Studies of health effects of long-term air pollution have typically assigned exposures to individuals using measurements made at a few fixed-site monitoring stations.^{1,2} This approach was based on the assumption that the spatially averaged ambient air pollution concentrations measured at monitoring sites in an administrative area (such as a community or a metropolitan area) is representative of the exposure of individuals residing at various locations in the area. However, such exposure data do not capture all the individual spatial heterogeneity in exposure and may result in less accurate or reliable health effect estimates.

To improve exposure assessment, recent studies have predicted individual air pollution exposure instead of using an average area-wide monitored concentration. One prediction approach assigns exposure based on the monitor nearest to the participant's residential location.^{3–6} Another approach is to interpolate predictions by applying a geostatistical method such as kriging.^{7–10} Kriging predicts individual concentrations corresponding to residential locations after estimating the parameters of a model of the spatial surface of air pollution concentration. Although both methods have been used, little is known about how these prediction methods might influence health effect estimation. We conducted a simulation study to investigate the impact on the health effect estimate of predicted exposure from these 2 exposure prediction approaches, using assumptions derived from an analysis of annual average concentration of particulate matter less than or equal to 2.5 μm in aerodynamic diameter (PM_{2.5}) and a previous analysis of the effect of particulate matter on cardiovascular events.^{4}

#### METHODS

##### Data and Assumptions

##### Population

We selected 2000 residential locations from the Los Angeles (LA)-Long Beach-Santa-Ana urbanized area of California (LA urbanized area) as defined by the US census classification (eFigure, http://links.lww.com/A980). Locations were randomly selected using a uniform distribution stratified by census tract to obtain up to 3 residential locations per tract, with the number of individual sites proportional to the population size. To obtain 10,000 subjects, 5 subjects were assumed to live around each residential site and to have identical PM_{2.5} exposure.

##### PM_{2.5} Data and Analysis

The 22 PM_{2.5} monitoring stations used in the simulation were located in counties (Los Angeles, Orange, Riverside, San Bernardino, and Ventura) surrounding the LA urbanized area. These stations were nearly identical to those used by Jerrett et al.^{8} Using annual average concentrations from 2002, we estimated parameters for a spatial model of PM_{2.5} by fitting mean and covariance models. We fit a covariance model to the empirical semi-variogram, half the squared differences of PM_{2.5} against distance. The covariance model has a specific functional form with 3 parameters: the range, partial sill, and nugget. The range is the distance at which the semi-variogram curve reaches the maximum variability. The total variance, named sill, is partitioned into the partial sill and nugget. The partial sill is the component of variance due to spatial variability, whereas the nugget is considered measurement error. The mean model is a regression model with covariates assumed to affect PM_{2.5}. In this analysis, we considered only longitude (X) and latitude (Y) as possible covariates. After exploration of several covariance and mean models, the best-fitting model, chosen by cross-validation, was a spherical covariance model with a second-order polynomial mean model, using the XY coordinates as predictors. This model is referred to as TEM 1 in the tables and figures. It is similar to the one used in previous studies in the LA area.^{8,9} For sensitivity analysis, we also estimated spatial parameters from a constant mean model (TEM 4).

##### Disease

We used an exponential relative risk (RR) model for time to first cardiovascular event and used the relative risk, baseline rate, and drop-out patterns consistent with a previous study of postmenopausal women in the Women's Health Initiative cohort.^{4} Miller et al^{4} reported 2113 cardiovascular and cerebrovascular events over a median follow-up of 6 years among 65,893 women and estimated overall hazard ratio of 1.24 for a 10 μg/m^{3} increase in long-term average PM_{2.5} concentration. In the simulation, we assumed a baseline incidence rate of 0.032 per year and a β coefficient of 0.0215 (=log(1.24)/10) for the relative risk parameter.

##### Simulation

Given the residence locations and model parameters, we generated true long-term PM_{2.5} exposures for 2000 residences and 22 monitor locations, and 10,000 outcomes (times to cardiovascular event) in 1000 simulations. We made the simplifying assumption that PM_{2.5} concentration equals PM_{2.5} exposure. Using only monitoring data, we then predicted PM_{2.5} exposure by 2 methods: nearest monitor and kriging. The effect of PM_{2.5} on cardiovascular event incidence was estimated conditional on predicted or true exposure in a Cox proportional hazards regression analysis. Across all simulations, locations were fixed for the 22 monitors (real locations) and 2000 residences; the true exposures and health outcomes were generated in every simulation.

##### True Models

##### Underlying Exposure Model

We assumed that exposure to PM_{2.5} followed a multivariate normal distribution with variability consisting of spatial and nonspatial components parameterized by the partial sill (*σ*^{2}), range (*φ*), and nugget (*τ*^{2}) in the spherical covariance model. True exposure (*W*) at location *si* was generated as

The true exposure was composed of the mean (*μ*(*si*)) and 2 error terms, the spatial (*v*(*si*)) and nonspatial errors (*ϵ*(*si*)). Variances of these errors are the partial sill (*σ*^{2}) and nugget (*τ*^{2}), respectively. Correlation of spatial errors is a function of the range (*φ*).^{11} To represent different spatial correlation scenarios, we focused primarily on 6 true exposure models with different mean and covariance parameters (Table 1). Parameters for 2 models were estimated from analyses of PM_{2.5} data in LA (section PM_{2.5} Data and Analysis). In the first true exposure model (TEM 1), most of the variability of PM_{2.5} was dominated by the geographic covariates. The mean was determined by a second-order polynomial function of XY coordinates, (*μ*(*si*) = (–46771) + (6.69 * *X*) + (24.07**Y*) + (–0.09 * *X*^{2}) + (–0.31 * *Y*^{2}) + (–0.0016 * *X* * *Y*)), while the partial sill, range, and nugget were relatively small (*σ*^{2} = 5.81, *φ* = 44 km, *τ*^{2} = 0.43). The other model (TEM 4) had a constant mean (*μ*(*si*) = 18.35 μg/m^{3}) so all the spatial dependence was parameterized in the covariance model. Thus, the partial sill and range were larger (*σ*^{2} = 46.9, *φ* = 120 km) and there was no nugget (*τ*^{2} = 0). To explore sensitivity to the exposure model structure, arbitrary parameter values were chosen for 4 other models (TEM 2, TEM 3, TEM 5, and TEM 6). These models had the same constant mean and partial sill as the TEM 4 but different range parameters to represent a range of spatial correlations from none to high. Figure 1 shows examples of each exposure surface; the spatial surface is smoother in the first, fourth, fifth, and sixth models with large spatial dependence from the variogram or the second-order mean models. To better understand how modeled exposure affected health effect estimates, we conducted a sensitivity analysis by simulating an additional set of 40 true exposure models. We assigned a constant mean to half and a polynomial mean to the other half (mean identical to the first true exposure model). Each of the 20 pairs had range parameters (φ) between 10 and 500 km, partial sill parameters (*σ*^{2}) between 0.1 and 90, and no nugget (*τ*^{2} = 0).

Table 1 Image Tools |
Figure 1 Image Tools |

##### Underlying Disease Model

Survival time to cardiovascular event occurrence (*T*) followed an exponential distribution with mean,

,ie,

Censoring was based on 2 mechanisms: cardiovascular disease-free survival for 10 years (study duration) and loss to follow-up (drop-out). Ten percent of all subjects dropped out in the first 10 years with time determined by the uniform distribution so the probability of drop-out was constant over all study years.

##### Fitted Models

##### Predicted Exposure Model

For each of the 6 exposure models, 2 versions of predicted PM_{2.5} were estimated from the simulated (“measured”) PM_{2.5} concentration at the 22 monitor locations. Nearest-monitor exposures (*W**) assigned the measured PM_{2.5} at the closest monitor to each individual residential location. Kriged exposures (*W*^{**}) interpolated measured PM_{2.5} by universal kriging methods. Kriging prediction consists of 2 parts: estimating the spatial structure and interpolating the spatial surface. We estimated parameters from the monitor measurements under a semi-variogram with a spherical covariance and a second-order polynomial mean model. Given the estimated parameters, we predicted PM_{2.5} at individual residence locations.

##### Fitted Disease Model

The effect of PM_{2.5} on time to cardiovascular event was estimated by the Cox proportional hazards model, conditional on true (*W′* = *W*) or predicted exposure (*W′* = *W** or *W*^{**}),

##### Assessment of Validity and Reliability

To evaluate validity and reliability of health effect estimates, bias, variance, mean square error, and 95% coverage probability were estimated. Coverage probability is the proportion of 95% confidence intervals of health effect estimates that include the true *β*.

To gain insight into properties of health effect estimates based on properties of exposures and their prediction, we also investigated validity and reliability of predicted exposure by summarizing average prediction error (difference of predicted PM_{2.5} minus true PM_{2.5}), variance of true and predicted exposures at subject locations, mean square prediction error, and 1 minus mean square prediction error relative to true exposure variance (constrained to always be greater than or equal to 0), which is an estimate of *R*^{2}.

#### RESULTS

Summary statistics of true and predicted long-term PM_{2.5} exposure in the first simulation are shown in the eTable (http://links.lww.com/A980). Across the exposure models, the true PM_{2.5} exposure means for the 2000 residential sites were between 17.1 and 26.1 μg/m^{3}. Predicted PM_{2.5} had smaller standard deviations and ranges than true PM_{2.5}, particularly for kriging prediction. Relative to the variation in the true exposure data, kriged PM_{2.5} varied much less in the second and third true exposure models (TEM 2 and TEM 3), which assumed little spatial correlation. A more detailed picture emerges from the true versus predicted exposure scatter plots displayed in Figure 2. Predicted PM_{2.5} was more correlated with true PM_{2.5} when the true exposure models had more spatial structure. Although models 1 and 4 (TEM 1 and TEM 4) were based on different fits to the same data, model 1 predictions were more correlated with true exposures.

Table 2 gives summary statistics for the quality of predictions over all simulations. Across the models, kriged PM_{2.5} had smaller average prediction error and variance than nearest-monitor PM_{2.5}. Mean square prediction error was smaller for kriging as well, except for model 4. Likewise, estimated *R*^{2} was generally higher for kriged predictions.

Table 2 also summarizes qualities of health effect estimates. Kriged PM_{2.5} health effect estimates were less biased and more variable than health effects from nearest-monitor PM_{2.5}. Estimates with the highest bias were from exposures with the least amount of spatial structure. However, because variance dominated bias, nearest-monitor predictions had smaller mean square error than kriging predictions. This trend was consistent across all 6 exposure models. Moreover, the same pattern was found for higher (RR = 1.34) and lower (RR = 1.14) true effects (results not shown). Coverage for kriged exposure was generally better than for nearest monitor, but was mostly less than the target 95%. With respect to the relationship between predicted exposure and health effect estimates, spatial exposure structures with higher estimated *R*^{2} produced higher coverage of health effect estimates.

Figure 3 shows scatter plots of the relationship between health effect estimates. Predictions from the first exposure model (TEM 1) gave the best health effect estimates from kriged exposure. Similarly, the fourth, fifth, and sixth exposure models with larger spatial correlation (TEM 4, TEM 5, and TEM 6) produced a stronger association between health effect estimates for true and predicted exposures than the exposure models with no or little spatial dependence (TEM 2 and TEM 3). Nearest-monitor prediction underestimated true effects on average, whereas kriging only underestimated on average when there was little spatial correlation in the exposure. Note that because there are estimates in the second and fourth quadrants of the nearest compared with kriged exposure scatter plots, there are multiple realizations for each exposure model where the kriging and nearest-monitor prediction approaches would lead to health effect estimates with opposite signs.

Figure 4 shows a sensitivity analysis of the relationship between spatial correlation (represented by the range) and health effect estimate properties (mean square error and coverage) across exposure models with different covariance parameters and mean structure. The models that produced the best health effect estimates (as defined by good coverage and small mean square error) had large range parameters relative to the partial sill. As the range increased, coverage increased toward 95% and mean square error decreased. Moreover, the difference between prediction approaches for both coverage and mean square error became smaller. Addition of mean structure had a dramatic effect on mean square error but relatively a small impact on coverage. In particular, with mean structure in the model there was only a small difference in mean square error between kriging and nearest-monitor exposures for all range parameters.

#### DISCUSSION

We explored 2 exposure prediction approaches in a simulation study to estimate the effect of long-term PM_{2.5} exposure on cardiovascular events. Kriging prediction estimated more accurate exposure because it had smaller average mean square prediction error. However, kriged exposure predictions were also less variable, which has implications for health effect estimates. Overall, health effect estimates were better with kriged exposure when comparing results based on coverage probability. In contrast, for mean square error kriging generally performed worse than nearest-monitor prediction due to greater variance in the health effect estimates resulting from smaller exposure variance. Differences between the 2 approaches diminished in more spatially dependent exposure structures where the dependence is represented by a longer range or the polynomial mean model.

Health effect estimates for both exposure prediction approaches had better properties as spatial correlation in the covariance model increased and a spatially varying mean structure was included. Coverage converged to 95% and mean square error decreased as the range became larger. This improvement was consistent across different partial sill parameters (Fig. 4). In addition, when the spatially varying mean model was included, coverage improved slightly, whereas mean square error reduced more dramatically. In particular, mean square error of kriging prediction dramatically declined and even became smaller than that of nearest-monitor prediction.

The mean square error and coverage criteria led to generally different conclusions about the preference of nearest-monitor versus kriging predictions. Coverage is a very important property of health effect estimates because interval estimates are the basis for inference. Coverage of 95% means that the 95% confidence interval estimated in an observational study has the correct interpretation, that is, that the interval has a 95% probability of covering the true effect. Although nearest-monitor prediction had generally smaller estimated mean square error, this was due to less variable but mostly biased effect estimates that produced generally smaller coverage below 95%. When we examined the expanded set of exposure models with different range, partial sill, and mean parameters, all models except 5 showed better coverage for kriged compared with nearest-monitor exposures. Four models among them had medium range (120 km) and large partial sill (46.9 and 90), with and without mean structure. The coverages of the health effect estimates of nearest monitor were slightly better (by 1%–4%). In the remaining models, coverage in kriging was much better (> 50%) when spatial correlation was small with the range of 10 km and became slightly better (by 1%–3%) as spatial correlation increased up to the range of 500 km (Fig. 4).

Health effect estimates from the 2 prediction methods showed features typically attributed to classic and Berkson measurement errors depending on the spatial structure. Nearest-monitor health effect estimates were always attenuated, as seen from the lower fitted regression line for nearest monitor compared with true exposure (Fig. 3). Attenuation was also present for kriged exposure with no spatial correlation or little range (TEM 2 and TEM 3). These underestimated effects behave like attenuation due to classic measurement error. In contrast, health estimates from kriged predictions of exposure models with strong spatial structure showed little or no attenuation but the estimates were more variable than for true exposures. This is behavior typically attributed to a Berkson measurement error model, although in this application with a spatially correlated surface the error structure is not independent and identically distributed so the Berkson analogy is only approximate. Because we also showed kriged exposures in models with poor spatial structure gave health effect estimates that were both attenuated and more variable, our results suggest that there is additional deviation from the Berkson error structure due to uncertainty in estimating the prediction model parameters. Szpiro et al^{12} propose a method to correct the standard error estimates that accounts for this additional source of uncertainty. In future work, we plan to evaluate the performance of their methods for our examples, and we expect the corrected coverage probabilities will be closer to the target 95%.

There are other ways to incorporate prediction uncertainty in health analyses. In a 2-stage approach, exposure can be simulated to capture the full exposure distribution. Then multiple realizations of the simulated exposures can be used in health analyses and the estimates combined as with multiple imputation. However, the approach is not true multiple imputation because the outcome is not included in the simulation (imputation) step. Gryparis et al^{14} have shown that this “exposure simulation” approach leads to notable attenuation bias in the health effect estimate. A priori, many researchers would argue that the best approach would use a joint model for both the exposure and the disease outcome. It is notable that a few recent simulation studies have not achieved consistently good coverage for the health effect estimate obtained from a joint model, fit either with maximum likelihood^{14} or using a fully Bayesian model,^{13} contrary to expectations. In addition, joint models allow feedback between health and exposure, and this feedback could influence inference.^{15} It is a defensible scientific choice to keep the exposure modeling separate from the health effect estimation, thus ensuring there is no feedback between the health and exposure models. Tools are being developed to correct second-stage health effect estimates for exposure prediction uncertainty.^{12}

One of the features of our approach is that we considered a monitoring network and exposure models based on existing data to ensure the relevance of our results. Health effect estimates from exposure models derived from analysis of LA data (TEM 1 and TEM 4) showed good properties. Although the Los Angeles area has a great deal of spatial variability of PM_{2.5} and a relatively large number of monitors compared with other areas, the monitoring network is still relatively sparse from a spatial statistics perspective. Thus, it may be challenging in practice to model the spatial structure. More work is needed to determine the best approaches to spatial prediction for air pollution exposure data. As a sensitivity analysis for one aspect of this question, we looked at the impact of monitor density on our simulation results. First we increased the number of monitors by 5 up to 42 in the first true exposure model (TEM 1). Locations of additional monitors were uniformly distributed within the area covered by the existing 22 monitors. The denser network did produce marginally better exposure predictions and health effect estimates. With 20 additional monitors, mean square error for health effect estimates decreased by 16% and 15% in nearest-monitor and kriging predictions, respectively. However, the coverage probability was almost identical to that of the original number of monitors. We also explored the influence of a sparser monitoring network, a more frequent occurrence in air pollution epidemiology, by reducing the number of monitors by 5 down to 12 in total. With only 12 monitors (10 less than original monitoring network), our health effect estimates yielded similar coverage probability estimates and a small increase of mean square error estimates for kriged exposures. In contrast, nearest monitor prediction performed much worse. Mean square error increased by over 50%. Coverage probability dropped by about 20%. This sensitivity analysis suggests that a large number of monitors may not be necessary for kriging prediction in a health effect analysis, at least when there is adequate spatial variability and the form of the spatial model is not in question. Health effect estimates were more sensitive to monitor reduction when individual exposure was predicted using nearest monitor.

We observed that for the kriged exposure, a small percentage of health effect estimates resulted in large outliers (omitted from Fig. 3). In further evaluation of those simulations, we found the exposure model covariance parameter estimates were unreasonably large. Under the spatial structure with very small range, poorly estimated covariance parameters induced the most extreme outlier occurrences. As part of our analysis, we examined both universal and ordinary kriging fitting procedures for all exposure models. We found that estimation using ordinary kriging (ie, with a constant mean) produced consistently poorer covariance parameter estimates across all exposure models. The universal kriging approach gave more stable parameter estimates even when there was no underlying mean structure in the true exposure model. Although it is not practical to investigate a large number of mean and covariance models in a simulation study, further observational research should be done to determine whether other spatial statistical models or fitting algorithms would produce more reasonable parameter estimates in such cases.

To focus this research on features of spatially dependent exposure and prediction that affect health effect parameter estimates, we made some simplifying assumptions. Most important, we assumed ambient air pollution concentration was identical to ambient source exposure. This assumption does not reflect the difference of exposure from concentration due to time activity and local sources or the attenuation of ambient exposure by infiltration. Future research will need to incorporate these exposure features. We also assumed the multivariate normal distribution for the underlying true exposure. In sensitivity analyses, we relaxed this assumption and introduced skewness in the exposure data by changing the underlying distribution to a log-normal. In the few cases we examined, the pattern of health effect estimates remained and kriging still performed better than nearest-monitor prediction. Finally, we assumed there was no additional spatial structure in the disease model. There may be spatial dependence in the health outcome in addition to the spatial structure induced from the exposure distribution. Work is needed to incorporate spatial dependence in the disease model and determine how separable the 2 sources of spatial structure may be in the health analysis.

Although the focus of this paper is specific to the field of air pollution epidemiology, this research is broadly relevant to inference in epidemiologic studies in the presence of limited exposure data. We present a problem in which there are no exposure measurements for study subjects but one in which it is possible to use an external dataset to develop an exposure prediction model to plug into the health effect analysis. We evaluate the merits of 2 exposure prediction approaches directly in terms of the parameter of interest in the epidemiologic study. We learned that it is not sufficient to assess the accuracy and precision of the exposure predictions when the target of inference is the health effect parameter in the disease model. We showed that the bias and precision of the resulting health effect estimates depend on the approach to exposure prediction and the features of the underlying exposure distribution. Although the details of our results are specific to the air pollution epidemiology context, our general approach and overarching conclusions are relevant to other epidemiologic study settings.

In summary, based on the generally better coverage and lower bias properties of the health effect parameter estimate of interest, we conclude that kriging exposure prediction is preferable to nearest-monitor prediction for estimation of long-term air pollution health effects, particularly when the underlying true exposure is less spatially correlated. Both prediction approaches perform well when there is good spatial structure in the underlying exposure distribution.

#### ACKNOWLEDGMENT

We thank Kristin A. Miller for her constructive suggestions and comments.