In the past 2 decades, there has been a wealth of epidemiological research on the health effects of air pollution.^{1–3} Studies have reported important associations between short-term and long-term exposure to ambient levels of air pollution and a wide range of adverse health outcomes.

Air pollution measurements are usually obtained from fixed monitoring locations, while data on health outcomes are generally available at the individual level with geocoded addresses or as aggregated counts within a prespecified geographical region. The common approach to integrate these 2 types of data is to develop a statistical model for predicting levels of air pollution at places where the health outcomes are available.

Various methods can be used to predict missing air pollution values, including nearest-neighbor and kriging approaches.^{4},^{5} Recently, land-use regression has garnered much attention because of its ability to improve local variation in exposure prediction by incorporating land-use (geographic) covariates into the prediction model. Hoek et al^{6} provide a review of land-use regression models, and others^{7–14} have applied this methodology in epidemiological studies.

Another common issue in studies of air pollution and health is confounding,^{15} which arises due to the complex dependencies that exist among air pollution, the health outcome of interest, and other covariates. Researchers use expert knowledge in an attempt to control confounding through the use of covariates associated with both the exposure and the outcome. Great care is taken to minimize the magnitude of bias in the health-effect estimate, although it is unlikely that the bias can be completely removed. We use the term confounder here to define a covariate that is associated with the exposure, associated with the outcome independently of the exposure, and not on the causal pathway between the 2.

Sheppard et al^{15} provide a discussion of both confounding and exposure measurement error in air pollution epidemiology and point out that exposure assessment should be evaluated in the context of health-effect estimation. With effect estimation in mind, it is known that (1) better exposure prediction (ie, smaller prediction error) does not necessarily lead to smaller mean squared error^{16} of the health-effect estimate; and (2) confounding can lead to biased effect estimation.^{17} However, the current literature treats confounding and exposure prediction as separate statistical issues. That is, methods that account for measurement error in the predicted exposure often fail to acknowledge the possibility of confounding, whereas methods designed to control confounding often fail to acknowledge that the exposure has been predicted.

We simultaneously consider exposure prediction and confounding adjustment in a health-effects regression model. Based on theoretical arguments, we show that using different sets of covariates in an exposure prediction model and in a health-effects regression model can bias the health-effect estimate. We provide a simulation study that illustrates this concept in the context of a cohort study on the association between long-term exposure to PM_{2.5} and cardiovascular disease. We show that better prediction (higher *R*^{2}) does not always imply better effect estimation (smaller bias). Our results suggest that exposure prediction and confounding adjustment need to be considered simultaneously. We show that even under a correctly specified health-effects regression model that includes all confounders, the use of a predicted exposure can bias the health-effect estimate unless all the confounders included in the health-effects regression model are also included in the exposure prediction model.

The concepts described in this article, although motivated by epidemiological studies of air pollution and health, are broadly applicable to any context in which an exposure is predicted with covariates that might also be confounders of the exposure-response relationship. In the discussion, we provide examples of the broader applicability of our results.

## CONFOUNDING BIAS DUE TO EXPOSURE PREDICTION

Let **C**_{i} be a set of covariates for the *i*th observation, and assume that the outcome *Y*_{i} and the exposure *X*_{i} are generated under the following linear models:

where

and

are independent, normally distributed, mean-zero error terms with variances

and

. The true exposure *X*_{i} is assumed to be unobserved for all observations; therefore, an exposure prediction model is necessary.

Suppose interest lies in the estimation of the linear exposure-outcome relationship *β*, conditional on the covariates **C**_{i}. Here, and throughout, no restriction is placed on *γ* or *α*, and individual components of the vectors are free to be 0.

The difficulty in estimating *β* in this setting is 2-fold: (1) in a practical application, we do not know the exact set of covariates **C** necessary for confounding adjustment; and (2) the exposure is not directly observed and must be predicted. Current literature fails to acknowledge these 2 important concepts simultaneously, and as such, researchers may be misguided when faced with these issues. Our results provide a starting point at understanding the complex relationship between exposure prediction and confounding adjustment and provide some basic guidance as to what can be expected in practice. We briefly present a few results here; for a fuller discussion and mathematical derivation, see the eAppendix (http://links.lww.com/EDE/A784).

First, imagine an admittedly unrealistic situation in which no confounding adjustment is needed (*γ*_{j} = 0 or *α*_{j} = 0 for all *j* ≥ 1). In such a situation, the bias of a health-effect estimate falls fully within the vast measurement error literature^{18–21} and will therefore not be discussed here. Similarly, consider the case where the exposure is fully observed, but confounding adjustment is necessary (*γ*_{j} ≠ 0 and *α*_{j} ≠ 0 for some *j* ≥ 1). In this situation, one can rely exclusively on the large literature on confounding adjustment for standard regression modeling^{22–24} and will therefore not be discussed here.

We address simultaneous consideration of these concepts—that is, when exposure prediction and confounding adjustment are both necessary, and when a potentially large set of covariates are available for confounding adjustment and exposure prediction. Using overlapping sets of covariates to predict exposure and adjust for confounding can lead to biased health-effect estimates. Intuitively, this can be expected; under this situation, the predicted exposure may be more correlated with the confounders than with the true exposure.

When the goal of a study is health-effect estimation, the decision to include a potential confounder in either the exposure prediction model or the health-effects regression model needs to be based on more than just the predictive power of the potential confounder on the exposure or the strength of the relationship with the outcome. Rather, the decision needs to be based on some trade-off between the 2. To emphasize this concept, we will now discuss special cases where exposure prediction is either beneficial or detrimental to the goal of health-effect estimation under the common situation where confounding adjustment is necessary.

### Confounding Bias Inflation Due to Exposure Prediction Under Outcome Model Misspecification

Consider an oversimplified scenario in which all confounders are used to predict the exposure and no confounding adjustment is made in the health-effects regression model. Note that under the true models specified in Equations 1 and 2, this corresponds to a correct specification of the exposure prediction model but an incorrect specification of the health-effects regression model. While this would rarely occur in practice, it serves to illustrate that exposure prediction can increase the magnitude of confounding bias.

More specifically, assume that the true data-generating mechanism is given by Equations 1 and 2. We refer to confounding bias as the bias in the health-effect estimate from a health-effects regression model that fails to control for any confounding (

). Let *W*_{i} = *α*_{0} + **C**_{i}*α* be the predicted exposure with *α*_{0} and *α* known from Equation 2. Consider fitting the health-effects regression model that uses the predicted exposure *W*_{i} in place of the true exposure *X*_{i} and fails to control for any confounding

. It can be shown that the bias of a health-effect estimate using the predicted exposure is always larger in magnitude than the confounding bias when using the true exposure. In this specific context, using a predicted exposure in a health-effects regression model will always lead to a more biased health-effect estimate than using the true exposure, and the degree to which the predicted exposure increases the bias is determined solely by the prediction accuracy of the exposure prediction model. Therefore, the bias in the health-effect estimate can be large even when the confounding bias is small if the correctly specified exposure prediction model has poor prediction power. The full analytical expression, with discussion, can be found in the eAppendix (http://links.lww.com/EDE/A784).

Such a result occurs because the correlation between the predicted exposure and the confounders that are missing from the health-effects regression model may be magnitudes larger than the correlation between the predicted exposure and the true exposure. Similar confounding bias inflation occurs under any health-effects regression model that fails to adequately adjust for confounding when using the true exposure prediction model.

As we will show in the subsequent sections, this relationship does not generalize to more complex settings. The bias of the health-effect estimate can either increase or decrease in magnitude if a subset of the confounders are used in the exposure prediction model (ie, exposure prediction model misspecification). The bias depends on both the set of covariates used for exposure prediction and the set of covariates used for confounding adjustment in the health-effects regression model.

### Confounding Bias Due to Exposure Prediction Under Exposure Prediction Model Misspecification

In this section, we assume that confounding can be completely adjusted for in the health-effects regression model. We show that to return an unbiased health-effect estimate (1) the exposure prediction model must be correctly specified, (2) all confounders included in the health-effects regression model must be included in the exposure prediction model, or (3) the covariates used in the exposure prediction model must be uncorrelated with the confounders.

Assume that the data are generated under the following linear models:

where **C**^{(1)}, **C**^{(2)}, and **C**^{(3)} denote subsets of **C**. Note that in this data-generating scheme, there is only partial overlap in the sets of covariates in the 2 models and that the necessary set of confounders is **C**^{(1)}.

Assume that **C**^{(1)} is fully observed so that we can use a health-effects regression model that correctly adjusts for confounding by including **C**^{(1)} as a linear predictor. Under this correctly specified health-effects regression model, using a predicted exposure will bias the effect estimate unless (1) the exposure prediction model is correctly specified, (2) the covariates used in the exposure prediction are uncorrelated with the confounders **C**^{(1)}, or (3) all the confounders **C**^{(1)} are included in the exposure prediction model.

More specifically, assume that no data are collected on the set **C**^{(3)} and that *α*_{3} ≠ 0, so that we are guaranteed misspecification of our exposure prediction model. Consider the health-effects regression model that correctly adjusts for confounding but uses a predicted exposure based only on **C**^{(2)} in place of the true exposure

. This exposure prediction model purposely excludes confounders **C**^{(1)}. Under such a procedure, the health-effect estimate will be biased unless the covariates **C**^{(2)} used to predict the exposure are uncorrelated with the confounders **C**^{(1)} (result in eAppendix, http://links.lww.com/EDE/A784). A related result when estimating the health effect in the presence of unmeasured confounding will be discussed below.

In most applications, the set of covariates used to predict the exposure will not be uncorrelated with the confounders, and excluding the confounders from the exposure prediction model will bias the health-effect estimate. However, an unbiased health-effect estimate can be obtained if the exposure prediction model is based on both **C**^{(1)} and **C**^{(2)} (result in eAppendix, http://links.lww.com/EDE/A784). In other words, by including the confounders in the exposure prediction model, the health-effects regression model that properly adjusts for confounding will remain unbiased. This result may counter to intuition at first glance but is easily explained. By including the confounders **C**^{(1)} in the exposure prediction model, we guarantee that the variation in the predicted exposure that is used to estimate the health effect is uncorrelated with the confounders. For a mathematical discussion, see the eAppendix (http://links.lww.com/EDE/A784).

In general, any misspecification in the health-effects regression model or the exposure prediction model will lead to a biased health-effect estimate. The magnitude of these biases will be difficult to compare under various levels of misspecification. More work needs to be done to determine the best strategy for minimizing the bias under model misspecification. However, these preliminary results suggest that under a health-effects regression model that properly adjusts for confounding, one must include confounders in the exposure prediction model to avoid bias.

## EXPOSURE PREDICTION THAT REMOVES CONFOUNDING BIAS

In a simplified situation from the previous section, where the data are generated under Equations 3 and 4, assume that no data are collected on the set **C**^{(1)}, so that direct control of confounding is impossible. Furthermore, assume that *α*_{3} = 0 and that **C**^{(2)} is uncorrelated with the missing confounders **C**^{(1)}. If the predicted exposure is based solely on these covariates **C**^{(2)} that are both predictive of the exposure and uncorrelated with the unmeasured confounding, then the resulting health-effect estimate will be unbiased (result in eAppendix, http://links.lww.com/EDE/A784). This result can be viewed as an instrumental variable approach where **C**^{(2)} is used as an instrument for *X*.^{25},^{26}

Such a result is applicable in the context of air pollution studies of health. Suppose that we believe there is spatial confounding in our study and are unable to control it because we do not have data at the desired level of spatial resolution. An unbiased health-effect estimate can be obtained by building an exposure prediction model that uses only those covariates that vary on a different spatial scale than the spatial confounding (or those covariates with little to no spatial structure). This is true because these covariates will be uncorrelated with the spatial confounding, and our previous result will hold.

## SIMULATION SETUP AND RESULTS IN AIR POLLUTION EPIDEMIOLOGY

We have provided theoretical evidence that an exposure prediction model chosen solely on its ability to predict the true exposure may lead to a biased health-effect estimate. We now provide a simulated example that mimics a real cohort study of air pollution and health to show that better prediction (higher *R*^{2}) does not imply better effect estimation (smaller bias).

Consider a hypothetical cohort study of the association between long-term exposure to PM_{2.5} and cardiovascular disease in the New England region. Assume we have cardiovascular hospitalization rates over the study period for each of the 2165 zip codes in New England, and we wish to have available PM_{2.5} levels for each of the 2165 zip codes. Of the 2165 zip codes, 57 have air pollution monitors within their boundaries, and the exposure for these zip codes can be measured directly as the mean monitor value during the study period. For the remaining 2108 zip codes, we assume the exposure values are missing and need to be predicted.

Figure 1 provides a map of the 2165 zip codes in New England, with the 57 PM_{2.5} monitoring locations marked with an x. We observe that the PM_{2.5} monitors are sparse in New England and tend to cluster near major population centers. As such, the spatial heterogeneity in PM_{2.5} across New England will be difficult to capture based solely on spatial location (ie, latitude and longitude).

The intention of this simulation is to illustrate how the choice of covariates used in the PM_{2.5} prediction model will affect the estimated health effect under a misspecified health-effects regression model that does not fully account for confounding. We generate 1000 realizations of our hypothetical cohort in the following manner:

- Use the observed distribution of 9 land-use covariates for each zip code in New England. Table 1 provides a complete list and summary statistics for each land-use covariate considered.
- Augment the 9 land-use covariates with one
*N* (0, 1) random variable, and denote the centered and standardized versions of these 10 covariates as **C**_{i}.
- Generate the exposure based on the relationship between the observed PM
_{2.5} levels and **C**. That is, fit the exposure model
- for the 57 zip codes that have observed PM
_{2.5} measurements, and use the resulting
- = 10.86,
- = (−0.51, −0.44, −0.38, 0.35, 0.22, −0.16, 0.16, 0.13, −0.05, 0), and
- =
- (
*ε*_{x}) to generate a simulated “true” exposure as:
- .
- Generate the cardiovascular hospitalization rates using the regression model:
- , where
*γ* = (0, 0.01, 0.11, −0.15, −0.13, 0.12, −0.14, −0.13, 0.06, 0.01) and *β* = 0.04. Other choices of *γ* were considered and are available in the eAppendix (http://links.lww.com/EDE/A784).
- Remove the “true” PM
_{2.5} values
- from the data set to reflect the zip codes that are missing exposure. The final data set contains 57 zip codes of (
*Y*_{i},
- ,
**C**_{i}) and 2108 zip codes of (*Y*_{i}, **C**_{i}).

Table 2 summarizes the covariates that are included in each model of the data-generating mechanism. Note that the true exposure is generated using *C*_{1} through *C*_{9}, while the true confounders are *C*_{2} through *C*_{9}.

For ease of demonstration, this data-generating mechanism purposely uses a nearly worst-case scenario; there is a large overlap in the set of covariates used to predict the exposure and those used to adjust for confounding. In reality, there will be partial overlap between these 2 sets. See the eAppendix (http://links.lww.com/EDE/A784) for further discussion.

We will proceed using land-use regression to estimate PM_{2.5} levels that are missing from the study. Once the land-use regression is used to estimate the missing PM_{2.5} values, a health-effects regression is performed using a completed data set that replaces the missing 2108 PM_{2.5} values with their corresponding predicted values.

The only remaining decision for the purpose of our simulation is which land-use covariates to include in the land-use regression model to predict the missing PM_{2.5} values. Considering every combination of the land-use covariates would amount to 2^{10} = 1024 possible models. Instead, we chose to consider 10 nested regression models that include the 10 land-use covariates in order of their true predictive power of PM_{2.5}. The following summarizes the steps used to predict PM_{2.5} and estimate the resulting health effect:

- Fit the land-use regression model including only
*C*_{1} as a predictor for the 57 zip codes with observed PM_{2.5}.
- Estimate the 2108 missing PM
_{2.5} values, *W*, based on the model from step 1.
- Estimate the effect of long-term PM
_{2.5} exposure on cardiovascular hospitalization rates using a regression model including only *W* as a predictor
- . The health-effects regression model is always misspecified.
- Repeat steps 1–3, but using {
*C*_{1}, *C*_{2}}, {*C*_{1}, *C*_{2}, *C*_{3}}, …, {*C*_{1}, …, *C*_{10}} as predictors in the exposure regression model from step 1. Each land-use regression model is misspecified, except for the one that includes {*C*_{1}, …, *C*_{9}}.

Note that in step 3, we fit a regression model that fails to control confounding and gives a biased health-effect estimate. The magnitude of this bias, which is given in closed form in the eAppendix (http://links.lww.com/EDE/A784), is determined by a trade-off between the bias due to lack of adjustment for confounding and the prediction accuracy of the PM_{2.5} regression model. It does not depend on the true value of *β*. As such, we consider only one value of *β* = 0.04. However, if we had chosen to control confounding for some fixed set of covariates, results would be similar.

Figure 2 provides the *R*^{2} from the land-use regression models and the corresponding bias of the health-effect estimate from the hypothetical study of the association between long-term exposure to PM_{2.5} and cardiovascular hospitalization rates. The land-use regression model that provides the health-effect estimate with the smallest bias is the one that includes the first 5 land-use covariates (% forest, % open space, % urban, traffic density, and elevation) and has corresponding *R*^{2} value of less than 0.6. This occurs because the bias due to the lack of adjustment for confounding happens to be negated by the bias due to the measurement error in the exposure prediction. By including the 2 additional covariates distance to major road and point emissions, the *R*^{2} can be increased to 0.7, but this results in a large bias. In this case, the measurement error induced by the exposure prediction no longer negates the confounding bias, and we are left with a biased health-effect estimate. Of the 10 models considered, 5 have a smaller bias than the model that uses the true exposure (the dotted line in Figure 2), suggesting that a predicted exposure can either improve or worsen effect estimation when compared with the true exposure in the presence of uncontrolled confounding.

This simple simulation illustrates that in the presence of uncontrolled confounding, a more accurate prediction of the exposure does not necessarily lead to a better health-effect estimate. Exposure prediction can exacerbate the problem of uncontrolled confounding, but all is not lost. Recall that in this hypothetical study, we purposefully fail to control for any confounding, but with a properly chosen PM_{2.5} prediction model, we were able to return nearly unbiased effect estimates. In that situation, the exposure prediction was beneficial for health-effect estimation, but we are able to determine this only because we know the true exposure and health-effect regression models.

In addition to fitting the data as described above, we illustrate that even under a health-effects regression model that properly adjusts for confounding, a predicted exposure can lead to a biased health-effect estimate. Note that *C*_{2} through *C*_{9} are the confounders. Thus, we proceed as follows:

- Fit an incorrectly specified land-use regression model that includes only
*C*_{1} and *C*_{10} (and excludes *C*_{2} through *C*_{9}) as predictors for the 57 zip codes with observed PM_{2.5}. We purposely included in the model only the covariates that are not confounders.
- Estimate the 2108 missing PM
_{2.5} values, *W*, based on the model from step 1.
- Estimate the effect of long-term PM
_{2.5} exposure on cardiovascular hospitalization rates using a regression model including *W* and all the necessary confounders *C*_{2} through *C*_{9} as predictors
- . Note that using the true exposure in this model would yield an unbiased health-effect estimate.

Using the above data-fitting algorithm, the bias of the health-effect estimate is 0.01, corresponding to a 25% bias. This occurs because, although confounding bias is zero, the exposure model has been misspecified. To avoid this bias, the exposure prediction model that includes all confounders in addition to the other covariates can be used in step 1. Doing so makes the bias approximately 0 (the simulated value is 0.0002). This verifies a few of our theoretical results previously discussed. First, even under a health-effects regression model that properly adjusts for confounding, a predicted exposure can bias the health-effects estimate. Second, including all confounders in the exposure prediction model makes the estimated health-effect unbiased.

## DISCUSSION

We simultaneously consider 2 of the most important and challenging issues in environmental epidemiology: exposure prediction and confounding adjustment. Although the motivation and terminology come from air pollution epidemiology, results apply to any context in which exposure measurements are incomplete and must be predicted. Examples include exposures to herbicides or pesticides, burn pits in Iraq and Afghanistan, low-level radio waves, dietary intake, and blood lead levels.^{27–32} In each, the exposure of interest is not completely observed and must be predicted, while confounding adjustment is necessary when fitting the health-effects regression model.

Current statistical methods dealing with missing exposure and confounding adjustment treat the 2 topics as distinct. For example, methods to overcome missing exposure rely on exposure prediction, and exposure prediction can be viewed as a measurement error problem.^{18–21} Methods for exposure prediction are concerned only with bias of the health-effect estimates due to measurement error associated with the prediction of the true exposure; these methods do not consider how predicting exposure with covariates that are correlated with the outcome might bias the health-effect estimates. Similarly, methods designed for confounding adjustment do not acknowledge that the exposure has been predicted. For example, the approach of Wang et al^{33} was designed for the selection of confounders in the context of linear models for both the outcome and the exposure when the exposure has been fully observed.

New statistical methods are needed to simultaneously predict exposure while adjusting for confounding. While the results of our article indicate that under a properly specified health-effects regression model, all confounders should be used to predict the exposure, the optimal strategy is less clear when there is uncertainty surrounding the covariates that should be included in both the exposure prediction model and the health-effects regression model. The decision to include a covariate in the outcome or the exposure model needs to be based on both the predictive power of the covariate on the exposure and the strength of the relationship with the outcome. An extension of Wang et al^{33} into the context of missing exposure could provide a foundation of methodologies used to simultaneously predict exposure and control confounding.

We framed our results with the priority of returning an unbiased health-effect estimate, and we ignored the possibility of a bias-variance trade-off. We do not include any results on the variance in order not to distract from the main points. However, we do not believe that there is a simple bias-variance trade-off when simultaneously considering exposure prediction and confounding adjustment.

Our results do not address how spatial smoothing will affect the bias of a health effect in the presence of unmeasured spatial confounding. However, it is reasonable to postulate similar results on the bias of a health-effect estimate when using spatial smoothing. Such results would be related to the work of Dominici et al^{34}; these researchers provide results to reduce confounding bias in the pollution-mortality relationship due to unmeasured time-varying factors such as season and influenza epidemics in the context of time-series studies. One could adapt their results for cross-sectional studies of air pollution and health by indexing by space instead of time.

Issues of bias have been presented in the context of cross-sectional studies. There is a likely statistical parallel for time-series studies. If missing exposure values are imputed using covariates that are temporally correlated with both the exposure and the outcome, then similar biases are likely. For example, meteorological covariates could be temporally correlated with both air pollution and health.

We assumed simple linear relationships among the outcome, the exposure, and the confounders. Deriving analytic results for biases under more complex models is impractical; however, similar biases are expected. Greater care is needed when using predicted exposure values in epidemiological studies of health.

## ACKNOWLEDGMENTS

We thank Itai Kloog for providing the data needed for our simulation. We also thank Arden Pope, Jennifer Bobb, and the anonymous reviewers for their useful feedback and discussion of this work.

## REFERENCES

1. Dominici F, Sheppard L, Clyde M. Health effects of air pollution: a statistical review. Int Stat Rev. 2003;71:243–276

2. Pope CA 3rd. Mortality effects of longer term exposures to fine particulate air pollution: review of recent epidemiological evidence. Inhal Toxicol. 2007;19(suppl 1):33–38

3. Breysse P, Delfino R, Dominici F, et al. US EPA particulate matter research centers: summary of research results for 2005–2011. Air Qual Atmos Health. 2013;6:333–355

4. Oliver MA, Webster R. Kriging: a method of interpolation for geographical information systems. Int J Geogr Inform Syst. 1990;4:313–332

5. Madsen L, Ruppert D, Altman N.. Regression with spatially misaligned data. Environmetrics. 2008;19:453–467

6. Hoek G, Beelen R, de Hoogh K, et al. A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos Environ. 2008;42:7561–7578

7. Henderson SB, Beckerman B, Jerrett M, Brauer M. Application of land use regression to estimate long-term concentrations of traffic-related nitrogen oxides and fine particulate matter. Environ Sci Technol. 2007;41:2422–2428

8. Ross Z, Jerrett M, Ito K, Tempalski B, Thurston GD.. A land use regression for predicting fine particulate matter concentrations in the New York City region. Atmos Environ. 2007;41:2255–2269

9. Yanosky JD, Paciorek CJ, Schwartz J, Laden F, Puett R, Suh HH. Spatio-temporal modeling of chronic PM10 exposure for the Nurses’ Health Study. Atmos Environ. 2008;42:4047–4062

10. Sahsuvaroglu T, Jerrett M, Sears MR, et al. Spatial analysis of air pollution and childhood asthma in Hamilton, Canada: comparing exposure methods in sensitive subgroups. Environ Health. 2009;8:14

11. Neupane B, Jerrett M, Burnett RT, Marrie T, Arain A, Loeb M. Long-term exposure to ambient air pollution and risk of hospitalization with community-acquired pneumonia in older adults. Am J Respir Crit Care Med. 2010;181:47–53

12. Kloog I, Coull BA, Zanobetti A, Koutrakis P, Schwartz JD. Acute and chronic effects of particles on hospital admissions in New-England. PloS One. 2012;7:e34664

13. Kloog I, Melly SJ, Ridgway WL, Coull BA, Schwartz J. Using new satellite based exposure methods to study the association between pregnancy PM

_{2.5} exposure, premature birth and birth weight in Massachusetts. Environ Health. 2012;11:40

14. Cesaroni G, Badaloni C, Gariazzo C, et al. Long-term exposure to urban air pollution and mortality in a cohort of more than a million adults in Rome. Environ Health Perspect. 2013;121:324–331

15. Sheppard L, Burnett R, Szpiro A, et al. Confounding and exposure measurement error in air pollution epidemiology. Air Qual Atmos Health. 2011;5:203–216

16. Szpiro AA, Paciorek CJ, Sheppard L. Does more accurate exposure prediction necessarily improve health effect estimates? Epidemiology. 2011;22:680–685

17. Pope CA 3rd, Burnett RT. Confounding in air pollution epidemiology: the broader context. Epidemiology. 2007;18:424–426; discussion 427

18. Gryparis A, Paciorek CJ, Zeka A, Schwartz J, Coull BA. Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics. 2009;10:258–274

19. Szpiro AA, Sheppard L, Lumley T. Efficient measurement error correction with spatially misaligned data. Biostatistics. 2011;12:610–623

20. Carroll RJ, Ruppert D, Stefanski LA Measurement Error in Nonlinear Models. 1995 Boca Raton, FL Chapman and Hall/CRC

21. Basagaña X, Aguilera I, Rivera M, et al. Measurement error in epidemiologic studies of air pollution based on land-use regression models. Am J Epidemiol. 2013;178:1342–1346

22. Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med. 2004;23:2937–2960

23. Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–560

24. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55

25. Pearl J. On a class of bias-amplifying variables that endanger effect estimates. arXiv:1203.3503. 2012

26. Bhattacharya J, Vogt WB. Do instrumental variables belong in propensity scores? Int J Stat Econ. 2012;9:107–127

27. Choudhury H, Harvey T, Thayer WC, et al. Urinary cadmium elimination as a biomarker of exposure for evaluating a cadmium dietary exposure–biokinetics model. J Toxicol Environ Health A. 2001;63:321–350

28. Frei P, Mohler E, Bürgi A, et al.QUALIFEX Team. A prediction model for personal radio frequency electromagnetic field exposure. Sci Total Environ. 2009;408:102–108

29. Lewin MD, Sarasua S, Jones PA. A multivariate linear regression model for predicting children’s blood lead levels based on soil lead levels: a study at four superfund sites. Environ Res. 1999;81:52–61

30. Institute of Medicine (US). . Committee on the Long-Term Health Consequences of Exposure to Burn Pits in Iraq and Afghanistan. Long Term Health Consequences of Exposure to Burn Pits in Iraq and Afghanistan. 2011 Washington, DC The National Academies Press;

31. . Committee on Making Best Use of the Agent Orange Exposure Reconstruction Model. The Utility of Proximity-Based Herbicide Exposure Assessment in Epidemiologic Studies of Vietnam Veterans. 2008 Washington, DC The National Academies Press

32. . National Research Council (US) Committee to Assess Potential Health Effects from Exposures to PAVE PAWS Low-Level Phased-Array Radiofrequency Energy. An Assessment of Potential Health Effects from Exposure to PAVE PAWS Low-level Phased-array Radiofrequency Energy: Letter Report, 7 March 2006. 2006 Washington, DC The National Academies Press;

33. Wang C, Parmigiani G, Dominici F. Bayesian effect estimation accounting for adjustment uncertainty. Biometrics. 2012;68:661–671

34. Dominici F, McDermott A, Hastie T J. Improved semiparametric time series models of air pollution and mortality. J Am Stat Assoc. 2004;99:938–948