Epidemiologic studies suggest an association between short-term fluctuations in ambient air pollution levels and risk of hospitalization for acute cardiovascular events, including ischemic stroke,1 myocardial infarction,2,3 and acute decompensated heart failure.4 Time-series studies using generalized additive models or generalized linear models represent a widely used approach to evaluating the acute health effects of ambient air pollution.5 More recently, case-crossover methods have been applied to time-series data yielding theoretically equivalent results under certain assumptions.6
Regardless of the analytic approach, the majority of previous time-series studies have used data from administrative databases in which the date of hospitalization is the only information available on the timing of the event. Accordingly, pollution exposure just before the index event is necessarily defined as the average exposure over a fixed period before the calendar day of admission. For example, to evaluate the association between ambient fine particulate matter (PM2.5) and risk of same-day hospitalization for acute ischemic stroke, each case is commonly assigned the mean daily (eg, midnight–midnight) PM2.5 level on the date of hospital admission. Such studies are limited by the fact that hospital admission may occur at any time of day and that the true onset of the event may occur hours or days before hospital admission (Fig. 1).
The impact of misclassification of the time of event onset on the estimated association between ambient air pollution and the risk of acute cardiovascular events has not been evaluated. Accordingly, the aims of this study were to (1) investigate whether significant exposure misclassification may result from using the hospitalization date to estimate air pollution exposure at the true onset time of an acute cardiovascular event, and (2) evaluate the impact of the resulting exposure misclassification on the estimated association between ambient air pollution and the risk of acute cardiovascular events. We used data collected in the Boston metropolitan region as part of an ongoing study of ambient air pollution and the risk of acute ischemic stroke to accomplish the first aim and performed computer simulations to accomplish the second aim.
Assessment of Exposure Misclassification
This study was approved by the Institutional Review Board of the Beth Israel Deaconess Medical Center. First, we investigated whether substantial exposure misclassification may result from using hospitalization date to estimate PM2.5 exposure at the true onset time of an acute cardiovascular event, using hospitalizations for acute ischemic stroke as an example. Data on delay times between stroke onset time and hospital presentation were obtained from an ongoing study in the Boston metropolitan area of ambient air pollution and the risk of acute ischemic stroke. Specifically, we reviewed charts and electronic medical records of all patients aged 21 years and older admitted to the Beth Israel Deaconess Medical Center between 1 April 1999 and 31 December 2004 with a primary discharge diagnosis related to ischemic cerebrovascular disease (ICD-9 codes 433–438). Patients who were admitted for a neurologist-confirmed acute ischemic stroke were identified. Trained abstractors, using standardized forms, estimated stroke symptom onset times as previously described.7,8 Patients with in-hospital strokes or transient ischemic attacks were excluded. Stroke symptom onset times could not be estimated from medical records for less than 1% of patients (7 of 1101) because of conflicting information.
We obtained hourly measures of ambient particles with aerodynamic diameter of 2.5 μm or less (PM2.5) from the Boston/Harvard Countway Library PM Center, located less than 1 km from the study site. PM2.5 was measured continuously (TEOM Model 1400A, Rupprecht and Patashnick; Albany, NY) and calibrated against colocated integrated 24-hour Teflon filter (Teflon, Pall-Gelman, Ann Arbor, MI) gravimetric measurements, as previously described.9 We calculated 24-hour mean values when 75% or more of hourly measurements were available. Patient addresses were obtained from the medical record and geocoded using ArcGIS 9.0 (ESRI, Redlands, CA). To reduce exposure misclassification, we excluded patients who lived more than 40 km from this central monitor, as in previous studies.10
For each case, we calculated exposure to PM2.5 by using 3 different exposure assessment strategies. First, we defined exposure based on the 24-hour mean PM2.5 levels on the calendar day of admission (midnight–midnight). This is equivalent to the exposure assessment approach used in most of published time-series studies in which only the date of hospitalization is available. Second, we defined exposure based on the mean PM2.5 levels over the 24 hours preceding the time of hospital presentation. This approach might be used in future studies with more detailed administrative data. Third, we defined exposure based on the mean PM2.5 levels over the 24 hours preceding the estimated time of stroke symptom onset. This approach assesses exposure over the most etiologically relevant period and has been used in certain previous studies of the transient effects of ambient particles on the risk of acute cardiovascular events.11,12 Because hospitalization necessarily follows onset of stroke symptoms, the first 2 exposure assessment strategies would be expected to result in more exposure misclassification compared with the strategy based on time of stroke symptom onset.
We used the intraclass correlation coefficient estimated by 1-way analysis of variance13,14 to assess the agreement between each of the first 2 exposure assessment strategies and the approach based on symptom onset times. We used Bland-Altman plots15,16 to evaluate whether the exposure misclassification under the first 2 approaches was nondifferential, and formally tested this hypothesis with paired t-tests. Analyses were performed with SAS 9.1 (SAS Institute, Inc., Cary, NC).
Impact of Exposure Misclassification on Estimated Regression Coefficients
We performed computer simulations to evaluate the impact of this source of exposure misclassification on the estimated association between PM2.5 and risk of acute ischemic stroke. For simplicity, we assumed that PM2.5 exposure based on the time of onset of stroke symptoms represents the true personal exposure to PM2.5. We compared exposure estimates based on calendar day of hospital admission or date and time of admission to this “true” exposure. These simulations explicitly do not account for any other sources of exposure misclassification.
To mimic the conditions actually observed in the ongoing stroke study, we assume that about 1100 cases of acute ischemic stroke were identified over a 5-year period. First, we simulated the number of cases observed each hour (Yt) during this period as a Poisson random variable:
where, PMt represents the 24-hour moving average for PM2.5 (time = t-24–time = t) and β1 is the hypothesized linear effect of a 10 μg/m3 increase in PM2.5 24-hour moving average on the log rate of hospitalization for stroke. The observed PM2.5 time-series from the ongoing stroke study was used for PMt. Next, for each simulated case we assigned a random delay time between time of symptom onset and hospital presentation, based on the distribution of delay times observed in the ongoing stroke study. Third, we assessed exposure based on date of hospitalization, time of hospital presentation, and time of symptom onset. Finally, for each exposure assessment strategy we evaluated the association between exposure and rate of hospitalization for stroke (1) by using the time-stratified case-crossover design.17 Referents were all days in the same year, month, day-of-week, and time-of-day as the simulated case, excluding the case day. For each simulation, this process was repeated 500 times and the properties of 1 summarized. Relative bias was defined as
. A separate simulation was performed for values of exp(β1) of 1.00, 1.10, 1.20, and 1.30, consistent with the range of values observed in previous studies of PM2.5 and risk of acute ischemic stroke.1,18 Simulations were performed in R v220.127.116.11
Assessment of Exposure Misclassification
There were 1101 patients admitted to the Beth Israel Deaconess Medical Center for a neurologist-confirmed acute ischemic stroke who lived within 40 km of the central ambient particle monitor (Table 1). The mean daily ambient PM2.5 concentration over the study period was 11.3 μg/m3 (SD = 6.6 μg/m3; Table 2). The mean distance between patient residential address and the central monitoring site was 9.6 km (SD = 8.0 km), with the majority of patients (89%) living within 20 km of the central monitor. The 25th, 50th, and 75th percentiles for distance were 3.9, 7.3, and 12.4 km, respectively.
Stroke onset occurred at least 1 calendar day before hospital admission among 53% of patients (median delay = 1 day, range = 0–30 days; Fig. 2A). The difference in PM2.5 exposure based on date of hospitalization versus time of stroke onset ranged from −47 to 36 μg/m3 (mean difference ± SD = −0.1 ± 7.1 μg/m3; 5th–95th percentile = −11.7–9.9 μg/m3; Fig. 2B). The intraclass correlation coefficient between these 2 PM2.5 measurements was 0.45. There was no evidence that the exposure misclassification was systematically differential (P = 0.68 from paired t test of δ = 0). A plot of the differences between each pair of PM2.5 measurements against their averages was symmetric around 0 μg/m3 (not shown). Limiting the analysis to patients living less than 20 km from the monitoring site, did not materially alter the results.
The median delay between time of symptom onset and time of hospital presentation was 13 hours (Fig. 2C). The difference in PM2.5 assessed based on time of hospital presentation versus time of stroke symptom onset ranged from −46 to 32 μg/m3 (mean difference ± SD = 0.1 ± 6.0 μg/m3; 5th–95th percentile = −9.8–9.0 μg/m3; Fig. 2D). The intraclass correlation coefficient between the 2 measurements was 0.60. There was no evidence of systematic differential misclassification (P = 0.74 from paired t test of δ = 0). A plot of the differences between each pair of PM2.5 measurements against their averages was symmetric around 0 μg/m3 (not shown). Limiting the analysis to patients living less than 20 km from the monitoring site did not materially alter the results.
Impact of Exposure Misclassification on Estimated Regression Coefficients
We performed computer simulations to evaluate the impact of this exposure misclassification on the estimated association between PM2.5 and risk of acute ischemic stroke. Under the null hypothesis of no effect of PM2.5 [ie, β1 = ln(1.0)], all 3 exposure assessment strategies yielded similar unbiased results (Fig. 3). However, under the alternative hypotheses of a positive association between PM2.5 and risk of acute ischemic stroke [β1 = ln(1.1), ln(1.2), ln(1.3)], using an exposure assessment strategy based on date of hospitalization resulted in attenuation of 1 by 60%–66% compared with assessing exposure based on time of stroke onset, whereas using an exposure assessment strategy based on time of hospitalization presentation resulted in attenuation of 1 by 37%–42%.
Epidemiologic studies of the short-term effects of ambient air pollution on the risk of acute cardiovascular events often use data from administrative databases in which only the date of hospitalization is known. However, because true time of event onset may precede hospitalization by hours or days, using an exposure assessment strategy based on the date of hospitalization may lead to substantial exposure misclassification. In this study we estimated the degree of exposure misclassification by using as an example data from an ongoing study of the effects of ambient PM2.5 on the risk of hospitalization for acute ischemic stroke. We found that among 1101 patients hospitalized for acute ischemic stroke, date of hospitalization, and time of hospital presentation were associated with varying degrees of misclassification of stroke onset time. Specifically, stroke symptoms began a median of 13 hours before hospital admission and occurred on a different calendar day from hospital admission among more than half of all patients. Furthermore, our simulation studies show that an exposure assessment strategy based on date of hospitalization can attenuate estimates of PM2.5 health effects by about 60% compared with using stroke onset times.
The difference between the average PM2.5 levels on the date of hospitalization versus the 24-hour period before symptom onset had a range of 83 μg/m3 and SD of 7.1 μg/m3. The range between the 5th and 95th percentiles of differences between measurements was 22 μg/m3, demonstrating that more than 10% of exposure measurements were extremely misclassified. The observed intraclass correlation coefficient of 0.45 indicates that more than half of the variance in exposure is attributable to misclassification of the onset time, rather than between-day differences in exposure. Moreover, an exposure assessment strategy based on the time of hospital presentation—as might be available in administrative databases with more detailed data—performed only somewhat better. The difference between the mean PM2.5 levels over the 24-hour period before the time of hospital presentation versus the 24-hour period before stroke onset had a range of 78 μg/m3 and the intraclass correlation coefficient of 0.60. Importantly, misclassification of the onset time resulted in exposure misclassification that was nondifferential.
The exposure misclassification quantified in this study is consistent with the classic error model,20,21 in which the observed exposure x* (here, exposure based on the date of hospitalization) is randomly distributed around the true exposure x (here, exposure based on stroke onset time). Although x* is an unbiased estimate of x [ie, E(x*|x) = x], the use of x* in regression analyses is expected to attenuate associations between exposure and the outcome y. In a simple linear regression model with 1 explanatory variable, such that E(y|x) = α + βxx, if population exposure and measurement error are normally distributed, it has been shown that the regression coefficient βx is attenuated by a factor that ranges between 0 and 1 and is given by c = var(x)/[var(x)+var(x*–x)]. Thus, the degree of attenuation increases with the variance of exposure measurement error (x*−x). In this simple model, the exposure misclassification observed in the current study would be predicted to reduce the association between daily PM2.5 levels and the risk of acute ischemic stroke by about 50%. In reality, however, the problem is more complex, and a simple linear regression model would be inappropriate. Because of the discrete nature of the outcome and the choice of study design, the data are analyzed using conditional logistic regression.
Accordingly, to evaluate the impact of the exposure misclassification in this setting, we conducted a simulation study. We observed a similar degree of attenuation empirically through computer simulations based on the degree of exposure misclassification observed in the ongoing stroke study. Under hypotheses of a positive association between PM2.5 and risk of acute ischemic stroke, using an exposure assessment strategy based on date of hospitalization or time of hospital presentation, the observed association was attenuated by 40%–60% compared with exposure based on time of stroke symptom onset. When simulations were run under the null hypothesis of no association between PM2.5 and hospitalization, all exposure assessment strategies led to unbiased results. Therefore, misclassification of onset time is unlikely to have contributed to positive associations in previous studies.
Results from a few published studies provide anecdotal evidence consistent with the notion that the use of misclassified onset times of acute events may bias health effect estimates toward the null. For example, Rich et al12 evaluated the association between ambient levels of PM2.5 and the risk of ventricular arrhythmias as recorded by implanted cardioverter defibrillators and found an 8% (95% CI = −6%–24%) increased risk per interquartile range increase in PM2.5 when using the midnight-to-midnight mean PM2.5 levels on the same day, but a 19% (2%–38%) increase in risk when using the mean PM2.5 exposure over the 24-hour preceding event onset. Although the impact of onset time misclassification is substantial in this example, the impact may be even more substantial in the context of studies of hospital admissions in which the date of event onset is less precisely known than in the case of ventricular arrhythmias recorded by implanted cardioverter defibrillators.
We used the example of acute ischemic stroke in the Boston metropolitan area to describe onset time and exposure misclassification in typical time-series studies of daily air pollution and acute cardiovascular events. The magnitude of the impact of onset time misclassification observed in this study may not be generalizable to studies in other locations, or of acute events other than acute ischemic stroke. Specifically, we expect that the amount of attenuation of the pollutant–health effect association is influenced by the distribution of prehospital delay times and the temporal characteristics of the pollution time-series being considered.
Our study has other potential limitations. First, we assessed only the exposure misclassification due to misinformation on timing of event onset and did not account for other known sources of exposure misclassification. Second, stroke onset times were estimated based on neurologists’ notes in patient medical records and are likely misclassified with respect to true stroke onset times. However, this residual error is expected to be nondifferential with respect to air pollution levels and would not materially alter the conclusions of this analysis.
Epidemiologic studies of PM-related risk of acute cardiovascular events based on date of hospitalization can substantially underestimate the strength of associations with health effects. Using data on time of hospital presentation rather than date of hospitalization would only marginally attenuate this important source of bias. Whenever possible, investigators should consider collecting detailed data on time of event onset in studies of the effect of environmental exposures on acute events.
1. Wellenius GA, Schwartz J, Mittleman MA. Air pollution and hospital admissions for ischemic and hemorrhagic stroke among medicare beneficiaries. Stroke
2. Zanobetti A, Schwartz J. The effect of particulate air pollution on emergency admissions for myocardial infarction: a multicity case-crossover analysis. Environ Health Perspect
3. D'Ippoliti D, Forastiere F, Ancona C, et al. Air pollution and myocardial infarction in Rome: a case-crossover analysis. Epidemiology
4. Dominici F, Peng RD, Bell ML, et al. Fine particulate air pollution and hospital admission for cardiovascular and respiratory diseases. JAMA
5. Bell ML, Samet JM, Dominici F. Time-series studies of particulate matter. Annu Rev Public Health
6. Lu Y, Zeger SL. On the equivalence of case-crossover and time series methods in environmental epidemiology. Biostatistics
7. Rosamond WD, Reeves MJ, Johnson A, et al. Documentation of stroke onset time: challenges and recommendations. Am J Prev Med
. 2006;31(suppl 2):S230–S234.
8. Evenson KR, Rosamond WD, Vallee JA, et al. Concordance of stroke symptom onset time. The Second Delay in Accessing Stroke Healthcare (DASH II) Study. Ann Epidemiol
9. Allen G, Sioutas C, Koutrakis P, et al. Evaluation of the TEOM method for measurement of ambient particulate mass in urban areas. J Air Waste Manag Assoc
10. Dockery DW, Luttmann-Gibson H, Rich DQ, et al. Association of air pollution with increased incidence of ventricular tachyarrhythmias recorded by implanted cardioverter defibrillators. Environ Health Perspect
11. Peters A, Dockery DW, Muller JE, et al. Increased particulate air pollution and the triggering of myocardial infarction. Circulation
12. Rich DQ, Schwartz J, Mittleman MA, et al. Association of short-term ambient air pollution concentrations and ventricular arrhythmias. Am J Epidemiol
13. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods
14. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull
15. Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Statistician
16. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet
17. Lumley T, Levy D. Bias in the case-crossover design: implications for studies of air pollution. Environmetrics
18. Tsai SS, Goggins WB, Chiu HF, et al. Evidence for an association between air pollution and daily stroke admissions in Kaohsiung, Taiwan. Stroke
19. R Development Core Team. R: A Language and Environment for Statistical Computing
. Vienna, Austria: R Foundation for Statistical Computing; 2005.
20. Thomas D, Stram D, Dwyer J. Exposure measurement error: influence on exposure-disease. Relationships and methods of correction. Annu Rev Public Health
© 2009 Lippincott Williams & Wilkins, Inc.
21. Zeger SL, Thomas D, Dominici F, et al. Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environ Health Perspect