&#x201C;We caution against selecting any particular model as &#x2018;correct&#x2019;&#x2026;&#x201D;
From 1The Johns Hopkins Bloomberg School of Public Health, Department of Epidemiology and
2The Johns Hopkins Bloomberg School of Public Health, Department of Biostatistics, Baltimore, MD.
Address correspondence to: Jonathan M. Samet, The Johns Hopkins Bloomberg School of Public Health, Department of Epidemiology, Rm W6041, 615 N. Wolfe Street, Baltimore, Maryland 21205; email@example.com
Editors’ note: Another invited commentary on this topic appears on page 13.
Since the early 1990s, numerous time series studies have linked daily mortality counts to levels of particulate air pollution on the same or recent days. 1,2 Such studies of morbidity indicators, hospitalization and clinical status, for example, have provided complementary evidence for adverse effects of particulate air pollution on the public's health. The daily time series studies of air pollution, together with findings of prospective cohort studies that indicate increased mortality associated with long-term exposure to air pollution, have motivated reassessment of air-quality standards for particles in the United States and Europe.
The time series studies of acute effects have largely been of similar design, involving analyses of databases of daily counts of events, daily levels of particles and other air pollutants measured at central site monitors, and daily data on weather, which is a potential confounding factor. The analyses have typically controlled for weather, season and other longer-term time-varying factors (eg, trends of disease mortality) to minimize confounding of the effect estimates for the air pollutant, which may be associated with weather and season. Time series studies estimate relative rates of mortality/morbidity, generally interpreted as percentage increase in mortality/morbidity per unit increase in the air-pollutant levels. These studies have used regression models with nonlinear functions of time and weather variables, including generalized additive models (GAMs) with smoothing splines and generalized linear models (GLMs) with natural cubic splines. Use of GAM became very popular in the mid-1990s with implementation using the S-Plus function gam. 2 We used this software in extensive analyses of air pollution, mortality and hospitalization in the National Morbidity, Mortality and Air Pollution Study (NMMAPS). 3,4
In this issue of Epidemiology, Ramsay and colleagues 5 point out that the S-Plus function gam uses a computational approximation which, in the presence of a large correlation between the nonlinear functions included in the model (called concurvity), can underestimate the standard errors of the relative rates. We have recently identified and described another limitation of the S-Plus function gam. 6 In an in-depth exploration of model sensitivity, we discovered that the gam default convergence criteria (S-Plus Version 3.4) were not sufficiently rigorous for these analyses; the result was an overestimation of the effect of particulate air pollution on mortality. In our initial exploration of the sensitivity of model findings to the details of model specification, we have found a complex interplay among the extent of smoothing of time-related confounding, the extent of concur-vity and the degree of bias in estimates. 6
Thus, studies using the gam function in S-Plus with default criteria might have overestimated the magnitude of the risk to public health posed by air pollution, tending to provide risk coefficients that were biased upwards and estimated with overstated precision. In the NMMAPS analyses, 3,4 for example, a pooled estimate based on the 90 largest U.S. cities was 0.41% increase in mortality per 10 μg/m3 increase in PM10 (particulate matter less than 10 μ in aerodynamic diameter), with a posterior standard error of 0.05, when the standard gam convergence criteria were used; with use of substantially more strict convergence criteria, 6 the estimate dropped to 0.27% per 10 μg/m3 increase in PM10, again with a posterior standard error of 0.05. For both the original and the revised analyses, there was strong evidence for an effect of air pollution on mortality; posterior probabilities for the PM10 coefficient exceeding zero were essentially 1.0 for both analyses. Pooled estimates from multisite time series are not affected by the underestimation of the standard errors in gam. Multisite time series studies are analyzed by using hierarchic models that estimate the uncertainty in the pooled estimate by the sum of the within-city plus the between-city variance (total variance). 7 Therefore, in hierarchic models, the underestimation of the within-city variance may be balanced by the overestimation of the between-city variance, without effecting the total variance. 8
The community of air-pollution researchers is now faced with the obligation of repeating analyses that have used the gam function and considering further methodologic issues, such as that described by Ramsay and colleagues. 5 These methodologic issues are important when the air pollution effects are small and possibly confounded by varying processes, such as weather, which are correlated with pollution exposures. What are the alternative strategies for modeling daily time series data? Of course, there is no “correct” model. We have compared GAMs with GLMs with natural cubic splines for confounder adjustment. 6 The pooled estimate obtained with GLM for the 90 NMMAPS cities (0.21% per 10 μg/m3 increase in PM10) is slightly lower than the estimate obtained with GAM and the updated convergence criteria. We caution against selecting any particular model as “correct,” and we urge researchers to explore the sensitivity of findings to model selection and to the degree of adjustment for confounding factors.
These methodologic issues in time series analyses of air pollution data were identified as the U.S. Environmental Protection Agency was carrying out its process of evidence review for the National Ambient Air Quality Standard (NAAQS) for particulate matter. This process involves the compilation of all relevant evidence since the last review into a comprehensive document, the Criteria Document. In the most recent draft Criteria Document, the time series studies, including NMMAPS, were covered extensively and considered as providing clear evidence of an adverse effect of particulate matter air pollution on human health. 2 The Environmental Protection Agency is also using the effect estimates from the time series studies in a quantitative risk assessment mandated by the Clean Air Act. The new analyses continue to provide strong evidence of an association between acute exposure to particles and mortality. However, the updated estimates of burden of disease and death attributable to acute exposure are smaller. It is important to remember that time series studies only quantify the effects of acute exposure and do not address the larger question of whether chronic exposure increases the risk of disease and death.
Many “lessons learned” might be listed based on the report by Ramsay and colleagues 5 and our recent findings. 6 The difficulty of detecting the small signal of the effect of air pollution amidst the noise of the many other factors affecting mortality merits emphasis. To find this signal, we are analyzing large and complicated databases with models that inherently make assumptions. We are learning just how sensitive the model results are to these assumptions and finding that some of the tools that we have been using need to be improved for this application. Faster computers can now overcome software limitations easily. The S-Plus default convergence parameters have already been revised in the new S-Plus version, and substantially more stringent parameters can be used without much loss in computing time. In addition, revisions of GAM software implementations, allowing “exact” calculations of the standard errors, are underway. We have also learned again that a community of inquisitive researchers will continue to refine their work and replace less adequate approaches with better ones.
About the Authors
THE AUTHORS, all faculty of the Johns Hopkins Bloomberg School of Public Health, have teamed to carry out the National Morbidity Mortality and Air Pollution Study (NMMAPS), a national approach to characterizing the health effects of air pollution. The team includes epidemiologic, biostatistical and clinical expertise, and its work emphasizes development of new methods for environmental epidemiology. The group recently identified an issue with statistical software in analyses of the NMMAPS and other data that has lead to reanalysis and reevaluation of major time series studies of air pollution.
1. Pope CA III, Dockery DW. Epidemiology of particle effects. In: Holgate ST, Samet JM, Koren HS, Maynard RL, eds. Air Pollution and Health. San Diego: Academic Press, 1999; 673–705.
2. U.S. Environmental Protection Agency (EPA), National Center for Environmental Assessment. Air Quality Criteria for Particulate Matter. Research Triangle Park, NC: U.S. Environmental Protection Agency, 2002.
3. Samet JM, Zeger S, Dominici F, et al
. The National Morbidity, Mortality, and Air Pollution Study (NMMAPS). Part 2. Morbidity and Mortality from Air Pollution in the United States. Cambridge, Mass: Health Effects Institute, 2000.
4. Samet JM, Zeger S, Dominici F, Dockery D, Schwartz J. The National Morbidity, Mortality, and Air Pollution Study (NMMAPS). Part I. Methods and Methodological Issues. Cambridge, Mass: Health Effects Institute, 2000.
5. Ramsay T, Burnett R, Krewski D. The effect of concurvity in generalized additive models linking mortality to ambient air pollution. Epidemiology 2003; 14: 18–23.
6. Dominici F, McDermott A, Zeger SL, Samet JM. On generalized additive models in time series studies of air pollution and health. Am J Epidemiol 2002; 156: 193–203.
7. Dominici F, Samet J, Zeger SL. Combining evidence on air pollution and daily mortality from the largest 20 U.S. cities: a hierarchical modeling strategy (with discussion). J Royal Stat Soc Ser A 2000; 163: 263–302.