The U.S. Environmental Protection Agency (EPA) has some nerve—in the sense of scientific courage. That is a trait not often associated with government agencies, but the risk the EPA took in commissioning the 3 analyses published in this issue of the journal1–3 must be recognized and applauded. The implications of this exercise go far beyond the question of the ozone mortality effect, the ostensible focus of these papers. In commissioning this examination in triplicate, the agency was testing not just the ozone-mortality hypothesis, but the methods of science itself—methods the EPA and others have used to justify regulations in many areas. If these methods had failed this test, there would have been broad repercussions for the entire field of environmental risk assessment, not to mention the field of evidence synthesis.
The results of this exercise will undoubtedly elicit a substantial sigh of relief in many quarters and perhaps an equal degree of consternation. The 3 groups used a wide diversity of methods and assumptions as outlined in Table 1. There were differences in the studies selected, the estimates used, the numbers abstracted, the confounders considered, the models used, the conversion factors applied, the subgroups investigated, and the alternatives explored. These many methodologic approaches could be contrasted and critiqued, but the bottom lines were remarkably consistent, within a fraction of a percent—a 0.8% increase in immediate mortality per 10-ppb increase in average daily ozone over the year, with most or all of this risk concentrated in the warmer months.
However, behind the sigh of relief must be some discomfiture. Agreement aside, we are also given a glimpse of that Holy Grail—truth—that looks somewhat different than suggested by the meta-analyses. The meta-analyses depended on published single-city analyses, each of which used different kinds of data, different analytic techniques, and different reporting—severely limiting the meta-analysts’ ability to control for confounding effects in a sophisticated and uniform manner. However, 2 of the meta-analyses threw in, as a bonus, a primary analysis of independent, multicity air-quality data, with the National Morbidity and Mortality Air Pollution Study (NMMAPS) representing the mother lode of such information: a longitudinal study of air quality in 95 U.S. cities over a 14-year period.
Although the NMMAPS analysis1 does not qualitatively contravene the meta-analytic results, in that it still shows an ozone hazard, it does point strongly to a smaller effect—less than one third of the risk. Ito et al3 contrasted their meta-analysis with a reanalysis of primary air quality data from 7 U.S. cities. The weather-modeling approach closest to NMMAPS (“four-smoother”) produced a point estimate approximately 40% lower than their meta-analysis. Both the NMMAPS and 7-city contrasts send a strong message that depending on published, single-estimate, single-site analyses is an invitation to bias. This is not the first time that such bias has been demonstrated,4,5 but it may be the most compelling demonstration yet. The most plausible explanation is the one suggested by the authors, that investigators tend to report, if not believe, the analysis that produces the strongest signal; and in each single-site analysis, there areinnumerable model choices that affect the estimated strength of that signal. Both Bell et al1 and Ito et al3 provide empiric evidence within their meta-analyses to support this explanation.
There are other meta-lessons to be learned from these meta-analyses. Although the point estimates were almost identical, the stochastic uncertainty varied considerably; the confidence interval width of 0.59% in the study by Bell et al1 was 55% larger than the 0.38% reported by Ito et al.3 The slightly different set of papers and estimates used in the calculations may have contributed to this difference, but it is more likely due to the fully Bayesian approach applied by Bell and colleagues, which produced interval estimates that reflect the uncertainty in the heterogeneity parameter. This example shows the importance of that dimension of uncertainty, and it is not hard to imagine other situations in which a difference of that magnitude could make a qualitative difference in the inference. In environmental risk assessment, in which heterogeneity is the norm, the use of models that adequately represent the stochastic uncertainty in the system is critical.
The authors of all 3 of these meta-analyses emphasize the wariness with which we must approach single-site studies. However, analysts who conducted the single-city studies are not necessarily to blame for this state of affairs. Without an a priori biologic model that would tell us definitively which of several lag periods or weather-control models should take priority, one naturally looks for clues from the data. If one had to pick a single estimate, the largest or statistically strongest is a natural one to report, and, indeed, epidemiologists are trained to do exactly that. A key lesson here is the hazard of reporting only one estimate or estimates from one model. Only an array of models, exposure definitions, and outcome measures (as Ito et al3 present in their analysis of the 7-city data) can properly communicate what the data are really telling us and what they are not. The stochastic element represents the minimum uncertainty; even with the widest confidence (or posterior) intervals, the meta-analytic and NMMAPS intervals do not overlap. If these intervals incorporated model uncertainty and perhaps reporting bias, they would be substantially wider and less inconsistent.
Although there is little doubt that huge, multisite studies with uniform measurements and sophisticated analyses are preferable to meta-analyses of single-site reports, there are many areas in environmental assessment where an NMMAPS equivalent is not planned or perhaps possible (eg, water quality and cancer risk). These results may add some urgency to the development of larger, more uniform studies, but until such study results are available, we will have to rely on the kinds of meta-analytic approaches used here. This exercise shows that the value of single-site studies may depend on the comprehensiveness of their analytic strategies and reporting. A failure to be comprehensive in that reporting not only diminishes their value in the collective enterprise of finding the truth, it may indeed distort it. It also underscores the value of making datasets accessible to other researchers after the primary analyses are published so common analytic approaches can be applied. The Internet-based Health and Air Pollution Surveillance System (IHAPSS)6 is a model for how both data and analytic strategies can be shared with the research community.
A closely related lesson is that studies of this kind need to be reported in light of what has been learned from prior studies. Studies from any one region or city must be reported with the assumption that their results will be of value only if they can be used as part of a larger evidence synthesis. If they report results in ways that are not consistent with the prior literature, or they choose parameter or model choices that prior studies have shown to be unlikely, they are making it much more difficult for their results to contribute to the collective body of knowledge. Perhaps calls should be made in the environmental risk assessment field similar to those in the field of clinical trials that systematic reviews precede and guide the design and interpretation of any study.7 A more ambitious agenda would be to preplan meta-analyses, prospectively.8 This would require agreement and standardization within the environmental community of how individual studies should be analyzed and reported with standards enforced by journals (as with the CONSORT guidelines in the biomedical arena9).
In the absence of NMMAPS or other multisite analyses, some observers might have taken the agreement of the meta-analyses as confirmation that the meta-analytic method was reliable. However, if our observational methods are all subject to the same biases, as meta-analyses are when they are derived from the same pool of studies, the agreement criterion is testing a narrow range of assumptions. The situation can be likened to multiple eyewitnesses choosing the same alleged perpetrator out of a lineup, with the NMMAPS data being the DNA evidence showing that they were all wrong. In the absence of definitive evidence, it behooves us to understand the causes of errors in our observational methods and use procedures to minimize them. These papers provide a roadmap for doing exactly that. Aside from the estimates of ozone risk, their most valuable aspect is the deeper understanding they provide of the current weaknesses of single-site reports, and the implicit challenge they issue to the environmental epidemiology community to either strengthen or avoid such studies.
ABOUT THE AUTHOR
STEVEN GOODMAN is an Associate Professor of Oncology, Pediatrics, Biostatistics and Epidemiology at the Johns Hopkins Schools of Medicine and Public Health. He is editor of the journal Clinical Trials and PI of Project ImpACT (http://www.projectimpact.info), a project to find and profile the most important clinical trials ever done in medicine and public health. He writes about issues of inference and evidence synthesis in clinical trials and epidemiology.
1. Bell ML, Dominici F, Samet JM. A meta-analysis of time-series studies of ozone and mortality with comparison to the National Morbidity, Mortality and Air Pollution Study. Epidemiology
2. Levy J, Chemerysnski SM, Sarnat JA. Ozone exposure and mortality: an empirical Bayes meta-regression analysis. Epidemiology
3. Ito K, DeLeon SF, Lippmann M. Associations between ozone and daily mortality: analysis and meta-analysis. Epidemiology
4. Kelsall JE, Samet JM, Zeger SL, et al. Air pollution and mortality in Philadelphia, 1974–1988. Am J Epidemiol
5. Anderson HR, Atkinson RW, Peacock JL, et al. Ambient particulate matter and health effects: publication bias in studies of short-term associations. Epidemiology
6. Internet-based Health and Air Pollution Surveillance System. Available at: www.ihapss.jhsph.edu/
. Accessed April 29, 2005.
7. Savulescu J, Chalmers I, Blunt J. Are research ethics committees behaving unethically? Some suggestions for improving performance and accountability. BMJ
8. Berlin JA, Colditz GA. The role of meta-analysis in the regulatory process for foods, drugs, and devices. JAMA
9. Moher D, Schulz KF, Altman D. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA