Click on the links below to access all the ArticlePlus for this article.
Please note that ArticlePlus files may launch a viewer application outside of your web browser.
The short-term associations between ambient particles and health effects are usually investigated by analyzing time-series of daily data using regression techniques. A common approach is to relate counts of routinely available outcomes (such as daily numbers of deaths or hospital admissions) from a whole population, usually a city, to pollution levels on the same or previous days. The other main approach is to follow a panel of individual subjects over a number of days to obtain data on symptoms, lung function, or use of medications. We use the term “time-series study” to refer to either type of study.
The results of time-series studies have played a central role in identifying the potentially toxic effects of ambient particulate air pollution at levels below national guidelines. The estimates obtained are typically small, but the large number of positive associations reported by investigators around the world, together with emerging evidence of plausible biologic mechanisms, has convinced most public health authorities that there is a health case for stringent abatement strategies.1–5 The size and shape of the concentration-response relationships have also assumed importance for quantifying the health impacts of air pollution6–9 and the health benefits of regulations designed to abate emissions.10
It is important to consider whether the evidence on which these inferences and quantitative estimates are based has been affected by publication bias, a process that leads to the published literature being unrepresentative of the totality of evidence.11–16 Publication bias occurs when studies showing evidence for associations in a particular direction are selectively published. In the case of air pollution, we postulated that studies with evidence of positive (adverse) associations would be more likely to be published than those with evidence for negative associations. If present, publication bias might lead to the adoption of a false hypothesis (in the present case, that air pollution is associated with adverse health effects), or to an estimate of a true effect that is biased away from the null11,17; either of these consequences would be important to air pollution science and policy.
Publication bias might be more likely in the case of whole-population time-series studies than for some other types of observational study for several reasons. First, whole-population studies often use data such as daily mortality counts, which are easily accessible from routine data collection systems. Such easy access might make researchers less determined to pursue publication of negative findings. Second, the complex statistical modeling used for analyzing time-series studies may involve a degree of subjective judgment on the part of the analyst, which might be biased by knowledge of the result. Finally, each study has the potential to generate a large number of estimates, and those chosen for publication may have been selected on the basis of the direction of their effect. The likelihood of publication bias will have been reduced by design in multicity time-series studies18–20 that have used a common protocol for analysis and reporting.
With occasional exceptions,21,22 major reviews and metaanalyses have not mentioned the possibility of publication bias.1–3,23,24 None has formally investigated the presence or size of any such bias. In this article, we investigate the presence of publication bias in time-series studies of particulate matter and daily measures of 4 outcomes: all-cause mortality, hospital admissions for chronic obstructive lung disease (COPD), tests of lung function, and incidence of cough symptom. We also investigate the related bias that may result from the post-hoc selection of effect estimates based on their size and direction, and on the size of their P values.
Identification of Relevant Studies
Medline, Embase, and Web of Science databases were used to identify all peer-reviewed time-series papers published up to January 2002. The search strings are available on request. We placed no constraints on language or date of publication. The search criteria were developed using known papers and then tested several times against citations in published reviews of the literature. We downloaded the full reference and abstract for each paper from the source bibliographic databases into a reference manager database (Reference Manager; ISI Researchsoft, Carlsbad, CA).
These abstracts were reviewed to exclude all that obviously did not contain time-series results; for the remainder, the entire paper was obtained. We then assessed the papers to see if they provided data that could be included in a database of results suitable for quantitative analysis. This assessment addressed the appropriateness of the statistical methods used, data quality, and the provision of the information needed to standardize the effect estimates. The data characterizing the effect estimates were entered into a relational database (ACCESS, Microsoft Corp.).
Time-series studies have the potential to estimate effects of exposure to pollution measured in single days, such as 1 day before the health outcome measure (lag 1), or the average exposure over a number of days before the outcome day. It is common for studies to analyze several possible lags. However, there is no standard approach to the analysis, selection, or presentation of the various lags, which means that it was not possible from the data presented in the papers to select the same lag from each study. Nevertheless, we needed to select an estimate for our metaanalysis. This selected estimate was identified as follows. If only 1 lag estimate was presented (either because only 1 lag was examined or only 1 was presented in the paper), this was recorded as the author-selected lag. If estimates for more than 1 lag were presented, we chose the estimate mentioned by the author in the abstract or emphasized in the presentation of the results or discussion. If the author did not indicate a preference, or an a priori basis for choice of estimate, we chose 1 based first on the smallest P value and then on the size of the estimate. This policy was adopted because it seemed implicit in most papers and had been explicit in some studies.25 However, to avoid introducing bias, we applied these criteria irrespective of the direction of effect. The implications of this selection policy on the summary estimate, heterogeneity, and funnel plot asymmetry were investigated using data from 2 multicity studies, the National Morbidity, Mortality and Air Pollution (NMMAPS) study19 and the Pollution Effects on Asthmatic Children in Europe (PEACE) study20 (see subsequently).
We examined the database to identify the outcomes for which there were the most studies of particulate matter with aerodynamic diameter <10 μm (PM10), total suspended particulate matter, or black smoke. These were: daily all-cause mortality (PM10, total suspended particulate matter, black smoke); daily admissions for COPD (PM10); peak expiratory flow in children with respiratory symptoms (PM10); and cough symptom in children with respiratory symptoms (PM10). For each study, we calculated a regression estimate for a pollution increment of 10 μg/m3 using information contained in the paper. Summary estimates were calculated using inverse variance weighting; where there was evidence of heterogeneity (P <0.05), random effects estimates were used.26
Single-city and Multicity Studies
We divided estimates into those from multicity studies and those from single-city studies. Multicity studies are those in which a number of cities were investigated by 1 research collaboration using a standardized protocol and the results published in metaanalytic form. It is likely that the findings of these studies would be published whatever the result. The 3 studies classified as multicity were Air Pollution and Health, a European Approach (APHEA),18 NMMAPS,19 and PEACE.20 All other studies were classified as single-city studies.
Analysis of Publication Bias
Publication bias may manifest itself as an association between study size and study precision.27 The funnel plot28 is a simple scatterplot of study effect against study precision. Estimates from smaller studies tend to be scattered more widely than those of larger studies as a result of their relatively greater random variation. In the absence of bias, the plot resembles an inverted symmetric funnel. An asymmetric funnel plot suggests that there is an excess of small studies with estimates biased in a particular direction. Publication bias is a common reason for such a pattern, although there may be other explanations (see the “Discussion”). In our presentation of funnel plots, we have used the inverse of the variance rather than the standard error as a measure of precision because this increases the visual contrast between studies of higher and lower power.
The funnel plot is assessed subjectively. Two statistical tests were used to help assess the evidence for asymmetry in the plots. Egger's linear regression test29 regresses the standardized effect estimate against the inverse of the standard error. A nonzero intercept provides evidence that the funnel plot is asymmetric. Begg's test30 is an adjusted rank correlation method to examine the association between the study estimates and their variances. Sterne et al.31 have shown that in circumstances where there are reasonable numbers of studies in the metaanalysis, including a number of large studies, the Begg's test can be too conservative. We have therefore tended toward the Egger test of asymmetry when the P values for the 2 tests differed considerably.
Where asymmetry was indicated, we used the trim-and-fill method to adjust the summary estimate for the observed bias.32 This method removes small studies until symmetry in the funnel plot is achieved—recalculating the center of the funnel before the removed studies are replaced together with their “missing” mirror-image counterparts. A revised summary estimate is then calculated using all of the original studies, together with the hypothetical “filled” studies.
Using these methods, we confirmed our prediction that the multicity studies, NMMAPS,19 APHEA,18 and PEACE,20 would, by their very nature, not be subject to publication or related forms of bias. Our analyses of asymmetry are therefore confined to single-city studies.
Sensitivity of Estimates to Lag Selection Policy
Because it is common to fit separate statistical models for pollution concentration lagged by various numbers of days, it is possible that another source of asymmetry and publication bias could be the selective reporting of the lag with the largest size or the lowest P value. We examined the potential for this by analyzing different lag combinations from NMMAPS (unpublished data provided by the authors) and PEACE,20 for which several lags were reported systematically for all the cities. We compared the summary estimate based on an a priori lag selection with that based on selecting from each study the estimate with the lowest P value irrespective of its direction and in an adverse direction only.
Screening of the bibliographic databases and abstracts identified 579 papers, of which 297 appeared to contain the results of population time-series studies. All of the references to the papers considered and excluded from the analysis are in Appendix 1 (available with the electronic version of this article). After detailed examination of the full paper, 57 papers were excluded for 1 or more of the following reasons: acceptable estimates but no standard errors given (n = 7); no daily data (n = 8); less than 12 months of data (n = 15); inappropriate or inadequately described control for seasonal or other environmental variables (n = 15); subgroup analysis only (n = 5); outcome or pollutant not usable (n = 3); and duplication (n = 4). From the remaining papers, we extracted our selected estimates for 4 particle-outcome pairs: 185 estimates for PM10 and daily all-cause mortality, 44 for total suspended particulates and daily mortality, 47 for black smoke and daily mortality, and 39 for PM10 and COPD admissions. For daily mortality, 111 estimates for PM10 were from the 2 multicity population studies APHEA,18 and NMMAPS.19 Only 12 studies had been published before 1995, the majority of these for total suspended particulates and daily mortality.
Our search for panel studies yielded 193 papers, of which 95 papers presented relevant results. None was excluded on quality or technical grounds. Forty papers provided estimates for PM10 and peak expiratory flow, and 38 for PM10 and cough. For peak expiratory flow and cough, 28 estimates were obtained from the multicity PEACE study.20
The summary estimates for the selected coefficients are shown in Table 1. All of the references, lags, and estimates used in this analysis are in Appendix 2 (available with the electronic version of this paper). For the single-city studies, Table 1 also shows the P values for the tests of asymmetry and the effect on the estimate of adjusting for this asymmetry. We found visual and statistical evidence of funnel plot asymmetry for daily mortality associated with all 3 particle measures (PM10, total suspended particulates, and black smoke) (Fig. 1A–C). Adjustment for asymmetry using the trim-and-fill technique reduced the estimate (for a 10-μg/m3 increment of pollution) for PM10 and daily mortality from 1.006 to 1.005, and for black smoke and daily mortality from 1.006 to 1.005. The estimate for total suspended particulates and daily mortality did not change, however. When the estimates for single- and multicity cities were combined, there was no suggestion of asymmetry and the adjusted estimate remained at 1.006. The single-city summary estimate for PM10 and daily mortality was larger than that reported by NMMAPS but similar to that reported by APHEA.
There was also evidence of asymmetry for PM10 and daily hospital admissions for COPD, as indicated by the funnel plot (Fig. 1D) and the regression test (Table 1). Adjustment reduced the estimate from 1.013 to 1.011.
The evidence for asymmetry for PEFR and PM10 was weaker; although the funnel plot looked asymmetric, the P value for the regression test was high (0.46). However, with only 12 studies, the power to detect asymmetry was low (Table 1 and Fig. 1E). The summary estimate of −0.151 l/min was more than twice as large as the effect of −0.067 L/min observed in the PEACE study at lag 1. For cough symptom, there was clearer evidence of publication bias with asymmetry in the funnel plot (Fig. 1F), a conclusion supported by the regression test. Adjustment for this asymmetry reduced the estimate for the odds ratio by 40% from 1.025 to 1.015.
The sensitivity of the estimates to lag selection is shown in Tables 2 and 3. Table 2 is based on data from the NMMAPS study in which bias in the process of analysis and publication was eliminated by design, and for which the results for lags 0, 1, and 2 were available, by courtesy of the authors (Dominici F, et al., unpublished data, December 2002). The summary estimates for lags 0, 1, and 2 were 1.003, 1.004, and 1.003, respectively. When the lag with the smallest P value in an adverse direction was chosen from among the 2 lags 0 and 1, the summary estimate increased to 1.005 (25% to 65% increase). When the choice was among 3 lags, the estimate increased still further to 1.007 (75% to 133% increase).
A similar analysis of PEACE data for PM10 and PEFR is shown in Table 3. The estimates for lags 0, 1, and 2 were 0.045, −0.067, and 0.021 L/min, respectively. When the lag with the smallest P value in an adverse direction was chosen from the 2 lags 0 and 1, the resulting estimate (−0.073 L/min) was larger than any of the single lag estimates. When the most adverse among 3 lags was chosen, the estimate was larger still (−0.094 L/min).
For both NMMAPS and PEACE studies, selection of the lag with the smallest P value, irrespective of direction, was associated with greater evidence of heterogeneity, but without asymmetry, compared with any 1 single lag was analyzed.
We found evidence that associations between ambient particulate pollution and various health outcomes reported by single-city studies were affected by publication bias. Statistical adjustment for publication bias reduced the size of estimates by as much as 40%. The adjusted estimates from single-city studies tended to be higher than those from planned multicity studies in which publication and lag selection bias were eliminated by design. Using multicity data for several lags, we also demonstrated that preferential selection of the lag in an adverse direction with lowest P value had the potential to increase the regression estimate of PM10 and daily mortality by up to 130%. A similar analysis of a prospective multicity study of PM10 and lung function in panels of children with chronic respiratory symptoms showed a 40% increase in the estimated adverse effect on lung function if there was biased lag selection.
We made every effort to identify and include all eligible studies for which an estimate of effect could be identified. No panel studies were excluded. We excluded only those whole-population time-series studies that could not, for methodologic reasons, provide an estimate for quantitative metaanalysis. Time-series studies usually examine associations between the outcome and pollution on the same day or lagged by 1 or more days but rarely are all of the results published. In the single-city studies, 1 estimate was usually singled out for mention by the author, but rarely was this done on the basis of an a priori hypothesis. Mostly, it seemed to be selected because it was the estimate in a positive direction with the lowest P value or was stated by the author to be the most “significant.” Because the aim of the present study was to look at publication bias, our policy was to accept the decision of the author. When this was not clear, we selected according to the significance, then size of the estimate, irrespective of the direction of the association—the latter to avoid introducing bias and to ensure that our estimate of publication bias would be on the conservative side. Underestimation of publication bias may have occurred because we included in the single-city analysis some studies that unknown to us had adopted an a priori choice of lag.
We were able to examine the likely effect of publication bias empirically by comparing single-city studies with multicity studies that used standard techniques and an a priori choice of lag.18–20 The estimates for PM10 and daily mortality were higher in the single-city studies (1.006) than in the NMMAPS study (1.004), but similar to APHEA 2 (1.006). In the single-city studies, the estimates for peak expiratory flow (−0.151 L/min) and cough (odds ratio [OR] = 1.025) were more adverse than in the multicity PEACE study (−0.067 L/min and OR = 0.991, respectively).
Could publication bias have been responsible for falsely inferring that there is an association between particulate air pollution and health effects? From our analysis, it would appear that for the outcomes of daily mortality and hospital admissions for COPD, publication bias has not affected the conclusions, because positive and precisely estimated associations remained after correcting for bias using the trim-and-fill technique. We recognize, however, that this method may be insufficiently reliable.33 Stronger support for a true association with daily mortality comes from the 2 multicity studies, both of which found positive associations. The evidence from the panel studies is more problematic, because although the multicity study found a positive association with peak expiratory flow, the odds ratio for cough was close to unity.
Could publication bias have been responsible for overestimating the size of a true effect? The answer to this question is almost certainly yes, because the regression estimates from the multicity studies and the corrected estimates from the single-city studies are approximately half of the estimates of the effect of PM10 on daily mortality that were current in the mid-1990s.23 For individual panel studies, correction for publication bias reduced the estimate for cough by 40%. For both cough and peak expiratory flow, the PEACE study found smaller effects than the single-city studies, but interpretation of this finding is difficult because the specific environment of the PEACE study (European winter) might differ from that of some of the single-city panel studies. It is important to note that other sources of bias such as the analytic technique are also likely to affect the size of estimates. For example, reanalysis of NMMAPS and APHEA data using different regression techniques has reduced the estimates for PM10 and daily mortality by 50% and 34%, respectively.34
We have used funnel plots and statistical tests for asymmetry to examine the evidence for publication bias and have assumed that asymmetry in the plot is an indicator of such bias. There are several possible reservations to this approach. First, the funnel plot and associated tests are actually investigating small study effects rather than publication bias per se. However, small study effects are often the result of publication bias and so our assumption is not unreasonable. If publication bias affected larger studies to the same degree as smaller studies, it would not be detected by tests of asymmetry. Second, asymmetry in funnel plots may be the result of other factors such as true heterogeneity. However, this heterogeneity would only result in asymmetry if the effect size varied with the study size and we see no reason why this should be so. Third, asymmetry could be the result of poorer methodologic design in small studies. However, although this remains a possibility, we have scrutinized the studies' methodologies and have omitted those with doubtful validity. Fourth, chance is always a possible explanation for observed asymmetry. It is appreciated that the statistical tests for asymmetry and the trim-and-fill technique are not ideal,35 but the consistency of the findings for the various outcomes in this paper does not support this explanation.
The potential for bias resulting from preferential selection of the adverse effect with the lowest P value is well recognized, but the data available from the single-city studies were insufficient to investigate the effect of this. Instead, we were able to examine the potential for this source of bias by using the data from NMMAPS and PEACE. We found that if the regression estimate was chosen on the basis of being the adverse effect with the lowest P value, it was biased upward by up to 130%. This finding suggests that selective reporting may be an important component in the process of publication bias.
The emergence of time-series evidence associating low levels of ambient air pollution with adverse health effects has been a notable feature of environmental epidemiology over the past 2 decades. It is not unexpected that publication bias would have occurred in the exploratory phases of this research. It is also notable that publication bias, although mentioned in passing by some reviews,21,22 appears not to have been formally investigated.
Various approaches to the problem of publication bias have been proposed, including registration of studies13 and a process of peer review that is blind to the outcome.17 In the case of time-series studies, there is an additional problem, that of publishing all the results of a particular study that may comprise a large number of estimates for various lags, averaging times, pollutants, age groups, diagnoses, and statistical models. To publish results in a conventional form, or to present them at a conference, necessitates selection. This problem could be addressed by the adoption by editors of an agreed standard of presentation that recognizes that selective presentation of results must be avoided and that studies of this type will almost certainly be used subsequently in metaanalyses. The implementation of an agreed standard of presentation would be greatly facilitated by the increasing possibility of publishing extensive study results electronically.
We thank Francesca Dominici and Jon Samet of the Bloomberg School of Public Health, Johns Hopkins Hospital, Baltimore, for providing data from the NMMAPS study.
1. Health effects of outdoor air pollution. Committee of the Environmental and Occupational Health Assembly of the American Thoracic Society. Am J Respir Crit Care Med
2. Pope CA III, Dockery DW. Epidemiology of particle effects. In: Holgate ST, Samet JM, Koren HS, et al., eds. Air Pollution and Health
. London: Academic Press; 1999:673–706.
3. Department of Health Committee on the Medical Effects of Air Pollutants. Non-biological Particles and Health
. London: HMSO; 1995.
4. Expert Panel on Air Quality Standards. Particles
. London: HMSO; 1995.
5. World Health Organization. Air Quality Guidelines for Europe
, 2nd ed. European Series, No. 91. Copenhagen: WHO Regional Office for Europe; 2000.
6. Department of Health Committee on the Medical Effects of Air Pollutants. Quantification of the Effects of Air Pollution on Health in the United Kingdom
.London: HMSO; 1998.
7. World Health Organization. Quantification of Health Effects of Exposure to Air Pollution.
Report on a WHO Working Group. Bilthoven, The Netherlands, 20–22 November 2000. WHO Regional Office for Europe; 2001.
8. Kunzli N, Kaiser R, Medina S, et al. Public-health impact of outdoor and traffic-related air pollution: a European assessment. Lancet
9. Cohen A, Anderson HR, Ostro B Mortality impacts of urban air pollution. In: Ezzati M, Lopez AD, Rodgers A, eds. Comparative Quantification of Health Risks: Global and Regional Burden of Disease Attributable to Selected Major Risk Factors
. Geneva: World Health Organization; 2004.
10. National Research Council. Estimating the Public Health Benefits of Proposed Air Pollution Regulations
.Washington: National Academic Press; 2002.
11. Sterling TD. Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. J Am Stat Assoc
12. Mahoney MJ. Publication predjudices: an experimental study of confirmatory bias in the peer review system. Cognitive Ther Res
13. Simes RJ. Publication bias: the case for an international registry of clinical trials. J Clin Oncol
14. Begg CB, Berlin JA. Publication bias: a problem in interpreting medical data. J R Stat Soc A
15. Begg CB, Berlin JA. Publication bias and dissemination of clinical research. J Natl Cancer Inst
16. Dickersin K. How important is publication bias? A synthesis of available data. AIDS Educ Prev
. 1997;9(suppl A):15–21.
17. Sterling TD, Rosenbaum WL, Weinkam JJ. Publication decisions revisited: the effect of the outcome of statistical tests on the decision to publish and vice versa. Am Stat
18. Katsouyanni K, Touloumi G, Samoli E, et al. Confounding and effect modification in the short-term effects of ambient particles on total mortality: results from 29 European cities within the APHEA2 project. Epidemiology
19. Samet JM, Zeger SL, Dominici F, et al. The National Morbidity, Mortality and Air Pollution Study. Part II: Morbidity, Mortality and Air Pollution in the United States
. Health Effects Institute; 2000.
20. Roemer W, Hoek G, Brunekreef B, et al. Daily variations in air pollution and respiratory health in a multicentre study: the PEACE project.Pollution effects on asthmatic children in Europe. Eur Respir J
21. Levy JI, Hammitt JK, Spengler J. Estimating the mortality impacts of particulate matter: what can be learned from between-study variability? Environ Health Perspect
22. Environmental Protection Agency. Air Quality Criteria for Particulate Matter: Second External Review Draft
. Research Triangle Park, NC: US Environmental Protection Agency; 2001.
23. Pope CA, Dockery DW, Schwartz J. Review of epidemiological evidence of health effects of particulate pollution. Inhal Toxicol
24. Stieb DM, Judek S, Burnett RT. Meta-analysis of time-series studies of air pollution and mortality: effects of gases and particles and the influence of cause of death, age, and season. J Air Waste Manage Assoc
25. Katsouyanni K, Zmirou D, Spix C, et al. Short-term effects of air pollution on health:a European approach using epidemiological time-series data.The APHEA project:background, objectives, design. Eur Respir J
26. DerSimonian R, Laird NM. Meta-analysis in clinical trials. Control Clin Trials
27. Sutton AJ, Duval SJ, Tweedie RL, et al. Empirical assessment of effect of publication bias on meta-analyses. BMJ
28. Light RJ, Pillemer DB. Summing Up: The Science of Reviewing Research
. Cambridge: Harvard University Press; 1984.
29. Egger M, Davey SG, Schneider M, et al. Bias in meta-analysis detected by a simple, graphical test. BMJ
30. Begg CB, Mazumdar M. Operating characteristics of a rank correlation test for publication bias. Biometrics
31. Sterne JA, Gavaghan D, Egger M. Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol
32. Duval S, Tweedie R. Trim and fill: a simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics
33. Sterne JA, Egger M. Funnel plots for detecting bias in meta-analysis: guidelines on choice of axis. J Clin Epidemiol
34. Revised Analyses of Time-Series Studies of Air Pollution and Health.
Boston: Health Effects Institute; 2003.
35. Sterne JA, Egger M, Davey Smith G. Investigating and dealing with publication and other biases in meta-analyses. BMJ