Secondary Logo

Journal Logo

ISEE Commentary

In Defense of the Weight-of-Evidence Approach to Literature Review in the Integrated Science Assessment

Richmond-Bryant, Jennifera

Author Information
doi: 10.1097/EDE.0000000000001254


The Clean Air Act requires the US Environmental Protection Agency (EPA) Administrator to issue “quality criteria for an air pollutant [that] shall accurately reflect the latest scientific knowledge useful in indicating the kind and extent of all identifiable effects on public health or welfare” as the basis for the review of the National Ambient Air Quality Standards (NAAQS).1 The NAAQS review includes a science assessment of the health and ecologic effects of exposure to a criteria pollutant (particulate matter, O3, NO2, SO2, CO, or Pb), followed by a risk assessment and a policy assessment containing recommendations for retaining or changing the NAAQS. Ultimately, the EPA Administrator makes a decision after reviewing the Integrated Science Assessment (ISA) and these recommendations.

EPA produces an ISA to evaluate the scientific literature and make determinations about the causal nature of relationships between criteria pollutant exposure and health and welfare effects2 (eFigure 1; The ISA takes a weight-of-evidence approach, in which it considers the body of scientific evidence spanning atmospheric chemistry, exposure assessment, dosimetry, health effects, and welfare effects related to a given criteria pollutant to determine if the literature taken together provides evidence of causality. Health effect determinations are made by considering evidence from controlled human exposure, animal toxicology, and epidemiology studies together. For example, the ISA can consider whether (1) exposure to a criteria pollutant is followed by a physiologic response in humans (controlled human exposure), (2) biologic mechanisms exist through which an exposure may cause a health effect (animal toxicology), and (3) populations exposed to ambient concentrations experience health effects (epidemiology). Key to EPA’s weight-of-evidence approach are certain Sir Bradford Hill aspects, including (1) consistency: agreement among studies about the existence of an effect within a given study type, (2) coherence: evidence of an effect among multiple lines of evidence, and (3) biologic plausibility: evidence of a mechanism by which the exposure may cause a health outcome.3 When multiple studies from multiple disciplines mostly point toward the same conclusion, chance, confounding, and other biases have likely been reduced so that a conclusion of a causal relationship is supported. This approach has been lauded by past Clean Air Scientific Advisory Committees (CASAC; the scientific advisory committee that provides external review of the ISAs as mandated by the Clean Air Act)4 and by the Administrative Conference of the United States5 (a federal agency that convenes external experts to recommend efficiencies in implementing federal regulations and programs). However, the Trump Administration EPA is attempting to disqualify many epidemiologic studies from consideration in the NAAQS review process by changing the way in which the peer-reviewed literature is considered in the ISA.

An important part of the ISA science review process is evaluation of the quality of individual studies comprising the body of evidence informing the causality determinations. For the epidemiologic literature, the ISA considers for each study whether:

  • concentrations observed in the study are at or near ambient concentrations;
  • models control for confounding by copollutants and other factors;
  • testing has been performed for potential effect modification;
  • health endpoints have been included in the study design;
  • the study presents new information pertinent to populations, groups, or lifestages; and
  • methodologic issues such as lagged effects and thresholds have been included in the study’s design.2

Within this process, EPA recognizes the insufficiency of methods used in epidemiologic studies to address copollutant confounding.2,6 Because each individual epidemiologic study controls for different potential confounders, the Hill approach to evaluating consistency across epidemiologic studies provides an indication of whether observed effects are robust to confounding.2,3 The EPA qualitatively examined study quality criteria (study design, study population, exposure assessment, outcome assessment, confounding, statistical analysis) to identify strengths and threats to internal validity or risk of bias of studies that were included in the recently published 2020 ISA for Ozone.7,8 Risk of bias did not necessarily disqualify a study from inclusion in this ISA8 if it was informative.9 However, wording found in the Ozone ISA Process Appendix8 stating “references that did not pass the study quality review, and deemed critically deficient, were excluded from the ISA” raises concern that individuals within the Agency may be pushing for the use of study quality evaluation to reduce the evidence base rather than contextualize it, as in past ISAs2 (eFigure 1;


Dr. Tony Cox, the current Administrator-appointed CASAC chair and industry consultant with clients including the American Petroleum Institute and Phillip Morris International, has focused his review on whether individual studies demonstrate causality. Unlike the ISA’s weight-of-evidence approach to determine if the body of literature supports a conclusion that criteria pollutant exposure causes a health effect, Cox proposed limiting the considered epidemiologic literature to “manipulative causation” studies.10 These studies require that some intervention be conducted to change air pollutant concentrations, with all other factors kept constant, to demonstrate a change in the health effect.11 Accordingly, he has called for use of systematic review study quality criteria in the ISAs to substantiate excluding studies based on the following considerations10:

  • study does not control for potential confounders or selection bias;
  • study estimates exposure by fixed-site monitor or modeled concentrations in lieu of using true personal exposure;
  • study design does not allow for testing of threats to internal validity;
  • study design does not allow for evaluation of external validity;
  • study does not perform sensitivity testing; or
  • study design only addresses association rather than permitting “valid inferences about (manipulative) causation.”10

Cox’s focus on study elimination suits the systematic review process of addressing a narrow question about observed health effects following an intervention12 through his manipulative causation test. This differs from the weight-of-evidence approach used in the ISA, as framed by the Clean Air Act, to evaluate all of the latest scientific evidence together to uncover all known health and welfare effects of the criteria pollutant.

Goodman et al13 proposed a quality score for studies considered for the ISA to quantitatively rate peer-reviewed studies based on a set of criteria similar to those listed by Cox,10 for short-term ozone exposure and asthma severity (eTable 1; In this system, an attribute deemed positive by Goodman et al13 would receive a score of “+1,” and an attribute deemed negative by Goodman et al13 would receive a score of “−1.” A positive score, summed across criteria, would result in designation as a Tier I study, whereas a negative score would result in designation as a Tier II study. In this analysis, four out of 19 panel studies, 10 out of 28 time-series studies, and one out of eight case-crossover studies (27% of all studies) would be downgraded or excluded by Cox’s approach.10 Similar approaches by Goodman et al13 have been published with respect to the relationships between particulate matter exposure and lung cancer biomarkers,14 long-term ozone exposure and cardiovascular effects,15 and short-term ozone exposure and cardiovascular effects,16 as well as in a broad evaluation of the framework employed by the EPA to evaluate the science used for the NAAQS review.17

Evaluation of quality scores in systematic reviews has demonstrated that different scoring systems can produce drastically different results for the same set of studies, making study rankings arbitrary and therefore unsystematic.18–20 Armijo-Olivo et al18 compared two study quality metrics and found that the same conclusions about study quality were reached for only two out of 20 studies. Whiting et al20 developed five weighting scores and found that although the scores mostly agreed for highest and lowest quality scoring studies, results among the studies ranking in the middle were vastly different. Rooney et al21 performed a comparison of five different qualitative study quality evaluation methods used in systematic review and found that, although all of the methods addressed issues of selection, exposure, attrition, confounding, outcome assessment, and publication bias, conversion of these elements to scores would be inadvisable due to uncertainties about each domain and how they are weighted. Savitz et al9 also recognized this limitation of study quality scores and instead recommended a qualitative approach to investigate the risk of bias and its potential magnitude and direction of effect through critical analysis of relevant peer-reviewed literature for each potential source of bias. Because the ISA is a complex, cross-disciplinary synthesis of the literature with a mandate under the Clean Air Act to be comprehensive, study weighting is inappropriate and exclusion risks loss of evidence that should be considered.11 Qualitative analysis in line with the approach of Savitz et al9 is therefore more amenable to the critical assessment of literature within the ISA.


The question about whether to discard studies due to a perceived study quality issue or simply to acknowledge their limitations can be examined for the exposure assessment domain. Dr. Sabine Lange, a CASAC charter member, wrote in her comments on the Particulate Matter ISA10:

the systematic review guidelines for TSCA22 lists [sic] study quality criteria for epidemiologic studies (amongst others). They state as a criterion for deeming a study unacceptable (and therefore for removal from the review) ‘There is evidence of substantial exposure misclassification that would significantly alter results.’ This needs to be seriously considered for studies that use ambient monitors as surrogates for personal exposure.

It is true that exposure assessments conducted for air pollution epidemiologic studies have limitations that may add bias and uncertainty to effect estimates. However, a recent review of the influence of exposure measurement error on effect estimates has shown that exposure measurement error is usually negatively biased.23 In other words, the presence of bias would not negate the observed effect and, in fact, would often cause an observed effect to be underestimated. Conversely, discarding epidemiologic studies from consideration in the ISA due to exposure measurement error could lead to reporting bias. This could result in loss of evidence of an association between the exposure and health effect from the ISA.


Using systematic review criteria such as individual study quality ratings for the ISA may not improve the validity of causality determinations. This practice could, in fact, introduce uncertainty and bias into the process by excluding informative scientific studies for the purported reason of minimizing bias. In contrast, expert evaluation of the strengths and limitations of studies through the weight-of-evidence approach reduces bias through the triangulation process.24 Without thoughtful consideration of these concerns, adopting a systematic review study rating methodology is likely to create an ill-conceived set of criteria that will make the ISAs less, not more, defensible and could result in weakened NAAQS.


Dr. Jennifer Richmond-Bryant was a staff scientist with the US EPA National Center for Environmental Assessment from 2008 to 2019, as the exposure assessment lead on the team writing the Integrated Science Assessment. She is currently an Associate Professor of the Practice in the Department of Forestry and Environmental Resources at North Carolina State University. Her research areas include assessing human exposure to ambient air pollution, transport and dispersion of air pollutants, and disparities in exposures among population groups.


The author would like to thank Dr. Joel Kaufmann for his advice on developing this commentary.


1. 42 U.S. Code Section 7408 - Air quality criteria and control techniques.
2. U.S. EPA. Preamble to the Integrated Science Assessments. 2015.Research Triangle Park, NC: Office of Research and Development;
3. Hill AB. The environment and disease: association or causation?. Proc R Soc Med. 1965;58:295–300.
4. CASAC. CASAC Review of the EPA’s Integrated Science Assessment for Oxides of Nitrogen - Health Criteria (Second External Review Draft). 2015.Washington, DC: Office of Research and Development;
5. Wagner W. Science in Regulation: a Study of Agency Decisionmaking Approaches. 2013.Washington, DC: Administrative Conference of the United States;
6. Rothman KJ, Greenland S. Modern Epidemiology1998.2nd edPhiladelphia, PA: Lippincott Williams & Wilkins;
7. U.S. EPA. Draft Ozone ISA: Study Quality (2019)2019. Available from: Accessed 10 January 2020.
8. U.S. EPA. Integrated Science Assessment for Ozone and Related Photochemical Oxidants. 2020. Research Triangle Park, NC: Office of Research and Development; EPA/600/R-20/012.
9. Savitz DA, Wellenius GA, Trikalinos TA. The problem with mechanistic risk of bias assessments in evidence synthesis of observational studies and a practical alternative: assessing the impact of specific sources of potential bias. Am J Epidemiol. 2019;188:1581–1585.
10. Cox LA. CASAC Review of the EPA’s Integrated Science Assessment for Particulate Matter (External Review Draft - October 2018). 2019.Washington, DC: Clean Air Scientific Advisory Committee;
11. Campaner R. Mechanistic causality and counterfactual-manipulative causality: recent insights from philosophy of science. J Epidemiol Community Health. 2011;65:1070–1074.
12. Boell SK, Cecez-Kecmanovic D. On being ‘systematic’ in literature reviews in IS. J Inf Technol. 2015;30:161–173.
13. Goodman JE, Zu K, Loftus CT, et al. Short-term ozone exposure and asthma severity: weight-of-evidence analysis. Environ Res. 2018;160:391–397.
14. Lynch HN, Loftus CT, Cohen JM, Kerper LE, Kennedy EM, Goodman JE. Weight-of-evidence evaluation of associations between particulate matter exposure and biomarkers of lung cancer. Regul Toxicol Pharmacol. 2016;82:53–93.
15. Prueitt RL, Lynch HN, Zu K, Sax SN, Venditti FJ, Goodman JE. Weight-of-evidence evaluation of long-term ozone exposure and cardiovascular effects. Crit Rev Toxicol. 2014;44:791–822.
16. Goodman JE, Prueitt RL, Sax SN, et al. Weight-of-evidence evaluation of short-term ozone exposure and cardiovascular effects. Crit Rev Toxicol. 2014;44:725–790.
17. Goodman JE, Prueitt RL, Sax SN, Bailey LA, Rhomberg LR. Evaluation of the causal framework used for setting national ambient air quality standards. Crit Rev Toxicol. 2013;43:829–849.
18. Armijo-Olivo S, Stiles CR, Hagen NA, Biondo PD, Cummings GG. Assessment of study quality for systematic reviews: a comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: methodological research. J Eval Clin Pract. 2012;18:12–18.
19. Jüni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999;282:1054–1060.
20. Whiting P, Harbord R, Kleijnen J. No role for quality scores in systematic reviews of diagnostic accuracy studies. BMC Med Res Methodol. 2005;5:19.
21. Rooney AA, Cooper GS, Jahnke GD, et al. How credible are the study results? Evaluating and applying internal validity tools to literature-based assessments of environmental health hazards. Environ Int. 2016;92–93:617–629.
22. U.S. EPA. Application of Systematic Review in TSCA Risk Evaluations. 2018.Research Triangle Park, NC: Office of Chemical Safety and Pollution Prevention;
23. Richmond-Bryant J, Long TC. Influence of exposure measurement error on results from epidemiologic studies of different designs. J Exp Sci Environ Epidemiol. 2020;30:420–429.
24. Lawlor DA, Tilling K, Davey Smith G. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45:1866–1886.

Clean Air Scientific Advisory Committee; Integrated Science Assessment; National Ambient Air Quality Standards; Study quality; Weight-of-evidence

Supplemental Digital Content

Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.