False Positives: Rejoinder
X-WAS, Traditional Epidemiology, and Policy Action
Ioannidis, John P. A.a; Tarone, Robertb; McLaughlin, Joseph K.b
From the aDepartment of Medicine, and Department of Health Research and Policy, Stanford Prevention Research Center, Stanford University School of Medicine, Stanford, CA; and bInternational Epidemiology Institute, Rockville, MD.
Correspondence: John P. A. Ioannidis, C. F. Renhborg Professor in Disease Prevention, Professor of Medicine and of Health Research and Policy, Stanford University School of Medicine, MSOB X306, 251 Campus Drive, Stanford, CA 94305. E-mail: email@example.com.
We thank Daniele Fallin and Linda Kao,1 George Davey Smith,2 and Shalom Wacholder3 for their insightful commentaries. Fallin and Kao1 doubt whether X-WAS is the future of all epidemiology and whether the rest of epidemiology beyond genomics is as biased as we implied.4 Epidemiology, and observational research in general, comprises a vast cluster of diverse disciplines; some areas may be less biased than human genome epidemiology in the candidate gene era, but others are probably even more biased. There are hundreds of possible biases.5 No one denies that traditional epidemiology and observational studies have been successful at times; we can add numerous examples of successful identification of risk factors6 besides the 2 mentioned by Fallin and Kao.1 However, as indicated by fellow commentator, Davey Smith,2 the average track record of observational epidemiology to date has been poor, almost a systemic failure.7 The successes are needles in a haystack of associations reported in thousands and possibly millions of studies. A quick PubMed search with [risk factor OR (case-control OR cohort OR cross-sectional) OR association] yields 1,489,912 articles. Perusing but a small sample, we find 90% of these retrievals are observational epidemiologic studies, while many other observational studies are probably not captured by this preliminary search. Hundreds of thousands of scientists have been involved in observational studies across diverse domains. Some of them have not even realized that this is the approach they have been using, and that they may have employed spurious inferential methods. Some prolific cohorts have published well over 1000 papers, often each examining 1 or 2 exposures for one disease outcome. We do not think that such fragmentation serves our science well.
Fallin and Kao indicate that findings from GWAS studies hold translational potential but are concerned that stringent criteria will limit the pursuit of important paths to translation. Successful translation occurs sparingly in biomedical science.8 This difficulty should be acknowledged and communicated to the wider public. We do not expect massive, agnostic approaches to make the translation process much easier. However, a rigorous approach with appropriate FP:FN stringency, large sample sizes, transparent design, and nonselective reporting will ensure a starting point with fewer false leads. False-positives continue to haunt the literature, even after being roundly refuted.9 Moreover, putative discoveries from observational studies continue to have a high refutation rate even when findings have reached the point of being close to translation.10 The current emphasis on translation by funders may exacerbate the false positive problem, as it presupposes validity of research findings. Funding incentives to rush to the bedside with presumed treatments will likely encourage bad science.
We agree with the principle of including additional lines of evidence, such as animal models,11 in vitro work, and other sources of biologic plausibility. However, biologic plausibility also needs an equally systematic and transparent approach to the evidence—otherwise scientists may selectively invoke those pieces of the evidence that best match their results or prejudices.12 Biases related to statistical testing and selective reporting are also highly prevalent in these additional lines of biologic evidence.13–16
Wacholder3 nicely summarizes some differences between human genome epidemiology and other disciplines. He points out that, with more stringent FP:FN criteria, the power of current studies would be eroded. However, if one was able to put together all the fragmented single-study efforts in traditional epidemiology, the cumulative sample sizes of the resulting consortia would often be as large as or larger than the typical samples amassed in current genetic consortia. Future efforts may be even more far-reaching. If 50 biobanks are launched around the world, with an average of 500,000 participants each, the total sample size is 25,000,000 participants. Some nationwide studies can already accommodate information from many millions of participants each. Unless criteria of discovery and validation are stringent, these studies may show nominally statistically significant results for almost any tested association. Wacholder also highlights differences in traditional versus genome epidemiology in cost, free availability of measurements, and use of combined (pooled) analyses. We do not see why traditional epidemiology cannot follow the paradigm of genetics in these aspects. The cost of genetic measurements has decreased a million-fold in the last 20 years. One can also envision a sharp decrease in the cost of nongenetic measurements and data gathering. Cost is high when technologies are home-made, idiosyncratic, and nongeneralizable. Conversely, cost routinely decreases when a technology becomes standardized and then applied by many scientists using common analytic plans and converging purposes. Several envirome measurement platforms are already available and continuously being improved—including tools to capture the epigenome, metabolome, and microbiome; administrative databases and national registries; and electronic epidemiology, to name a few.17 We also see no rationale why traditional epidemiologic data should not be freely available,18 and why prospectively designed combined analyses in overarching consortia should not be the norm. Once these deficits have been amended, one can deal more seriously with the other differences that may indeed be more specific to nongenetic epidemiology, including the range of prior odds and the highly dense correlation pattern.19,20
Davey Smith2 shows how Mendelian randomization bridges the gap between genetic and nongenetic epidemiology and offers a wealth of new possibilities. We believe that it is important to bring different platforms of measurement to coexist within the same cohorts and biobanks.20 To date, most studies that have extensive genomic measurements have limited or no information on the envirome, and studies that are strong in environmental measurements often lack interest in genomics. Hopefully, this unfortunate dissociation will be gradually remedied.
Finally, what does this all mean for policy and decision-making? As we pointed out originally4 and as our commentators further elaborated, scientific appraisal of the evidence is not the same as making policy decisions based on this evidence. We simply suggest that scientists should give transparent estimates about the FP:FN ratio of the field in which the information was obtained, systematic and transparent synopses of the prior evidence in the field, and estimates about the posterior odds that take the range of existing uncertainty into account. Scientific studies, when correctly executed and validated, inform us about what is, but not what ought to be. This may seem obvious to some, but one need only follow the newspapers or TV news to see what happens when scientists become advocates based on their own research findings.
1. Fallin M, Kao HW. Is “X”-WAS the future for all of epidemiology? Epidemiology. 2011;22:457–459.
2. Davey Smith G. Random allocation in observational data: how small but robust effects can facilitate hypothesis-free causal inference. Epidemiology. 2011;22:460–463.
3. Wacholder S. On standards of evidence. Epidemiology. 2011;22:464–466.
4. Ioannidis JP, Tarone R, McLaughlin JK. The false-positive to false-negative ratio in epidemiologic studies. Epidemiology. 2011;22:1205–1215.
5. Chavalarias D, Ioannidis JP. Science mapping analysis characterizes 235 biases in biomedical research. J Clin Epidemiol. 2010;63:1205–1215.
6. Petitti D. Triumphs in epidemiology. Epidemiol Monitor. Aug/Sept 2001.
7. Young S. Acknowledge and fix the multiple testing problem. Int J Epidemiol. 2010;39:934.
8. Contopoulos-Ioannidis DG, Ntzani E, Ioannidis JP. Translation of highly promising basic science research into clinical applications. Am J Med. 2003;114:477–484.
9. Tatsioni A, Bonitsis NG, Ioannidis JP. Persistence of contradicted claims in the literature. JAMA. 2007;298:2717–2726.
10. Contopoulos-Ioannidis DG, Alexiou GA, Gouvias TC, Ioannidis JP. Life cycle of translational research for medical interventions. Science. 2008;321:1298–1299.
11. Kitsios GD, Tangri N, Castaldi PJ, Ioannidis JP. Laboratory mouse models for the human genome-wide associations. PLoS One. 2010;5:e13782.
12. Ioannidis JP, Polyzos NP, Trikalinos TA. Selective discussion and transparency in microarray research findings for cancer outcomes. Eur J Cancer. 2007;43:1999–2010.
13. Crossley NA, Sena E, Goehler J, et al. Empirical evidence of bias in the design of experimental stroke studies: a metaepidemiologic approach. Stroke. 2008;39:929–934.
14. Perel P, Roberts I, Sena E, et al. Comparison of treatment effects between animal experiments and clinical trials: systematic review. BMJ. 2007;334:197.
15. Macleod MR, Fisher M, O'Collins V, et al. Good laboratory practice: preventing introduction of bias at the bench. Stroke. 2009;40:e50–e52.
16. Sena ES, van der Worp HB, Bath PM, Howells DW, Macleod MR. Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol. 2010;8:e1000344.
17. Rappaport SM, Smith MT. Environment and disease risks. Science. 2010;330:460–461.
18. Baggerly K. Disclose all data in publications. Nature. 2010;467:401.
19. Smith GD, Lawlor DA, Harbord R, Timpson N, Day I, Ebrahim S. Clustered environments and randomized genes: a fundamental distinction between conventional and genetic epidemiology. PLoS Med. 2007;4:e352.
20. Ioannidis JP, Loy EY, Poulton R, Chia KS. Researching genetic versus non-genetic determinants of disease: a comparison and proposed unification. Sci Transl Med. 2009;1:7ps8.
This article has been cited 1 time(s).
Cancer Epidemiology Biomarkers & PreventionFalse Positives in Cancer EpidemiologyCancer Epidemiology Biomarkers & Prevention
© 2011 Lippincott Williams & Wilkins, Inc.