From the Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece; and Institute for Clinical Research and Health Policy Studies, Tufts University School of Medicine and Tufts Medical Center, Boston, Massachusetts.
Submitted 23 May 2008; accepted 3 June 2008.
Correspondence: John P. A. Ioannidis, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina 45110, Greece. E-mail: firstname.lastname@example.org.
I am greatly honored by the very insightful comments of Senn,1 Willett,2 and Kraft3 on my commentary.4
Professor Senn1 nicely exemplifies that there is no need to invoke a sequential design to observe inflated effects. Any pilot (underpowered) study would do. Nevertheless, evidence in any field can be seen as sequential data accumulation. Meta-analysis, the total-evidence paradigm, is primarily a sequential (cumulative) enterprise5 and, for most topics, evidence is still in the pilot phase. As suggested by the Cochrane meta-analyses, even the best-conducted meta-analyses may have inflated effects, even after many trials are assembled. Moreover, I agree that additional problems beyond regression-to-the-mean are not necessary to see inflated effects. Yet, I believe I have mentioned enough empirical examples demonstrating inflationary practices4 (see also other writings6,7 and the Cochrane Methodology register8). We need more empirical evidence to measure the relative contribution of various reasons for inflated effects, but some healthy scepticism is warranted, as Senn suggests.
Professor Willett2 offers an extremely illuminating and dense statement that would solve the entire problem: “If the original question was reasonable, the results are of interest whether statistically significant or not.” However, the phrasing that follows, “many of us have published many ‘negative’ studies,” is alarming. Why not: “all of us have published primarily ‘negative’ studies, which is what one expects to get usually, even with careful thinking and meticulous design, while dredging the hell out of data”? We can debate how many “negative” studies are expected, but meanwhile 100% of prognostic studies in the International Journal of Cancer in 2005 reported statistically significant results.9 I report here an inflated estimate, the most spectacular percentage—the percentage across 343 journals (1575 articles) was 95.8%.
I definitely don't define epidemiology as collection of data that are sifted mindlessly. Epidemiology's strength is the careful thinking and planning, both in exploration and replication. However, the average project and paper is not conceived, designed, conducted and reported by giants of epidemiologic thinking of Willett's calibre. I simply propose that one fully records the process (how this was done) so that everybody can see, admire, replicate, and possibly critique analyses that allow large vibration of effects. I agree that data go beyond simple 2 × 3 tables. This is only one more reason to present them as explicitly as possible, with full attention to any prior biological hypotheses, potential recognized sources of confounding, misclassification, complexity of temporal relationships, and recall and selection biases.10 Reporting 2 × 9 or even 2 × 9000 tables is trivial; computers can readily handle petabytes of information.11 Consortia can be instrumental in enhancing both standardization and transparency of information,12 and efforts such as the Pooling Project of Prospective Studies of Diet and Cancer should attract more followers.13
Introductory epidemiology needs some reappraisal. Specifically, “consistency with other biological information” sounds august, but we need more empirical evidence on its workings. Sometimes we think we know everything about biology, while we know little or nothing. Using “consistency with other biological information” subjectively as the post hoc guarantor of poor data dredging is problematic. The commentary12 by Kraft (with whom I fully agree) nicely exemplifies the dangers of trusting biological consistency by describing how much one particular field (genetic epidemiology) has changed through the advent of “agnostic” genome-wide association studies14 (of which I am a great enthusiast). Several years ago, when I suggested that <10% to 30% of seemingly/partially replicated candidate genetic epidemiologic associations were true,6,15 many colleagues felt I was stubbornly unwilling to see the clear consistency of these candidate associations with other biological information. With the paradigm shift of the genome-wide association studies, the old ship was abandoned by its old-time enthusiasts to sink with all its cargo. (By the way, contrary to the deserters, I still believe that some candidate associations were true.) Now I hear talks celebrating that we have for the first time 200 (and rising) true associations, while 5 years ago the same lofty speakers were confident we already had 2000 true associations. What would happen if traditional epidemiology went through a similar paradigm shift in measurement capacity? Would many/most effects (including several “classics” of introductory epidemiology) shrink or sink? Interestingly, when we make more progress, apparently the associations that remain credible become fewer, claims for causality are toned down, and effects decrease.
Kraft3 also highlights some of the dangers of stretching inflated effect sizes for predictive purposes. Certainly great caution is needed in the presentation and interpretation of this otherwise fascinating knowledge.16 One might argue that seasoned epidemiologists would be immune to tricks, e.g. the presentation of results as odds ratios of extreme centiles based on multiplicative models. However, empirical evidence shows that it is seasoned epidemiologists who use these tricks par excellence.17 Ordinary people and even most physicians don't even recognize what an odds ratio is.18 Educating thinking citizens may be useful, but is this feasible when even we scientists oversell our data?
1. Senn S. Transposed conditionals, shrinkage, and direct and indirect unbiasedness. Epidemiology
2. Willett W. The search for truth must go beyond statistics. Epidemiology
3. Kraft P. Curses—winner's and otherwise—in genetic epidemiology. Epidemiology
4. Ioannidis JP. Why most discovered true associations are inflated. Epidemiology
5. Lau J, Antman EM, Jimenez-Silva J, et al. Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med
6. Ioannidis JP. Why most published research findings are false. PLoS Med
7. Ioannidis JP. Evolution and translation of research findings: from bench to where? PLoS Clin Trials
9. Kyzas PA, Denaxa-Kyza D, Ioannidis JP. Almost all articles on cancer prognostic markers report statistically significant results. Eur J Cancer
10. von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet
11. Ioannidis JP. Molecular evidence-based medicine: evolution and integration of information in the genomic era. Eur J Clin Invest
12. Seminara D, Khoury MJ, O'Brien TR, et al. The emergence of networks in human genome epidemiology: challenges and opportunities. Epidemiology
13. Smith-Warner SA, Spiegelman D, Ritz J, et al. Methods for pooling results of epidemiologic studies: the pooling project of prospective studies of diet and cancer. Am J Epidemiol
14. McCarthy MI, Abecasis GR, Cardon LR, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet
15. Ioannidis JP. Genetic associations: false or true? Trends Mol Med
16. Hunter DJ, Khoury MJ, Drazen JM. Letting the genome out of the bottle—will we get our wish? N Engl J Med
17. Kavvoura FK, Liberopoulos G, Ioannidis JP. Selection in reported epidemiological risks: an empirical assessment. PLoS Med
18. Gigerenzer G, Edwards A. Simple tools for understanding risks: from innumeracy to insight. BMJ