Share this article on:

Big Data in Epidemiology: Too Big to Fail?

Chiolero, Arnaud

doi: 10.1097/EDE.0b013e31829e46dc

Institute of Social and Preventive Medicine (IUMSP), University Hospital Center, Lausanne, Switzerland, Observatoire valaisan de la santé (OVS), Sion, Switzerland,

The authors report no conflicts of interest.

To the Editor:

Big data seems to be the cornerstone of modern epidemiology. In the July issue of EPIDEMIOLOGY, Toh and Platt1 argued that size could be the “next big thing in epidemiology”—notably because computational and analytic tools are available to deal with huge and complex datasets. Others have stated that the data deluge would “make the scientific method obsolete” because “more is different.”2 This enthusiasm for research driven by big data should not distract from the well-known limits of observational epidemiology.

Indeed, observational epidemiology has faced serious debacles, with major findings of large cohort studies later contradicted by randomized control trials.3 The large size of some of these observational studies was probably one major reason why their results were highly trusted. It is painful for epidemiologists to admit that large cohort studies—a highly valued study design—could lead to such mistakes. However, big size is not enough for credible epidemiology. Obsession with study power and precision may have blurred fundamental validity issues not solved by increasing sample size, for example, measurement error, selection bias, or residual confounding.4

I am concerned that the current appetite for big data–driven epidemiology could lead to similar mistakes. For sure, big data has the major advantage that the larger the study, the more likely the research findings are to be true.5 However, features of data-driven epidemiology (such as flexible data analysis and lack of prespecified hypotheses) also have the potential to lead to research findings that are not true.5 Big data do not speak by themselves any more than “small” data. Addressing research questions that are truly answerable by observational epidemiology (such as unexpected adverse effects of an exposure or a treatment),3 using sensitivity analyses,4 being aware of inflated interpretations of selected chance findings,6 and recent developments in causal thinking, can all help big data miners to avoid some traps. But be aware that nothing is “too big to fail.”

Arnaud Chiolero

Institute of Social and Preventive Medicine (IUMSP), University Hospital Center, Lausanne, Switzerland, Observatoire valaisan de la santé (OVS), Sion, Switzerland,

Back to Top | Article Outline


1. Toh S, Platt R. Is size the next big thing in epidemiology? Epidemiology. 2013;24:349–351
2. Anderson C The end of theory: the data deluge makes the scientific method obsolete. Available at: Accessed 13 May 2013
3. Vandenbroucke JP. When are observational studies as credible as randomised trials? Lancet. 2004;363:1728–1731
4. Lawlor DA, Davey Smith G, Kundu D, Bruckdorfer KR, Ebrahim S. Those confounded vitamins: what can we learn from the differences between observational versus randomised trial evidence? Lancet. 2004;363:1724–1727
5. Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2:e124
6. Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008;19:640–648
Copyright © 2013 Wolters Kluwer Health, Inc. All rights reserved.