The central point of the paper by Ioannidis1 in this issue is that newly discovered true associations often have inflated effects compared with true effect sizes. This phenomenon is widely recognized by epidemiologists, and will occur because an association is more likely to become statistically significant when random variation is in the same direction as the underlying true association, thus inflating both statistical significance and the magnitude of association. Viewed another way, an implausibly strong association that is barely statistically significant has a high likelihood of being due to chance. Clearly the primary approach to such a finding should be caution about the effect size, and indeed skepticism about whether the newly discovered association is true at all.
Although the tendency described by Ionnidis is real, the problem as he describes it is inflated by his perspective that epidemiology is primarily the collection of data that are sifted mindlessly for the occurrence of statistically significant associations, which are then reported. Those who practice epidemiology understand that the primary research mode is still the development of testable hypotheses based on sound biologic reasoning, leading to submission of research proposals for peer review, implementation of research protocols and collection of data, and finally the publication of findings regardless of their statistical significance. If the original question was reasonable, the results are of interest whether statistically significant or not, and many of us have published “negative” studies. Epidemiologists, and epidemiology journals, are probably better than most areas of science in reporting “negative” results, which is strongly discouraged by some basic science journals.
Still, investigators and journals can be influenced by multiple potential biases (some described by Ioannidis) that favor publication of statistically significant results. An emphasis on “positive” findings in a subgroup is one important example. Exploratory data analyses have a place in epidemiology, but they will inevitably yield a mix of true findings and those that occur by chance—often with inflated associations. Although these analyses are often pejoratively described as “fishing” or “data dredging,” such explorations can lead to findings that are unexpected and important. Phil Cole appropriately has encouraged his students, having completed a study, to “dredge the hell out of your data.” Probably most importantly, the reporting of findings based on exploratory analyses should be described as such, and conveyed with a strong note of caution.
Genome wide association studies have taken agnostic exploratory analyses to new heights. Currently available DNA chips provide information in 3 categories for 500,000 to 1 million variables, and create massive numbers of false-positive associations if usual levels of statistical significance are used. The use of standard corrections for multiple comparisons—for example, using a value of 10−7 for statistical significance—will address false-positive findings, but can result in many false-negative findings. It was necessary to re-examine the 30,000 most significant associations in an initial genome wide association study to find some true (repeatedly replicable) associations with prostate cancer.2
Some researchers report all results of a genome wide association studies in a single paper (typically as massive numbers of 2 × 3 tables posted on the Web), and make all data public. Ioannidis suggests that this practice should be applied to other areas of epidemiology as a general solution for inflated effect sizes. However, the data from genome-wide association studies are unique in ways that make simple 2 × 3 tables informative. First, these studies are by design without prior biologic hypotheses, and the interpretation of findings is based almost entirely on statistical probability. Second, if controls are properly matched on ethnicity, there is relatively little confounding in genetic association studies, unlike studies of diet or other typical topics of epidemiologic research. Third, the error rate in genetic analyses can be very low with good techniques and quality control, leading to relatively little misclassification. Fourth, the complexity of temporal relationships in most epidemiologic studies is not an issue because exposure is constant for a lifetime. Finally, recall and selection biases are relatively unimportant in genetic studies. In contrast, large sets of 2 × 9 tables of, for example, food frequency categories by disease status would be virtually useless.
Furthermore (and as acknowledged by Ioannidis), the reality of making primary data from genetic studies publicly available collides with increasing efforts to insure confidentiality of personal information, greatly limiting what can be released for general access. Although this is a rapidly evolving area, large studies funded by National Institutes of Health are required to have data sharing plans, which allows other researchers to have access.
Beyond the inflation of effect sizes in exploratory analyses is the broader challenge of summarizing the available epidemiologic data. I recently experienced this as a member of a committee that attempted to summarize the global epidemiologic data on diet, nutrition, and cancer.3 For a few central hypotheses, such as dietary fat and breast cancer, almost all large cohort studies with data had published their findings. However, for most potential diet and cancer relationships, only some of the studies that presumably had collected the data had published findings. This could have been for lack of resources or time, but could also be due to findings that were null and deemed less interesting. The potential for bias is obvious. Even when multiple studies had published on a topic, the data were often difficult to combine because of differences in variable definitions, covariates, and statistical models. Often, findings could be presented only as high versus low intake, with the contrast highly variable among studies.
A better way to compile the best available data on a particular relationship, as mentioned briefly by Ioannidis, is the creation of consortia for collaborative analyses. The Pooling Project of Prospective Studies of Diet and Cancer4 is such an example. In this collaboration, all prospective studies of diet and cancer with a specified minimum number of cases of a specific cancer are invited to join, and participation has been almost complete. For a specific diet and cancer relationship, all the data from all the studies are analyzed in a standard manner using standard definitions of exposure. Interestingly, substantial heterogeneity among studies has been rare, and the process maximizes power and minimizes publication bias. A limitation is that associations will tend to be underestimated because of measurement error, although error correction methods are used when possible (data from the reference method are not always available).
The issue of effect inflation raised by Ioannidis is part of the larger question of interpreting findings from epidemiologic studies, and many of the solutions he describes are taught in introductory epidemiology classes. Replication is central to the interpretation of any research findings, but it is surprising that he does not mention consistency with other biologic information. For many topics of epidemiologic research, this is critical because the next step he proposes—conducting randomized trials—will often be impossible for practical or ethical reasons.
For many hypotheses, the best evidence will be a combination of randomized trials of intermediate endpoints, which can often be conducted with few subjects and short follow-up periods, combined with replicated findings from prospective epidemiologic studies. For example, findings from many controlled feeding studies have shown that dietary trans fatty acids have adverse effects on blood lipids and other metabolic risk factors, and positive associations with coronary heart disease incidence have been seen in multiple prospective studies.5 This combination of evidence has been deemed sufficient for public health actions to eliminate or reduce consumption of partially hydrogenated vegetable oils.
Epidemiology is likely to be most successful when it is used as one approach, and an extremely important one, in an integrated effort to understand the impact of environment on human biology.
ABOUT THE AUTHOR
WALTER WILLETT is Professor of Epidemiology and Nutrition at Harvard School of Public Health. His work focuses on dietary assessment and on dietary risk factors for disease, much of this based on three large cohort studies that include nearly 300,000 men and women. He is the author of Nutritional Epidemiology, the first textbook on this topic.