In this issue, Ioannidis and colleagues1 highlight the use of the false positive to false negative (FP:FN) ratio as a metric to guide design, analysis, and interpretation of epidemiologic studies. They hold up the genetic epidemiology experience of the last decade as an example of a field's movement from the generally high FP:FN ratios in the candidate-gene-association-study era to the extremely low ratios in the current setting of genome-wide association studies (GWAS). They further argue that this shift to ratios much less than 1 is due to the combination of several related factors: improved technology allowing agnostic scanning of nearly all possible genetic risk factors, requirements of very low statistical significance thresholds in the context of extreme multiple testing (eg, P < 10−8), and very large sample sizes made possible through pooling or meta-analysis of many studies. Because this shift has resulted in many fewer “false” leads and quicker consensus regarding particular genetic findings, the authors promote this as a model for other areas of epidemiology that, according to their assessment, are still in a “candidate gene” phase. This would essentially propose an “X”-WAS approach across areas of epidemiology, where “X” represents an exhaustive list of risk factors for a particular field, such as “nutrition”-wide association study or “toxicant”-wide association study. Indeed, an “environment”-wide association study has already been published in the field of diabetes.2
The recent success of GWAS is evident in the numerous new findings that have achieved consensus across multiple studies,3 especially for diabetes, common forms of cancer, and cardiovascular traits. These discoveries have not only highlighted some known biologic pathways but, more impressively, have pointed research in new directions, identifying previously unsuspected genes for various conditions. There is great potential for translational success of this field, with novel findings from population-based studies driving new basic science research and ultimately new therapeutic research.4 A further benefit of this low FP:FN ratio in the genomics era is the decrease in false-positive results and their consequent or mixed messages in media, regulations, and policy. A low FP:FN ratio implies that most positive findings are true associations, and thus precious resources and time allotted to follow-up those findings are used efficiently. Nonetheless, it might be premature to declare “mission accomplished,” as the many GWAS discoveries have not had ample time to bear the fruits of translational work.
The genetic epidemiology example, at first glance, is indeed a happy model of successful implementation of discovery and efficient follow-up. However, there are costs to this approach. First, for every conclusion, the odds are that it is wrongly negative rather than wrongly positive. Although this indeed reserves precious resources spent on lengthy follow-up, it also means that many truths about the biology of a disease will remain undiscovered, and potentially important paths to translation will not be pursued. Specifically, in the GWAS setting, many scientists are concerned that these findings in the suggestive statistical significance range (eg, 10−4 > P > 10−7), all presently considered “negative,” may indeed harbor many false negatives that will remain undiscovered. If maintenance of this ratio was adhered to strictly, genetic epidemiology research would quickly hit a discovery ceiling, leaving many undiscovered truths untouched. Genetic epidemiologists fully acknowledge this and are pushing alternative strategies to the strict GWAS criteria to allow elucidation of these false negatives. This will likely mean changing design approaches and statistical significance criteria, 2 of the reasons highlighted by Ioannidis et al1 as reasons for the very low FP:FN ratio. Thus, these alternative strategies, which the field realizes are necessary, will not maintain the level of FP:FN the authors recommend. Further, new strategies are moving beyond statistical criteria alone, and instead seek ways to incorporate other realms of evidence into the decision making about follow-up of a genetic association including known biologic plausibility, animal-modeling evidence, in vitro evidence, and so on. One example is a recent movement to focus on a selected set of SNPs from a GWAS representing a particular biologic pathway, and to incorporate other evidence into the interpretation of these results, embracing many of the tenets of candidate-gene approaches. Therefore, indeed, the genetic epidemiology world may be circling back to the starting model, even though better informed about the perils of poor study design, lack of replication, and so on. Another drawback of holding to stringent threshold for significance is the overwhelmingly large samples sizes required in GWAS studies. On the one hand, the development of large consortia has led to unprecedented collaborations; however, large consortia studies also tend not to be conducive to more innovative or detailed analyses. This concern may become even more paramount as the field begins examining gene-by-environment interactions, which may require more sophisticated modeling.
Should other fields of epidemiology strive for a lower FP:FN ratio through design and statistical criteria? As previously mentioned, the movement to very low FP:FN ratios comes at a potentially nontrivial cost—a cost that the genetic epidemiology field has had to acknowledge and shift strategies to accommodate. More importantly, it is unclear as to whether the rest of epidemiology is indeed as biased toward high FP:FN ratios as implied by Ioannidis and colleagues.1 Epidemiology, using its current FP:FN ratio, has been successful in identifying risk factors for such important public health issues, such as smoking as a causal risk factor for lung cancer and high cholesterol as a risk factor for coronary heart disease. Further, as pointed out by Ioannidis et al, the optimal target for this ratio will depend on the philosophy and needs of particular fields. The optimal ratio is a function of several considerations, including urgency/severity of problem, ease of follow-up, costs, and policy implications. If the severity or urgency is high, a field may not tolerate false negatives and would gladly sacrifice false positives to capture more true associations. Similarly, if follow-up measures to filter out false positives among all positive findings are affordable and timely, a larger FP:FN ratio may be desired. On the other hand, if the cost for follow-up to distinguish false positives from true positives is high, then the field may want a GWAS-like FP:FN ratio. In some fields of epidemiology, failure to identify a true risk factor, such as a toxicant or behavior that could be modified, could be devastating for public health. Each field would have to weigh the consequences of policy and messaging for a wrongly negative report to the consequences of wrongly naming a factor as positively associated.
If a particular area of epidemiology is ready to shift toward this new paradigm, issues of feasibility are not trivial. In the genetic epidemiology example, the transition to a fractional ratio was a result of very stringent family-wise type-1 error control, large consortia analyses, limited standardized methods to assess the markers, and agnostic screens of all possible markers. Despite the optimism of Ioannidis and colleagues about applying these methods to other areas of epidemiology, the analogy may simply not be feasible to implement. For example, were one interested in associations of dietary factors' with colon cancer, it is unclear that an exhaustive agnostic set of dietary factors could be conceived and assessed uniformly across studies and populations. Thus, setting an informed “per-test” alpha to accommodate multiple testing would be difficult, as would achieving any stringent threshold without combining multiple studies. In this situation, the combination of studies to achieve sufficiently large sample size may be plagued with intractable issues of measurement heterogeneity, measurement error, true etiologic heterogeneity across cultures/ethnicities, and differential context problems (such as different prevalences of measured and unmeasured confounders and effect modifiers).
In summary, the GWAS model of a very low FP:FN ratio has served its initial purpose in the field of genetic epidemiology by identifying likely true biologic pathways with hope for efficient translation. However, the field has recognized the pitfalls of this paradigm, and is promoting alternative complementary strategies that do not rely on characteristics that have driven this movement to low FN:FP ratios—namely strict adherence to statistical evidence based on very stringent family-wise error rate control in a frequentist setting, and the large sample sizes needed in such a setting. Instead, these complementary approaches seek converging evidence from multiple lines of research. This lesson should provide caution as fields seek a movement toward this paradigm. It may indeed be that no one strategy is ideal, and that if a movement toward the characteristics that drive low FP:FN ratios is pursued, it should be with full realization of the pitfalls as well as the promises. Further, the features of genetic studies that made this shift possible may not translate well into many areas of epidemiology, and such a movement may simply never be feasible.
ABOUT THE AUTHORS
DANIELE FALLIN is Associate Professor of Epidemiology at the Johns Hopkins University Bloomberg School of Public Health. She works in the area of genetic epidemiology of neuropsychiatric disorders, and is currently studying genetic and environmental risk factors for autism. LINDA KAO is also Associate Professor of Epidemiology at Johns Hopkins. She has worked in the area of genetic epidemiology of diabetes and kidney disease and has recently identified several genetic loci for renal function through genome-wide association studies.