The estimated effect of a marker allele from the initial study reporting the marker-allele association is often exaggerated relative to the estimated effect in follow-up studies (the “winner's curse” phenomenon). This is a particular concern for genome-wide association studies, where markers typically must pass very stringent significance thresholds to be selected for replication. A related problem is the overestimation of the predictive accuracy that occurs when the same data set is used to select a multilocus risk model from a wide range of possible models and then estimate the accuracy of the final model (“over-fitting”). Even in the absence of these quantitative biases, researchers can over-state the qualitative importance of their findings—for example, by focusing on relative risks in a context where sensitivity and specificity may be more appropriate measures. Epidemiologists need to be aware of these potential problems: as authors, to avoid or minimize them, and as readers, to detect them.
From the Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA.
Submitted 9 May 2008; accepted 30 May 2008.
Editors’ Note: Related articles appear on pages 652, 655, and 657.
Correspondence: Peter Kraft, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, 665 Huntington Avenue, Building 2 Room 207, Boston MA 02115. E-mail: firstname.lastname@example.org.
Considering the recent avalanche of results linking common genetic variation to common, complex diseases and traits, Ioannidis’ article1 is very timely. As he points out, genome-wide association studies have avoided some of the problems that plagued earlier studies of candidate markers or genes, including small sample size, inappropriate significance thresholds relative to the large number of markers tested and low prior probability that any particular marker is truly associated with the studies trait, and a rather loose definition of replication.2–6 These problems have given genetic association studies a reputation for generating more than their fair share of false-positive results and caused a great deal of confusion in the scientific and lay press. Now that genome-wide association studies have discovered and replicated over 100 robust marker-trait associations,7 there is understandable interest in characterizing these associations,8 that is, in estimating how the trait varies across marker genotypes. Ioannidis reminds us that we need to use caution when interpreting the effect estimates coming out of initial discovery studies, due to the “winner's curse” phenomenon.9 (This may be less of an issue as we move beyond the first wave of genome-wide association studies that largely attempted only to replicate a very small number of markers that had achieved a stringent significance level in the initial scan. As multiple groups begin to collaborate and pool data10,11 or as larger numbers of markers are genotyped in large replication samples,12,13 the statistical power to detect modest effects will increase greatly and hence the bias in effect estimates will decrease.) I would like to expand on several of his points as they pertain to genetic association studies.
First, arguably the primary goal of genome-wide association studies is to identify loci that influence traits; estimating the effects of marker alleles at those loci is secondary. In fact, to increase power genetic studies many have adopted designs that will yield upwardly biased estimates of a marker allele's effect in the general population, such as comparing cases who have a family history of disease with general population controls14 or comparing cases with “hypernormal” controls (eg, men with low PSA levels15). Sometimes unexpected discoveries (such as the links between variants in the gene-poor 8q24 region and several cancers,16–18 or variants near JAZF1 and prostate cancer and diabetes risk19,20) may lead to novel hypotheses about the basic biology underlying these diseases. This is true regardless of the size of the marker alleles’ effects. Indeed, because the markers discovered via genome-wide association studies are almost certainly surrogates for the causal variant, the effect of a marker allele may not be a good estimate of the effect(s) of the causal allele(s). Similarly, the effect of a causal allele does not necessarily predict the ultimate clinical utility of a treatment that (loosely speaking) targets a pathway implicated by the causal locus—the treatment effect could be much greater. Nonetheless, we should heed Ioannidis’ warning that claims that genetic associations “can be translated into a major benefit for treatment of diseases” have rarely been born out.
Ioannidis briefly touches on inflated estimates of gene–environment interaction parameters, but it is worth mentioning that there is also a tendency to inflate the qualitative interpretation of gene–environment and gene–gene interactions. Setting aside problems of low power and lack of replication that have plagued these interactions analyses, it is difficult, if not impossible, to draw conclusions about the underlying biology from the observation that disease risk (or mean trait levels) do not follow an additive model on some scale.21,22 For example, departures from a multiplicative odds ratio model for disease risk across categories defined by the genotypes at 2 risk loci does not imply that the corresponding gene products physically interact, nor does physical interaction imply nonmultiplicative interactions.
Finally, I would like to discuss a topic that Ioannidis did not explicitly raise here (although he has elsewhere23), namely, the dangers of quantitative and qualitative over-interpretation in the context of risk prediction. The large public and private investments in genetic research over the last 25 years have been justified not only by the goal of a better understanding of basic biology, but also the hope that multiple genetic markers might be used to identify high-risk individuals who could benefit from therapeutic interventions to prevent disease.24 With the recent replicated discoveries of multiple common alleles linked to disease risk, we are finally able to take the first steps toward that goal (earlier attempts were hampered by their reliance on markers that proved to be false positives25–27). For example, although individual risk alleles for cardiovascular disease or prostate cancer each convey only a small increase in risk, combining them can explain a much larger proportion of the variation in risk.19,28 However, it is frustratingly easy to present inflated estimates of the association of disease with risk a combination of risk alleles and other known risk factors. To name just 1 problem: researchers will often fit many different models mapping multilocus genotypes to risk based on different assumptions about which loci to include, whether to simply count the number of risk alleles or allow effects to differ across loci, whether to model risk at each locus additively or dominantly, whether to consider gene–gene or gene–environment interactions, etc. They then present the model with lowest deviance or training-sample prediction error, leading to overestimates of model performance in future data sets.29 This over-fitting can be ameliorated or avoided by reporting all of the models considered and the process used to select the final model, and by estimating test-sample error using an independent test data set or cross-validation.30
Moreover, even when the association between a set of alleles is accurately estimated, there is a danger that the qualitative implications of the association will be exaggerated.31 If alleles interact multiplicatively on the odds ratio scale (and to date there is little evidence for nonmultiplicative interactions32,33), then very quickly one can report impressive-sounding odds ratios of 5, 10, or 20 comparing the very small extreme of the population carrying many risk alleles to the small extreme carrying very few. However, risk factors with such large effects on relative risk do not necessarily translate into good predictors that can distinguish, with high sensitivity and specificity, those who will go on to get disease from those who will not.34,35 Even predictors with good sensitivity and specificity can have poor positive predictive values if the disease is rare. This is important because many people will not request a genetic test “for entertainment purposes only”—nor are these tests typically marketed that way (outside of the fine print). Those classified as high-risk based on their genetic profile will want to take action to reduce their risk. If the positive predictive value is low, most people exposed to the risks of preventive intervention will not see any benefits (presuming an effective risk-reducing intervention even exists).
Clearly, despite the possibility of developing genetic tests that are useful for some diseases in some contexts (eg, in the presence of family history of disease), companies that are marketing genetic tests (especially direct-to-consumer genetic tests) have a financial incentive to exaggerate the importance of tests that may be about as effective as the Magic 8-Ball I keep on my desk. As Ruth Hubbard warned in the early 1990s: “[T]he best candidates for mass marketing are predictive diagnostic tests that could be conducted on large numbers of healthy people. If an atmosphere can be generated in which none of us feels safe until we have assessed the likelihood that we or our children will develop sundry diseases and disabilities, we will be willing to support this new industry in the style to which it would like to become accustomed.”36 Academics are not immune from pressures to oversell their results to pad their citation indices and get grants. Minimizing these often-unconscious biases will require discipline on the part of authors and vigilance on the part of readers. The principles that Ioannidis outlines (Table 3 of his paper) can serve as a guide to both.
1. Ioannidis JPA. Why most discovered true associations are inflated. Epidemiology
2. Hirschhorn JN, Altshuler D. Once and again—issues surrounding replication in genetic association studies. J Clin Endocrinol Metab
3. Ioannidis JP. Why most published research findings are false. PLoS Med
4. Wacholder S, Chanock S, Garcia-Closas M, et al. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst
5. Mutsuddi M, Morris DW, Waggoner SG, et al. Analysis of high-resolution HapMap of DTNBP1 (Dysbindin) suggests no consistency between reported common variant associations and schizophrenia. Am J Hum Genet
6. Chanock SJ, Manolio T, Boehnke M, et al. Replicating genotype–phenotype associations. Nature
7. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest
8. Thomas D. Gene characterization studies: an overview. Monogr Natl Cancer Inst
9. Lohmueller KE, Pearce CL, Pike M, et al. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet
10. Loos RJ, Lindgren CM, Li S, et al. Common variants near MC4R are associated with fat mass, weight and risk of obesity. Nat Genet
11. Lettre G, Jackson AU, Gieger C, et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet
12. Zhong H, Prentice RL. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics.
2008 Feb 28. [Epub ahead of print].
13. Yu K, Chatterjee N, Wheeler W, et al. Flexible design for following up positive findings. Am J Hum Genet
14. Easton DF, Pooley KA, Dunning AM, et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature
15. Eeles RA, Kote-Jarai Z, Giles GG, et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nat Genet
16. Tomlinson I, Webb E, Carvajal-Carmona L, et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24. 21. Nat Genet
17. Witte JS. Multiple prostate cancer risk variants on 8q24. Nat Genet
18. Fletcher O, Johnson N, Gibson L, et al. Association of genetic variants at 8q24 with breast cancer risk. Cancer Epidemiol Biomarkers Prev
19. Thomas G, Jacobs KB, Yeager M, et al. Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet
20. Zeggini E, Scott LJ, Saxena R, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet
21. Thompson W. Effect modification and the limits of biological inference from epidemiologic data. J Clin Epidemiol
22. Greenland S, Rothman K. Concepts of interaction. In: Rothman K, Greenland S, eds. Modern Epidemiology
. Philadelphia: Lippincott Williams & Wilkins; 1998.
23. Ioannidis JP. Microarrays and molecular research: noise discovery? Lancet
24. Collins FS. Shattuck lecture—medical and societal consequences of the Human Genome Project. N Engl J Med
25. Haga SB, Khoury MJ, Burke W. Genomic profiling to promote a healthy lifestyle: not ready for prime time. Nat Genet
26. Davey Smith G, Ebrahim S, Lewis S, et al. Genetic epidemiology and public health: hope, hype, and future prospects. Lancet
27. Janssens AC, Gwinn M, Bradley LA, et al. A critical appraisal of the scientific basis of commercial genomic profiles used to assess health risks and personalize health interventions. Am J Hum Genet
28. Kathiresan S, Melander O, Anevski D, et al. Polymorphisms associated with cholesterol and risk of cardiovascular events. N Engl J Med
29. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning
. New York: Springer; 2001.
30. Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics
31. Hunter DJ, Khoury MJ, Drazen JM. Letting the genome out of the bottle—will we get our wish? N Engl J Med
32. Hill WG, Goddard ME, Visscher PM. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet
33. Maller J, George S, Purcell S, et al. Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nat Genet
34. Ware JH. The limitations of risk factors as prognostic tools. N Engl J Med
35. Wald NJ, Hackshaw AK, Frost CD. When can a risk factor be used as a worthwhile screening test? BMJ
36. Hubbard R, Wald E. Exploding the Gene Myth: How Genetic Information is Produced and Manipulated by Scientists, Physicians, Employers, Insurance Companies, Educators and Law Enforcers
. Boston: Beacon Press; 1999.