If all risk factors were created equal, then the last 4 years would qualify as a golden age of observational epidemiologic risk-factor discovery. Since the publication of the first genome-wide association studies1 (GWAS), more than 950 papers have reported new associations between more than 1400 genetic variants and a wide variety of diseases and traits. All risk factors are not, of course, equal, and these GWAS-discovered variants are relatively “weak” risk factors (most with relative risks of 1.05 to 1.50 per allele). Moreover, these are not modifiable factors with direct potential to reduce disease incidence. The chief utility of these genetic variants is likely to be in improved understanding of disease mechanisms2 and (potentially) in identification of persons at higher or lower risk of specific diseases.3
The speed and robustness of these discoveries stands in sharp contrast to much of the previous 3 or 4 decades of observational epidemiologic research in noncommunicable diseases. Few “new” discoveries in chronic-disease epidemiology have been accepted as valid on first publication; indeed, Ioannidis has proposed that “most published research findings are false.”4 A process of claim and counter-claim, meta-analysis, and expert reports is usually required before acceptance of a new risk factor. As Ioannidis and colleagues5 point out, the ratio of true-positive to false-negative reports in genetic epidemiology has flipped since the advent of GWAS. What are the epidemiologic methods that have permitted this sudden bounty, and are there lessons for observational epidemiology of nongenetic risk factors?
Key Characteristics of the GWAS
Sample Size Rules
The biggest difference between GWAS and past approaches in epidemiology has been the consortium building that has been necessary to obtain the large sample sizes needed to drive P values to genome-wide significance (now customarily P < 5 × 10−8). We have known for decades that an effect size calculated from a large study is more likely to be a true positive than one calculated from a small study.6 Sample sizes of 10,000 cases or more are routine for multistage GWAS of diseases; the latest analyses of genetic variants and height (by the GIANT consortium) include data from more than 180,000 persons.7 There is a long tradition of meta-analysis in epidemiology, although most meta-analyses accumulate data from published studies, many of which are individually underpowered. The GWAS have, by and large, involved de novo analysis of multiple studies, without requiring them to have been published individually. In this way, GWAS has proceeded much faster, without the back-and-forth findings that have preceded many meta-analyses.
New Methods of Collaboration Are Key
Epidemiology has a long tradition of large-scale analyses and data-sharing of both primary raw data and preanalyzed study-specific data, but these have generally involved a single data-analysis center collating data from multiple studies—often with in-person meetings of investigators to review results. Examples include the Pooling Project of Prospective Studies of Diet and Cancer8 and the Collaborative Group on Hormonal Factors in Breast Cancer.9 Both of these have summarized evidence from more than 20 studies, with analyses usually taking several years. In contrast, GWAS collaborations often convene by teleconference and move to completion on a scale of months. Due to data-privacy and confidentiality concerns, primary data are not always shared; instead, the process involves the exchange of study-specific summary results that are combined by meta-analysis.
Relaxation of Methodological Orthodoxy—the “Flight to Quantity”
To the chagrin of some epidemiologists, these large sample sizes are not reached by combining studies that all meet standard criteria for high-quality epidemiologic study design. Many studies included in multistage GWAS are essentially case series, with “public” controls, ie, a database of single nucleotide polymorphism (SNP) prevalences from a group such as blood donors, or cases or controls from a different study. The canonical example is the Wellcome Trust Case Control Consortium,10 in which 7 case series of various diseases were initially compared with controls from 2 sources: (1) blood donors and (2) the 1958 Birth Cohort—a sample of members of an older cohort who were recontacted in 2003 and consented to give a blood sample.11 The largest single collection of GWAS studies has been conducted in Iceland by deCODE genetics, Inc., with a design in which a case series of each disease was compared with “controls,” mostly case series of other diseases studied.12 It is probably fair to say that neither of these designs would have been reviewed with enthusiasm by a typical NIH study section of epidemiologists and statisticians.
There is a recurring tension in our discipline between “quality” and “quantity”—ie, between smaller studies meeting the best-design principles and having high quality exposure measurements, and larger studies that fall short of best-design principles and have poorer measurements of exposure. In some circumstances, a smaller study, with less misclassification of exposure, can be more powerful than a much larger study with a more misclassified exposure13 or with less optimal control selection.
However, in the GWAS era, there has been a flight to quantity, driven by the need for the large sample sizes described earlier in the text. By and large, the most successful consortia have maximized sample size by including all possible case series regardless of “quality” or the source of controls. Given that epidemiologic best practices are designed mainly to minimize information bias, selection bias, and confounding, are the GWAS immune to these sources of bias?
Sources of Bias and Error Applied to GWAS
Much attention in study design is directed at ensuring that an observed association is not due to differential quality of information between cases and controls. Here, technology has handed the GWAS a large advantage in exposure assessment. Completeness rates for genotyping genetic variants are usually >95%, and concordance rates comparing 2 methods of genotyping are often >99%. There have been some exceptions; for example, Clayton et al14 observed that genotyping of samples on DNA prepared and extracted in 2 different laboratories might lead to small differences in the genotype calling, potentially causing differential bias and false-positive findings. If the problem is recognized, however, statistical methods can be applied to attenuate this bias.
If the best defense against selection bias is to understand the population that gives rise to the cases and to obtain high participation rates, many of the studies used in GWAS are grossly inadequate. As described earlier in the text, GWAS control series are often not temporally or geographically matched to cases (eg, UK studies use “public” controls from the USA, and vice versa). Participation rates of controls in the underlying studies (on the rare occasions that these are even reported) are often low due to the additional requirements for both a blood sample and informed consent or data deposition in the NIH-administered database (a requirement for NIH-funded studies). This would seem to be a recipe for false positives due to selection bias; however, very few examples of false-positive findings seem to have occurred. One possible explanation may be that the departure from what is usually recommended as sound epidemiologic practice (amalgamation of several “control” series of varying provenance) may actually balance out the biases in a single control series. A more likely reason is that the sources of potential bias (eg, lack of geographic matching and refusal to participate) are simply not as strongly associated with genotype frequency as they often are with environmental exposures.
Before the GWAS era, the major threat to the validity of genetic studies using unrelated controls was usually said to be “population stratification” (ie, confounding by ethnicity, in which allele frequency differences between cases and controls were not due to the allele being causal but due to differential ancestry differences between cases and controls that led any ancestry-associated allele difference of sufficient magnitude to be observed as a case-control difference). This source of bias was sufficiently feared as, in 1994, experts advocated that parental or family-based controls “should be routinely used” to closely match on ethnic ancestry.15 Simulations showed that this problem was probably not as large or intractable as many geneticists feared, assuming that studies followed such basic epidemiologic principles as matching or controlling for self-reported race/ethnicity.16 The ultimate solution turned out to be in the GWAS themselves.17 Approaches such as principal components analysis of the vast number of gene variants measured in the GWAS permit identification of case-control differences in population substructure; these can then be controlled for by standard statistical analyses. However, it is commonly reported that controlling for population substructure makes little empirical difference, as long as basic steps have been taken to match cases and controls on self-reported race/ethnicity.
An early insight of GWAS analyses was the danger inherent in prioritizing any of the single nucleotide polymorphisms (SNPs) on the SNP microarrays over any other SNPs. Many of us had assumed that the usual pecking order of probable functional variants in the genome (roughly, exons > introns > putative promoter regions > intergenic regions) would be applied to genome-wide analysis by weighting these regions according to the “priors” implied by this hierarchy. In fact, relatively few of the GWAS-identified variants have been in exons, and the largest single category is in intergenic regions—a category that would have been downweighted if the above priors had been attached.18
In parallel with this, more than 10 years of “candidate gene” studies yielded few reproducible disease-associated variants and a plethora of unreproduced associations.19 Similarly, large-scale and comprehensive analyses of candidate-gene “pathways,” such as the steroid hormone and insulin-like growth factors pathways for breast and prostate cancer, have shown poor yield.20,21 Opinions differ on why the yield is poor, and some still retain optimism for the approach. Perhaps we are simply still too ignorant about disease mechanisms to create strong priors for most diseases, and a virtuous cycle may emerge in which improved biologic knowledge, partly derived from GWAS findings, improves our choice of priors. Although we continue to have a confidence in the importance of inherited contributions to most diseases, efforts to identify specific hypotheses for specific genes (in the absence of prior linkage or other genetic information) have been almost futile.
Most of us were trained to worship hypotheses, and indeed the cry “What is the hypothesis?” is still regularly heard at study sections. The GWAS have shown us that, under some circumstances, specific hypotheses only distract, and that “data mining” (the polite term for “data dredging”), combined with a ruthless agnosticism about prior hypotheses and insistence on replication, is the shortest path to the truth. Of course, once discovered, these gene variants become “priors,” and in subsequent studies they are often reproduced at levels of significance that may fall short of genome-wide significance (eg, 10−2 to 10−7) due to limited power.
Broad hypotheses may be critical in the design of studies—if the risk factor is not “hypothesized,” there is little incentive to invest the time in designing and obtaining measurements of the appropriate exposure. We would do well to recognize, however, that most contemporary observational studies do not test a single hypothesis, or even a small number of hypotheses (no matter what was said in the “specific aims” section of the grant proposal). Instead, studies attempt to gather a large amount of data on all evaluable exposures of proven or possible relevance to the disease under study. This is particularly true of cohort studies, which are usually designed to investigate all disease outcomes that can be reliably ascertained. Even if every exposure measured in a cohort study can be attached by the investigators to a specific disease hypothesis, there is nothing to stop the imaginative investigator from “hypothesizing” an association with any of the other diseases ascertained. The problem arises when we test these at a low level of statistical stringency (eg, P < 0.05), without acknowledging the large number of statistical tests either made or implicit in the existence of multiple exposures and multiple diseases. One solution often proposed—paying attention only to a limited number of prespecified “primary hypotheses”—is enormously wasteful of data. One lesson of GWAS is that we might be better off doing a large amount of statistical testing, but being more rigorous in acknowledging the cumulative numbers of comparisons being made. We could insist on more extreme levels of statistical significance before we declare a “positive,” while speedily aggregating sample size across studies to minimize false negatives. The debate over multiple comparisons has a long history preceding even the articles in the earliest issues of this journal.22 – 24 Of course, insistence on any arbitrary significance threshold, be it 0.05 or 5 × 10−8, makes little epistemologic sense. However, the lesson of GWAS is to remind us that at the very least, we should be skeptical of claims for any new association at a marginal level of statistical significance.
In GWAS, what we gain by insisting on agnosticism and very small P values to maximize true positives, we lose by limiting study power and thus declaring false negatives. This has led to the criticism that the proportion of the heritability for most diseases explained by the true-positive variants discovered so far is small—a problem often referred to as “missing heritability.”25 However, recent analyses and simulations suggest that, ultimately, a much higher proportion will be explained when larger sample sizes increase power and more true positives are declared.26 Preliminary results suggest that discovery of rare variants in the neighborhood of common GWAS variants may substantially increase the amount of heritability explained.27
Small Effect Sizes Are Reproducible if Derived From Large Studies
Most of us were trained to be intensely suspicious of weak associations, even if derived from large studies or meta-analyses, because a small relative risk (RR) could be due to information or selection biases operating in the same direction across multiple studies. Weak in this context often meant RRs of 1.3 or so, and RRs of 1.1 or 1.05 seemed out of reach of conventional observational studies. Reproducible discoveries of this magnitude have now been made in GWAS, with low false-positive rates, moving the bar for accurate resolution of epidemiologic associations. The requirement for rigorous replication and extreme statistical significance before making a claim for association is a process that would almost certainly reduce false positives in nongenetic epidemiology.
Applications to Epidemiology Beyond GWAS
It should be acknowledged that the GWAS are privileged with respect to some of the aforementioned sources of bias and error. Without strict calibration studies, laboratory differences and drift are often sources of substantial heterogeneity in studies of continuous biomarkers, and DNA has a robustness to storage methods and duration of storage that is not available in measurements made in plasma or serum. Selection bias is much more likely to influence environmental-exposure prevalences than SNP prevalences, which appear to be infrequently correlated with selection probability. A similarly absent or low correlation of SNPs with most environmental exposures means that conventional confounding of SNPs by lifestyle causes is uncommon (with the notable exception of SNPs that influence addictive behaviors such as smoking). As more studies of gene-environment interaction are performed, heterogeneity in environmental-exposure methods becomes more problematic. Thus, the lessons from GWAS cannot be applied blindly or without careful attention to the specific exposure-disease relationships under study. There will always be a place for the single study of the right association. The lesson of the GWAS, however, is that these studies are likely to be exceptional, and that we should be wary of claiming special insight from a single, weakly powered, study—particularly if it is our own!
Epidemiology and “The Art of the Soluble”
The GWAS demonstrate that advances in epidemiology are often made when new methods of exposure measurement can be applied to testing exposure-disease associations, an application of what Sir Peter Medewar called “The art of the soluble.”28 The availability of information from the International HapMap (about the distribution of common genetic variants in the genome), together with the technical means to measure hundreds of thousands of these variants in a single DNA sample, is what has made the GWAS possible. Some epidemiologists saw this as a technology-driven bandwagon or passing fad, discouraging younger epidemiologists from getting involved. To be fair, the long-term contribution of GWAS to the understanding and prevention of diseases remains to be seen. By the measure of number of new and reproducible associations between risk factors and disease, however, it is hard to argue with the success of GWAS. A major lesson from the GWAS is that epidemiologists need to be trained to understand the potential of new measurement technologies and to discriminate between those that offer robust measurements that may advance the field from those that are insufficiently robust or are unlikely to measure causal exposures. Getting in on the ground floor of informative new technologies, genetic or otherwise, is to be recommended. The individual genome sequence is next, and its focus on rarer genetic variants will only reinforce the need for large-scale collaborative studies to address the exigencies of sample size and power.
1. Hindorff LA, Junkins HA, Hall PN, et al.. A catalog of published genome-wide association studies. Available at: www.genome.gov/gwastudies
. Accessed December 1, 2010.
2. Hunter DJ, Altshuler D, Rader DJ. From Darwin's finches to canaries in the coal mine—mining the genome for new biology. N Engl J Med. 2008;358:2796–2803.
3. Wacholder S, Hartge P, Prentice R, et al.. Performance of common genetic variants in breast-cancer risk models. N Engl J Med. 2010; 363: 2272.
4. Ioannidis JP, et al.. Why most published research findings are false. PLoS Med. 2005;2:e124.
5. Ioannidis JP, Tarone R, McLaughlin JK, et al.. The false-positive to false-negative ratio in epidemiologic studies. Epidemiology. 2011;22:450–456.
6. Peto R, Pike MC, Armitage P, et al.. Design and analysis of randomized clinical trials requiring prolonged observation of each patient. I: Introduction and design. Br J Cancer. 1976;34:585–612.
7. Lango AH, Estrada K, Lettre G, et al.. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–838.
8. Smith-Warner SA, Spiegelman D, Ritz J, et al.. Methods for pooling results of epidemiologic studies: the pooling project of prospective studies of diet and cancer. Am J Epidemiol. 2006;163:1053–1064.
9. Collaborative Group on Hormonal Factors in Breast Cancer. Breast cancer and hormone replacement therapy: collaborative reanalysis of data from 51 epidemiological studies of 52,705 women with breast cancer and 108,411 women without breast cancer. Lancet. 1997;350:1047–1059.
10. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature. 2007;447:661–678.
11. Power C, Elliott J. Cohort profile: 1958 British birth cohort (National Child Development Study). Int J Epidemiol. 2006;35:34–41.
12. Gudmundsson J, Sulem P, Steinthorsdottir V, et al.. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet. 2007;39:977–983.
13. McKeown-Eyssen GE, Tibshirani R. Implications of measurement error in exposure for the sample sizes of case-control studies. Am J Epidemiol. 1994;139:415–421.
14. Clayton DG, Walker NM, Smyth DJ, et al.. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet. 2005;37:1243–1246.
15. Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265:2037–2048.
16. Wacholder S, Rothman N, Caporaso N. Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. J Natl Cancer Inst. 2000;92:1151–1158.
17. Price AL, Patterson NJ, Plenge RM, et al.. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909.
18. Hindorff LA, Sethupathy P, Junkins HA, et al.. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Science USA. 2009;106:9362–9367.
19. Hirschhorn JN, Lohmueller K, Byrne E, et al.. A comprehensive review of genetic association studies. Genet Med. 2002;4:45–61.
20. Beckmann L, Husing A, Setiawan VW, et al.. Comprehensive analysis of hormone and genetic variation in 36 genes related to steroid hormone metabolism in pre- and postmenopausal women from the breast and prostate cancer cohort consortium (BPC3). J Clin Endocrinol Metab. 2011;96:E360–E367.
21. Canzian F, Cox DG, Setiawan VW, et al.. Comprehensive analysis of common genetic variation in 61 genes related to steroid hormone and insulin-like growth factor-I metabolism and breast cancer risk in the NCI breast and prostate cancer cohort consortium. Hum Mol Genet. 2010;19:3873–3884.
22. Poole C.. Multiple comparisons? No Problem! Epidemiology. 1991;2:241–243.
23. Greenland S, Robins JM. Empirical-Bayes adjustments for multiple comparisons are sometimes useful. Epidemiology. 1991;2:244–251.
24. Rothman KJ. No adjustments are needed for multiple comparisons. Epidemiology. 1990;1:43–46.
25. Manolio TA, Collins FS, Cox NJ, et al.. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753.
26. Lee SH, Wray NR, Goddard ME, et al.. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88:294–305.
27. Sanna S, Li B, Mulas A, et al.. Fine mapping of five loci associated with low-density lipoprotein cholesterol detects variants that double the explained heritability. PLoS Genet. 2011; 7: e1002198.
28. Medawar PB. The Art of the Soluble. London: Methuen; 1969.