From the aDepartment of Medicine, Cardiovascular Health Research Unit, University of Washington, Seattle, WA; bDepartment of Epidemiology, University of Washington, Seattle, WA; cDepartment of Health Services, University of Washington, Seattle, WA; and dGroup Health Research Institute, Group Health Cooperative, Seattle, WA.
Editors’ note: This series addresses topics that affect epidemiologists across a range of specialties. Commentaries are first invited as talks at symposia organized by the Editors. This paper was originally presented at the 45th Annual Meeting of the Society for Epidemiologic Research (SER) in Minneapolis, MN, 2012.
Supported, in part, by grants HL078888, HL080295, HL085251, HL087652, HL103612, HL105756 from the National Heart, Lung, and Blood Institute.
B.M.P. serves on the Data Safety and Monitoring Board for a clinical trial of a device funded by Zoll LifeCor and on the Steering Committee for the Yale Open Data Access Project funded by Medtronic.
Correspondence: Bruce M. Psaty, Cardiovascular Health Research Unit, 1730 Minor Avenue, Suite 1360, Seattle, WA 98101. E-mail: firstname.lastname@example.org.
Genome-wide association studies (GWAS) involve an unbiased mapping effort to identify common genetic loci that typically have a minor allele frequency of greater than 5% and are associated with phenotypes of interest. The development of genotyping arrays has enabled relatively low-cost genetic studies in unrelated people. The primary goal is new insights about mechanisms rather than prediction or public health interventions. This agnostic approach to biology sometimes leads to new discoveries about biology. As of May 2012, the National Human Genome Research Institute website had identified 1261 GWAS publications reporting 6411 genetic loci associated with a variety of phenotypes.
In response to a request for applications that funded about a dozen projects, the Cardiovascular Health Study (CHS) received funding for GWAS genotyping and analysis. The CHS is a prospective cohort study of 5888 older adults recruited at four sites in 1989–1993.1 At baseline, participants underwent a detailed examination that included traditional risk factors, biomarkers, and measures of subclinical disease. The participants returned for additional visits and examinations, and they have been followed for the occurrence of cardiovascular events. Although the CHS has a rich set of cardiovascular and aging phenotypes, the focus of the CHS GWAS was on the incidence of myocardial infarction, stroke, and heart failure in the 4000 participants who were free of cardiovascular disease at baseline.
As the CHS genotype data were becoming available in 2007, GWAS were reporting associations with relative risks typically in the range of 1.2–1.3.2 For these effect sizes and with corrections for the multiple testing in GWAS, CHS by itself was clearly underpowered to achieve its main aims. CHS investigators were not the only ones funded to conduct GWAS, and CHS was not the only study to find itself underpowered to meet its primary aims. The search for improved power and replication—a side effect of the GWAS technology—brought a number of investigators together on conference calls. Between June 2007 and February 2008, the Cohorts for Heart and Genomic Epidemiology (CHARGE) Consortium emerged as a voluntary federation among five cohort studies,3 (the Age, Gene/Environment Susceptibility—Reykjavik Study; the Atherosclerosis Risk in Communities Study; CHS; the Framingham Heart Study; and the Rotterdam Study).1,4–9 The organizing principle of the consortium was the cohort study design rather than any specific phenotype. These prospective cohort studies have multiple cardiovascular or aging phenotypes in common; and although investigators from several cohorts had occasionally collaborated previously,10,11 there was no precedent for a consortium of cardiovascular epidemiology cohorts. By 2008, however, it became clear that a cohort-level collaboration would facilitate a series of prospectively planned GWAS analyses.
The organizational structure is simple: there is a Research Steering Committee, an Analysis Committee, a Genotyping Committee, and approximately 35 phenotype-specific Working Groups. The Research Steering Committee is responsible for establishing the other committees; for nominating Working Group members; and for developing general guidelines for collaboration, authorship, sharing of results, publication, and timely participation. The Analysis Committee develops guidelines that the Working Groups are encouraged to adopt or adapt, and the Committee solves analytic problems that arise as those plans are implemented. The Genotyping Committee coordinates consortium-wide genotyping. The Research Steering Committee and Analysis Committee recommendations are advisory, and the primary decision-making authority rests with the Working Groups.
As Hunter12 has pointed out, the “flight” to large sample sizes has helped to create new methods of collaboration. The wide range of health-related phenotypes measured in these population-based cohort studies quickly spawned about 35 phenotype-specific Working Groups. The Working Groups harmonize phenotypes across the cohorts, decide whether and how to include nonmember studies with similar phenotypes, and agree on prespecified analysis plans with input from the Analysis Committee. The phenotype Working Groups develop plans for authorship, evaluate results, write manuscripts, and decide on follow-up genotyping or functional work. Many CHARGE Working Groups have already engaged investigators from nonmember studies as collaborators. Over the years, several other prospective cohort studies—including the Health Aging and Body Composition Study, the Multi-Ethnic Study of Atherosclerosis Study, the Coronary Artery Risk Development in Young Adults Study, and the Jackson Heart Study13–16—have become regular participants. Collaborating nonmember studies either agree to the overall CHARGE principles or the CHARGE Working Group develops and negotiates a new CHARGE-compatible agreement with the nonmember studies or consortia.
In this setting, the CHARGE Analysis Committee emerged to provide consortium-wide recommendations about analysis methods, plans, and problems. Many investigators wanted to use the traditional two-stage approach of discovery and replication, but because all the cohorts had genotype data on all participants, there was no cost savings from ignoring some of the data from the “replication” cohorts. The Analysis Committee recommended meta-analysis of study-specific results from each participating cohort as the most powerful approach to identify genuine associations,17 not only avoiding the difficulties of individual-level data sharing on an international scale but also maintaining nearly equivalent power to a cohort-adjusted analysis of individual-level data.18 The use of meta-analysis rather than a two-stage discovery-replication design proved to be a powerful social force in promoting collaboration within each Working Group: this approach avoided intercohort competition for claims of “discovery,” with the result that all cohorts participated more or less equally in joint meta-analyses. The Analysis Committee has solved a wide variety of analytic problems encountered by particular Working Groups and rapidly disseminated its recommendations and methods to the many Working Groups through both direct participation in the Working Groups and web-based resources that are available to all CHARGE investigators.19
The Working Group model emerged in early CHS, whereby several senior investigators invited junior investigators from CHS and non-CHS institutions to participate in epidemiologic research related to renal impairment and cardiovascular disease.20 From the outset, CHARGE adopted both the CHS Working Group model and a commitment to young investigators. The primary organizing structure of each Working Group is the clinical-research phenotype area rather than any specific research study, department, school, or academic institution. The Working Group model has helped to establish multi-institutional, multistudy, international, and interdisciplinary teams populated by both senior scientists and eager well-trained young investigators. Some of these Working Groups have served as the setting for additional grant applications and even for K-award applications from young investigators.
Since its inception in 2008, CHARGE investigators have published more than 150 articles: about 85 are main or primary GWAS articles published or in press. Many CHARGE publications have appeared in high-impact journals, including 27 in Nature Genetics, 5 in Nature, 2 in JAMA, 2 in Lancet, and 1 in the New England Journal of Medicine. CHARGE is associated with many other minor contributions, commentaries, and methods work. Some commentators have expressed the concern that “big science” may concentrate resources in a few hands and limit the control and credit available to individual investigators, particularly junior investigators.21 In accordance with the CHARGE principles (http://web.chargeconsortium.com), investigators at all sites have made special efforts to provide opportunities for students, fellows, and junior investigators to lead manuscripts. These young investigators have served as the first-first authors of a large proportion of CHARGE publications. Often, they have also served as mentors to the mentors, particularly in the use of new bioinformatics tools, recent network algorithms, and the visual display of complex data.
Recent directions include the merging of some CHARGE Working Groups with other consortia to publish larger joint meta-analyses,22 new genotyping efforts such as whole-exome sequencing and whole-genome sequencing in samples, and the use of genotyping arrays that assay rare variants inexpensively in multiple cohorts. The analysis of rare variants not only requires large sample sizes to ensure sufficient power to detect associations but also poses a number of new analytic and bioinformatics challenges.23 Moreover, some investigators have carried their collaborations over to nongenetic areas and included replication in traditional epidemiologic studies that might previously have reported the results from a single study or site.24
GWAS consortia such as CHARGE are the result of the confluence of several recent forces—not only advances in genotyping technology and computing power over the past two decades but also the decision by the US National Institutes of Health to invest resources in GWAS and sequencing studies. Single-study analyses of GWAS data could not have advanced the field: the efficient and effective use of grant funds required not just a large effort and commitment by many researchers worldwide but also the development of novel collaborative research structures. For all its success, CHARGE remains a fragile entity, dependent on continuing funding by grants. This complex framework for collaborative work could potentially serve as future model or infrastructure for open science, although this approach would require organizational and infrastructural support.
1. Fried LP, Borhani NO, Enright P, et al. The Cardiovascular Health Study: design and rationale. Ann Epidemiol. 1991;1:263–276
2. Hindorff LA, Sethupathy P, Junkins HA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106:9362–9367
3. Psaty BM, O’Donnell CJ, Gudnason V, et al.CHARGE Consortium. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ Cardiovasc Genet. 2009;2:73–80
4. Harris TB, Launer LJ, Eiriksdottir G, et al. Age, Gene/Environment Susceptibility-Reykjavik Study: multidisciplinary applied phenomics. Am J Epidemiol. 2007;165:1076–1087
5. The ARIC Investigators. . The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. Am J Epidemiol. 1989;129:687–702
6. Dawber TR, Meadors GF, Moore FE Jr. Epidemiological approaches to heart disease: the Framingham Study. Am J Public Health Nations Health. 1951;41:279–281
7. Feinleib M, Kannel WB, Garrison RJ, McNamara PM, Castelli WP. The Framingham Offspring Study. Design and preliminary data. Prev Med. 1975;4:518–525
8. Splansky GL, Corey D, Yang Q, et al. The Third Generation Cohort of the National Heart, Lung, and Blood Institute’s Framingham Heart Study: design, recruitment, and initial examination. Am J Epidemiol. 2007;165:1328–1335
9. Hofman A, Breteler MM, van Duijn CM, et al. The Rotterdam Study: objectives and design update. Eur J Epidemiol. 2007;22:819–829
10. Howard G, Manolio TA, Burke GL, Wolfson SK, O’Leary DH. Does the association of risk factors and atherosclerosis change with age? An analysis of the combined ARIC and CHS cohorts. The Atherosclerosis Risk in Communities (ARIC) and Cardiovascular Health Study (CHS) investigators. Stroke. 1997;28:1693–1701
11. Folsom AR, Cushman M, Tsai MY, et al. A prospective study of venous thromboembolism in relation to factor V Leiden and related factors. Blood. 2002;99:2720–2725
12. Hunter DJ. Lessons from genome-wide association studies for epidemiology. Epidemiology. 2012;23:363–367
13. Cesari M, Penninx BW, Newman AB, et al. Inflammatory markers and onset of cardiovascular events: results from the Health ABC study. Circulation. 2003;108:2317–2322
14. Friedman GD, Cutter GR, Donahue RP, et al. CARDIA: study design, recruitment, and some characteristics of the examined subjects. J Clin Epidemiol. 1988;41:1105–1116
15. Bild DE, Bluemke DA, Burke GL, et al. Multi-ethnic study of atherosclerosis: objectives and design. Am J Epidemiol. 2002;156:871–881
16. Taylor HA Jr, Wilson JG, Jones DW, et al. Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study. Ethn Dis. 2005;15(suppl 6):S6–4–17
17. Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet. 2006;38:209–213
18. Lin DY, Zeng D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet Epidemiol. 2010;34:60–66
19. Voorman A, Lumley T, McKnight B, Rice K. Behavior of QQ-plots and genomic control in studies of gene-environment interaction. PLoS One. 2011;6:e19416
20. Shlipak MG, Fried LF, Stehman-Breen C, Siscovick D, Newman AB. Chronic renal insufficiency and cardiovascular events in the elderly: findings from the Cardiovascular Health Study. Am J Geriatr Cardiol. 2004;13:81–90
21. Ness RB. “Big” science and the little guy. Epidemiology. 2007;18:9–12
22. Ehret GB, Munroe PB, Rice KM, et al.for The International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature. 2011;478:103–109
23. Basu S, Pan W. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol. 2011;35:606–619
24. Schnabel RB, Aspelund T, Li G, et al. Validation of an atrial fibrillation risk algorithm in whites and African Americans. Arch Intern Med. 2010;170:1909–1917