Share this article on:

Primary Data Collection: What Should Well-Trained Epidemiology Doctoral Students Be Able To Do?

Buring, Julie E.

doi: 10.1097/EDE.0b013e318162a947
The Changing Face of Epidemiology

The increasing size and complexity of epidemiologic studies is leading more and more doctoral students in epidemiology to base their thesis work on existing data. While the analysis of existing data provides useful experience in complex analyses, it gives trainees little or no hands-on experience in the actual design and conduct of an epidemiologic study. As these students pursue their careers, most will eventually want to collect original data. I discuss what hands-on experience a well-trained doctoral-level epidemiology student should receive, and argue that we short-change our students if we do not provide them with the opportunity for primary data collection during their doctoral training.

From the Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.

Editors' note: This series addresses topics that affect epidemiologists across a range of specialties. Commentaries are first invited as talks at symposia organized by the Editors. This paper was originally presented at the 2007 Society for Epidemiologic Research Annual Meeting.

Submitted 24 September 2007; accepted 8 October 2007.

Editors' note: Related articles appear on pages 345, 350, and 353.

Correspondence: Julie E. Buring, Division of Preventive Medicine, Brigham and Women's Hospital, 900 Commonwealth Avenue East, Boston, Massachusetts 02215. E-mail:

No one would disagree that the practice of epidemiology has experienced an incredible period of change over the last 25 years. There has been a shift in the content and focus of epidemiologic research towards large-scale, multisite studies that are more complex in their design, conduct, and analysis, and that include biomarker or genetic components. Moreover, these changes have occurred in a funding environment that is seriously constrained and likely to continue to be so.

These changes in our research environment have changed the nature of the student research experience, especially with regard to doctoral theses. Just 25 years ago, the typical doctoral student had responsibility (under careful supervision of mentors) for every aspect of their thesis from beginning to end—from writing the grant; to obtaining funding for the project; to designing, conducting, analyzing, and interpreting the study; to preparing the manuscript and presenting the results. This range of learning activities was ideal from the standpoint of providing the doctoral student with the broadest range of practical experiences. But as funding became harder to obtain, it became infeasible for students to perform all these functions within a reasonable time period. It became more practical for students to do their dissertation on a recently funded project of their mentors—but still taking responsibility for design, conduct, analysis, and interpretation of the work.

Unfortunately, as funding has become ever more scarce and epidemiologic studies more complex, the scope of a doctoral thesis has continued to shrink. Now the most common type of thesis in many programs is an in-depth analysis of already-collected data. As a result, there are graduates of doctoral programs who are superbly trained in advanced data analysis but who lack a full understanding of the complexities of the parent study they are analyzing, let alone an understanding of what it takes to collect data “from scratch.”

Back to Top | Article Outline

Consequences for the Education of Doctoral Students

Is this a problem? Some may argue that it is not. In this period of limited funding and expanding access to existing data, some epidemiologists may go their whole career without ever launching and directing their own study. The skills of primary data collection may not seem relevant. The wealth of high-quality data available from existing datasets and ongoing large-scale studies can provide years of fruitful analysis of important hypotheses. Such secondary analyses using well-documented and carefully-developed datasets are indeed a very cost-effective way of addressing epidemiologic questions. Furthermore, secondary analyses are more likely to be funded than new studies that require extensive primary data collection on large samples.

Even so, there are many of us—not just senior faculty, but also newly minted graduates in the field—who would argue passionately that this narrowing in the focus of epidemiology doctoral theses leaves our students unprepared to be independent investigators. Why do we believe this so strongly? First, even if a student's thesis involves only secondary analyses, the student cannot give the most valid interpretation of the results without knowing the strengths and limitations of the parent dataset, which in turn requires an understanding of the complexities of the primary data collection. This is all the more important for secondary data analyses, which typically include respondents' self-reported outcomes or conditions that were not the focus of the main study. To grasp the data's limitations, one must know how the sample was obtained, how the data were collected, how the information was validated, how the endpoints were defined, and how biologic specimens were handled. Knowing what can go wrong in the collection and assembly of data does not come from a book—it can come only from first-hand experience.

A second reason for students to have experience in data collection is that, sooner or later, they are likely to want to collect data of their own. For most epidemiologists, there comes a time when the available data are not enough. New data may be needed to evaluate mechanistic questions, or examine gene-environment interactions, or explore new hypotheses. New data may come by adding an ancillary study onto an existing study, or by initiating a new study. Regardless, the investigator will be required to make countless practical decisions that will affect the completeness and validity of the data. Investigators with no previous experience will face serious obstacles in obtaining the data they need.

The Women's Health Study provides a good illustration of how new data collection can be built into established studies. The Women's Health Study was a randomized trial that evaluated the role of low-dose aspirin and vitamin E in the prevention of cardiovascular disease and cancer. There were 39,876 healthy women over the age of 45 randomized to the various arms of the study.1 More than 28,000 of the women also provided a prerandomization blood sample. Although the active treatment phase of the trial has been completed, there is ongoing observational follow-up. Yearly questionnaires collect information on a large number of demographic, medical history, and risk factor variables. Average duration of follow-up is now approximately 13 years. DNA has been extracted for all participants who provided blood, and a large number of biomarkers have been analyzed. Medical records have been obtained for all reported cardiovascular and cancer endpoints, and the events confirmed by an endpoints committee.

The major findings of the clinical trial have been published.2–4 However, the Women's Health Study provides an opportunity to test additional hypotheses regarding the effect of the randomized treatment or other factors on the primary study endpoints or other outcomes. One ancillary study, for example, has evaluated the roles of vitamin E and aspirin in the prevention or delay of cognitive decline in healthy older women.5,6 This required supplementary efforts—the administration of 3 telephone cognitive assessments over 5 years to a subset of participants; an assessment of the sensitivity of this cognitive testing; assurance of adequate compliance with testing and follow-up; and the conduct of specialized statistical analyses. The Women's Health Study, with its high compliance and follow-up, offered a unique and highly cost-effective opportunity for this additional project. Still, the principal investigator of the ancillary study bore complete responsibility for writing and submitting the grant application and carrying out the study.

There have been more than 50 ancillary studies within the Women's Health Study, with outcomes as varied as migraine, Parkinson disease, venous thrombosis, hypertension, diabetes, cognitive decline, cataracts, age-related macular degeneration, rheumatoid arthritis, and bone fractures. These ancillary studies have tremendous advantages over secondary-data analyses. While both approaches are cost-effective and more fundable than assembling a brand-new study population, ancillary studies have the opportunity to be far more tailored to an investigator's research question. However, the investigator must have the skills needed to carry this out. Study investigators may supervise a staff member or research fellow who does the actual work—but even this requires that the investigator knows the steps in doing the work correctly.

It is a rare epidemiologist whose entire career is limited to analyzing existing data. For an epidemiologist not to be able to design and conduct a study, and not to know how to collect data, will—in my opinion—limit a young epidemiologist's ability to be competitive for faculty positions and to receive promotions. Many would consider an epidemiologist who is unable to do the basic field-work of epidemiology as unqualified to be an independent investigator.

Do doctoral students agree with this assessment? As part of a review of methodology courses at the Harvard School of Public Health, the School recently sent an e-mail survey to all graduates of the doctoral and 2-year masters programs for the last 10 years. This survey solicited comments regarding all aspects of methodologic training in epidemiology, especially any gaps that pertained to their postgraduate positions. While the question of the importance of primary data collection was not explicitly asked, a surprising number of students spontaneously added comments about this. Specifically, graduates noted that training in the practical aspects of design, conduct, and management of epidemiologic studies would have been valuable, as most of them already have been or expect to be charged with various aspects of these responsibilities. They noted that learning how to run a study turns out to be much more useful over time than theory alone.

The question remains as to how primary data collection can be made a part of our doctoral students' education. Many schools are grappling with this question, and a number have reinforced their commitment to providing students with hands-on experience. How can this be done? Ideally, it would be part of the thesis work, but this is not always feasible. Experience with primary data collection can be obtained in other ways, perhaps as part of another study. The important thing is that it be required, and that it be valued for the indispensable experience it provides.

The bottom line is that the competent conduct of field studies is a basic tool of the trade. To be well-trained, doctoral students in epidemiology need exposure to the business of conducting a study, from IRBs to HIPAA, from preparation of budgets to responding to critiques of grant applications. Students need to know what it takes to be a part of an interdisciplinary team in which researchers from other disciplines turn to them for help with all epidemiologic aspects of the study. The more that such experiences can be made available during the doctoral students' educations, the better our training programs will serve our graduates and the field of epidemiology.

Back to Top | Article Outline


JULIE E. BURING is Professor of Medicine at Harvard Medical School and Brigham and Women's Hospital, and Professor of Epidemiology at the Harvard School of Public Health. Her research focuses on the epidemiology of cardiovascular disease and cancer, including large-scale clinical trials on prevention. She is engaged in teaching and training students and fellows in epidemiology, and is director of an NIH training grant in the epidemiology of aging.

Back to Top | Article Outline


1. Rexrode KM, Lee IM, Cook NR, et al. Baseline characteristics of participants in the Women's Health Study. J Womens Health Gend Based Med. 2000;9:19–27.
2. Ridker PM, Cook NR, Lee IM, et al. A randomized trial of low-dose aspirin in the primary prevention of cardiovascular disease in women. N Engl J Med. 2005;352:1293–1304.
3. Cook NR, Lee IM, Gaziano JM, et al. Low-dose aspirin in the primary prevention of cancer. The Women's Health Study: a randomized controlled trial. JAMA. 2005;294:47–55.
4. Lee IM, Cook NR, Gaziano JM, et al. Vitamin E in the primary prevention of cardiovascular disease and cancer. The Women's Health Study: a randomized controlled trial. JAMA. 2005;294:56–65.
5. Kang JH, Cook N, Manson J, et al. A randomized trial of vitamin E supplementation and cognitive function of women. Arch Intern Med. 2006;166:2462–2468.
6. Kang JH, Cook N, Manson J, et al. Low-dose aspirin and cognitive function in the Women's Health Study cognitive cohort. BMJ. 2007;334:987.
© 2008 Lippincott Williams & Wilkins, Inc.