Share this article on:

Control Recruitment in Population-Based Case-Control Studies

Bernstein, Leslie

doi: 10.1097/01.ede.0000209440.94875.42

From the Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA.

Leslie Bernstein is a professor of preventive medicine at the Keck School of Medicine, University of Southern California, where she holds the AFLAC Inc. Endowed Chair in Cancer Research. She is director of the Women's Cancers Program at the Norris Comprehensive Cancer Center and Scientific Director for the Los Angeles County Cancer Surveillance Program, one of the SEER registries. Dr. Bernstein is best known for her research on physical activity and cancer.

Address correspondence to: Leslie Bernstein, Norris Comprehensive Cancer Center, University of Southern California,1441 Eastlake Ave., Room 4449, Los Angeles, CA 90033. E-mail:

Hartge1 emphasizes the issues related to response in population-based epidemiologic studies. She notes that epidemiologists do not always document response rates in their publications and that, when available, these rates have declined, particularly among potential control subjects being recruited into case–control studies. In fact, a critical and challenging aspect of population-based case–control study designs is how to develop a scientifically sound and cost-effective method for identifying and enrolling control subjects.

Care must be taken to define the study's population base when developing the approach for control identification. Ideally, this base is the population that spawned the case subjects. Should any member of the control population be diagnosed with the disease under study, it must be guaranteed that the individual would be identified as a member of the case group. The defined population base will depend on the source of case subjects: medical practices or hospitals, health plans, or population-based registries. It is key to ensure that selection of potential control subjects not depend on their exposure status to minimize bias. As Hartge1 points out in her essay, this is more easily said than done, because case–control studies will assess numerous exposures, and having minimal bias in one exposure does not ensure lack of bias in another. Ideally, our control population will represent the exposure distribution in the underlying population giving rise to the cases, thus minimizing selection bias for all exposures, and (ideally) the recruitment of controls will be efficient in terms of time and cost.

Historically, sources of population-based controls for case–control studies have included hospitals and clinics, random-digit dialing (RDD) or telephone directories, electoral rolls and other “population” registers, neighborhood or area surveys, Medicare beneficiary files, and department of motor vehicle (DMV) lists of registered or licensed drivers. Patients from hospitals and clinics may not represent the case base population if referral patterns vary by disease or condition. Furthermore, when using hospital- or clinic-based control subjects, we must guard against the potential for unrecognized shared causes for the case-defining disease and the diseases or conditions of patients selected for recruitment as control subjects.

RDD has been a favored method for identifying population-based control subjects. However, problems with RDD have increased substantially with advancing technology, and response rates have declined, in part as a result of increasing use of cellular telephones, telephone answering machines and caller ID, and multiple telephone numbers for a given household. Two recent multicenter case–control studies used RDD for control identification and recruitment: the Women's Contraceptive and Reproductive Experiences Study,2 conducted from 1994–1999, and a study of non-Hodgkin lymphoma,3 based on the Surveillance, Epidemiology and End Results data, conducted from 1999–2002. The control participation rates were 64% and 44%, respectively. RDD response rates depend on the proportion of residential telephone numbers successfully screened and the cooperation rate among those selected from the roster of identified eligible individuals. Studies of RDD have shown that RDD tended to underenumerate women, older persons, and lower socioeconomic status groups.4,5 There is an additional problem with RDD for geographically defined base populations. In the past, area codes and the first few digits of the phone number were used to represent a geographic area, but this is no longer necessarily the case.

Telephone directories suffer from problems similar to those presented for RDD and, in addition, their use precludes inclusion of persons (cases or controls) who have unlisted telephone numbers. The availability of other types of directories (such as electoral rolls and DMV listings) varies by state and locale. Although such directories may provide a list of potential control subjects, they do not always cover the entire population and they may be out of date. These listings do not generally record race or ethnicity. DMV files may not completely cover the older population, and in some states, individuals can choose not to be included in listings.

In the past, lists of Medicare beneficiaries provided an excellent resource for recruiting older control subjects (≥65 years). However, the rules for accessing Medicare files have changed.6 Currently, access is indirect, with investigators having to indicate criteria that define individuals of interest (eg, age, race, gender, location) and then providing recruitment materials to a mailing house that sends materials to a selected group of beneficiaries. Only persons who respond to the mailing by contacting the investigator can be recruited into the study. Thus, the investigator has little control over the selection of the sampling frame and little ability to “sell” the research study to potential control subjects.

Area survey methods are another approach for control identification.5 In one comparison of RDD with area survey methods, the household-screening success rate was higher for area sampling than for RDD, and households successfully screened by RDD were less likely to enumerate residents eligible for consideration as controls.5

We have used a neighborhood survey method for control identification for individually matched case–control studies. To implement this approach, we use an algorithm that identifies a residence with a specific geographic relationship to the home of the patient. This is the starting point in a walk through the neighborhood to identify the first person who matches a case patient on several criteria (generally, birth year within a fixed number of years, race, and sex).7,8 We follow a specific preplanned walk pattern that excludes the several blocks surrounding the home of the case patient and then spirals out through the neighborhood. Each household is contacted in sequence and a census is determined so that the first person who lives in the sequence of homes contacted and who fulfills the matching criteria is recruited. If that individual refuses, the next eligible person in the walk pattern is recruited. This is labor-intensive and requires “clean up” efforts to obtain censuses in residences where no one is present at the time of the neighborhood survey. When access is denied to security-locked areas with multiple housing units, more extensive efforts are needed (contact by mail and phone). We continue to use this method of control identification but acknowledge that it is difficult to define an accurate response rate.

One alternative approach is the use of commercially available, comprehensive population directories or databases that provide information on household residents' demographic characteristics. It is not clear how complete and up-to-date these are relative to other sources, and their use is only recently being explored by epidemiologists. These resources generally are compiled from information available through utility company files, telephone directories, postal service change of address records, public records such as tax records, and survey data. These resources are frequently purchased for use by telemarketers, and access costs vary.

Once potential controls are identified, it is necessary to contact and recruit them into the study. The success of effective recruitment depends in large part on the willingness of the potential controls to participate in medical research studies, which is a function of their attitudes toward research, prior experiences as participants, demographics, and the disease under study.9 Rogers et al10 have documented that the effort required (eg, number of contacts) to recruit controls into their studies increased substantially between 1991 and 2003.

Some respondents will agree to participate immediately, whereas others delay their decision. What is important is whether exposure misclassification is greater among participants who are difficult to recruit. Several studies have shown differences in exposure prevalence of late versus early responders (summarized by Stang and Jöckel11). In a simulation study, Stang and Jöckel11 have shown that studies with low response rates are less biased than those that maximize recruitment by increasing the participation rates of late responders if nondifferential misclassification increases with time to recruitment into the study. Voigt et al12 studied this pattern in a large breast cancer case–control study in which a validation substudy of prescription drug use was conducted among participants enrolled in a health maintenance organization. Control subjects who were late responders (participating 2 or more months after first contact) were more likely to have been exposed to antihypertensive medications than were early responders. However, little exposure misclassification occurred among controls. This study highlights the importance of pursing the more difficult-to-recruit study subjects.

A number of strategies can be used to improve the response rates among control participants. These include use of the media to advertise the study; having attractive, easy-to-read, yet informative, study brochures; and providing small incentives for participation (either money or cards to purchase merchandise).

The population-based case–control study will continue to be an important approach to defining associations between risk factors and disease outcomes. Thus, it is critical that we begin to explore innovative means for identifying potential control subjects and encouraging the participation of those we want to recruit.

Back to Top | Article Outline


1.Hartge P. Participation in population studies. Epidemiology. 2006;17:xx–xx.
2.Marchbanks PA, McDonald JA, Wilson HG, et al. Oral contraceptives and breast cancer risk: findings from the NICHD Women's Contraceptive and Reproductive Experiences Study. N Engl J Med. 2002;346:2025–2032.
3.Chatterjee N, Hartge P, Cerhan JR, et al. Risk of non-Hodgkin's lymphoma and family history of lymphatic, hematologic and other cancers. Cancer Epidemiol Biomarkers Prev. 2004;13:1415–1421.
4.Glaser SL, Clarke CA, Keegan TH, et al. Attenuation of social class and reproductive risk factor associations for Hodgkin lymphoma due to selection bias in controls. Cancer Cause Control. 2004;15:731–739.
5.Brogan DJ, Denniston MM, Liff JM, et al. Comparison of telephone sampling and area sampling: response rates and within-household coverage. Am J Epidemiol. 2001;153:1119–1127.
6.Research Data Assistance Center (ResDAC). What's new: CMS to review data requests for the Medicare Name & Address File, July 2004. Available at: Accessed October 29, 2005.
7.Bernstein L, Henderson BE, Hanisch R, et al. Physical exercise activity reduces the risk of breast cancer in young women. J Natl Cancer Inst. 1994;86:1403–1408.
8.Wu AH, Wan P, Bernstein L. A multiethnic population-based study of smoking, alcohol and body size and risks of adenocarcinomas of the stomach and esophagus (United States). Cancer Cause Control. 2001;12:721–732.
9.Trauth JM, Musa D, Siminoff L, et al. Public attitudes regarding willingness to participate in medical research studies. J Health Soc Policy. 2000;12:23–43.
10.Rogers A, Murtaugh MA, Edwards S, et al. Contacting controls: Are we working harder for similar response rates and does it make a difference? Am J Epidemiol. 2004;160:85–90.
11.Stang A, Jöckel KH. Studies with low response proportions may be less biased than studies with high response proportions. Am J Epidemiol. 2004;159:204–210.
12.Voigt LF, Boudreau DM, Weiss NS, et al. Re: Studies with low response proportions may be less biased than studies with high response proportions [Letter]. Am J Epidemiol. 2005;161:401–402.
© 2006 Lippincott Williams & Wilkins, Inc.