Institutional members access full text with Ovid®

Share this article on:

Accounting for Selection Bias in Association Studies with Complex Survey Data

Wirth, Kathleen E.a; Tchetgen Tchetgen, Eric J.a,b

doi: 10.1097/EDE.0000000000000037

Obtaining representative information from hidden and hard-to-reach populations is fundamental to describe the epidemiology of many sexually transmitted diseases, including HIV. Unfortunately, simple random sampling is impractical in these settings, as no registry of names exists from which to sample the population at random. However, complex sampling designs can be used, as members of these populations tend to congregate at known locations, which can be enumerated and sampled at random. For example, female sex workers may be found at brothels and street corners, whereas injection drug users often come together at shooting galleries. Despite the logistical appeal, complex sampling schemes lead to unequal probabilities of selection, and failure to account for this differential selection can result in biased estimates of population averages and relative risks. However, standard techniques to account for selection can lead to substantial losses in efficiency. Consequently, researchers implement a variety of strategies in an effort to balance validity and efficiency. Some researchers fully or partially account for the survey design, whereas others do nothing and treat the sample as a realization of the population of interest. We use directed acyclic graphs to show how certain survey sampling designs, combined with subject-matter considerations unique to individual exposure-outcome associations, can induce selection bias. Finally, we present a novel yet simple maximum likelihood approach for analyzing complex survey data; this approach optimizes statistical efficiency at no cost to validity. We use simulated data to illustrate this method and compare it with other analytic techniques.

Supplemental Digital Content is available in the text.

From the aDepartment of Epidemiology, Harvard School of Public Health, Boston, MA; and bDepartment of Biostatistics, Harvard School of Public Health, Boston, MA.

The authors report no conflicts of interest.

This project was supported by the National Institutes of General Medical Sciences (U54 GM088558), Environmental Health Sciences (R21 ES019712), Allergy and Infectious Diseases (R37 AI51164), and Heart, Lung, and Blood (R01 HL080644).

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article ( This content is not peer-reviewed or copy-edited; it is the sole responsibility of the author.

Correspondence: Kathleen E. Wirth, Department of Epidemiology, Harvard School of Public Health, 677 Huntington Avenue, Suite 501, Boston, MA 02115. E-mail:

© 2014 by Lippincott Williams & Wilkins, Inc