Institutional members access full text with Ovid®

Share this article on:

Adjustment for Missing Data in Complex Surveys Using Doubly Robust Estimation: Application to Commercial Sexual Contact Among Indian Men

Wirth, Kathleen E.a; Tchetgen Tchetgen, Eric J.a,b; Murray, Megana,c,d

doi: 10.1097/EDE.0b013e3181f57571
Methods: Original Article

Background: The Demographic and Health Survey program routinely collects nationally representative information on HIV-related risk behaviors in many countries, using face-to-face interviews and a complex sampling scheme. If respondents skip questions about behaviors perceived as socially undesirable, such interviews may introduce bias. We sought to implement a doubly robust estimator to correct for dependent missing data in this context.

Methods: We applied 3 methods of adjustment for nonresponse on self-reported commercial sexual contact data from the 2005–2006 India Demographic Health Survey to estimate the prevalence of sexual contact between sexually active men and female sex workers. These methods were inverse-probability weighted regression, outcome regression, and doubly robust estimation—a recently-described approach that is more robust to model misspecification.

Results: Compared with an unadjusted prevalence of 0.9% for commercial sexual contact prevalence (95% confidence interval = 0.8%–1.0%), adjustment for nonresponse using doubly robust estimation yielded a prevalence of 1.1% (1.0%–1.2%). We found similar estimates with adjustment by outcome regression and inverse-probability weighting. Marital status was strongly associated with item nonresponse, and correction for nonresponse led to a nearly 80% increase in the prevalence of commercial sexual contact among unmarried men (from 6.9% to 12.1%–12.4%).

Conclusions: Failure to correct for nonresponse produced a bias in self-reported commercial sexual contact. To facilitate the application of these methods (including the doubly robust estimator) to complex survey data settings, we provide analytical variance estimators and the corresponding SAS and MATLAB code. These variance estimators remain valid regardless of whether the modeling assumptions are correct.


From the Departments of aEpidemiology and bBiostatistics, Harvard School of Public Health, Boston, MA; cDivision of Global Health Equity, Brigham and Women's Hospital, Boston, MA; dInfectious Disease Unit, Massachusetts General Hospital, Boston, MA.

Submitted 12 February 2010; accepted 30 June 2010.

Supported by the US National Institutes of Health (AI 007433).

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (

Correspondence: Kathleen E. Wirth, Department of Epidemiology, Harvard School of Public Health, 641 Huntington Ave, Boston, MA 02115. E-mail:

© 2010 Lippincott Williams & Wilkins, Inc.