Institutional members access full text with Ovid®

Share this article on:

On the Analysis of Case–Control Studies in Cluster-correlated Data Settings

Haneuse, Sebastien; Rivera-Rodriguez, Claudia

doi: 10.1097/EDE.0000000000000763

In resource-limited settings, long-term evaluation of national antiretroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case–control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this article, valid estimation and inference requires acknowledging any clustering, although, to our knowledge, no statistical methods have been published for the analysis of case–control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case–control sampling across all clinics, case–control sampling within clinics has been suggested as a more practical strategy. To our knowledge, although similar outcome-dependent sampling schemes have been described in the literature, a case–control design specific to correlated data settings is new. In this article, we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case–control studies in cluster-correlated settings based on inverse probability–weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N = 78,155 program registrants in Malawi between 2005 and 2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case–control sampling or when case–control sampling is performed within clusters.

From the Harvard T.H. Chan School of Public Health, Boston, MA.

Editor’s Note: A Commentary on this article appears on p.76.

Submitted July 20, 2016; accepted September 27, 2017.

Supported, in part, by Harvard University Center for AIDS Research Feasibility Project grant P03 A106054 and National Institutes of Health grant 5DP1 ES025459. All code for the simulations is available from the first author on request.

The authors report no conflicts of interest.

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (

Correspondence: Sebastien Haneuse, Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Ave, Building II, Boston, MA 02115. E-mail:

Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.