Institutional members access full text with Ovid®

Share this article on:

On the analysis of case–control studies in cluster-correlated data settings

Haneuse, Sebastien; Rivera, Claudia
doi: 10.1097/EDE.0000000000000763
Original Article: PDF Only

In resource-limited settings, long-term evaluation of national anti-retroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy-makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case–control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this paper, valid estimation and inference requires acknowledging any clustering although, to our knowledge, no statistical methods have been published for the analysis of case–control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case–control sampling across all clinics, case–control sampling within clinics has been suggested as a more practical strategy. To our knowledge, while similar outcome-dependent sampling schemes have been described in the literature, a case–control design specific to correlated data settings is new. In this paper we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case–control studies in cluster-correlated settings based on inverse probability-weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N=78,155 program registrants in Malawi between 2005-2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case–control sampling or when case–control sampling is performed within clusters.

This project was funded, in part, by Harvard University Center for AIDS Research Feasibility Project grant P03 A106054 and National Institutes of Health grant 5DP1 ES025459. All code for the simulations is available from the first author upon request.

The authors declare that they have no conflicts of interest.

Editor’s Note: A Commentary on this article appears on p. xxx.

Corresponding author: Sebastien Haneuse, PhD, Department of Biostatistics, 655 Huntington Ave, Building II, Boston, MA 02115, Phone: 617.432.3980, Fax: 617.432.5619, Email:

Copyright © 2017 Wolters Kluwer Health, Inc. All rights reserved.