Share this article on:

Extending the case-control design to longitudinal data: stratied sampling based on repeated binary outcomes

Schildcrout, Jonathan S.; Schisterman, Enrique F.; Mercaldo, Nathaniel D.; Rathouz, Paul J.; Heagerty, Patrick J.
doi: 10.1097/EDE.0000000000000764
Original Article: PDF Only

We detail study design options that generalize case–control sampling when longitudinal outcome data are already collected as part of a primary cohort study, but new exposure data must be retrospectively processed for a secondary analysis. Furthermore, we assume that cost will limit the size of the subsample that can be evaluated. We describe a novel class of stratified outcome–dependent sampling designs for longitudinal binary response data where distinct strata are created for subjects who never, sometimes, and always experienced the event of interest during longitudinal follow-up. Individual designs within this class are differentiated by the stratum-specific sampling probabilities. We show for parameters associated with time-varying exposures, subjects who experience the event/outcome at some but not at all of the follow-up times (i.e., those who exhibit response variation) are highly informative. If the time-varying exposure varies exclusively within individuals (i.e., intraclass correlation coefficient is 0), then sampling all subjects with response variability can yield highly precise parameter estimates even when compared to an analysis of the original cohort. The flexibility of the designs and analysis procedures also permits estimation of parameters that correspond to time-fixed covariates, and we show that with an imputation–based estimation procedure, baseline covariate associations can be estimated with very high precision irrespective of the design. We demonstrate features of the designs and analysis procedures via a plasmode simulation using data from the Lung Health Study.

This is an open access article distributed under the Creative Commons Attribution License 4.0 (CCBY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Acknowledgments: The authors wish to thank the supported e_ort of the faculty and sta_ members of the Johns Hopkins University Bayview Genetics Research Facility, NHLBI grant HL066583(Garcia/Barnes, PI) and NHGRI grant HG004738 (Barnes/Hansel, PI). The Lung Health Study was supported by U.S. Government contract No. N01-HR-46002 from the Division of Lung Diseases of the National Heart, Lung and Blood Institute. Data were downloaded from the NCBI database of genotypes and phenotypes (accession number phs000335.v2.p2)

Sources of funding: This project was partially funded by the NIH grants R01 HL094786 and R01 HL072966 from the National Heart Lung and Blood Institute, the Long-Range Research Initiative of the American Chemistry Council, and the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health.

Conflict of interest statement: The authors have no conicts of interest. PJR is a Charter Member of a Data Safety Monitoring Board for Sunovian Pharmaceuticals, Inc., in Fort Lee, New Jersey. Sunovian is a pharmaceutical and drug development company.

Data and code availability / code: Data for the analyses conducted here can be downloaded from the database for genotypes and phenotypes (dbGaP). The code for conducting analyses is available from the online electronic appendix; http://links.lww.com/EDE/B286 & http://links.lww.com/EDE/B287.

Editor’s Note: A Commentary on this article appears on p. xxx.

Corresponding author address: Department of Biostatistics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 11000, Nashville, Tennessee 37203

Copyright © 2017 Wolters Kluwer Health, Inc. All rights reserved.