Share this article on:

Outcome-related, auxiliary variable sampling designs for longitudinal binary data

Schildcrout, Jonathan S.; Schisterman, Enrique F.; Aldrich, Melinda C.; Rathouz, Paul J.
doi: 10.1097/EDE.0000000000000765
Original Article: PDF Only


Epidemiologists have long used case–control and related study designs to enhance variability of response and information available to estimate exposure–disease associations. Less has been done for longitudinal data.


We discuss an epidemiological study design and analysis approach for longitudinal binary response data. We seek to gain statistical efficiency by over–sampling relatively informative subjects for inclusion into the sample. In this methodological demonstration, we develop this concept by sampling repeatedly from an existing cohort study to estimate the relationship of chronic obstructive pulmonary disease to past–year smoking in a panel of baseline smokers. To account for over–sampling, we describe a sequential offsetted regressions approach for valid inferences in this setting.


Targeted sampling can lead to increased statistical efficiency when combined with sequential offsetted regressions. Efficiency gains are degraded with increased prevalence of the disease response variable, with decreased association between the sampling variable and the response, and with other design and analysis parameters, providing guidance to those wishing to use these types of designs in the future.


These designs hold promise for efficient use of resources in longitudinal cohort studies.

This is an open access article distributed under the Creative Commons Attribution License 4.0 (CCBY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Acknowledgments: The authors wish to thank the supported effort of the faculty and staf mem- bers of the Johns Hopkins University Bayview Genetics Research Facility, NHLBI grant HL066583 (Garcia/Barnes, PI) and NHGRI grant HG004738 (Barnes/Hansel, PI). The Lung Health Study was supported by U.S. Government contract No. N01-HR-46002 from the Division of Lung Diseases of the National Heart, Lung and Blood Institute. Data were downloaded from the NCBI database of genotypes and phenotypes (accession number phs000335.v2.p2)

Sources of funding: This project was partially funded by the NIH grants R01 HL094786 and R01 HL072966 from the National Heart Lung and Blood Institute, the Long-Range Research Initiative of the American Chemistry Council, and the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health.

Conflict of interest statement: The authors have no conicts of interest. PJR is a Charter Member of a Data Safety Monitoring Board for Sunovian Pharmaceuticals, Inc., in Fort Lee, New Jersey. Sunovian is a pharmaceutical and drug development company.

Data and code availability / code: Code for conducting sequential o_setted regressions analysis is available from the _rst author's website ( and LHS are available at the National Center for Biotechnology Information database of genotypes and phenotypes (

Editor’s Note: A Commentary on this article appears on p. xxx.

Corresponding author address: Department of Biostatistics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 11000, Nashville, Tennessee 37203

Copyright © 2017 Wolters Kluwer Health, Inc. All rights reserved.