Share this article on:

Outcome-related, Auxiliary Variable Sampling Designs for Longitudinal Binary Data

Schildcrout, Jonathan S.a; Schisterman, Enrique F.b; Aldrich, Melinda C.c; Rathouz, Paul J.

doi: 10.1097/EDE.0000000000000765

Background: Epidemiologists have long used case–control and related study designs to enhance variability of response and information available to estimate exposure–disease associations. Less has been done for longitudinal data.

Methods: We discuss an epidemiological study design and analysis approach for longitudinal binary response data. We seek to gain statistical efficiency by oversampling relatively informative subjects for inclusion into the sample. In this methodological demonstration, we develop this concept by sampling repeatedly from an existing cohort study to estimate the relationship of chronic obstructive pulmonary disease to past-year smoking in a panel of baseline smokers. To account for oversampling, we describe a sequential offsetted regressions approach for valid inferences in this setting.

Results: Targeted sampling can lead to increased statistical efficiency when combined with sequential offsetted regressions. Efficiency gains are degraded with increased prevalence of the disease response variable, with decreased association between the sampling variable and the response, and with other design and analysis parameters, providing guidance to those wishing to use these types of designs in the future.

Conclusions: These designs hold promise for efficient use of resources in longitudinal cohort studies.

Supplemental Digital Content is available in the text.

From the aDepartment of Biostatistics, Vanderbilt University Medical Center, Nashville, TN; bEpidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD; cDepartment of Thoracic Surgery and Division of Epidemiology, Vanderbilt University Medical Center, Nashville, TN; and dDepartment of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI.

Editor’s Note: A Commentary on this article appears on p.76.

Submitted July 22, 2016; accepted September 27, 2017.

This project was partially funded by the NIH grants R01 HL094786 from the National Heart Lung and Blood Institute, the NIH grant K07 CA172294 from the National Cancer Institute, the Long-Range Research Initiative of the American Chemistry Council, and the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health.

Disclosure: The authors have no conflicts of interest. P.J.R. is a Charter Member of a Data Safety Monitoring Board for Sunovian Pharmaceuticals, Inc., in Fort Lee, New Jersey. Sunovian is a pharmaceutical and drug development company.

Data and code availability/code: Code for conducting sequential offsetted regressions analysis is available from the first author’s website (, and LHS are available at the National Center for Biotechnology Information database of genotypes and phenotypes (

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (

Correspondence: Jonathan S. Schildcrout, Department of Biostatistics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 11000, Nashville, TN 37203.

This is an open access article distributed under the Creative Commons Attribution License 4.0 (CCBY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.