Skip Navigation LinksHome > March 2014 - Volume 25 - Issue 2 > Reuse of Controls in Nested Case-Control Studies
doi: 10.1097/EDE.0000000000000057

Reuse of Controls in Nested Case-Control Studies

Støer, Nathalie C.; Meyer, Haakon E.; Samuelsen, Sven Ove

Free Access
Article Outline
Collapse Box

Author Information

Department of Mathematics, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway,

Department for Chronic Diseases, Division of Epidemiology, Norwegian Institute of Public Health, Oslo, Norway

Department of Mathematics, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway

The serum analysis of vitamins was supported by a grant from the Norwegian Cancer Society and the Throne Holst Foundation. The Norwegian Cancer Society provided salary as a PhD grant for N.C.S.

Back to Top | Article Outline

To the Editor:

A nested case-control design with risk set sampling1,2 matches controls to cases on time and often on additional factors. This matching has been thought to make the controls unusable for other endpoints. However, recent methods3–5 enable the reuse of controls, thereby improving efficiency. We demonstrate this in an example assuming proportional hazards models for time-to-event and estimating hazard ratios (HRs).

We consider inverse probability–weighted Cox regression models weighted by 1.0 divided by the probability of ever being sampled as control. Subjects can be sampled at each event time they are at risk and meet the matching criteria, thus typically at several occasions. We estimate these probabilities using two methods. For the Kaplan-Meier method,3,4 note that the probability of ever being sampled is 1.0 minus the probability of never being sampled, and the latter probability is the product of probabilities of not being sampled at each possible event time. This leads to the formula

Image Tools

which is similar to the Kaplan-Meier estimator for the sampling probabilities . The and are numbers of possible and sampled controls for the case at time , respectively. The is an indicator function that is either 0 or 1. For the second method,5,6 referred to as generalized linear model weights, we consider indicators of ever being sampled as controls among all noncases. The sampling probabilities are estimated using logistic regression with entry time , censoring time , and matching variables as covariates,

Image Tools

See the eAppendix ( for more details.

We applied inverse probability weighting in a study of serum 25-hydroxyvitamin D (s-25(OH)D) and prostate cancer7 to evaluate this method. The cohort consisted of participants in health surveys in Norway, comprising 116,493 men. Among those, 2,118 were diagnosed with prostate cancer during follow-up. For each incident case, one control was sampled from the case’s risk set, matched on age at serum sampling ±6 months, date of serum sampling ±2 months, and county of residence. Meyer et al7 focused on the association between s-25(OH)D and incidence of prostate cancer. Due to the increased practice of screening for this cancer, death from prostate cancer might be a better endpoint when considering the most serious cases. Among the incident cases, 367 men died from prostate cancer. Traditional analysis of nested case-control data can use only the controls for incident cases who also died from prostate cancer, whereas all sampled controls can be used with inverse probability–weighted analysis. Robust variance estimation, possibly slightly conservative,6 was chosen for the present analyses.

The Table displays results from traditional analyses and inverse probability–weighted analyses with Kaplan-Meier and generalized linear model weights. For both endpoints, the hazard rates from inverse probability weighting and traditional analyses were approximately equal. The inverse probability–weighted standard errors for the incidence endpoint were somewhat smaller than the standard error from the traditional analysis. In contrast, the inverse probability–weighted standard errors for the death endpoint were considerably smaller. Because all available controls could be used, the efficiency increases.

Image Tools

We also analyzed a physical activity variable available in the complete cohort (n = 116,493) to contrast cohort and nested case-control analyses. The endpoint was incidence of prostate cancer. With cohort data, the HR was 1.07 (95% confidence interval = 0.95–1.22) compared with 1.09 (0.92–1.29) using traditional nested case-control analysis and 1.07 (0.90–1.26) and 1.01 (0.85–1.21) with generalized linear models and Kaplan-Meier weights, respectively, when comparing moderate activity to sedentary. Hence, the traditional estimates are not necessarily closer to cohort estimates than inverse probability–weighted estimates (see eAppendix [] for full analysis).

Our experience suggests that Kaplan-Meier and generalized linear model weights have similar performance with comparable estimated HRs and variances. However, with extremely close matching, simulations indicates that biased estimates can occur when applying Kaplan-Meier weights.8

We have demonstrated that inverse probability weighting can be a powerful alternative with sub-endpoints. Moreover, reuse of controls can be helpful in many multiple outcomes settings. The eAppendix ( gives an example of inverse probability weighting for specific metastasis groups.

Back to Top | Article Outline


We thank Tone Bjørge and the Janus serum bank for making this study possible.

Nathalie C. Støer
Department of Mathematics
Faculty of Mathematics
and Natural Sciences
University of Oslo
Oslo, Norway
Haakon E. Meyer
Department for Chronic Diseases
Division of Epidemiology
Norwegian Institute of Public Health
Oslo, Norway
Sven OveSamuelsen
Department of Mathematics
Faculty of Mathematics
and Natural Sciences
University of Oslo
Oslo, Norway

Back to Top | Article Outline


1. Thomas DC. Addendum to “Methods of cohort analysis: appraisal by application to asbestos mining” by Liddell FDK, McDonald JC and Thomas DC. J Roy Stat Soc Ser A. 1977; 140:469–491

2. Langholz B, Goldstein L. Risk set sampling in epidemiologic cohort studies. Stat Sci. 1996; 11:35–53

3. Salim A, Hultman C, Sparén P, Reilly M. Combining data from 2 nested case-control studies of overlapping cohorts to improve efficiency. Biostatistics. 2009; 10:70–79

4. Samuelsen SO. A pseudolikelihood approach to analysis of nested case-control studies. Biometrika. 1997; 84:379–394

5. Saarela O, Kulathinal S, Arjas E, Läärä E. Nested case-control data utilized for multiple outcomes: a likelihood approach and alternatives. Stat Med. 2008; 27:5991–6008

6. Samuelsen SO, Ånestad H, Skrondal A. Stratified case-cohort analysis of general cohort sampling designs. Scand J Stat. 2007; 34:103–119

7. Meyer HE, Robsahm TE, Bjørge T, Brustad M, Blomhoff R. Vitamin D, season, and risk of prostate cancer: a nested case-control study within Norwegian health studies. Am J Clin Nutr. 2013; 97:147–154

8. Støer NC, Samuelsen SO. Inverse probability weighting in nested case-control studies with additional matching-a simulation study. Stat Med. 2013; 32:5328–5339

Back to Top | Article Outline

Supplemental Digital Content

Copyright © 2014 by Lippincott Williams & Wilkins

Twitter  Facebook


Article Tools