# Reuse of Controls in Nested Case-Control Studies

Støer, Nathalie C.; Meyer, Haakon E.; Samuelsen, Sven Ove

doi: 10.1097/EDE.0000000000000057
Letters
Free

Department of Mathematics, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway, nathalcs@math.uio.no

Department for Chronic Diseases, Division of Epidemiology, Norwegian Institute of Public Health, Oslo, Norway

Department of Mathematics, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway

The serum analysis of vitamins was supported by a grant from the Norwegian Cancer Society and the Throne Holst Foundation. The Norwegian Cancer Society provided salary as a PhD grant for N.C.S.

## To the Editor:

A nested case-control design with risk set sampling1,2 matches controls to cases on time and often on additional factors. This matching has been thought to make the controls unusable for other endpoints. However, recent methods3–5 enable the reuse of controls, thereby improving efficiency. We demonstrate this in an example assuming proportional hazards models for time-to-event and estimating hazard ratios (HRs).

We consider inverse probability–weighted Cox regression models weighted by 1.0 divided by the probability of ever being sampled as control. Subjects can be sampled at each event time they are at risk and meet the matching criteria, thus typically at several occasions. We estimate these probabilities using two methods. For the Kaplan-Meier method,3,4 note that the probability of ever being sampled is 1.0 minus the probability of never being sampled, and the latter probability is the product of probabilities of not being sampled at each possible event time. This leads to the formula

which is similar to the Kaplan-Meier estimator for the sampling probabilities

. The

and

are numbers of possible and sampled controls for the case at time

, respectively. The

is an indicator function that is either 0 or 1. For the second method,5,6 referred to as generalized linear model weights, we consider indicators of ever being sampled as controls

among all noncases. The sampling probabilities are estimated using logistic regression with entry time

, censoring time

, and matching variables

as covariates,

See the eAppendix (http://links.lww.com/EDE/A762) for more details.

We applied inverse probability weighting in a study of serum 25-hydroxyvitamin D (s-25(OH)D) and prostate cancer7 to evaluate this method. The cohort consisted of participants in health surveys in Norway, comprising 116,493 men. Among those, 2,118 were diagnosed with prostate cancer during follow-up. For each incident case, one control was sampled from the case’s risk set, matched on age at serum sampling ±6 months, date of serum sampling ±2 months, and county of residence. Meyer et al7 focused on the association between s-25(OH)D and incidence of prostate cancer. Due to the increased practice of screening for this cancer, death from prostate cancer might be a better endpoint when considering the most serious cases. Among the incident cases, 367 men died from prostate cancer. Traditional analysis of nested case-control data can use only the controls for incident cases who also died from prostate cancer, whereas all sampled controls can be used with inverse probability–weighted analysis. Robust variance estimation, possibly slightly conservative,6 was chosen for the present analyses.

The Table displays results from traditional analyses and inverse probability–weighted analyses with Kaplan-Meier and generalized linear model weights. For both endpoints, the hazard rates from inverse probability weighting and traditional analyses were approximately equal. The inverse probability–weighted standard errors for the incidence endpoint were somewhat smaller than the standard error from the traditional analysis. In contrast, the inverse probability–weighted standard errors for the death endpoint were considerably smaller. Because all available controls could be used, the efficiency increases.

We also analyzed a physical activity variable available in the complete cohort (n = 116,493) to contrast cohort and nested case-control analyses. The endpoint was incidence of prostate cancer. With cohort data, the HR was 1.07 (95% confidence interval = 0.95–1.22) compared with 1.09 (0.92–1.29) using traditional nested case-control analysis and 1.07 (0.90–1.26) and 1.01 (0.85–1.21) with generalized linear models and Kaplan-Meier weights, respectively, when comparing moderate activity to sedentary. Hence, the traditional estimates are not necessarily closer to cohort estimates than inverse probability–weighted estimates (see eAppendix [http://links.lww.com/EDE/A762] for full analysis).

Our experience suggests that Kaplan-Meier and generalized linear model weights have similar performance with comparable estimated HRs and variances. However, with extremely close matching, simulations indicates that biased estimates can occur when applying Kaplan-Meier weights.8

We have demonstrated that inverse probability weighting can be a powerful alternative with sub-endpoints. Moreover, reuse of controls can be helpful in many multiple outcomes settings. The eAppendix (http://links.lww.com/EDE/A762) gives an example of inverse probability weighting for specific metastasis groups.

## ACKNOWLEDGMENTS

We thank Tone Bjørge and the Janus serum bank for making this study possible.

Nathalie C. Støer

Department of Mathematics

Faculty of Mathematics

and Natural Sciences

University of Oslo

Oslo, Norway

nathalcs@math.uio.no

Haakon E. Meyer

Department for Chronic Diseases

Division of Epidemiology

Norwegian Institute of Public Health

Oslo, Norway

Sven OveSamuelsen

Department of Mathematics

Faculty of Mathematics

and Natural Sciences

University of Oslo

Oslo, Norway

## REFERENCES

1. Thomas DC. Addendum to “Methods of cohort analysis: appraisal by application to asbestos mining” by Liddell FDK, McDonald JC and Thomas DC. J Roy Stat Soc Ser A. 1977;140:469–491
2. Langholz B, Goldstein L. Risk set sampling in epidemiologic cohort studies. Stat Sci. 1996;11:35–53
3. Salim A, Hultman C, Sparén P, Reilly M. Combining data from 2 nested case-control studies of overlapping cohorts to improve efficiency. Biostatistics. 2009;10:70–79
4. Samuelsen SO. A pseudolikelihood approach to analysis of nested case-control studies. Biometrika. 1997;84:379–394
5. Saarela O, Kulathinal S, Arjas E, Läärä E. Nested case-control data utilized for multiple outcomes: a likelihood approach and alternatives. Stat Med. 2008;27:5991–6008
6. Samuelsen SO, Ånestad H, Skrondal A. Stratified case-cohort analysis of general cohort sampling designs. Scand J Stat. 2007;34:103–119
7. Meyer HE, Robsahm TE, Bjørge T, Brustad M, Blomhoff R. Vitamin D, season, and risk of prostate cancer: a nested case-control study within Norwegian health studies. Am J Clin Nutr. 2013;97:147–154
8. Støer NC, Samuelsen SO. Inverse probability weighting in nested case-control studies with additional matching-a simulation study. Stat Med. 2013;32:5328–5339

## Supplemental Digital Content

© 2014 by Lippincott Williams & Wilkins, Inc