The nested case-control design is the most widely used method for sampling from epidemiologic cohorts when investigators need to collect additional data in a reduced sample.1 Using incidence density sampling, the potential impact of exposures on disease occurrence can be studied by hazard ratios in a reduced data set.1,2 Furthermore, the cumulative incidence function can also be estimated if information of the full cohort is used.3 However, often the observation of the disease of interest is preceded by other “competing” events (or risks).4,5 There are two statistical approaches to deal with competing risks data: the event-specific hazard approach, which addresses the etiological point of view, and the subdistribution hazard approach, which is linked to the cumulative incidence function5,6; the latter is suitable for prediction.
It is well-known that the covariate effect on event-specific hazard can be very different from the effect on the cumulative incidence function.4,6 The reason is that the cumulative incidence function of the event of interest also depends on all event-specific hazards.4,6,7 Nested case-control studies in cohorts with competing events have been described.8–11 In contrast to these articles, we propose a sampling method to approximate the proportional hazards model for the subdistribution of a competing event.6 Based on this, we use cohort information to estimate the cumulative incidence function.
This methodology is illustrated by an example of hospital infection. Competing events for the occurrence of nosocomial infections are discharged from the hospital or dying in the hospital without a nosocomial infection. Competing events are often ignored in the analysis of nosocomial infections, which may lead to wrong conclusions when studying risk factors12 or interventions for nosocomial infections.13 We study the situation when information is available for the full cohort. This situation allows us to know the (“true”) information that we want to approximate with fewer observations by using the nested case-control approach. Description of the cohort study data and detailed event-specific results are given in the eAppendix (http://links.lww.com/EDE/A741).
Subdistribution Hazard Approach
Nonparametric Estimation of the Cumulative Incidence Function: Full Cohort
Nosocomial infection is the event of interest, and discharge and death without nosocomial infection are competing events. According to Andersen et al,14 the marginal survival probability in a competing risks framework with k competing events with hazards α1(u), ..., αk(u) is defined as
Then, the cumulative incidence function for event i = 1, …, k is defined as
These formulas show that the cumulative incidence function depends on all competing hazards. We used the Aalen-Johansen estimator15 for the nonparametric estimation of the cumulative incidence function of nosocomial infections (Figure 1), comparing patients with Acute Physiology and Chronic Health Evaluation (APACHE II) scores >15 versus ≤15.
Semiparametric Estimation: Full Cohort
Analogous to the event-specific approach (see eAppendix, http://links.lww.com/EDE/A741), we used a proportional hazards model to calculate the subdistribution hazard ratio of infection.6 To do this, we fit a Fine-Gray6 model with subdistribution hazard λsubI(t;X) = λsubI0(t)exp(X(t)β0) and get subdistribution hazard ratios exp(β0) for nosocomial infections. This sets the event times of the competing event (discharge) to infinity (time until potential censoring) before fitting a proportional hazards model.5,6,16 This principal idea is displayed in Figure 2.
Again, we compare patients with APACHE II scores >15 versus ≤15: the subdistribution hazard ratio is 4.0 (95% confidence interval = 3.3–4.9). The corresponding cumulative incidence functions are displayed in Figure 1 (dark gray).
Nested Case-control Approach
Our aim was to approximate the Fine-Gray6 model of the full cohort. To do this, we copy the principle idea of this model and set the event times of the competing events (discharge or death) in the cohort (source) data to “infinity” (time until potential censoring). This is displayed in Figure 2 as gray lines to show that these admissions remain in the new risk set. Then, we performed incidence density sampling (after breaking ties) with the modified data. As shown in Figure 2, controls must be disease-free at the time of diagnosis of the case to which they are matched. However, owing to the modified time, each infected case has now more eligible controls because discharged patients are still “at risk.” For instance, the patient who acquired an infection on day 10 had only three potential controls in the traditional incidence density sampling, whereas there are seven potential controls in the subdistribution sampling: all admissions who cross the corresponding vertical line.
The estimated odds ratio from the conditional logistic regression model approximates the subdistribution hazard ratio exp(β0) from the Fine-Gray6 model. The results are comparable with those from the full cohort (Table).
As with the event-specific cumulative hazards (see eAppendix, http://links.lww.com/EDE/A741), we used cohort information (modified number at risk for event times) to calculate the cumulative infection subdistribution hazards for each exposure category.3 With this, we approximated the cumulative incidence function of infection (Figure 1). Note that the crude risks of nosocomial infections in each score category (3.5% and 13.2%) correspond to the cumulative incidence function on the plateau of Figure 1 because administrative censoring is very low (about 0.001%).
We propose a modified sampling technique to study the impact of risk factors on the event of interest in terms of a comparison of cumulative incidence functions. The mathematical justification for this approach is based on a straightforward and consecutive combination of established methodology: the Fine-Gray model with adaptation,6,17 incidence density sampling,1,2 and the use of cohort information for absolute risks estimation.3 Statistical software is available.3,16–18
Interpreting results of a competing risks analysis is challenging, but conclusions can easily be misleading if competing events are ignored.13 In our example, the cumulative incidence functions would be highly overestimated if the competing event is ignored: infection risk 30 days after admission would be 35% for those with low APACHE II scores and 45% for those with high APACHE II scores compared with only 3.5% and 13.2%, respectively, when we account for the competing risks. Borgan8 proposed a method that uses the cumulative hazards (only) of the event of interest but assumed that the exposure has no effect on the competing events; in our example, this would mean that the APACHE II score has no effect on the discharge hazard. This also clearly leads to biased results because there is indeed an effect: infection risk 30 days after admission is 5% (low APACHE II score) and 7% (high APACHE II score). Additional knowledge on the cumulative hazards of the competing event is needed to overcome this problem,8 but that would require further nested case-control studies on the competing event or subcohorting. In contrast to this approach, we propose a direct sampling method.
Our approach has limitations. First, proportionality of event-specific hazards does not imply proportionality of subdistribution hazards and vice versa. In our example, the fit in both models was acceptable. This was very much in line with the results of Grambauer et al19 who showed that subdistribution hazard approach gives a summary analysis even if misspecified. Second, we dichotomized a continuous variable (APACHE II score). The reason for dichotomizing was only for illustrative purposes because this score is associated with the hazards for infection, discharge, and death. However, we emphasize that the proposed sampling method works well with exposures on a continuous scale and in a multivariate setting (data not shown). Third, in our cohort, the potential censoring times were available because of administrative censoring. If this is not the case, we recommend imputation of these values before sampling; methodology and software are available.16,17
Researchers who are planning a nested case-control study in a cohort with competing events should ask the question of which actual model they want to study: the etiology model with event-specific hazards or the prediction model with subdistribution hazards. We recommend the study of both to receive a complete picture of direct and indirect effects and to derive correct conclusions.20 If there is one event of interest (in our example, nosocomial infections), it is enough to combine the other competing events (eg, discharge [alive or dead]) because the cumulative incidence function depends on the sum of all competing cumulative hazards (Equation 2). However, before performing separate nested case-control studies for the event of interest and the competing event, one might consider possibilities of re-using controls21,22 or choosing a case-cohort design.
1. Gail MH, Benichou J Encyclopedia of Epidemiologic Methods. 2000 Chicester, UK: John Wiley & Sons Inc
2. Vandenbroucke JP, Pearce N. Case-control studies: basic concepts. Int J Epidemiol. 2012;41:1480–1489
3. Langholz B. Use of cohort information in the design and analysis of case-control studies. Scand J Stat. 2007;34:120–136
4. Andersen PK, Geskus RB, de Witte T, Putter H. Competing risks in epidemiology: possibilities and pitfalls. Int J Epidemiol. 2012;41:861–870
5. Lau B, Cole SR, Gange SJ. Competing risk regression models for epidemiologic data. Am J Epidemiol. 2009;170:244–256
6. Fine J, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94:496–509
7. Grambauer N, Schumacher M, Dettenkofer M, Beyersmann J. Incidence densities in a competing events analysis. Am J Epidemiol. 2010;172:1077–1084
8. Borgan Ø. Estimation of covariate-dependent Markov transition probabilities from nested case-control data. Stat Methods Med Res. 2002;11:183–202
9. Lubin JH. Extensions of analytic methods for nested and population-based incident case-control studies. J Chronic Dis. 1986;39:379–388
10. Lubin JH. Case-control methods in the presence of multiple failure times and competing risks. Biometrics. 1985;41:49–54
11. Flanders WD, Louv WC. The exposure odds ratio in nested case-control studies with competing risks. Am J Epidemiol. 1986;124:684–692
12. Wolkewitz M, Di Termini S, Cooper B, Meerpohl J, Schumacher M. Paediatric hospital-acquired bacteraemia in developing countries. Lancet. 2012;379:1484; author reply 1484–1485
13. Wolkewitz M, Harbarth S, Beyersmann J. Daily chlorhexidine bathing and hospital-acquired infection. N Engl J Med. 2013;368:2330
14. Andersen PK, Abildstrom SZ, Rosthøj S. Competing risks as a multi-state model. Stat Methods Med Res. 2002;11:203–215
15. Aalen OO, Johansen S. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand J Stat. 1978;5:141–150
16. Beyersmann J, Allignol A, Schumacher M Competing Risks and Multistate Models with R. 2011 New York Springer
17. Ruan PK, Gray RJ. Analyses of cumulative incidence functions via non-parametric multiple imputation. Stat Med. 2008;27:5709–5724
18. Richardson DB. An incidence density sampling program for nested case-control analyses. Occup Environ Med. 2004;61:e59
19. Grambauer N, Schumacher M, Beyersmann J. Proportional subdistribution hazards modeling offers a summary analysis, even if misspecified. Stat Med. 2010;29:875–884
20. Latouche A, Allignol A, Beyersmann J, Labopin M, Fine JP. A competing risks analysis should report results on all cause-specific hazards and cumulative incidence functions. J Clin Epidemiol. 2013;66:648–653
21. Støer NC, Samuelsen SO. Comparison of estimators in nested case-control studies with multiple outcomes. Lifetime Data Anal. 2012;18:261–283
22. Salim A, Yang Q, Reilly M. The value of reusing prior nested case-control data in new studies with different outcome. Stat Med. 2012;31:1291–1302