The Kaplan–Meier curve1 is a standard statistical tool that is used in cohort studies to illustrate how survival during follow-up depends on time-fixed covariates measured at baseline. However, in many scenarios, it is also of interest to study how survival depends on time-varying covariates that are measured at repeated occasions during follow-up. For this purpose, Snapinn et al.2 proposed an “extended” Kaplan–Meier curve, which is constructed by letting subjects move across risk sets as their covariate levels change during follow-up. This procedure is analogous to how time-varying covariates are handled in the Cox proportional hazards model.3
The article by Snapinn et al.2 has been quite influential; it has currently been cited more than 100 times, and their proposed method has been used in many epidemiologic and medical studies, several of which have been published in high-impact journals, such as New England Journal of Medicine,4JAMA,5BMJ,6Circulation,7European Heart Journal,8,9Annals of Internal Medicine,10,11Annals of Neurology,12 and Clinical Cancer Research.13 However, to the best of our knowledge, there has been no proper discussion on how to interpret extended Kaplan–Meier curves. Snapinn et al.2 claimed that, under a certain independence assumption, the extended Kaplan–Meier curve has a causal interpretation as representing a “hypothetical cohort whose covariate values remain constant during follow-up”; however, they did not formally prove this claim.
In this note, we first review the definition of the extended Kaplan–Meier curve. We then use the potential outcome framework14,15 to formalize the notion of a hypothetical cohort with constant covariate values. We use causal diagrams15,16 to show that, in the absence of confounding, the extended Kaplan–Meier curve can indeed be given the aforementioned causal interpretation under the proposed independence assumption by Snapinn et al.2 However, we argue that the causal implications of this independence assumption are highly unrealistic, and that a causal interpretation of the extended Kaplan–Meier curve is therefore typically unwarranted. The two concluding sections of the article are somewhat more theoretical. In these sections, we discuss how the proposed independence assumption by Snapinn et al.2 is related to the standard Cox proportional hazards model, and how to appropriately control for time-varying confounders in the estimation of causal survival functions.
NOTATION, ASSUMPTIONS, AND DEFINITION OF THE EXTENDED KAPLAN–MEIER CURVE
Snapinn et al.2 considered a continuous time framework. For pedagogic purposes, we discretize time into , where represents baseline. This has no major practical consequences, since time is always discretized in practice, e.g., into months or days. We let be the outcome of interest, and define if the outcome event (e.g., death or cancer diagnosis) happens at time , else. We let be a categorical time-varying covariate of interest, with the observe value at time denoted with . We assume that occurs just before , so that the temporal order is given by . For any time-varying variable , we define , , and we use as shorthand for . To keep notation simple, we ignore truncation and censoring. However, all conclusions that we make are valid in the presence of left-truncation and right-censoring, provided that these are noninformative.3
Snapinn et al.2 considered an example where serum creatinine is the covariate and end-stage renal disease (ESRD) is the outcome. To motivate their extended Kaplan–Meier curve they wrote “...it may be reasonable to assume that the risk of ESRD is a function of the patient’s current value of serum creatinine and, conditional on that current value, is not related to any previous values. This simplifying is assumption is commonly used in the medical literature...”. In the notation introduced above, this assumption reads
where we have used “” for “ and conditionally independent, given .” Note that the conditioning on (event-free up to time ) is necessary in order for the variables and to be defined.
Let be the risk set containing those subjects in the observed cohort who are still event-free (e.g., have not yet developed ESRD) just before time and have the covariate (e.g., serum creatinine) value at time . Assumption (1) implies that the risk sets are “homogeneous,” in the sense that the risk of the event is the same for all subjects in the risk set , irrespective of their covariate history . Let be the total number of subjects in , and let be the number of subjects in who have the event at time . Snapinn et al.2 defined the extended Kaplan–Meier curve for fixed covariate level , as a function of , as
This extended Kaplan–Meier curve differs from the ordinary Kaplan–Meier curve in that it allows subjects to enter and exit the risk set multiple times through follow-up as their current covariate level varies; this is analogous to how time-varying covariates are handled in the Cox proportional hazards model.3
CAUSAL INTERPRETATION OF THE EXTENDED KAPLAN–MEIER CURVE
Snapinn et al.2 claimed that, under assumption (1), the extended Kaplan–Meier curve can be interpreted as representing a “hypothetical cohort whose covariate values remain constant during follow-up.” To formalize the notation of a hypothetical cohort with constant covariate levels, we let be the potential outcome14,15 for a given subject at time , had the covariate been set to for that subject throughout follow-up. In this notation,
is the survival function under a counterfactual (hypothetical) scenario where the covariate is set to throughout follow-up for everybody. By contrasting for different values of (e.g., by taking the difference ), we obtain a measure of the causal covariate effect on survival.
Assumption (1) is a purely statistical assumption. However, to see how the extended Kaplan–Meier curve relates to the causal survival function , it is useful to consider the causal implications of the assumption. The causal diagram in Figure 1 illustrates a possible data-generating mechanism for the covariate and the outcome. For brevity, the diagram only illustrates two time points, but generalization to more time points is straight-forward. The arrows from to and from to represent an effect of the current covariate level on the current outcome level (e.g. “short-term” or “acute”17 effect of serum creatinine on ESRD), and the arrow from to represents an effect of previous covariate levels (e.g., a “long-term” effect). The arrows from to and indicate that and are only defined if the subject does not have the event at time , i.e., if . The variables and represent all measured and unmeasured factors that affect only the covariate and only the outcome, respectively. In the example by Snapinn et al.2 and may for instance be genetic factors, with specific and independent effects on serum creatinine and ESRD, respectively. In practice, and may be time-varying as well, but for brevity we depict them in Figure 1 as time-fixed.
Snapinn et al.2 made no explicit reference to confounding. However, since the extended Kaplan–Meier curve is an unadjusted measure of association, it is clear that it requires an assumption of no confounding to have a causal interpretation. This assumption follows by design if the covariate of interest is randomized at each time point, but would often be violated in observational studies. In this section, we proceed under the assumption of no confounding, which is encoded in Figure 1 by the absence of common causes of and . We later relax this assumption, and discuss the implications of time-varying confounding.
The independence assumption (1) imposes several restrictions on the data-generating mechanism. Clearly, the assumption rules out a direct effect of on , since such an effect would make associated with , conditionally on . Less obvious perhaps, the assumption also rules out the simultaneous presence of a direct effect of on and common causes of and . This is because the simultaneous presence of these would make associated with , conditionally on , through the path ; by conditioning on the collider this path becomes open.18,19 Thus, assumption (1) implies that the data-generating mechanism either looks like in Figure 2, where there is no direct effect of on , or as in Figure 3, where there are no common causes of and . We emphasize that the arrow from to in Figure 2 is only present since is the last time point. When is present, the arrow from to must be absent for all time points , except at the very last time point, since each such arrow would induce conditional associations between and , given , via open paths through .
Using standard algebra for counterfactual variables it can be shown (see Section “Estimation of the Causal Survival Function in the Presence of Time-Varying Confounding”) that, if either of the causal diagrams in Figures 2 and 3 holds, then the extended Kaplan–Meier curve converges in probability to the causal survival function . Hence, in the absence of confounding, the extended Kaplan–Meier curve does indeed have a causal interpretation under the independence assumption (1), as claimed by Snapinn et al.2 These are good news; however, we will argue in the next section that this causal interpretation is almost always unwarranted in practice, since the causal implications of assumption (1) are highly unrealistic.
PLAUSIBILITY OF ASSUMPTION (1)
The absence of a direct effect of on in Figure 1 means that the covariate only affects the outcome through its current value, i.e., the covariate has no long-term effect on the outcome. This assumption may sometimes be realistic, at least as an approximation. For instance, in the study by Lichtenstein et al.,4 the covariate of interest is medication for attention deficit-hyperactivity disorder, which mainly has a short-term effect on the patient’s cognitive functioning.20
The main problem with assumption (1) is that it rules out the simultaneous presence of a direct effect of on and common causes of and . If there is no direct effect of on , then the covariate level at baseline has no effect at all on the outcome (Figure 2), since assumption (1) also rules out a direct effect of on .With more than two time points, the covariate would have no effect at any time point on the outcome, except possibly at the very end of follow-up. Thus, in the example by Snapinn et al.,2 this means that serum creatinine has essentially no effect at all on the risk of ESRD. A total absence of covariate effects may be plausible in some scenarios, but then there is not much point in drawing extended Kaplan–Meier curves for different covariate levels, as these would be identical (in large samples). If, on the other hand, there are no common causes of and , then there are no factors that simultaneously influence the risk of an event at different time points (Figure 3). Clearly, this assumption is highly unrealistic. This is particularly the case in the example by Snapinn et al.2 where the risk of ESRD at any given time point is clearly affected by a large number of genetic factors, lifestyle factors, etc., that may influence the risk of ESRD throughout follow-up. Many of these factors would typically be hard to measure, or even unknown to the researcher, and it would thus not be realistic to approximate the assumption by stratifying on or adjusting for measured covariates.
In light of this conclusion, one may wonder what made Snapinn et al.2 state that assumption (1) “may be reasonable.” We guess that these authors simply overlooked all the causal implications of the assumptions, and perhaps equated it with the absence of long-term covariate effects, e.g., the absence of an arrow from to in Figure 1. However, as noted above, this less restrictive assumption is not enough to give the extended Kaplan–Meier curve a causal interpretation.
RELATION BETWEEN ASSUMPTION (1) AND THE COX PROPORTIONAL HAZARDS MODEL
Snapinn et al.2 claimed that assumption (1) is “commonly used in the medical literature.” We disagree with this claim, and believe that it may stem from a misunderstanding of common modeling practice. In medical research, the most common way to analyze survival data is to use the Cox proportional hazards model. In principle, this model may include the whole covariate history, or some function thereof, at any given time point. In practice though, it is common to only include the most current covariate value. In our experience, this modeling practice is often thought of as being justified by, or even requiring, assumption (1). This is not the case though, as the following example illustrates.
Suppose that the true data-generating mechanism looks like in Figure 1. In this figure, both the direct effect of on , the direct effect of on , and the common causes of and are present, so that assumption (1) is violated. Suppose further that data are generated from the discrete-time proportional hazards model
In this model, is the unspecified baseline hazard, and the coefficients and are the short- and long-term effects of the covariate, respectively.
Now, suppose that we follow common practice and analyze data by only including the current covariate value in the model. We thus fit the model
Does this model incorrectly make assumption (1), and what result should we then expect when fitting the model? To address this question, we first note that models (3) and (4) are equivalent at , with and , since is by definition equal to 0. At , we have that
Suppose now, for pedagogic purposes, that both the direct effect of on and the direct effect of on are absent. This would, for instance, be the case if the covariate is randomized at without taking the value at into account. It then follows that and are conditionally independent, given , so that . Thus, the right-hand side of (5) further simplifies to
where and . Thus, model (4) is in fact correct, because it is implied by the data-generating model (3). Furthermore, even though ignoring the previous covariate value changes the interpretation of the baseline hazard, it does not change the interpretation of the short-term covariate effect .
This example illustrates that the exclusion of previous covariate values from the model does not require assumption (1) to hold, and it does not mean that the model is misspecified if assumption (1) is violated. It simply means that the model is agnostic about (i.e., marginalizes over) the association between current outcome status and previous covariate values.
ESTIMATION OF THE CAUSAL SURVIVAL FUNCTION IN THE PRESENCE OF TIME-VARYING CONFOUNDING
In observational studies, there is almost always confounding (i.e., common causes) of the covariate and the outcome. Considerable efforts have been devoted to the estimation of causal effects in the presence of time-varying confounding, see Hernán and Robins21 and the references therein. In this section, we briefly discuss how the causal survival function can be estimated in the presence of time-varying confounding; we refer to Hernán and Robins21 for details.
Consider the causal diagram in Figure 4. Here, and represent the sets of all measured and unmeasured confounders, respectively. In Figure 4, we have allowed for to be time-varying. For brevity, we have depicted as time-fixed, but all conclusions below are valid when is time-varying.
The causal diagram makes an important assumption, namely that there is no direct effect of the unmeasured confounders on ; all effect of on is mediated through the measured variables . This assumption may be reasonable, at least as an approximation, in well-designed observational studies where efforts have been made to collect data on all variables that directly determine the value of . Under this assumption, it can be shown21 that
The expression in (6) is a special case of what is often referred to as the “G-formula.” It expresses the causal survival function of interest (the left-hand side) in terms of the probability distribution for the observed data (the right-hand side). If contains continuous elements, then the summations for these elements on the right-hand side of (6) are replaced by integrals.
In low-dimensional settings (e.g., if is small, and both and are binary), the probabilities on the right-hand side of (6) may be estimated nonparametrically, which produces a nonparametric estimate of . In high-dimensional settings, estimation typically requires regression modeling. One approach is to fit a parametric model for each of the probabilities on the right-hand side of (6), which imply a parametric model for through the G-formula. Another approach is to explicitly postulate a parametric model for , and fit this so-called marginal structural model with inverse probability weighting. For details on these and other approaches, we refer to Hernán and Robins21 and the reference therein.
We end this section by noting that the causal diagrams in Figures 2 and 3 are special cases of the causal diagram in Figure 4, with both and being absent. It thus follows from (6) that, in these special cases, the G-formula simplifies to
Under assumption (1), we further have that . Thus, we have that , which is the asymptotic probability limit of the extended Kaplan–Meier curve.
In this note, we have shown that, in the absence of confounding, the extended Kaplan–Meier curve has a causal interpretation under the independence assumption (1), as claimed by Snapinn et al.2 However, we have argued that the causal implications of this assumption are highly unrealistic, and that a causal interpretation of the extended Kaplan–Meier curve is therefore typically unwarranted.
Even though assumption (1) has important causal implication, it is a statistical assumption per se. As such, it can be empirically verified or falsified. A natural way of testing the assumption is to fit one hazard model that includes both the current covariate value and some function of the covariate history, such as a cumulative covariate17,22 or a lagged covariate,23 and one hazard model that only includes the current covariate value. The goodness-of-fit of these models can then be compared with, for instance, the AIC criterion.17 A substantially better fit of the former model indicates that assumption (1) is violated.
Ordinary Kaplan–Meier curves are often used as purely descriptive tools, without reference to causality. Thus, one may argue that extended Kaplan–Meier curves can be used for a similar purpose, even when assumption (1) is violated. We do not take a strong stance in this question, but we wish to emphasize that, even though extended Kaplan–Meier curves have similar features as ordinary Kaplan–Meier curves (e.g., nonincreasing in time, always between 0 and 1), they cannot generally be interpreted as proper survival functions. Furthermore, even though the shape of the extended Kaplan–Meier curve depends on the statistical association between the covariate and the outcome, it does so in a complex fashion that mixes the short-term covariate effect, the long-term covariate effect, the confounding of the covariate and the outcome, and the noncausal association between previous covariate levels and current outcome status that arises by conditioning on being currently event-free (e.g., the collider on the path in Figure 1). Thus, the noncausal interpretation of extended Kaplan–Meier curves is far more intricate than the noncausal interpretation of ordinary Kaplan–Meier curves.
1. Kaplan E, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958;53:457–481.
2. Snapinn S, Jiang Q, Iglewicz B. Illustrating the impact of a time-varying covariate with an extended Kaplan–Meier estimator. Am Stat. 2005;59:301–307.
3. Klein J, Moeschberger M. Survival Analysis
: Techniques for Censored and Truncated Data. 2005.New York: Springer Science & Business Media.
4. Lichtenstein P, Halldner L, Zetterqvist J, et al. Medication for attention deficit–hyperactivity disorder and criminality. N Engl J Med. 2012;367:2006–2014.
5. Vigen R, O’Donnell CI, Barón AE, et al. Association of testosterone therapy with mortality, myocardial infarction, and stroke in men with low testosterone levels. JAMA. 2013;310:1829–1836.
6. Heinze G, Kainz A, Hörl WH, Oberbauer R. Mortality in renal transplant recipients given erythropoietins to increase haemoglobin concentration: cohort study. BMJ. 2009;339:b4018.
7. Conen D, Tedrow UB, Koplan BA, Glynn RJ, Buring JE, Albert CM. Influence of systolic and diastolic blood pressure on the risk of incident atrial fibrillation in women. Circulation. 2009;119:2146–2152.
8. Okin PM, Kjeldsen SE, Julius S, et al. All-cause and cardiovascular mortality in relation to changing heart rate during treatment of hypertensive patients with electrocardiographic left ventricular hypertrophy. Eur Heart J. 2010;31:2271–2279.
9. Van Gelder IC, Healey JS, Crijns HJGM, et al. Duration of device-detected subclinical atrial fibrillation and occurrence of stroke in ASSERT. Eur Heart J. 2017;38:1339–1344.
10. Larochelle MR, Liebschutz JM, Zhang F, Ross-Degnan D, Wharam JF. Opioid prescribing after nonfatal overdose and association with repeated overdose: a cohort study. Ann Intern Med. 2016;164:1–9.
11. Okin PM, Devereux RB, Harris KE, et al.; LIFE Study Investigators. Regression of electrocardiographic left ventricular hypertrophy is associated with less hospitalization for heart failure in hypertensive patients. Ann Intern Med. 2007;147:311–319.
12. Heneka MT, Fink A, Doblhammer G. Effect of pioglitazone medication on the incidence of dementia. Ann Neurol. 2015;78:284–294.
13. Lee CK, Marschner IC, Simes RJ, et al. Increase in cholesterol predicts survival advantage in renal cell carcinoma patients treated with temsirolimus. Clin Cancer Res. 2012;18:3188–3196.
14. Rubin D. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educat Psychol. 1974;66:688–701.
15. Pearl J. Causality: Models, Reasoning and Inference. 2009.2nd ed. New York: Cambridge University Press.
16. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48.
17. Abrahamowicz M, Beauchamp ME, Sylvestre MP. Comparison of alternative models for linking drug exposure with adverse effects. Stat Med. 2012;31:1014–1030.
18. Greenland S. Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology. 2003;14:300–306.
19. Cole SR, Platt RW, Schisterman EF, et al. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010;39:417–420.
20. Faraone SV, Buitelaar J. Comparing the efficacy of stimulants for ADHD in children and adolescents using meta-analysis. Eur Child Adolesc Psychiatry. 2010;19:353–364.
21. Hernán MA, Robins JM. Causal Inference
: What If. 2020.Boca Raton, FL: Chapman & Hall/CRC.
22. Sylvestre MP, Abrahamowicz M. Flexible modeling of the cumulative effects of time-dependent exposures on the hazard. Stat Med. 2009;28:3437–3453.
23. Gasparrini A. Modeling exposure-lag-response associations with distributed lag non-linear models. Stat Med. 2014;33:881–899.