# Causal Mediation Analysis With Survival Data

Epidemiology:
July 2011 - Volume 22 - Issue 4 -
p 582-585

doi: 10.1097/EDE.0b013e31821db37e

Methods: Commentary

SUPPLEMENTAL DIGITAL CONTENT IS AVAILABLE IN THE TEXT.

From the Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA.

Supported by National Institutes of Health grant HD060696.

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com).

Correspondence: Tyler J. VanderWeele, Harvard School of Public Health, Departments of Epidemiology and Biostatistics, 677 Huntington Ave, Boston, MA 02115. E-mail: tvanderw@hsph.harvard.edu.

In the last few years, there have been a number of papers developing methods for mediation analysis from a counterfactual perspective, building on some of the original insights of Robins and Greenland^{1} and Pearl.^{2} Until the paper by Lange and Hansen,^{3} in this issue of Epidemiology, there has not, however, been any work addressing the survival-analysis setting from the perspective of causal inference. Using an additive hazard model, Lange and Hansen^{3} have provided a useful flexible method to analyze direct and indirect effects for time-to-event data.

Here, I would like to discuss different effect measures of interest when direct and indirect effects in survival analysis are in view, show how an approach similar to that of Lange and Hansen^{3} is possible for a proportional hazards model with a rare outcome or accelerated failure time models generally, and relate these ideas to previous work on mediation analysis with survival data published in the social science literature.^{4}

## CONCEPTS AND DEFINITIONS

Let *A* denote an exposure of interest, *T* a time-to-event outcome, *M* a mediator, and *C* a set of covariates. Let *Ta* denote the counterfactual event time if *A* had been set to *a*; likewise let *Tam* denote the counterfactual event time if *A* had been set to *a* and *M* had been set to *m*. Let *Ma* be the counterfactual value of the mediator if *A* had been set to *a*. We restrict our attention here to the setting of a single event, rather than considering multiple events as in Lange and Hansen.^{3} With these definitions we can also consider nested counterfactual event times. For example,

is an individual's event time if the exposure had been set to *a* and the mediator had been set to the level it would have been had exposure been *a**. We assume composition,^{5} that

For an arbitrary time-to-event variable *V*, we will let *SV*(*t*) denote the survival function at time *t*, that is *SV*(*t*) = *P*(*V* > *t*); the survival function conditional on covariates *C* = *c* can likewise be defined as *SV*(*t*|*c*) = *P*(*V* > *t*|*c*). We will use λ_{V}(*t*) and λ_{V}(*t*|*c*) for the hazard or conditional hazard at time *t*, that is the instantaneous rate of the event conditional on *V* ≥ *t*.

An interesting feature of survival data within the context of mediation analysis is that there are multiple ways or scales by which we might decompose a total effect comparing exposure levels *a* and *a** into direct and indirect effects. For example, if we were to consider the survival functions, we could decompose a comparison of the survival functions

and

as follows:

where the first expression in brackets is the natural indirect effect on the survival function scale and the second is the natural direct effect on the survival function scale. We could alternatively but similarly decompose the overall difference in hazards as the sum of natural indirect and direct effects on the hazard scale:

Both of these measures, along with a cumulative hazard effect decomposition, were considered by Lange and Hansen.^{3} We could, however, also consider other effect decompositions. We could, for example, consider a decomposition in terms of mean survival times:

Or if we let *Qa* and *Qam* denote the median counterfactual survival time if *A* had been set to *a* or if *A* had been set to *a* and *M* had been set to m, respectively, then we have the decomposition:

One could also consider using the difference in log-survival function, or log hazards, or log-expected survival times, etc. For example, with log-hazard one has the decomposition:

which exponentiating can also be written as

so that the hazard ratio is the product of the natural indirect and direct effect hazard ratios. All of the above measures could also be considered conditional on strata of covariates *C* = *c*.

With each of these potential decompositions on the difference scale, one could calculate a “proportion mediated” by taking a ratio of the natural indirect effect to the sum of the natural direct and indirect effects (ie, the total effect). These measures of the proportion mediated may vary across scales. Also, depending on the specific survival model, the natural direct and indirect effects may be analytically tractable on certain scales but not on others.

Irrespective of the decomposition chosen, however, certain fairly strong no-unmeasured-confounding assumptions need to be made. Following an identification approach initiated by Pearl^{2} and used by subsequent authors on mediation,^{5–8} Lange and Hansen^{3} make 4 assumptions about no confounding conditional on the covariates. These can essentially be stated as that, conditional on covariates, there is (i) no confounding for the exposure-outcome relationship, (ii) no confounding for the mediator-outcome relationship, (iii) no confounding for the exposure-mediator relationship, and (iv) no mediator-outcome confounder that is an effect of the exposure. These are assumptions (A. 1)–(A. 4) in Lange and Hansen, and we likewise assume that they hold here. Sensitivity analysis techniques for direct and indirect effects can be useful when these assumptions do not hold.^{9,10}

## MEDIATION WITH AN ADDITIVE HAZARD MODEL

Lange and Hansen^{3} present an approach to mediation analysis with survival data using an additive hazard model. In the most basic form they consider, the model can be written as:

They propose a linear regression model for the mediator, when it is continuous, with normally distributed error:

They proceed to show that on the hazard scale, natural direct and indirect effects are given by:

where, the first expression is the indirect effect and the second the direct effect on the hazard scale.

The use of the coefficient λ_{1} for the exposure in the model for the outcome as the direct effect, and the product of the coefficient for the exposure in the model for the mediator times the coefficient for the mediator in the model for the outcome (λ_{3}β_{1}) as a measure of the indirect effect, has a long history in the social sciences.^{11,12} The causal inference literature has clarified the assumptions needed to interpret these measures as causal direct and indirect effects,^{2,5,8} eg, assumptions (i)–(iv) above. The causal inference literature has also given formal counterfactual definitions of these effects, and has extended the notions of direct and indirect effects to much more general settings. Lange and Hansen^{3} have shown how these notions extend further to survival data and have provided a model—the additive hazards models—under which the traditional social science direct and indirect coefficient measures hold.

However, the paper of Lange and Hansen^{3} goes much further than this. Their approach allows the hazard functions to vary over time, allows for the possibility of multiple types of events, and could be extended to incorporate exposure-mediator interactions as well. The generality of the approach proposed is impressive, and the methodology and software provided will certainly be of use for causal mediation analysis within a survival context. Additive hazard models are not employed with great frequency in the epidemiologic literature, but the paper by Lange and Hansen demonstrates their potential utility and perhaps should give epidemiologists reason to rethink their choice of survival analysis models.

## MEDIATION WITH ACCELERATED FAILURE TIME AND PROPORTIONAL HAZARDS MODELS

The survival analysis models most frequently employed in the epidemiologic and social science literatures are probably, first, the proportional hazards model, and, second, accelerated failure time models. The possibility of conducting mediation analysis with survival data under both models was in fact considered in a paper by Tein and MacKinnon^{4} in the social science literature some years ago. There have traditionally been 2 methods for undertaking mediation analysis. The “difference method,”^{13} which is more common in epidemiology, considers an outcome model both with and without the mediator, and takes the difference in the coefficients for the exposure as the measure of the indirect or mediated effect. The “product method,”^{11} more common in the social sciences, takes as a measure of the indirect effect the product of the coefficient for the exposure in the model for the mediator (ie, β_{1} in model (2)) and the coefficient for the mediator in the model for the outcome. If the outcome and mediator are continuous and there are no interactions in the model for the outcome, then the 2 methods coincide.^{8,14} However, with binary outcomes and logistic regression, the 2 methods may diverge^{8,15}; they will approximately coincide when the binary outcome is rare.^{8}

Tein and MacKinnon^{4} considered whether the 2 approaches coincide with proportional hazards and accelerated failure time models. They effectively use model (2) for the mediator and use

for the proportional hazard model and

for the accelerated failure time model where ε is a random variable following an extreme value distribution and *ν* is a scale parameter so that *T* follows a Weibull distribution. Using simulations, Tein and MacKinnon find that the difference method and product method give different results for the proportional hazards model but the same results for the accelerated failure time model. Their results raise the question of whether either of these methods for either of the models has a clear causal interpretation. Lange and Hansen^{3} have given a rigorous causal interpretation for the parameters of an additive hazard model. Do similar results hold for the proportional hazards or accelerated failure time models?

Let us first consider the accelerated failure time model. We note first that it is no coincidence that the product and difference methods coincide for the accelerated failure time model in (4). In the eAppendix (http://links.lww.com/EDE/A487), we give an analytic proof that this is so, provided that the models are correctly specified and that there are no interactions in model (4); the result holds for arbitrary distributions of ε in model (4) ie, not just Weibull models. We moreover show in the eAppendix that the measures of direct and indirect effects obtained by these methods are the natural direct and indirect effects on the conditional mean survival time scale. That is, the natural direct effect

is equal to θ_{1}(*a* − *a**) and the natural indirect effect

is equal to β_{1}θ_{2}(*a* − *a**). In other words, we once again obtain the result that exposure-coefficient in model (4) for the outcome is a measure of the direct effect, and the product of the exposure-coefficient in model (2) for the mediator times the mediator-coefficient in model (4) for the outcome is a measure of the indirect effect. In fact, for the accelerated failure time model, these analytic expressions can be extended so as to allow for exposure-mediator interaction in model (4). Suppose we extended model (4) to allow for such interaction:

If model (5) holds for the outcome and model (2) holds for the mediator, then natural direct and indirect effects on the log mean survival time scale conditional on *C* = *c* are given by:

where the first expression is the natural indirect effect and the second expression is the natural direct effect, and where σ^{2} is the variance of the error term in regression model (2) for the mediator. These results hold for arbitrary distributions for ε in model (5) but do require a normally distributed mediator in model (2). Note that when there is no interaction (θ_{3} = 0), the expressions reduce to those given above and considered by Tein and MacKinnon.^{4} The expressions given here for the accelerated failure time model are analogous to those given by VanderWeele and Vansteelandt^{8} for odds ratios for mediation analysis for a dichotomous outcome. Expressions for standard errors for these direct and indirect effects could likewise be adapted from VanderWeele and Vansteelandt.^{8}

Let us now turn to the proportional hazards model in (3). With the proportional hazards model, somewhat analogous results can be obtained, but only when the outcome is rare. Specifically, consider an extension to model (3) which allows for exposure-mediator interaction:

If model (6) holds for the outcome and model (2) holds for the mediator then we show in the eAppendix (http://links.lww.com/EDE/A487), using arguments similar to those in Lin et al,^{16} that, provided the outcome is rare, natural direct and indirect effects on the log hazard difference scale are given by:

where σ^{2} is again the variance of the error term in regression model (2) for the mediator. The expressions are likewise analogous to those obtained by VanderWeele and Vansteelandt^{8} for a dichotomous outcome, but these expressions only apply for a rare outcome. Natural indirect and direct effect hazard ratios can be obtained by exponentiating the right hand side of the equalities. We moreover show in the eAppendix that when there is no exposure-mediator interaction as in model (3), and when the outcome is rare, then the product and difference methods will coincide approximately.

In the general setting (with nonrare outcome), unfortunately, neither the product method or the difference method for the proportional hazards model have any sort of clear causal interpretation as a measure of effect. Tein and MacKinnon^{4} show that the product and difference methods can diverge, and that they may even be of opposite signs. Lange and Hansen^{3} noted that in the general setting (ie, common outcome) with the proportional hazards model, natural direct and indirect effects do not have any simple analytic expression. We do nevertheless show in the eAppendix that even if the outcome is common, the product method using models (2) and (3) at least provides a valid test for whether there is any mediated effect, provided the models are correctly specified and that assumptions (i)–(iv) hold. With the proportional hazards model and a common outcome, the product method can thus be useful at least in testing the hypothesis of any mediated effect. But neither the product nor the difference method should in general be used as a measure of an indirect effect. Indeed, Hafeman^{17} has recently demonstrated the danger of using such measures in nonlinear models; they can result in weighted averages of causal effects in which the weights do not in fact sum to one.

## CONCLUSION

The discussion here has provided expressions for natural direct and indirect effects for the accelerated failure time model and for the proportional hazards model when the outcome is rare. A major contribution of the counterfactual approach to causal mediation analysis has been to clarify the no-confounding assumptions required for the identification of direct and indirect effects. Within the context of survival data, the counterfactual approach also clarifies when different methods for direct and indirect effects can be interpreted as measures of effect rather than simply as a test for a mediated effect. The causal inference approach clarifies further on what scale these measures apply when they can be so interpreted. The observations of Tein and MacKinnon^{4} have been given a more rigorous formulation and the approach has been extended to allow for exposure-mediator interactions.

Because the proportional hazards model is commonly used in epidemiologic research, the development of methodology and causal direct and indirect effect measures that can be used in conjunction with the model when the outcome is common may be an important direction for future research. However, the additive hazard model employed by Lange and Hansen^{3} constitutes an important and very general alternative to mediation analysis with survival data.

## REFERENCES

1. Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects.

*Epidemiology*. 1992;3:143–155.2. Pearl J. Direct and indirect effects. In:

*Proceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence*. San Francisco: Morgan Kaufmann; 2001:411–420.3. Lange T, Hansen JV. Direct and indirect effects in a survival context.

*Epidemiology*. 2011;22:575–581.4. Tein J-Y, MacKinnon DP. Estimating mediated effects with survival data. In: Yanai H, Rikkyo AO, Shigemasu K, Kano Y, Meulman JJ, eds.

*New Developments on Psychometrics*. Tokyo, Japan: Springer-Verlag Tokyo Inc; 2003:405–412.5. VanderWeele TJ, Vansteelandt S. Conceptual issues concerning mediation, interventions and composition.

*Statist Interface*. 2009;2:457–468.6. Peterson ML, Sinisi SE, van der Laan MJ. Estimation of direct causal effects.

*Epidemiology*. 2006;17:276–284.7. VanderWeele TJ. Marginal structural models for the estimation of direct and indirect effects.

*Epidemiology*. 2009;20:18–26.8. VanderWeele TJ, Vansteelandt S. Odds ratios for mediation analysis with a dichotomous outcome.

*Am J Epidemiol*. 2010;172:1339–1348.9. VanderWeele TJ. Bias formulas for sensitivity analysis for direct and indirect effects.

*Epidemiology*. 2010;21:540–551.10. Imai K, Keele L, Tingley D. A general approach to causal mediation analysis.

*Pyschological Methods*. 2010;15:309–334.11. Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations.

*J Pers Soc Psychol*. 1986;51:1173–1182.12. MacKinnon DP.

*An Introduction to Statistical Mediation Analysis*. New York: Lawrence Erlbaum Associates; 2008.13. Judd CM, Kenny DA. Process analysis: estimating mediation in treatment evaluations.

*Eval Rev*. 1981;5:602–619.14. MacKinnon DP, Warsi G, Dwyer JH. A simulation study of mediated effect measures.

*Multivariate Behav Res*. 1995;30:41–62.15. MacKinnon DP, Dwyer JH. Estimating mediated effects in prevention studies.

*Eval Rev*. 1993;17:144–158.16. Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeausred confounding in observational studies.

*Biometrics*. 1998;54:948–963.17. Hafeman DM. Proportion explained: a causal interpretation for standard measures of indirect effect?

*Am J Epidemiol*. 2009;170:1443–1448.