Numerous articles have in recent years laid the foundation for causal mediation analysis in the context of linear and nonlinear models for continuous, binary, and survival outcomes, and likewise in situations where an interaction may be present between the exposure and the mediator.^{1–7} These advances have clarified conditions under which traditional mediation techniques for estimating the indirect effect, such as the “product” and “difference” methods, are equivalent.^{3}^{,}^{5} In a recent commentary, VanderWeele^{8} established that the well-known equivalence of the difference and product methods in linear models with no exposure–mediator interaction holds exactly for the mean of the log of a survival outcome under a certain accelerated failure time model. This result confirmed previous findings by Tein and MacKinnon,^{9} who showed in simulation studies that under a Weibull accelerated failure time model the product and difference estimators were consistent for the indirect effect; however, both Tein and MacKinnon^{9} and VanderWeele^{8} only considered a setting in which censoring was absent.

In practice, outcomes in survival data are typically subject to some form of censoring, primarily due to loss to follow-up and administrative censoring. The accelerated failure time model is a prominent approach for handling censored survival data in the health sciences and has become widely used in practice by epidemiologists partly because it is readily available from popular standard commercial software packages such as PROC LIFEREG in SAS,^{10} streg in Stata,^{11} and survreg in R.^{12} Another approach to handling censored survival data is the Cox proportional hazards model, but the regression coefficients from these models can be hard to interpret due to built-in selection bias as they condition on information about the outcome.^{13}^{,}^{14} The central role of survival models in medical research coupled with the increasing popularity of causal mediation analyses has led to method development and software that will estimate direct and indirect effects. Recently, Valeri and VanderWeele^{15} created a SAS macro to estimate direct and indirect effects in censored survival data using accelerated failure time and Cox proportional hazards models. They cautioned that the estimated direct and indirect effects for a Cox proportional hazards model are based on a rare disease assumption, while the accelerated failure time model does not require such an assumption.^{15} In a more recent article, Gelfand et al.^{16} encourage the use of accelerated failure time models over Cox proportional hazards models in mediation analysis. A commonly noted drawback of the accelerated failure time model is that the event times are assumed to follow a specific distribution, but Gelfand et al.^{16} argue that the breadth of distributions available can capture the variability in survival data, and the Weibull distribution can represent distributions commonly found in clinical research.

For mediation analysis with a censored survival outcome and assuming no exposure–mediator interaction in the outcome model, researchers can easily estimate direct and indirect effects in standard software through a combination of accelerated failure time and linear models (see R Code in eAppendix; https://links.lww.com/EDE/B212). In doing so, researchers may estimate the indirect effect either by the product or difference method, as the results by Tein and MacKinnon^{9} and VanderWeele^{8} may inadvertently lead one to incorrectly assume that both estimators are valid. However, Gelfand et al.^{16} demonstrate in simulation studies that in the presence of right censoring the product and difference estimators are not necessarily equivalent under a Weibull accelerated failure time model. Interestingly, they note that the product method appeared to remain unaffected by right censoring, while the difference method underestimated the indirect effect in their simulation studies.^{16} The results of our article will provide the theoretical underpinning for the conclusions drawn from simulation studies by Tein and MacKinnon^{9} and Gelfand et al.^{16} We supply formal justification for their conjecture that the difference method will often fail to provide a consistent estimator of the indirect effect under an accelerated failure time model even when the product method does.

Specifically, in this article, we formally establish that the equivalence between the product and difference method generally fails in the presence of censoring, primarily due to lack of consistency of the difference method arising from a form of model misspecification. We will formally show that this form of model misspecification gives rise to bias in the difference method estimator when censoring is present. However, in the absence of censoring, this form of model misspecification is relatively benign and does not generally induce bias in the estimated indirect effect using the difference method. This misspecification does not arise in the special case of normal mediator–normal outcome model, and, thus, both difference and product estimators are consistent in the presence of independent censoring. In a simulation study, we confirm these results for normal and Weibull distributed time-to-event outcomes, respectively. We also consider the implication of our findings in estimating the indirect effect of human immunodeficiency virus (HIV) status mediated by height for age at sexual maturity in the Pediatric HIV/AIDS Cohort Study (PHACS), Adolescent Master Protocol (AMP) and Pediatric AIDS Clinical Trials Group 219C (PACTG) studies and the indirect effect of combination treatment mediated by viral suppression on time to death or opportunistic infection among HIV-infected adults using multiple studies of the AIDS Clinical Trial Group.^{17–20} The PHACS AMP and PACTG 219/219C protocols were approved by institutional review boards (IRBs) at each participating site and the Harvard T.H. Chan School of Public Health. Written informed consent was obtained from all participants or their legal guardians; assent was obtained from minors as allowed by site IRBs. Although the article focuses on the implications of censoring, all of the main results hold for left truncation as we formally show in the eAppendix (https://links.lww.com/EDE/B212).

## NOTATION AND ASSUMPTIONS

Throughout, we focus on a binary exposure *A*, continuous mediator *M*, and time-to-event outcome *T*. To simplify the presentation, we do not explicitly include pre-exposure covariates, and therefore, for all practical purposes, our analysis may be viewed as if we had conditioned on a specific level of such covariates. However, in the eAppendix (https://links.lww.com/EDE/B212), formal statements of our main results and corresponding proofs explicitly account for covariates. Let *M*(*a*) denote the counterfactual mediator had the exposure taken value *a* and *T*(*a*) = *T*(*a*, *M*(*a*)) denote the counterfactual outcome had exposure taken value *a*. In mediation analysis, we will also consider the counterfactual outcome had exposure taken value *a* = 0,1 and the mediator taken the value it would have under

.

We consider the following models for the survival outcome *T* and mediator *M*:

where *ξ* has mean zero and is independent of *A*, *ε* follows a known distribution and is independent of *A* and *M*, and *σ* is some positive scale parameter. Note that model (1) assumes no exposure–mediator interaction, which is necessary for possible equivalence between the product and difference representation of the indirect effect.

As shown in the eAppendix (https://links.lww.com/EDE/B212), under treatment randomization (A1) and cross-world counterfactuals independence (A2) assumptions and following the same reasoning as Pearl,^{2} the average natural (or pure) direct or indirect effects on the log mean scale is nonparametrically identified. (A1) is a standard no unmeasured confounding assumption of the effects of *A* on (*M*,*T*), while (A2) is a somewhat stronger no unmeasured confounding assumption of the effects of *M* on *T*.^{21} The natural direct

and indirect

effects for the log-survival time are defined in terms of these counterfactuals and under models (1) and (2) we have that

Letting *a* = 1 and

, this leads to a natural direct effect of

and a natural indirect effect given by the product rule

. The expression for the difference method, given by

, is obtained upon marginalizing over *M* by positing a second accelerated failure time model for *T* as a function of *A* only, which shall be referred to as the reduced form model and is typically specified as followed:

where

independent of *A* and typically assumed to follow the same distribution as *ε* in (1), and

is a positive scale parameter. This specification is also used by both VanderWeele^{8} and Tein and MacKinnon.^{9} Under this formulation, we see that

is the total effect of *A* on the mean of *T* on the log scale and satisfies

This equivalence follows from direct substitution of Equation (2) into (1) and evaluation of total effect

.

## EQUIVALENCE OF THE PRODUCT AND DIFFERENCE METHOD

As we discuss further below, Equation (4) is usually misspecified because the error distribution specified in models (1) and (2) completely determine the error distribution in model (3) as a convolution of these two laws (see eAppendix; https://links.lww.com/EDE/B212 (A7)). Unless the error distribution of model (3) is carefully chosen to match this convolution, the model will be misspecified. This convolution seldom reduces to a standard model typically implemented in off-the-shelf software when *M* and *T* follow standard distributions. For instance, suppose that *M* is assumed normal and *T* assumed to follow a Weibull distribution, then the reduced form error distribution is a convolution of a normal with a Weibull distribution, which is neither normal nor Weibull and is in fact not of a standard closed form (see eAppendix; https://links.lww.com/EDE/B212 (A10)). In this case, the assumption that the error in the reduced model is Weibull is clearly incorrect. A fairly prominent setting in which the reduced form model is correctly specified is the normal mediator–normal outcome model, in which case the error distribution of model (3) is also normal.

As shown in the eAppendix (https://links.lww.com/EDE/B212), in the absence of censoring, maximum likelihood estimation of the reduced form model will be consistent for the total effect

even if the error distribution is misspecified. This follows from the fact that in the absence of censoring, consistency of the estimated regression parameters in an accelerated failure time model depends on correct specification of the regression model, not on the choice of error distribution. In contrast, result (A8) in the eAppendix (https://links.lww.com/EDE/B212) formally establishes that in the presence of censoring, maximum likelihood estimation of the reduced form model will fail to be consistent when the error distribution is incorrect because the corresponding score function fails to be unbiased, which is a basic requirement of consistency. Under correct specification of models (1) and (2), the product method corresponds to the maximum likelihood estimator and is guaranteed to be consistent whether or not censoring is present and irrespective of choice of models for residual errors.

On these theoretical grounds, we conclude that one should exert caution when using the difference method in the presence of censoring (or as shown in the eAppendix; https://links.lww.com/EDE/B212, in the presence of truncation), as it is prone to model misspecification of model (3) even when models (1) and (2) are correctly specified. When these two models are correctly specified, the product method gives a valid estimator for the indirect effect. In the next section, we illustrate this phenomenon in extensive simulation studies and two separate applications.

## SIMULATION

In simulation studies, we considered two scenarios, one where *T* is normal and the other where *T* is Weibull distributed. In both settings, *A* was generated Bernoulli with probability equal to 0.5. In the first setting, *M* was generated from a normal model with mean

and variance 1, where

. The time-to-event outcome, *T*, was generated from a normal distribution with mean

and variance 1, where

. We investigated the following three censoring scenarios: no censoring, 70% right-censored, and the remaining 30% with observed event times, and 70% right-censored and the remaining 30% interval-censored. Models (1), (2), and (3) were estimated using survreg in the R survival package,^{12} with Gaussian time-to-event distribution for models (1) and (3) and a linear regression for (2). For the right-censoring only setting, a censoring distribution was generated from a normal distribution to yield approximately 70% censored. For the right and interval-censoring setting, the same right-censoring distribution was used, and a censoring interval was generated for observed event times. The length of the censoring interval was generated from a multinomial distribution; to choose where on the interval the true time occurred, we generated a proportion from a uniform(0,1) distribution.

For the second setting with a Weibull distributed time-to-event, the mediator *M* was generated from a normal model with mean

and variance 1, where

. Finally, the time-to-event outcome was generated as

, where

, *σ* is 0.25, and *ε* is the extreme value density. For this model, we investigated the following two censoring scenarios: no censoring and 30% right-censored. Models (1), (2), and (3) were fit using survreg in the R survival package^{12} with a Weibull distribution for time. For the right-censoring scenario, a censoring distribution was generated for a Weibull distribution to yield ~30% censored. We performed 10,000 simulations for each scenario, with sample size ranging from 800 to 4,000.

We evaluated the following characteristics for each distribution and censoring type: absolute proportion difference between the estimators

and proportion bias of each estimator (

and

), where *IE* is the true indirect effect,

is the Monte Carlo mean of the product estimator, and

is the Monte Carlo mean of the difference estimator. Simulation results for each scenario are summarized in Figures 1 and 2.

Figure 1A shows the absolute proportion difference between the product and difference method under the normal model. In the absence of censoring, the proportion difference was identically zero for all sample sizes, thus confirming that in this setting the estimators are numerically identical as theory dictates. In the presence of censoring (whether right or interval censoring), the difference between the estimators decreased as sample size increased. Though not displayed in the figure, this trend continues and the proportion difference converges to zero with increasing sample size. Figure 1B shows that both the product and difference methods produced consistent estimators for the indirect effect.

Figure 2A shows the absolute proportion difference between the product and difference method under a Weibull model. In the absence of censoring, the proportion difference decreased as sample size increased but was still relatively large for small sample sizes. In the presence of right censoring, the proportion difference between the estimators was very large and did not decrease with increasing sample size. Figure 2B gives a summary of the proportion bias incurred by each estimator under the two censoring scenarios. For no censoring, both product and difference methods produced consistent estimators of the indirect effect. Under right censoring, the product method also produced an consistent indirect effect estimator across all sample sizes. In contrast, the difference method under right censoring had a proportion bias of about 45%, which does not appear to decrease with increasing sample size. The results from Figure 2B reveal that under right censoring, the difference method failed to be consistent for the indirect effect, with significant bias regardless of sample size. When there was no censoring, the difference method produced a consistent estimator of the indirect effect, although, in small samples the difference between the estimators was substantial.

In the above simulations, we only considered the indirect effect, but the results can be easily expanded to the total effect. For the normal model scenario, the total effect estimator is consistent. Thus, in both the absence and presence of censoring, the total effect can be estimated from the reduced form model (

estimator) or by summation of direct and indirect effect estimators based on the product or difference method and the direct effect (

estimator). For the Weibull model scenario, in the absence of censoring, the total effect can be estimated using either method, similar to the normal model scenario. In the presence of censoring, the total effect should only be estimated by the summation of the product method indirect effect estimator and the direct effect as the estimate of

will be biased.

Figure A1 in the Appendix (https://links.lww.com/EDE/B212) shows the Monte Carlo variances for each estimator discussed above. As expected, the variance of all estimators decreased toward zero as sample size increased.

## APPLICATIONS

We considered a data application which combined two cohort studies of HIV-exposed persons: PHACS and PACTG 219C. The studies both followed perinatally HIV-exposed males and females upon entry into study and measured various outcomes. The outcome *T* evaluated was age at sexual maturity for males only, which was subject to both interval and right censoring. Sexual maturity is defined as having reached stage 5 of the Tanner stage criteria for genitalia.^{22} Previous research has modeled the outcome with a normal distribution, because age at attainment of pubertal milestones generally follows a normal distribution.^{23}^{,}^{24} Thus, *T* is adequately modeled as a normal outcome, and, therefore, we expect results for the normal model to apply. Of the 1,380 males in the sample, 28% reached sexual maturity during follow-up and were subject to interval censoring; the remaining 72% were right-censored. The exposure *A* was binary perinatal HIV infection. The mediator *M* was height age- and sex-adjusted *Z* score (HTZ) at first visit occurring at age 7 or older.

We adjusted for confounding by birth year and race. These correspond to *Z* in models below. We fit a normal accelerated failure time model as age at sexual maturity is known to follow an approximately normal distribution. We used R to fit the following models to estimate the direct, indirect, and total effects using both the product and difference method. Note that R allows a “Gaussian” distributed outcome in the survreg^{12} function, so that the model can be written in terms of *T* rather than log*T*, though the same model can be obtained by fitting log*T* as a log-normal model for exp(age):

where *ε*, *ξ*, and *ζ* are all normally distributed variables with mean zero and unknown variance.

Table 1 displays effect estimates, their standard errors, and 95% confidence intervals. Bootstrap estimates of standard errors were used for indirect (difference) and total (product) effect estimates. Our analysis indicated that HIV-infected youth had a 7.1-month delay in age at sexual maturity compared with uninfected youth; height *Z* score accounting for approximately 40% of the effect. There was a 3% difference between the product and difference method estimators of the indirect effect. As discussed above, we do not expect numerical equivalence in the presence of censoring, though asymptotically both estimators should be consistent for the indirect effect. In addition, as we saw in the simulations, this sample size yielded a similar percent difference between estimators.

In a second application, we combined four different randomized studies HIV-infected adults from the US-based AIDS Clinical Trials Group studies.^{18–20} The binary exposure *A* was treatment assignment at baseline to combination antiretroviral therapy versus monotherapy. The outcome *T* was time to opportunistic infection or death and modeled as Weibull distributed. Out of 719 HIV-infected patients, 18% experienced the outcome, and the remaining 82% were right-censored. The mediator *M* was change in viral load (log base 10 scale), which was measured at 8 weeks of follow-up. We excluded 12 people who had the event or were lost to follow-up within the first 8 weeks after treatment initiation and any subjects with missing values for change in viral load. Of the four studies, two randomized participants to either combination antiretroviral therapy versus monotherapy, while the other two studies randomized participants to two different types of monotherapy. As our comparison is no longer based on randomization, we adjusted for potential confounding by sex, weight, and intravenous drug use at baseline; these correspond to variable *Z* in models below.

where

is normally distributed with mean zero and unknown variance and

and *ξ~* follow an extreme value distribution with unknown scale parameters.

Table 2 displays effect estimates, their standard errors, and 95% confidence intervals. Bootstrap estimates of standard errors were used for the indirect (difference) and total (product) effect estimates. Our estimates indicated a two-fold increase in mean time to death or opportunistic infection for adults starting combination antiretroviral drugs as compared with monotherapy, but 28% of this effect was mediated by decrease in viral load. The proportion difference between the two estimators was 12%. As previously discussed, we do not expect exact numerical equivalence. Even asymptotically, we expect the estimators to be different, with only the product method yielding a consistent estimator for the indirect effect. As shown in the eAppendix (https://links.lww.com/EDE/B212), the total effect estimator via the reduced form accelerated failure time model (“difference” in Table 2) will be biased, and the total effect (“product” in Table 2) is the only valid estimate for the total effect. Unlike the normal model discussed previously, we had no prior knowledge about the distribution of the time-to-event outcome. To assess goodness-of-fit, we compared the Akaike information criterion of our full model with other potential distributions for the outcome: exponential, log-normal, log-logistic, and Rayleigh. We found that the Weibull time-to-event outcome provided the best fit as it had the lowest Akaike information criterion. Furthermore, the Cox-Snell residual plot for our Weibull model showed that the fit was adequate as the residuals were relatively linear through the origin.

## CONCLUSIONS

In recent years, there has been an explosion of work to identify direct and indirect effects through causal mediation analysis in a variety of settings. The ease of estimating and interpreting direct and indirect effects from accelerated failure time models holds tremendous appeal to researchers; however, there are currently no explicit guidelines regarding possible complications due to censoring or truncation, two common phenomena in survival analysis. This article offers such guidance, based on theoretical considerations, simulation studies, and two applications, establishing that the well-known equivalence of the product and difference approaches for estimating an indirect effect in linear models does not generally apply in the presence of censoring or truncation.

Specifically, we have formally established that the reduced form accelerated failure time model upon marginalizing over the mediator is misspecified when the error distribution of the accelerated failure time model is not collapsible with respect of the error distribution of the mediator. In the presence of censoring or truncation, this misspecification can cause bias of the reduced form estimator of total effect, and therefore bias of the difference estimator of indirect effect. In the absence of censoring or truncation, the difference method yields a consistent estimator of the indirect effect. However, the model-based variance of the difference methods is generally incorrect, because the information matrix is derived from an incorrect likelihood. In theory, one could correct this by using the nonparametric bootstrap or the sandwich variance estimator for inference.

The normal mediator–normal outcome model is an exception to the above phenomenon because the reduced form accelerated failure time model is correctly specified; thus, the product and difference method are both consistent for the indirect effect whether or not censoring or truncation is present. Crucially, consistency relies on both the mediator and the outcome following a normal distribution. If the mediator is not normally distributed, then the reduced form accelerated failure time model will be misspecified. However, in the absence of censoring or truncation, we have shown that this form of model misspecification does not compromise consistency of the estimator of the indirect effect with either the product or difference method. Thus, the normality assumption of the mediator is only needed in the presence of censoring and truncation.

The normal mediator–normal outcome simulation study confirmed these results as the product and difference methods yielded consistent estimators of the indirect effect regardless of censoring. In addition, the Weibull simulation study confirmed that the difference method indirect effect estimator was biased and, thus, inconsistent in the presence of censoring, but consistent when there was no censoring. As shown in the eAppendix (https://links.lww.com/EDE/B212) and our simulation results, under certain assumptions, the product method will always yield a consistent estimator of the indirect effect. Thus, we caution users against employing the difference method for accelerated failure time models, and generally recommend using the product method as it yields a consistent estimator of the indirect effect in any of the above scenarios. In addition, one could also use alternative semiparametric methods that are less susceptible to modeling bias.^{6}^{,}^{7}^{,}^{25} Regardless of the approach used, a careful evaluation of the distributional choice for models and assessment of potential confounders should always be conducted.