Secondary Logo

Journal Logo


Causal Mediation Analysis in the Presence of a Mismeasured Outcome

Jiang, Zhichao; VanderWeele, Tyler J.

Author Information
doi: 10.1097/EDE.0000000000000204

To the Editor:

Previous work on measurement error in mediation analysis has focused on mismeasured mediators.1–4 Here, we consider mediation analysis with a mismeasured outcome. Let A denote an exposure, M denote a mediator, Y denote an outcome of interest, and C denote a vector of covariates. Let


denote the value of the outcome and mediator that would have been observed if the exposure A had been set to level a. Let

denote the value of the outcome that would have been observed if the treatment and the mediator had been set to levels a and m, respectively. The average total effect, conditional on

, comparing exposure levels a with

, is defined by

. The controlled direct effect (CDE), conditional on

, comparing the effect of the exposure levels a and a' while fixing the mediator at level m, is defined by

. The natural direct effect (NDE), conditional on c, comparing the effect of the exposure levels a and a' while fixing the mediator to the level it would have naturally been under some reference condition for the exposure,

, is defined by

. The natural indirect effect (NIE), conditional on C = c, comparing the effect of the mediator at levels


while fixing the exposure at level a, is defined by

. 5,6

Let (A   B|C) denote that A is independent of B conditional on C. The following 4 confounding assumptions suffice to identify the NDE and NIE: conditioning on covariates C, there is no unmeasured confounding of (1) the exposure–outcome relationship (Ya   A|C), (2) the mediator–outcome relationship (Yam   M|C,A), (3) the exposure–mediator relationship (Ma   A|C), and (4) there are no mediator–outcome confounders affected by the exposure (Yam   Ma'|C). Under these assumptions, we can obtain the formulae for the CDE, NDE, and NIE as follows.

Now suppose Y is subject to misclassification and let Y* denote the observed outcome. Suppose

We assume that the misclassification is nondifferential, ie,

. We then have that U is independent of A, M, and C. The naive estimators use the observed outcome instead of the true outcome to calculate the direct and indirect effects, denoted by


, and

, respectively. We can obtain that

The estimates thus depend on the form of E(U|Y). Under classical measurement error = 0, which means that the misclassification is completely random, the naive estimators give consistent estimates of the direct and indirect effects. However, we can do correction under other forms of measurement error when

is specified.

For a binary outcome, the probability of misclassification can be characterized by sensitivity

and specificity

. We assume that

, which is plausible since the observed outcome is more likely to be 1 (or 0) if the true outcome is 1 (or 0). We again assume nondifferential measurement error so that

and we can obtain that

Substituting the above formula in (1) to (3), we can obtain the following


we have that the naive estimators estimate both the direct and indirect effects toward the null. We can furthermore get corrected estimates and confidence intervals by dividing the estimate and both limits of the confidence interval of the naive estimator by

for the CDE, the NDE, and the NIE. Also because the correction factor is the same for the NDE and the NIE, the proportion-mediated measures

will not be biased by nondifferential misclassification of the outcome.

This conclusion enables us to draw qualitative conclusions in practice, even when we cannot observe the true outcome. For example, if our estimate of the indirect effect is positive using the observed outcome, we can conclude that our estimate of the true natural indirect effect is positive, ie, mediation is present. On the other hand, if the naive estimator is zero, the estimate of the true natural indirect effect will also be zero, indicating the absence of mediation.

Similar results hold with parametric models and it is straightforward to implement an expectation maximization correction algorithm. See eAppendix ( for details.

Zhichao Jiang

School of Mathematical Sciences

Peking University

Beijing, China

[email protected]

Tyler J. VanderWeele

Departments of Epidemiology and


Harvard School of Public Health

Boston, MA


1. Hoyle RH, Kenny DAHoyle RH. Sample size, reliability, and tests of statistical mediation. In: Statistical Strategies for Small Sample Research. 1999;Vol. 1 Thousand Oaks, CA Sage:195–222
2. le Cessie S, Debeij J, Rosendaal FR, Cannegieter SC, Vandenbroucke JP. Quantification of bias in direct effects estimates due to different types of measurement error in the mediator. Epidemiology. 2012;23:551–560
3. VanderWeele TJ, Valeri L, Ogburn EL. The role of measurement error and misclassification in mediation analysis. Epidemiology. 2012;23:561–564
4. Valeri L, Vanderweele TJ. The estimation of direct and indirect causal effects in the presence of misclassified binary mediator. Biostatistics. 2014;15:498–512
5. Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3:143–155
6. Pearl J. Direct and indirect effects. In: Proceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence. 2001 San Francisco, CA Morgan Kaufmann:411–420

Supplemental Digital Content

© 2015 by Lippincott Williams & Wilkins, Inc