# Causal Mediation Analysis in the Presence of a Mismeasured Outcome

Jiang, Zhichao; VanderWeele, Tyler J.

doi: 10.1097/EDE.0000000000000204
Letters
Free
SDC

Supplemental Digital Content is available in the text.

School of Mathematical Sciences, Peking University, Beijing, China, zhichaojiang@pku.edu.cn

Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA

Supported by National Institutes of Health grant ES017876.

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com). This content is not peer-reviewed or copy-edited; it is the sole responsibility of the authors.

## To the Editor:

Previous work on measurement error in mediation analysis has focused on mismeasured mediators.1–4 Here, we consider mediation analysis with a mismeasured outcome. Let A denote an exposure, M denote a mediator, Y denote an outcome of interest, and C denote a vector of covariates. Let and denote the value of the outcome and mediator that would have been observed if the exposure A had been set to level a. Let denote the value of the outcome that would have been observed if the treatment and the mediator had been set to levels a and m, respectively. The average total effect, conditional on , comparing exposure levels a with , is defined by . The controlled direct effect (CDE), conditional on , comparing the effect of the exposure levels a and a' while fixing the mediator at level m, is defined by . The natural direct effect (NDE), conditional on c, comparing the effect of the exposure levels a and a' while fixing the mediator to the level it would have naturally been under some reference condition for the exposure, , is defined by . The natural indirect effect (NIE), conditional on C = c, comparing the effect of the mediator at levels and while fixing the exposure at level a, is defined by . 5,6

Let (A   B|C) denote that A is independent of B conditional on C. The following 4 confounding assumptions suffice to identify the NDE and NIE: conditioning on covariates C, there is no unmeasured confounding of (1) the exposure–outcome relationship (Ya   A|C), (2) the mediator–outcome relationship (Yam   M|C,A), (3) the exposure–mediator relationship (Ma   A|C), and (4) there are no mediator–outcome confounders affected by the exposure (Yam   Ma'|C). Under these assumptions, we can obtain the formulae for the CDE, NDE, and NIE as follows.   Now suppose Y is subject to misclassification and let Y* denote the observed outcome. Suppose We assume that the misclassification is nondifferential, ie, . We then have that U is independent of A, M, and C. The naive estimators use the observed outcome instead of the true outcome to calculate the direct and indirect effects, denoted by , , and , respectively. We can obtain that The estimates thus depend on the form of E(U|Y). Under classical measurement error = 0, which means that the misclassification is completely random, the naive estimators give consistent estimates of the direct and indirect effects. However, we can do correction under other forms of measurement error when is specified.

For a binary outcome, the probability of misclassification can be characterized by sensitivity and specificity . We assume that , which is plausible since the observed outcome is more likely to be 1 (or 0) if the true outcome is 1 (or 0). We again assume nondifferential measurement error so that and we can obtain that Substituting the above formula in (1) to (3), we can obtain the following Because we have that the naive estimators estimate both the direct and indirect effects toward the null. We can furthermore get corrected estimates and confidence intervals by dividing the estimate and both limits of the confidence interval of the naive estimator by for the CDE, the NDE, and the NIE. Also because the correction factor is the same for the NDE and the NIE, the proportion-mediated measures will not be biased by nondifferential misclassification of the outcome.

This conclusion enables us to draw qualitative conclusions in practice, even when we cannot observe the true outcome. For example, if our estimate of the indirect effect is positive using the observed outcome, we can conclude that our estimate of the true natural indirect effect is positive, ie, mediation is present. On the other hand, if the naive estimator is zero, the estimate of the true natural indirect effect will also be zero, indicating the absence of mediation.

Similar results hold with parametric models and it is straightforward to implement an expectation maximization correction algorithm. See eAppendix (http://links.lww.com/EDE/A850) for details.

Zhichao Jiang

School of Mathematical Sciences

Peking University

Beijing, China

zhichaojiang@pku.edu.cn

Tyler J. VanderWeele

Departments of Epidemiology and

Biostatistics

Harvard School of Public Health

Boston, MA

## REFERENCES

1. Hoyle RH, Kenny DAHoyle RH. Sample size, reliability, and tests of statistical mediation. In: Statistical Strategies for Small Sample Research. 1999;Vol. 1 Thousand Oaks, CA Sage:195–222
2. le Cessie S, Debeij J, Rosendaal FR, Cannegieter SC, Vandenbroucke JP. Quantification of bias in direct effects estimates due to different types of measurement error in the mediator. Epidemiology. 2012;23:551–560
3. VanderWeele TJ, Valeri L, Ogburn EL. The role of measurement error and misclassification in mediation analysis. Epidemiology. 2012;23:561–564
4. Valeri L, Vanderweele TJ. The estimation of direct and indirect causal effects in the presence of misclassified binary mediator. Biostatistics. 2014;15:498–512
5. Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3:143–155
6. Pearl J. Direct and indirect effects. In: Proceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence. 2001 San Francisco, CA Morgan Kaufmann:411–420

## Supplemental Digital Content

© 2015 by Lippincott Williams & Wilkins, Inc