# Commentary: The Role of Measurement Error and Misclassification in Mediation Analysis Mediation and Measurement Error

From the Departments of Epidemiology, and Biostatistics; Department of Biostatistics; Department of Epidemiology, Harvard School of Public Health, Boston, MA.

Tyler J. VanderWeele was supported by National Institutes of Health grant HD060696. The authors reported no other financial interests related to this research.

Correspondence: Tyler J. VanderWeele, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115. E-mail: tvanderw@hsph.harvard.edu.

Methods to estimate direct and indirect effects have been rapidly expanding.^{1} ^{–} ^{14} It is now well documented that such mediation analyses are subject to strong no-confounding assumptions and that an unmeasured confounder of the mediator-outcome relationship can lead to substantial bias in direct and indirect effect estimates.^{1},^{2},^{6},^{7},^{14},^{15} Much less attention has been given to the question of how measurement error may bias estimates of direct and indirect effects. le Cessie and colleagues.^{16} have done a service to investigators interested in direct effects by providing simple correction formulas for direct effects estimates in a variety of mediator measurement-error scenarios. Here, we will consider the implications of these and other results, as they relate to making inferences, not just about direct effects but also about mediation and indirect effects.

### Mediation and Nondifferential Measurement Error

Suppose that a mediator is subject to nondifferential measurement error or misclassification (ie, the error does not depend on the exposure or outcome conditional on the true mediator and covariates). Intuitively, we might expect that such measurement error will weaken the association between the mediator and the outcome and will therefore, perhaps, bias estimates of mediated effects toward the null and bias estimates of direct effects away from the null. An important question is under what conditions this intuition holds.

le Cessie et al^{16} consider a logistic regression model of the form:

where *Y* is the outcome, *X* is the exposure, *M* is the mediator, and *C* is the covariates. They suppose that the investigator has access to a mismeasured mediator *M** = *M* + *U*, where *U* is normally distributed with mean 0 and independent of *M*. Let λ denote the proportion of the variance in *M** explained by *M*, conditional on *X* and *C*. An investigator might then fit the logistic model with the mismeasured mediator:

le Cessie et al also consider a linear regression for the mediator:

If the linear regression is fit with the mismeasured mediator, this would be:

Under these assumptions, the coefficients in models (3) and (4) will be the same. However, for the logistic regression, the coefficients in models (1) and (2) will differ. Under their assumptions about measurement error, they note that the relationships between the coefficients in models (1) and (2) are given by:

Provided that the covariates *C* control for confounding of the exposure-outcome and mediator-outcome relationships, exp(β_{1}) is equal to the controlled direct-effect odds ratio.^{5} le Cessie et al^{16} thus note that even when the mediator is subject to such measurement error, we could obtain a corrected direct effect estimate as follows. We could use model (2) with the mismeasured mediator to obtain estimates of

. We could then use model (4) to estimate

. If we specify λ (the proportion of the variance of *M** explained by *M*, conditional on *X* and *C*), then we could use

in equation (5) to obtain a corrected estimates of β_{1}. The controlled direct-effect odds ratio is simply given by exp(β_{1}).

Similar logic can, in fact, also be used for mediated effects. Suppose that *C* controls for confounding for the (i) exposure-outcome relationships, (ii) mediator-outcome relationships, (iii) exposure-mediator relationships, and that (iv) there is no mediator-outcome confounder that is affected by exposure.^{2},^{4},^{5} If, in addition, the outcome is rare and the models are correctly specified, then so-called natural direct- and indirect-effect odds ratios are given by exp(β_{1}) and exp(α_{1}β_{2}), respectively. In this case, we could fit models (2) and (4), specify λ, and use the expressions in (5) and (6) to obtain estimates of β_{1} and β_{2} that are corrected for measurement error. We could then use exp(β_{1}) and exp(α_{1}β_{2}) as estimates of natural direct and indirect effects. Corrected confidence intervals either have to take into account that α_{1} is estimated (because α_{1} appears in the correction formula for β_{1}; this can be done with the delta method) or corrected confidence intervals could be obtained by bootstrapping.

The results also have interesting implications for the direction of bias of these effects. If we ignore measurement error and use exp(α_{1}β_{2}*) as the estimate of the indirect effect, then this would be biased toward the null odds ratio of 1 because

Similarly, it follows from equation (5) that if the direct and indirect effects are in the same direction and if we ignore measurement error and use exp(β_{1} ^{*}) as a measure of the direct effect, then this will be biased away from the null odds ratio of 1 (ie, if the true direct-effect odds ratio is greater than 1, then exp(β_{1} ^{*}) will be even larger; if the true direct-effect odds ratio is less than 1, then exp(β_{1} ^{*}) will be even smaller). Thus, under classic nondifferential measurement error with a normally distributed mediator, the bias of the mediated effect under models (1) and (3) is always toward the null. If the direct and indirect effects are in the same direction, then the bias of direct effect is away from the null.

Will nondifferential measurement error result in a similar pattern of biases in other settings? In related work, we have shown that if a binary mediator is subject to nondifferential misclassification then, once again, the bias of the mediated effect is toward the null, and the bias of direct effect is away from the null.^{17} Unfortunately, however, nondifferential misclassification of a polytomous mediator will not always result in biases that follow these patterns. It is possible to construct examples of a nondifferentially misclassified mediator with 3 levels such that the bias of the mediated effect is away from the null and the bias of the direct effect is toward the null.^{17} It is even possible to construct examples in which the direct and mediated effect estimates, when ignoring measurement error, lie on the wrong side of the null. An important task will be to better characterize those settings in which nondifferential measurement error or misclassification of a mediator leads to bias patterns of the type we would intuitively expect.

One final point is of interest before moving on. Using the definitions of natural direct and indirect effects given in the causal inference literature, the total effect will always decompose into the sum of the natural direct and indirect effects on a difference scale; a total effect on the odds ratio scale will always decompose into the product of the natural direct- and indirect-effect odds ratios.^{7} We saw previously that in the presence of measurement error for the mediator, the standard estimators for the direct and indirect effects will be biased. Perhaps surprisingly, however, if we use these biased direct and indirect measures and take their product on the odds ratio scale (or sum on the difference scale), we will still get an unbiased estimate of the total effect. In some ways, this is intuitive. Even if we have measurement error of the mediator, we should still be able to obtain valid estimates of total effects by simply ignoring the mediator. What may be surprising is that even if we use the mismeasured mediator to estimate-biased direct and indirect effects, their combination is still unbiased for the total effect. In fact, this property holds not simply for nondifferential measurement error of the mediator, but, as shown in the Appendix, for any form of measurement error of the mediator.

### Other Forms of Measurement Error and More Complex Models

This discussion of effect decomposition, in fact, also suggests another way to harness the results of le Cessie et al^{16} to reason not only about direct effects but about mediated effects. In the previous section, under aforementioned assumptions, we used exp(α_{1}β_{2}) as a measure of the mediated effect. The use of this and similar expressions is sometimes referred to as the “product method” because in essence it takes the product of the exposure coefficient in the mediator model with the mediator coefficient in the outcome model as an indirect effect.^{11},^{18} An alternative way to go about estimating mediated effects—at least under models (1) and (3)—is sometimes referred to as the “difference method.” The difference method first estimates a total effect (eg, by not using data on the mediator), then estimates a direct effect, as in model (1), and takes as the estimate of the mediated effect, the “difference” between the total effect and the direct effect. For odds ratios, this “difference” is done on the log-odds scale. Equivalently, then, we could take the total-effect odds ratio divided by the direct-effect odds ratio to get an indirect-effect odds ratio. Provided there are no exposure-mediator interactions or other interactions between variables, the product method and difference method will coincide for continuous outcomes and coincide approximately for binary outcomes if the outcome is rare.^{5},^{18}

This difference method supplies an approach to obtain corrected estimates of mediated effect in the various other mediator measurement error scenarios described by le Cessie et al^{16} (eg, differential measurement error with the exposure or outcome affecting the mediator measurement, differential or nondifferential intraindividual variation over time, or trigger mechanisms). If model (2) is correctly specified, there are no exposure-mediator interactions, and if the no-confounding assumption (i)–(iv) hold, and the outcome is rare, we could get corrected estimates of the direct-effect odds ratio exp(β_{1}) using the formulas in le Cessie et al for any of the measurement error scenarios that they consider. We could then estimate the total-effect odds ratio by simply ignoring data on the mediator. Measurement error of the mediator will then not affect estimates of the total effect. Finally, we could take the ratio of our total-effect odds ratio and the measurement-error–corrected direct-effect odds ratio to obtain a measurement-error-corrected indirect-effect odds ratio. Standard errors could be obtained by bootstrapping. This approach will work for any of the forms of mediator measurement error described by le Cessie et al, and it will work for any other form of mediator measurement error for which we are able to obtain measurement-error–corrected estimates of the direct effect.

Of course, all of our discussion here has presupposed that the models in equations (1)–(4) were correctly specified. One advantage of the approach to mediation that has developed within the causal inference literature is that it has allowed for the definition and estimation of direct and indirect effects even in the presence of exposure-mediator interactions.^{1} ^{–} ^{6},^{11} We could thus extend also models (1) and (2) to allow for such interaction. Analytic expressions for direct and indirect in the presence of interactions when there is no measurement error are given elsewhere.^{4},^{5},^{11} As noted by le Cessie and colleagues,^{16} in such cases, the simple formulas that they derived are no longer applicable, although one could use SIMEX methods to attempt to correct for measurement error. In fact, in related work, we have shown that even in the presence of exposure-mediator interactions, analytic expressions can be derived for measurement-error–corrected direct and indirect effects, at least for classic nondifferential classification of a continuous mediator.^{19} Even these expressions can be complicated, and comparison with other approaches, such as SIMEX, becomes important.^{19}

## DISCUSSION

le Cessie et al^{16} have provided a number of helpful results in correcting direct effect estimates for mediator measurement error. Here, we have discussed how correction methods can similarly be applied to indirect or mediated effects, and we have discussed simple rules to know a priori, at least in certain cases, the direction of the bias of direct and indirect effects. Other work in the social sciences addresses mediator measurement error by using data on multiple measurements or on variables related to the mediator.^{20},^{21} Future work could consider settings in which both the exposure and the mediator, or both the mediator and the outcome, are subject to either differential or nondifferential measurement error.

Concerns about measurement error in mediation analysis are not simply hypothetical. Such issues arose in a study on the extent to which certain genetic variants affected lung cancer through nicotine dependence and associated smoking behavior (measured in terms of cigarettes per day) versus other pathways.^{22} The number of cigarettes per day was assessed by self-report and was, thus, subject to measurement error; moreover, cigarettes per day, even correctly reported, would be a crude measure of nicotine dependence. Analyses ignoring measurement error suggested that most of the effect of the variants was direct; using methods of the type described here, it was possible to show that, although the indirect effect may have been underestimated in the initial analysis, allowing for even substantial measurement error would not change the qualitative conclusion that most of the effect was direct. Similar questions are likely to arise in other settings in which mediation is of interest. Measurement error could potentially prove to be as significant a threat to mediation analyses as unmeasured mediator-outcome confounding. Correction methods such as the type described by le Cessie et al^{16} will be helpful in addressing this potential source of bias.

## APPENDIX

*Proof that under mediator measurement error, the sum of the biased direct and indirect effect estimators still give an unbiased total effect:*

We will consider a difference scale; the proof for the odds ratio scale is similar. When the relevant no-confounding assumptions hold, estimators for natural direct and indirect effects are given by Pearl^{2} as follows:

The natural direct effect is given by:

and the natural indirect effect is given by:

If the mismeasured mediator *M** were used instead of *M*, these 2 expressions would be respectively:

The sum of these is equal to:

where the final equality follows by iterated expectations and the final quantity is equal to the total effect, conditional on *C* = *c*, of the exposure on the outcome provided that *C* suffices control for confounding of the effect of *X* on *Y*. Thus, the sum of the biased natural direct and indirect effect estimators using data on the mismeasured mediator *M** rather than *M* will still be an unbiased estimator of the total effect. Note that the argument neither made any assumptions about the confounding for the effect of *M* nor about the form of the measurement error for *M*.