# On the Relative Nature of Overadjustment and Unnecessary Adjustment

From the Department of Health Studies, University of Chicago, Chicago, IL.

Correspondence: Tyler J. VanderWeele, Department of Health Studies, University of Chicago, 5841 S. Maryland Ave, MC 2007, Chicago, IL 60637. E-mail: vanderweele@uchicago.edu.

Control for relevant covariates is unquestionably of importance in drawing inferences about causation from observational data. Determining which covariates warrant or require control can be challenging. Causal directed acyclic graphs have provided a formal conceptual framework and tool for making these judgments.^{1–3} In their paper, Schisterman et al^{4} use such causal diagrams to discuss examples of what they call “overadjustment” and “unnecessary adjustment.” It is well known that in trying to estimate the total effect of an exposure on some outcome, control for an intermediate will generally bias estimates of the total effect of the exposure on the outcome.^{5–8} When adjustment is made for such intermediates (or effects of such intermediates), Schisterman et al call the resulting bias, “overadjustment bias.” They consider several examples of such bias, and derive formulas for such overadjustment bias in some simple linear structural equation models. They distinguish such “overadjustment” from what they call “unnecessary adjustment,” a term they use to describe instances in which control for a variable does not affect bias but may affect precision.

In this commentary, I discuss the relative nature of overadjustment and unnecessary adjustment. I point out that overadjustment and unnecessary adjustment are relative to the causal effect of interest, relative to the method chosen to estimate that effect, and relative to the other variables for which control is made.

### The Relative Nature of Overadjustment

Schisterman et al define overadjustment bias as “control for an intermediate variable (or a descending proxy for an intermediate variable) on a causal path from exposure to outcome.” They point out that control for an intermediate variable (or an effect of an intermediate variable) on the path from exposure to outcome will often bias estimates of the total causal effect towards the null. The description and discussion of “overadjustment” given by Schisterman et al is generally restricted to an analysis of “total effects.” However, the issue of overadjustment seems relevant not simply to total effects but also to direct effects, indirect effects, joint effects, conditional effects, etc. The term “overadjustment” might be used for any setting in which control for a variable introduces (rather than eliminates) bias. Furthermore, whether control for a particular variable constitutes an instance of “overadjustment” will sometimes be relative to the effect of interest. Consider the case of the estimation of controlled direct effects.^{9–12} Schisterman et al note that, although adjustment for an intermediate variable may bias estimates of the total effect of an exposure on the outcome, analyses with such adjustment can, under additional assumptions,^{9–12} sometimes be interpreted as estimates of controlled direct effects. This is indeed the case. Moreover, in some settings, control for an additional intermediate variable is necessary to eliminate bias in estimates of controlled direct effects. Consider the causal directed acyclic graph in DAG 1. Suppose no data are available for *U* but data are available for *E, L, M, D*. If the total effect of *E* on *D* were of interest, then control for the intermediate *L* or *M* would generally bias estimates of the total causal effect. However, if the controlled direct effect of *E* on *D* were of interest with *M* set to a particular level, then control for the intermediate *L* would be necessary-otherwise the relationship between *M* and *D* would be confounded, resulting in biased estimates of the controlled direct effect.^{9,13,14} For estimating the total effect of *E* on *D*, adjustment for intermediate *L* should not be made; for estimating the controlled direct effect of *E* on *D* with *M* set to a particular level, adjustment for intermediate *L* should be made. Overadjustment is clearly relative to the effect of interest.

Even if we restrict attention to the analysis of total causal effects, overadjustment may still be relative to the method used to identify the effect. Consider the causal directed acyclic graph in DAG 2.

Suppose data are available on all variables except *U*. The effect of *E* on *D* in DAG 2 is confounded by *U* and data are not available for *U*. A regular regression of *E* on *D* will not give valid estimates of the total effect of *E* on *D* regardless of whether control is made for *M*_{1} and *M*_{2}. However, Pearl^{1} has shown in a result he calls the “front-door path adjustment theorem” that in cases such as DAG 2, the total effect of *E* on *D* can still be estimated from data on *E*, *M*_{1}, *M*_{2}, and *D* provided that the intermediates (*M*_{1} and *M*_{2} in this case) completely mediate the effect of *E* on *D* and provided that the effects of *E* on the intermediates and the effects of the intermediates on the outcome *D* are unconfounded. The formula given by Pearl in his paper^{1} requires more than a simple regression of *D* on *E*, *M*_{1}, and *M*_{2} but his result is an example in which adjustment for intermediate variables can be used in the estimation of total effects. In fact, in this case, such adjustment must be used in the estimation of total effects if data are not available on *U*. Such cases of adjustment for intermediates *M*_{1} and *M*_{2} would thus not constitute instances of overadjustment. Time will perhaps tell whether results like Pearl's front-door path adjustment theorem and its generalizations^{15} are actually useful for epidemiologic research or whether the results are simply of theoretical interest. In any case, Pearl's result makes clear that, even for total effects, whether adjustment for an intermediate constitutes “overadjustment” is relative to the method used in estimation.

Two further comments on overadjustment perhaps merit some attention. First, in questions of overadjustment, distinctions between “control,” “conditioning,” “stratification,” “restriction,” and “adjustment” should perhaps be preserved. Consider the causal directed acyclic graph in DAG 3, and suppose data were available on all variables.

If the joint effects of *E* and *M* on *D* were of interest, then simple stratification on *L* or conditioning on *L* in a regression will yield biased estimates of the joint effects of *E* and *M* on *D*.^{16,17} However, adjustment for *L* using inverse probability of treatment weighting (IPTW) techniques can be used to obtain valid estimates of the joint effects of *E* and *M* on *D*.^{16,17} In this case, once again, overadjustment is relative to the method employed and here distinctions in methods of adjustment also become important. “Stratification” or simple “conditioning” on an intermediate does not yield unbiased results but IPTW “adjustment” does; “conditioning” and “adjustment” are not equivalent here. Second, it is not clear that discussion of overadjustment bias should be restricted to cases of control for intermediates. Consider the causal directed acyclic graph in DAG 4 and suppose data are available for *L, E, D* but not *U*_{1}, *U*_{2}. In this case, using rules from causal directed acyclic graphs,^{1–3} it can be shown the effect of *E* on *D* is unconfounded if control is not made for any variables, but the effect is biased if control is made for *L* alone because of so-called “collider stratification.”^{18–20} The variable *L* in DAG 4 is not an intermediate on the pathway from *E* to *D*, but it seems reasonable in this case to call control for *L* an instance of “overadjustment bias.” Furthermore, if we do consider control for *L* in DAG 4 an instance of “overadjustment,” then it follows from this example that “overadjustment” is also relative to the other variables for which control is made. This is because if data were in fact available for *U*_{1} then one could obtain an unbiased estimate of the effect of *E* on *D* by controlling for both *U*_{1} and *L*. Thus if control is made for *U*_{1} then control for *L* will not introduce bias; if control is not made for *U*_{1}, then control for *L* will introduce bias, a bias which we might consider a form of “overadjustment bias.”

### The Relative Nature of Unnecessary Adjustment

Schisterman et al use the term “unnecessary adjustment” to describe instances in which control for a variable does not affect bias but may affect precision. Similar remarks to those made earlier for overadjustment can also be made for unnecessary adjustment, namely that unnecessary adjustment is relative to the effect of interest, to the method used to estimate the effect, and to the other variables for which control is made. Thus, for example, in the estimation of total effects, it is unnecessary to control for variables that confound the intermediate-outcome relationship but that do not confound the exposure-outcome relationship; however, it is necessary to control for such variables in the estimation of controlled direct effects. Unnecessary adjustment is thus relative to the effect of interest.

Suppose we again restrict our attention to the estimation of total effects. Consider the causal diagram in DAG 5 and suppose data are available on all variables.

If regression is used in the estimation of total effect of *E* on *D*, then control for *C*_{1} is necessary but control for *C*_{2} is not necessary; control for *C*_{2} will not bias estimates but may affect precision and thus control for *C*_{2} would constitute a case of unnecessary adjustment. If, however, Pearl's front-door path adjustment theorem^{1} were used, then control for *C*_{2} would be necessary but control for *C*_{1} would not be necessary. We thus see that, even for total effects, unnecessary adjustment may be relative to the method used to identify the causal effect of interest.

Finally, we will consider the relative nature of unnecessary adjustment to the other variables for which control is made. Schisterman et al describe 5 cases of unnecessary adjustment: one case in which adjustment is made for a variable which is neither related to the exposure nor the outcome of interest and 4 cases in which adjustment is made for a variable for which the only structural relationship with the other variables on the graph is one, and only one, of the following (1) the variable is only a cause of the exposure, or (2) the variable is only an effect of exposure, or (3) the variable is only a cause of the outcome, or (4) the variable is only an effect of the outcome.

However, other more complicated cases of unnecessary adjustment are possible in which, for some variable *C*, when control is made 1 set of variables, control for *C* is also necessary but when control is made for a different set of variables, control for *C* is not necessary. Consider the causal directed acyclic graph in DAG 6.

Greenland et al^{3} suggest that this causal diagram might describe the relationships concerning the effect of antihistamine treatment *E*, on asthma incidence *D* among children attending public schools, with confounding variables including air pollution *A*, sex *S*, and bronchial reactivity *B*. Using rules from causal directed acyclic graphs,^{1–3} it can be shown that, if DAG 6 is a correct depiction of the causal relationships, then if control is made for *S* and *B*, then control for *A* is not necessary; but if control is made for *B* but not *S* (if data on *S* were missing, say) then conditioning on *A* would be necessary to control for confounding of the effect of *E* on *D*. Unnecessary adjustment is thus relative to the other variables for which control is made. Following the simulation results of Schisterman et al, if asthma severity *D* were recorded continuously and we thought the relationships with other variables were plausibly linear, we might want to control for *S* and *B* but not *A* because, in DAG 6, *S* is a direct cause of *D* on the graph, which may allow us to gain precision, whereas *A* is a direct cause of *E* but not of *D* and controlling for it unnecessarily may therefore diminish precision. An important direction for future research may involve the development of variable selection methods aimed at improving the efficiency of estimates but taking into account the structure of particular causal directed acyclic graphs to ensure the set of variables, in the end, selected still suffices to adequately control for confounding.

## CONCLUSION

The concepts of “overadjustment” and “unnecessary adjustment” are relative; they are relative to the effect of interest, relative to the method used to estimate the effect, and relative to the other variables for which control is made. Consequently, when thinking about issues of overadjustment and unnecessary adjustment, researchers should ensure that they have first clarified the effect of interest, the method used to estimate that effect, and the other control variables being considered in the analysis. Although the concepts of overadjustment and unnecessary adjustment are relative, the central points in the paper by Schisterman et al are good ones and important ones: if the total causal effect is of interest, adjustment should not generally be made for intermediate variables or for any effects of the exposure of interest; for total causal effects, adequate control should be made for variables that confound the relationship between the exposure and the outcome of interest. Sometimes it is not necessary to control for particular preexposure covariates but these should be excluded only with adequate justification. Additionally, if controlled direct effects are of interest, and control is made for an intermediate, then it is also necessary to make adequate control for variables that confound the relationship between the intermediate and the outcome.^{9,13,14} Researchers neglecting these principles risk introducing bias into their analyses, potentially quite severe bias; the principles should be violated only in very special cases in which both methodologic and substantive justification can be given.

## ABOUT THE AUTHOR

TYLER VANDERWEELE is an assistant professor in the Department of Health Studies at the University of Chicago. His research concerns the development of epidemiologic methods and theory for reasoning about causality. His current research focus includes problems related to interaction, mediation, and causal diagrams.

## REFERENCES

*Biometrika*. 1995;82:669–688.

*Causality: Models, Reasoning, and Inference*. Cambridge: Cambridge University Press; 2000.

*Epidemiology*. 1999;10:37–48.

*Epidemiology*. 2009;20:486–493.

*Int J Epidemiol*. 1980;9:361–367.

*J Roy Stat Soc Ser A*. 1984;147:656–666.

*Stat Med*. 1989;8:679–701.

*Am J Epidemiol*. 1993;127:1–8.

*Epidemiology*. 1992;3:143–155.

*Proceedings of the 17th Conference on Uncertainty and Artificial Intelligence*. San Francisco: Morgan Kaufmann; 2001:411–420.

*Epidemiology*. 2006;17:276–284.

*Epidemiology*. 2009;20:18–26.

*Process Analysis*. 1981;5:602–619.

*Int J Epidemiol*. 2002;31:163–165.

*J Jpn Stat Soc*. 1999;29:105–117.

*Statistical Models in Epidemiology: The Environment and Clinical Trials*. New York: Springer-Verlag; 1999:95–134.

*Epidemiology*. 2000;11:550–560.

*Epidemiology*. 2003;14:300–306.

*Am J Epidemiol*. 2007;166:1096–1104.

*Epidemiology*. 2004;15:615–625.