Secondary Logo

Share this article on:

Assessing Confounder Balance in Outcome Regressions

Popham, Frank; Leyland, Alastair H.

doi: 10.1097/EDE.0000000000000871

MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, Scotland,

The example data set comes from Hernán and Robins’ causal inference book1 and is obtainable from the book’s website ( Stata (StataCorp, College Station) and R (R foundation, Vienna) code are provided (as supplemental materials) to replicate the results described.

F.P. and A.H.L. are funded by the Medical Research Council (MC_UU_12017/13) and Scottish Government Chief Scientist Office (SPHSU13).

The authors report no conflicts of interest.

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (

F.P. had the original idea inspired by reading, Abadie A, Diamond A, Hainmueller J. (2015). Comparative Politics and the Synthetic Control Method. American Journal of Political Science, 59, 495–510. F.P. and A.H.L. developed the idea. A.H.L. derived the matrix algebra/proofs. F.P. wrote the first draft which was extensively edited by A.H.L. F.P. wrote the Stata (StataCorp, College Station) and R code (R foundation, Vienna). F.P. revised the original paper to a research letter with edits by A.H.L. F.P. and A.H.L. have approved the manuscript.

Back to Top | Article Outline

To the Editor:

Estimating causal effects is a key aim for observational epidemiology. Given a binary exposure, the effect of interest might be the difference in the average outcome if the whole of a population was treated compared with when it was not: the population average effect. Alternatively, we might want the average effect for the treated, which is the difference in average outcome between treated or not for those actually treated. Average causal effects with these different target populations, although involving counterfactuals, can be estimated from observed data on the outcomes of groups treated and not treated, given a number of assumption including exchangeability (no confounding).1 , 2 A full discussion of these assumptions is available.1 Assessing confounder balance after confounder control checks the effect’s target population and no confounding on observed confounders. A guide to best practice in assessing balance is available.3 Modern causal methods, including inverse probability weighting (IPW) and related weighting schemes,1 , 2 , 4 make it simple to specify and check the target population. Given effect heterogeneity, the average effect will vary with different target populations.1 However, despite the availability of these causal methods, so-called outcome regression remains common for causal inference. An outcome regression is a model of the exposure’s effect on the outcome controlling for observed confounders.1 The effect is taken as the regression coefficient for the exposure. Thus, given that there is no interaction term between the confounders and exposure, the assumption is effect homogeneity. Specifying the target population is not usual when an outcome regression is used, perhaps the assumption is that it represents the dataset’s population or the wider population of which the dataset is a (representative) sample. Implicitly, this could then be the population average effect. We illustrate a method for assessing where an outcome regression balances binary confounders when the effect of interest is the population average effect for a binary exposure on a continuous outcome.

In the eAppendix;, we describe our method. Recent work shows observations receive different weights when deriving a regression coefficient.5–8 Our method uses weights from a matrix representation of a regression. For a binary exposure, the control group’s weights will sum to −1 and the treatment group’s weights to 1. When applied to the outcome, the two sets of weights compare counterfactual means, everyone exposed to the treatment and everyone not. Their sum is the difference, the average causal effect. In the Table, the column “no weight” shows the observed distribution of two confounders, Z and Q, and their interaction (which is modeled) both in the total sample and over the exposure, X. However, the outcome regression does not balance at the population mean. Thus, the outcome regression is not estimating the population average effect but an average effect for a different population. The IPW does balance at the population mean, confirming that this is a suitable approach for estimating the population average effect. In contrast to the outcome regression, the IPW allows for the interaction between the exposure and confounders when calculating the average effect. We believe that our approach can be generalized to all exposure and confounder types, as well as other types of outcome covered by generalized linear modeling (e.g., logistic regression). This is because a different approach to assessing balance in an outcome regression that follows the same logic as ours has been generalized.8 Confirming our approach can be generalized requires further work.



Back to Top | Article Outline


F.P. thanks colleagues who patiently listened to early iterations of this idea and gave feedback. Although not referenced in the paper, F.P. learnt a great deal about matrix transformation from the following blog post. Gould W. (2011). Understanding matrices intuitively, part 1. The Stata Blog: Not elsewhere classified.

Frank Popham

Alastair H. Leyland

MRC/CSO Social and Public Health Sciences Unit

University of Glasgow

Glasgow, Scotland

Back to Top | Article Outline


1. Hernán M, Robins J. Causal Inference Book. Boca Raton, FL: Chapman Hall/CRC. (forthcoming). Available at:
2. Li F, Morgan KL, Zaslavsky AM. Balancing covariates via propensity score weighting. J Am Stat Assoc. 2017. Available at:
3. Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015;34:3661–3679.
4. Zubizarreta JR. Stable weights that balance covariates for estimation with incomplete outcome data. J Am Stat Assoc. 2015;110:910–922.
5. Abadie A, Diamond A, Hainmueller J. Comparative politics and the synthetic Control method. Am J Polit Sci. 2015;59:495–510.
6. Brown JD. Linear Models in Matrix Form: A Hands-On Approach for the Behavioral Sciences. 2014.London: Springer.
7. Imbens GW. Matching methods in practice: three examples. J Human Res. 2015;50:373–419.
8. Aronow PM, Samii C. Does regression produce representative estimates of causal effects?. Am J Polit Sci. 2016;60:250–267.

Supplemental Digital Content

Back to Top | Article Outline
Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.