Share this article on:

Variable Selection for Propensity Score Estimation via Balancing Covariates

Zhu, Yeying; Schonbach, Maya; Coffman, Donna L.; Williams, Jennifer S.

doi: 10.1097/EDE.0000000000000237
Letters

Department of Statistics and Actuarial Science University of Waterloo Waterloo, ON, Canada yeying.zhu@uwaterloo.ca

The Methodology Center Pennsylvania State University University Park, PA

The Center for Childhood Obesity Research Pennsylvania State University University Park, PA

Disclosure: The authors report no conflict of interest.

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com). This content is not peer-reviewed or copy-edited; it is the sole responsibility of the author.

Back to Top | Article Outline

To the Editor:

Recently, several new approaches have been proposed for estimating propensity scores by achieving balance in the covariates. The philosophy is that by achieving balance, the bias in the estimated causal treatment effect due to measured covariates can be reduced.1 In this study, we focus on two approaches in this class: the generalized boosted model2 and the covariate balancing propensity score.3 For both approaches, the estimation depends on the covariates that we aim to balance. The traditional belief is that we should obtain balance on all the available covariates in a study.4 However, will including covariates that are not real confounders increase the variance of the causal estimator? Should we also include covariates that are related only to the treatment assignment?

To investigate which set of covariates should be included in the balancing condition, we conduct a simulation study following Brookhart et al.5 We first generate three covariates, (X1, X2, X3), from a standard normal distribution. Then, the treatment indicator T is generated from a Bernoulli distribution and the outcome variable Y is generated from a Poisson model with the true treatment effect α = 0.5 (details in the eAppendix, http://links.lww.com/EDE/A868). Based on the simulation setup, X1 is the real confounder that is jointly related to the treatment and the outcome variable; X3 is related only to the treatment variable and X2 is related only to the outcome variable.

We employ two approaches to estimate the treatment effect: inverse probability weighting and matching (details in the eAppendix, http://links.lww.com/EDE/A868). We generate 1,000 datasets with n = 500 and n = 2,500. We record the bias, variance, and mean squared error of the estimated treatment effect,

. The results for covariate balancing propensity scores are displayed in the Table and the results for generalized boosted model are displayed in eTable 1 in the eAppendix (http://links.lww.com/EDE/A868). In both the tables, the reference model for estimating the propensity scores is the probit model with X1 and X2 as the covariates. We choose this model because Brookhart et al.5 found that this model leads to the smallest variances and mean squared errors among all possible probit models for estimating the propensity scores.

From the simulation results, we find that the best propensity score is the model with X1 and X2 in the balancing conditions. Placing an additional balancing condition on X3 leads to increased variance and mean squared error. For inverse probability weighting and matching estimators, it also increases the bias of the causal estimates in most cases. Compared with covariate balancing propensity scores, generalized boosted model has larger biases but smaller variances and smaller mean squared errors in general. This set of simulations has certain limitations because there are only three covariates in the setup. In practice, to make sure there are no unmeasured confounders, researchers usually collect information on a large number of covariates. Generalized boosted model tends to have superior performance when there are a large number of covariates because it can automatically perform variable selection without specifying a parametric model.6 In summary, the simulation results indicate that for both approaches, we should aim to achieve balance on real confounders, as well as covariates that are related to the outcome variable.

Finally, this study is also in line with Austin et al.4 and Stuart et al.7 The former compares several propensity score models by evaluating the models’ ability to balance all available covariates in the study. The latter compares balance statistics in terms of removing bias. However, the focus of this study was to investigate which set of covariates should be included in the above-mentioned evaluation procedures.

Yeying Zhu

Maya Schonbach

Department of Statistics and Actuarial Science

University of Waterloo

Waterloo, ON, Canada

yeying.zhu@uwaterloo.ca

Donna L. Coffman

The Methodology Center

Pennsylvania State University

University Park, PA

Jennifer S. Williams

The Center for Childhood Obesity Research

Pennsylvania State University

University Park, PA

Back to Top | Article Outline

REFERENCES

1. Harder VS, Stuart EA, Anthony JC. Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychol Methods. 2010;15:234–249
2. McCaffrey DF, Ridgeway G, Morral AR. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods. 2004;9:403–425
3. Imai K, Ratkovic M. Covariate balancing propensity score. J R Stat Soc. 2014;76(1):243–263
4. Austin PC, Grootendorst P, Anderson GM. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study. Stat Med. 2007;26:734–753
5. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol. 2006;163:1149–1156
6. Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Stat Med. 2010;29:337–346
7. Stuart EA, Lee BK, Leacy FP. Prognostic score-based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. J Clin Epidemiol. 2013;66(8 Suppl):S84–S90.e1

Supplemental Digital Content

Back to Top | Article Outline
Copyright © 2015 Wolters Kluwer Health, Inc. All rights reserved.