Institutional members access full text with Ovid®

Share this article on:

Using Super Learner Prediction Modeling to Improve High-Dimensional Propensity Score Estimation

Wyss, Richard; Schneeweiss, Sebastian; van der Laan, Mark; Lendle, Sam D.; Ju, Cheng; Franklin, Jessica M.
doi: 10.1097/EDE.0000000000000762
Original Article: PDF Only

The high-dimensional propensity score is a semi-automated variable selection algorithm that can supplement expert knowledge to improve confounding control in non-experimental medical studies utilizing electronic healthcare databases. While the algorithm can be used to generate hundreds of patient-level variables and rank them by their potential confounding impact, it remains unclear how to select the optimal number of variables for adjustment. Super Learner and collaborative targeted maximum likelihood estimation (collaborative targeted MLE) are tools for prediction modeling and causal inference that can be combined with the high-dimensional propensity score to improve propensity score estimation and confounding control in large healthcare databases. We used plasmode simulations based on empirical data to evaluate the performance of combining the high-dimensional propensity score with Super Learner and a scalable version of collaborative targeted MLE. We evaluated performance using bias and mean squared error (MSE) in effect estimates. Results showed that the high-dimensional propensity score can be sensitive to the number of variables included for adjustment and that severe overfitting of the propensity score model can negatively impact the properties of effect estimates. Combining the high-dimensional propensity score with the scalable version of collaborative targeted MLE performed well for many of the scenarios considered, but was sensitive to the parameter specifications within the algorithm. Combining the high-dimensional propensity score with Super Learner was the most consistent strategy, in terms of reducing bias and MSE in the effect estimates, and may be promising for semi-automated data-adaptive propensity score estimation in high-dimensional covariate datasets.

Code availability: Software for the methods discussed in the manuscript is available at

R code for producing plasmode simulations is available upon request.

Financial support: This work was funded by PCORI contract ME-1303-5638

Conflicts of interest: none declared.

* Corresponding author. Contact details: Richard Wyss, Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital and Harvard Medical School, 1620 Tremont St. Suite 3030, Boston, MA 02120, USA. Email:, Phone: +1 617 278 0627.

Copyright © 2017 Wolters Kluwer Health, Inc. All rights reserved.