Commentary: Balancing Automated Procedures for Confounding Control with Background Knowledge

Wyss, Richard; Stürmer, Til

doi: 10.1097/EDE.0000000000000068
Author Information

Department of Epidemiology, UNC Gillings School of Global Public Health University of North Carolina at Chapel Hill, Chapel Hill, NC.

T.S. receives investigator-initiated research funding and support as Principal Investigator (R01 AG023178) and Co-Investigator (R01 AG042845) from the National Institute on Aging (NIA), and as Co-Investigator (R01 CA174453) from the National Cancer Institute (NCI) at the National Institutes of Health (NIH), and as Principal Investigator of a Pilot Project from the Patient Centered Outcomes Research Institute (PCORI). He also received research funding as Principal Investigator of the UNC-DEcIDE center from the Agency for Healthcare Research and Quality. T.S. does not accept personal compensation of any kind from any pharmaceutical company, though he receives salary support from the Center for Pharmacoepidemiology and research support from pharmaceutical companies (Amgen, Genentech, GlaxoSmithKline, Merck, UCB) to the Department of Epidemiology, University of North Carolina at Chapel Hill. Stürmer has no conflicts to report.

Editors’ note: A related article appears on page 268.

Correspondence: Til Stürmer, Department of Epidemiology, UNC Gillings School of Global Public Health University of North Carolina at Chapel Hill McGavran-Greenberg, CB # 7435 Chapel Hill, NC 27599-7435. E-mail:

Article Outline

In this issue of EPIDEMIOLOGY, Patorno et al1 illustrate the importance of using subject-matter knowledge to complement the automated high-dimensional propensity-score algorithm when controlling for confounding in studies based on claims data with few exposed outcomes.

The topic of variable selection for propensity-score models in settings involving large numbers of potential confounders has received considerable attention in recent years. This interest is due in part to the uncertainty in determining what role automated procedures should play in the variable-selection process. With large healthcare databases becoming increasingly used in epidemiology,2–4 automated procedures can be beneficial in selecting potential confounders that are unknown to the investigator.5–7 Furthermore, the application of automated procedures is likely to expand with increased attention to safety surveillance as part of the Food and Drug Administration’s Sentinel Initiative.8 In these settings, automated procedures such as the high-dimensional propensity score can increase the speed and efficiency of active surveillance.7 With this increasing need for automated methods for confounding control, the question becomes: how should investigators balance automated procedures with the use of subject-matter knowledge?

Automated procedures can be beneficial in identifying empirical associations among large numbers of covariates. Still, empirical associations and probability distributions by themselves are not sufficient to determine causal relations.9–13 In a recent commentary, Pearl13 explains that probability distributions of observed variables cannot completely characterize causal relations and that “every exercise in causal inference must commence with some extra knowledge that cannot be expressed in probability alone.” On the topic of variable selection, many authors have argued that the identification of confounding variables should be grounded in previous substantive knowledge.9–11 In discussing this issue, Hernán et al11 have emphasized that “causal inference from observational data requires previous causal assumptions or beliefs, which must be derived from subject-matter knowledge, not from statistical associations detected in the data.”

Although it is easy to acknowledge the theoretical limitations of using empirical associations to identify causal relations, the practical consequences of these limitations are less clear. For example, a potential obstacle for automated variable-selection procedures is the possibility for an increase in bias amplification caused by controlling for instrumental variables and collider-stratification bias (eg, M-bias). Although the negative effects of these biases are obvious in theory, the impact of bias amplification and collider-stratification bias in practice is elusive. Recent simulation studies have examined the magnitude of these biases in several practical settings and have shown that such increases in bias are generally small compared with the bias resulting from the exclusion of confounding variables.14,15 On the basis of their results, Myers et al14 and Liu et al15 recommend that controlling for confounding should take precedence over avoiding bias amplification14 or M-bias.15 Although we agree, it is important to recognize that there do exist scenarios where collider-stratification bias and, in particular, bias amplification can be substantial.

In his commentary on the study by Meyers et al, Pearl16 describes theoretical situations where the cumulative effect of conditioning on multiple variables with strong effects on treatment but weak effects on the outcome (ie, near-instruments) can result in bias amplification which is more pronounced than the bias reduction obtained from controlling for those variables. More specifically, Pearl16 states that “the cumulative effect of sequential conditioning has a built-in slant toward bias amplification as compared with confounding reduction; the latter is tempered by sign cancellations, the former is not.” Published empirical examples that demonstrate bias amplification or collider-stratification bias are rare but do exist. One such study by Patrick et al17 reported that including a glaucoma diagnosis covariate (a strong predictor of receiving the comparator drug: an antiglaucoma medication) in the propensity score resulted in a substantial increase in bias when estimating the effect of statins on mortality and hip fracture.

Although bias amplification and collider-stratification bias can be substantial in specific settings, there is little empirical evidence of these obstacles having a major impact on the automated high-dimensional propensity-score algorithm. Several studies have demonstrated that this score, when used to complement investigator-specified covariate adjustment, often improves confounding control and performs no worse than investigator-specified approaches by themselves.5–7,18 In discussing the high-dimensional propensity score for confounding adjustment in large healthcare databases, Rassen and Schneeweiss7 concluded that “any confounding bias will likely be greater in magnitude than collider bias” and “an automated confounding adjustment system that selects a large number of covariates, even with somewhat imperfect variable selection, should improve study validity far more than it will harm it.”

Although we tend to agree, investigators should be aware of the potential limitations of bias amplification and collider-stratification bias and the potential for these biases to affect the performance of the propensity score in some circumstances. Researchers should further be aware of the possibility for additional limitations to arise as the high-dimensional propensity score is applied in settings where its performance has not been well established. For example, recent studies have described situations involving few exposed events where the performance of the automated high-dimensional propensity-score algorithm may be limited.6,7,18

The theoretical limitations outlined above dictate that automated procedures for confounding control are not optimal in every situation. Indeed, there have been published examples where the high-dimensional propensity score, when used as a replacement for investigator-specified covariate adjustment, increased bias compared with covariate-adjustment procedures that incorporate expert knowledge.1,19 In the study by Toh et al19 based on a U.K. electronic medical record database, one of the key points was that “compared with adjustment for investigator-identified variables only, adjustment using the high-dimensional propensity-score algorithm (with only age and sex included a priori) was closer to the unadjusted estimate.”

The study by Patorno et al1 is the first to empirically demonstrate within claims data that the high-dimensional propensity score can perform worse than investigator-specified covariate adjustment when used in the absence of a prespecified set of investigator-selected variables. Their results highlight the problem that the high-dimensional propensity score can be especially vulnerable in settings with few exposed outcomes. These findings are valuable contributions to the literature, as they help us understand settings in which the theoretical limitations of automated procedures in the absence of investigator input can have practical consequences and should thus be avoided.

Because the need for automated procedures for confounding control grows, researchers will continue to apply high-dimensional propensity scores in novel studies and new areas where its performance is not well established. It is important to recognize that these can include situations where this propensity score (and automated procedures in general) can perform poorly in both theory and practice in the absence of investigator input. We therefore emphasize the importance of using substantive knowledge to obtain an understanding of the data and the underlying causal structure before applying automated procedures for confounding control.10 Although the high-dimensional propensity score is a valuable tool in helping researchers control for large numbers of potential confounders, automated procedures cannot replace background knowledge. As emphasized by Patorno et al1 such methods should complement investigator input, not replace it.

Back to Top | Article Outline


RICHARD WYSS is a doctoral student in the Department of Epidemiology, UNC Gillings School of Global Public Health University of North Carolina at Chapel Hill. His thesis addresses applications of propensity scores and disease-risk scores in pharmacoepidemiology. TIL STÜRMER is a Professor and Director of the Center for Pharmacoepidemiology in the same Department. He has conducted extensive research on the application of propensity scores and preventive drug use in the elderly.

Back to Top | Article Outline


1. Patorno E, Glynn RJ, Hernandez-Diaz S, Liu J, Schneeweiss S. Studies with many covariates and few outcomes:. selecting covariates and implementing propensity-score-based confounding adjustments. Epidemiology. 2014;25:268–278
2. Ray WA. Population-based studies of adverse drug effects. N Engl J Med. 2003;365:475–481
3. Arbogast PG, Ray WA. Use of disease risk scores in pharmacoepidemiologic studies. Stat Methods Med Res. 2009;18:67–80
4. Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005;58:323–337
5. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20:512–522
6. Rassen JA, Glynn RJ, Brookhart MA, Schneeweiss S. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples. Am J Epidemiol. 2011;173:1404–1413
7. Rassen JA, Schneeweiss S. Using high-dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):41–49
8. Platt R, Wilson M, Chan KA, Benner JS, Marchibroda J, McClellan M. The new Sentinel Network—. improving the evidence of medical-product safety. N Engl J Med. 2009;361:645–647
9. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48
10. Robins JM. Data, design, and background knowledge in etiologic inference. Epidemiology. 2001;12:313–320
11. Hernán MA, Hernández-Díaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155:176–184
12. Pearl J Causality: Models, Reasoning, and Inference. 20092nd ed New York, N.Y. Cambridge University Press
13. Pearl J. Comment on ‘Causal inference, probability theory, and graphical insights’ by Stuart G. Baker Stat Med. 2013;32:4331–4333
14. Myers JA, Rassen JA, Gagne JJ, et al. Effects of adjusting for instrumental variables on bias and precision of effect estimates. Am J Epidemiol. 2011;174:1213–1222
15. Liu W, Brookhart MA, Schneeweiss S, Mi X, Setoguchi S. Implications of M bias in epidemiologic studies: a simulation study. Am J Epidemiol. 2012;176:938–948
16. Pearl J. Invited commentary: understanding bias amplification. Am J Epidemiol. 2011;174:1223–1228
17. Patrick AR, Schneeweiss S, Brookhart MA, et al. The implications of propensity score variable selection strategies in pharmacoepidemiology: an empirical illustration. Pharmacoepidemiol Drug Saf. 2011;20:551–559
18. Rassen J, Schneeweiss S. Letter to the. editor. Pharmacoepidemiol Drug Saf. 2011;20:1110–1111
19. Toh S, García Rodríguez LA, Hernán MA. Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records. Pharmacoepidemiol Drug Saf. 2011;20:849–857
© 2014 by Lippincott Williams & Wilkins, Inc