From the Division of Biostatistics, Department of Biostatistics and Epidemiology, Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania School of Medicine, Philadelphia, PA.
Correspondence: Marshall M. Joffe, Division of Biostatistics, Department of Biostatistics and Epidemiology, Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania School of Medicine, 602 Blockley Hall 423 Guardian Drive, Philadelphia, PA 19104-6021. E-mail: email@example.com.
Behind every analysis of observational data lurks the specter of inadequate control of confounding. Inadequate control can result from inadequate measurement of confounding variables, from errors in measurement or classification of those variables, from incorrect assumptions about the nature of relationships among variables, and from laziness and exhaustion. Analysts faced with hundreds or thousands of measured covariates that potentially confound the association of the study treatment or exposure with the outcome may throw up their hands in desperation; one who tries to use the usual advice for building and checking models will quickly get exhausted.
Automating the process of model building thus becomes an appealing option. But how can one automate model building in a way to do a good job of controlling confounding? The problem becomes particularly acute when one is using propensity scores to control confounding; here, modeling the treatment process accurately is not an end in itself, and, in terms of efficiency and even bias, it can be harmful to control for covariates that predict the treatment but not the outcome.1
Automated algorithms have been developed for several purposes: to discover causal structure, to build well-fitting predictive models, and to control for confounding. Algorithms to discover causal structure can, in principle, sometimes determine whether a covariate is affected by treatment and so should not be controlled, or a pretreatment covariate that may need to be controlled.2 Unfortunately, these algorithms may be of limited use in observational epidemiology.3
There is a large literature on automated algorithms for predictive models, both in statistics and machine learning.4 Criteria for evaluating models may relate to measures of predictive ability (eg, R2), or to calibration or discrimination. Many algorithms are designed to try to optimize these measures, and use these measures explicitly in model building and selection. These criteria are based on functions of observable quantities; thus, cross-validation plays a key role, as random subsets of the observed data may be used to evaluate rules derived from other random subsets.
Controlling confounding presents a harder problem, because the purpose of the modeling relates to unobservable or counterfactual outcomes.5–7 Thus, evaluation of the performance of a modeling strategy using empirical data will rely on assumptions external to the data; in this way, it is similar to controlling confounding using observational data. The assumptions can relate to the internal data at hand or to external data. Internal assumptions might be that identified measured covariates are sufficient for controlling confounding; this assumption seems implicit in the process used for model and variable selection by Schneeweiss et al8 in this issue of Epidemiology. External assumptions might include an assumption that other studies, especially randomized trials, provide less biased estimates of the same quantities; this also seems to be assumed by Schneeweiss et al. Although there are reasons this might not be true, including differences between the populations used in the randomized studies and the observational studies at hand, it is plausible that this is a good working approximation.
Compared with some other approaches for model selection in propensity score models,9 the approach of Schneeweiss et al is presented with little formal or theoretical justification. Nonetheless, on the whole, the algorithm seems reasonable. Formal methods based on minimizing some criterion can be somewhat problematic when using traditional propensity score adjustments with logistic or proportional hazards regression models, because, due to noncollapsibility of the measures of effect, the true parameter of interest changes with the variables included in the propensity score model.9–11 This noncollapsibility may be unimportant if the outcome is rare, as is the case in the examples provided.
It is intriguing that in each of the 3 examples, the authors' algorithm produces estimates closer to those expected on the basis of randomized trials or other external knowledge than do simpler adjustment methods using fewer covariates. Although suggestive, this is not fully convincing evidence of the utility of their approach, as the authors acknowledge. These results might be viewed as a meta-study with a sample size of 3; amassing more empirical evidence (by increasing sample size) is clearly a difficult and time-consuming process. Further, there is moderate imprecision in the estimates in each of the studies, and so it is hard to know definitively whether the closer approximation of their results to expectations is due to better performance of their methods in reducing bias.
In summary, Schneeweiss et al provide a reasonable-looking algorithm and intriguing if limited evidence for its utility in reducing bias reduction. Despite the difficulty of obtaining empirical evidence of an algorithm's benefit, the use of this algorithm, and other automated algorithms, is to be encouraged. With high-dimensional covariate data, model selection by hand in a reasonable way is an impossible task that is too often attempted in epidemiology. Just as automation was required for the mass production of industrial goods, it will be required for the mass consideration of large numbers of high-dimensional confounders. Schneeweiss et al are to be congratulated for moving our field in this direction.
ABOUT THE AUTHOR
MARSHALL JOFFE is Associate Professor of Biostatistics at the University of Pennsylvania. His independent research concentrates on statistical methods for causal inference, and he has concentrated on methods for controlling for confounding in general and confounding by indication in particular. He collaborates extensively in nephrology and other fields.
1.Brookhart MA, Schneeweiss S, Rothman K, Glynn RJ, Avorn J, Sturmer T. Variable selection in propensity score models. Am J Epidemiol
2.Spirtes P, Glymour C, Scheines R. Causation, Prediction, and Search.
2nd ed. Cambridge, MA: MIT Press; 2000.
3.Robins JM, Wasserman L. On the impossibility of inferring causation from association without background knowledge. In: Glymour C, Cooper G, eds. Computation, Causation, and Discovery
. Menlo Park, CA: AAAI Press/The MIT Press; 1999:305–321.
4.Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning.
New York: Springer; 2001.
5.Greenland S, Robins JM. Identifiability, exchangeability, and epidemiological confounding. Int J Epidemiol
6.Neyman J. On the application of probability theory to agricultural experiments. Essay on principles. Translated by D.M. Dabrowska and edited by T. P. Speed. Statist Sci
7.Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol
8.Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology
9.Brookhart MA, van der Laan MJ. A semiparametric model selection criterion with applications to the marginal structural model. Comput Stat Data Anal
10.Greenland S. Interpretation and choice of effect measures in epidemiologic analyses. Am J Epidemiol
11.Greenland S, Robins J, Pearl J. Confounding and collapsibility in causal inference. Statist Sci