In the article titled “High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data,” the authors introduce a semiautomated variable selection algorithm for high-dimensional proxy adjustment within insurance health care claims databases.1 The high-dimensional propensity score (HDPS) algorithm evaluates thousands of diagnostic, procedural, and medication claims codes and, for each code, generates binary variables based on the frequency of occurrence for each code during a defined pre-exposure covariate assessment period. The HDPS then prioritizes or ranks each variable based on its potential for bias by assessing the variable’s prevalence and univariate association with the treatment and outcome according to the Bross formula.1,2 From this ordered list, investigators then specify the number of variables to include in the HDPS model along with prespecified variables such as age and sex.1 A full description of the HDPS algorithm is provided elsewhere.1
In the original article by Schneeweiss et al.,1 the Bross bias multiplier for prioritizing covariates was defined as follows:
represents the prevalence of the binary covariate within the exposed group,
the prevalence of the binary covariate within the unexposed group, and
the relative risk for the univariate association between the binary covariate and the study outcome.
One of us (B.F.) noted that for correct assessment of a binary covariate’s confounding impact, the Bross bias multiplier should be defined simply as follows:
We repeated a subset of the analyses from the original manuscript using a revised HDPS that included the correct implementation of the Bross formula. A full description of the data sources that were used for the empirical analyses is provided in the original article.1,3 The Table shows that there was almost no change from the results reported in the original manuscript after using the above Bross formula for covariate prioritization. For the nonsteroidal anti-inflammatory drugs data example (Table), 199 out of the top 200 ranked variables and 476 out of the top 500 ranked variables were common to both the ordering from the revised HDPS and the ordering from the original manuscript. For the Statin data example (Table), 193 out of the top 200 ranked variables and 486 out of the top 500 ranked variables were common to both the ordering from the revised HDPS and the ordering from the original manuscript.
The HDPS software that is distributed online has been updated to include the modified implementation of the Bross formula.4 Results from analyses that have been conducted using older versions of the HDPS algorithm are unlikely to change meaningfully after this correction.
1. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20:512–522.
2. Bross ID. Spurious effects from an extraneous variable. J Chronic Dis. 1966;19:637–647.
3. Rassen JA, Glynn RJ, Brookhart MA, Schneeweiss S. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples. Am J Epidemiol. 2011;173:1404–1413.
4. Rassen JA, Doherty M, Huang W, Schneeweiss S. Pharmacoepidemiology Toolbox. Boston, MA.