Secondary Logo

Journal Logo

Studies with Many Covariates and Few Outcomes: Selecting Covariates and Implementing Propensity-Score–Based Confounding Adjustments

Patorno, Elisabettaa; Glynn, Robert J.a; Hernández-Díaz, Soniab; Liu, Juna; Schneeweiss, Sebastiana

doi: 10.1097/EDE.0000000000000069
Methods
Free
SDC

Background: Propensity scores are useful for confounding adjustment in the commonly observed setting of many potential confounders, frequent exposure, and rare events. However, with few exposed outcomes to inform covariate selection and many candidate confounders, optimal approaches to construct and implement propensity-score–based confounding adjustment remain unclear.

Methods: In a cohort study on the effect of anticonvulsant drugs on cardiovascular risk among adult patients from the HealthCore Integrated Research Database, we compared the performance for confounding control of various covariate-selection strategies for propensity-score estimation (expert knowledge only, expert knowledge informed by empirical covariate selection via high-dimensional propensity-score, and high-dimensional propensity-score empirical specification only) and propensity-score–based adjustment methods (propensity-score-matching and propensity-score-decile stratification). This article focuses on the first 90 days of follow-up because any treatment effect identified in this temporal window almost certainty originates from residual confounding rather than pharmacologic action.

Results: We identified 166,031 new users and 564 ischemic cardiovascular events. Among those, 12,580 patients initiated anticonvulsants that strongly induce cytochrome P450 enzymes and experienced 68 events. The unadjusted hazard ratio was 1.72 (95% confidence interval = 1.34–2.22). Adjustment for investigator-identified covariates led to 41% to 59% reductions in the hazard ratio; adjustment for both investigator-identified and high-dimensional propensity-score empirically identified covariates led to larger reductions (54% to 72%). A selection strategy based on high-dimensional propensity-score empirical specification alone produced less-attenuated and more-volatile hazard ratio estimates. This volatility seemed to be slightly attenuated in a trimmed propensity-score–stratified analysis.

Conclusions: The high-dimensional propensity-score algorithm complements expert knowledge for confounding adjustment, but in settings with few exposed outcomes, its performance without investigator-specified covariates is less clear and may be associated with an increased likelihood of bias. In our example, investigator specification of variables combined with high-dimensional propensity-score empirical selection and the use of trimmed propensity-score–stratified analysis seem to improve effect estimation. Plotting the relation of effect estimates to the increasing number of empirical covariates is a useful diagnostic.

Supplemental Digital Content is available in the text.

From the aDivision of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA; and bDepartment of Epidemiology, Harvard School of Public Health, Boston, MA.

Supported by the Pharmacoepidemiology Program at Harvard School of Public Health, which is funded by Pfizer and Asisa (to E.P.). Robert J. Glynn has received grants to his institution from AstraZeneca and Novartis for the design, interim monitoring, and analysis of clinical trials. Sonia Hernández-Díaz participates in The North American Antiepileptic Drugs (AED) Pregnancy Registry, which receives grants from multiple pharmaceutical companies, and has consulted for Novartis and GSK Biologics pregnancy registries for medications not the subject of this analysis. The Pharmacoepidemiology program at Harvard School of Public Health receives funds for training grants from Pfizer. Jun Liu has no conflicts to report.

Dr. Schneeweiss is Principal Investigator of the Harvard-Brigham Drug Safety and Risk Management Research Center funded by FDA. He is consultant to WHISCON LLC and Booz & Co., and his research is partially funded by investigator-initiated grants to the Brigham and Women’s Hospital by Pfizer, Novartis, and Boehringer-Ingelheim unrelated to the topic of this study. He is consultant to and share holder of Aetion, Inc. a start-up software company that seeks to provide a database backbone and intuitive user interface for database analysis of real world data assets.

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com). This content is not peer-reviewed or copy-edited; it is the sole responsibility of the author.

Editors’ note: A commentary on this article appears on page 279.

Correspondence: Elisabetta Patorno, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, 1620 Tremont Street (Suite 3030), Boston, MA 02120. E-mail: epatorno@partners.org.

It is common in epidemiologic studies to have frequent exposures, many potential confounders, and few outcomes. This is frequent in pharmacoepidemiology, and other epidemiologic areas that investigate rare diseases, such as cancer epidemiology (eg, low-incidence cancers), reproductive health (eg, birth defects), neuroepidemiology (eg, amyotrophic lateral sclerosis), and psychiatric epidemiology (eg, suicide).

There is a wide array of methods for adjustment of confounders in epidemiology including outcome regression models, g-computation,1 and doubly robust methods.2 However, many adjustment methods are limited to parsimonious models, which are constrained by the possibility that only a limited number of covariates can be considered per outcome.3 In a setting of relatively few exposed outcomes and many potential confounders, parsimonious models may lead to suboptimal confounder control and unreliable effect estimation.4 The use of nonparsimonious models can improve the validity of effect estimation, but only few strategies are available that allow the use of nonparsimonious models with few outcomes. These are primarily represented by confounder summary scores that bundle many variables into one dimension, such as exposure propensity scores and disease-risk scores. The disease-risk score is a less-useful tool in settings with rare outcomes where reliable multivariable risk prediction is problematic.5 Propensity-score methodology—the estimated probability of exposure based on the covariates included in the propensity-score model6—is an attractive approach with sparse outcomes, frequent exposure, and many potential confounders.7 Estimated propensity scores can be used to reduce confounding via matching, stratification, weighting, regression adjustment, or some combination of these strategies.8 Propensity scores play a role in causal modeling, including marginal structural models9 and some doubly robust methods that directly incorporate propensity-score estimation,10 including collaborative targeted maximum likelihood estimation.11 There are several strategies available for fitting propensity-score models, including logistic regression, Lasso regression,12 and machine learning techniques.13–15

In general, construction of the propensity score should consider any variable that is thought to be a common cause of an exposure and an outcome. Rubin16 has recommended that all variables related to the outcome should also be included in the propensity-score model, regardless of their association with exposure,16 a practice that has been shown to decrease the variance of an estimated exposure effect.17 Conversely, variables associated with the exposure but not with the outcome (ie, instrumental variables) should not be included in the propensity-score (PS) model, because they increase the variance of exposure effect estimates, and when unmeasured confounders are present, may increase bias.17,18

A variety of approaches guide covariate selection, ranging from strategies that rely solely on a subject-matter knowledge specification of variables to strategies that rely completely on an empirical specification. In the setting of rare outcomes, covariate selection through subject-matter knowledge may play a particularly important role, because information to empirically identify potential confounders and instrumental variables may be limited. However, with many potential confounders, such as found in research with large healthcare utilization databases, available knowledge may be inadequate to specify all relevant confounders and their measurements using various coding systems and coding habits,19 so that an automated technique for variable selection may add to the covariate-selection effort.

The high-dimensional propensity-score algorithm,20 an automated extension of propensity-score methodology, directly addresses the challenge of covariate selection and propensity-score estimation in the context of a large number of potential confounders. This algorithm assesses thousands of diagnoses, procedures, and drug treatment codes—in terms of their prevalence, strength of the association with exposure and outcome, and clustering in time—and assists in identifying potential confounders and proxies for confounders, ultimately addressing aspects of unmeasured confounding.

Although propensity-score methodology and its automated extension high-dimensional propensity-score have shown to be a useful tool with many potential confounders, frequent exposure, and rare events (eg, in epidemiology using secondary databases),21,22 optimal approaches to construct and implement PS-based confounding adjustment with few exposed outcomes and many candidate confounders remain unclear.

In a cohort study on the effect of anticonvulsant drugs on ischemic cardiovascular risk, we explore the performance for confounding adjustment of a variable selection strategy based on the following: subject-matter knowledge only; subject-matter knowledge informed by high-dimensional propensity-score empirical specification of covariates; and high-dimensional propensity-score empirical specification of covariates only. We then applied two propensity-score–based adjustment methods: propensity-score matching and propensity-score stratification. For this study, we included adult patients with no recorded major ischemic cardiovascular conditions at baseline and focused solely on the 90 days immediately after the initiation of an anticonvulsant medication. This temporal window was chosen on the basis of the underlying biological theory, because any treatment effect identified shortly after treatment initiation is likely to originate from inadequate confounding adjustment rather than true pharmacological action; proper confounding adjustment is therefore of particular importance. This investigation provides an example of a study with relatively rare outcomes, a frequent exposure that is characterized by a highly heterogeneous pattern of utilizations23,24 and limited information on indication of use, and many potential confounders.

Back to Top | Article Outline

OVERVIEW OF HIGH-DIMENSIONAL PROPENSITY-SCORE ALGORITHM FOR VARIABLE SELECTION

The high-dimensional propensity-score algorithm is an automated technique that examines thousands of covariates among various claims—data dimensions in the study population. Each dimension describes an aspect of care, such as dispensed drugs, recorded diagnoses, and performed procedures. For each dimension, the top n most prevalent codes are identified and classified into three levels of within-patient frequency of occurrence during the baseline period: code occurred at least once, at least the median number of times the code was observed for the cohort (sporadic), and at least the 75th percentile number of times the code was observed for the cohort (frequent). A code classified as frequent would have a “true” value for all three levels of frequency of occurrence. These codes are subsequently transformed into binary covariates and then individually assessed for selection into a propensity score; for example, with five dimensions (hospital diagnoses, hospital procedures, doctor’s office diagnoses, doctor’s office procedures, and pharmacy prescription fills), 200 most prevalent codes from each dimension (n = 200), and three levels of within-patient frequency of occurrence of each code (once, sporadic, or frequent), there are up to 3000 possible indicator variables that could be added to a propensity score. By default, the high-dimensional propensity-score algorithm then prioritizes each of these variables by its potential to bias the exposure–outcome relation under study (“bias” ranking criterion), on the basis of the formula by Bross,25 and includes the top k = 500 of these covariates in a propensity score. Other methods for selecting variables have also been developed.26 The top k variables can be selected on the basis of the confounder–exposure association (“exposure-only” ranking criterion, as measured by the ratio of prevalence of the confounder in the exposed vs. the unexposed), most suitable for cases with few exposed outcomes; or by the confounder–outcome association (“outcome-only” ranking criterion, as measured by the ratio of prevalence of the confounder in the people who experienced the outcome vs. those who did not), most suitable for disease-risk scores. In addition to the choice of the “exposure-only” ranking criterion for selecting empirical variables, the issue of few exposed cases can also be addressed by adding 0.1 to each cell of the covariate–exposure and covariate–outcome 2 × 2 tables to make the confounders’ associations with exposure and outcome consistently computable when there are cells with zero patients.21 High-dimensional propensity-score methodology has been shown to function robustly in small samples although with few exposed events it may not offer any improvement beyond investigator’s specification of covariates.22

Among the empirically identified covariates, although some will be strongly correlated with the variables selected by the investigator a priori, others may be new information obtained from the observed data. These covariates could be proxies for constructs that are complex and difficult to measure even in highly controlled settings and might therefore improve adjustment for confounding.

The high-dimensional propensity-score algorithm and its associated Statistical Analysis System (SAS) macro code are available at www.hdpharmacoepi.org.26

Back to Top | Article Outline

METHODS

Example Study

Some of the older anticonvulsants that strongly induce cytochrome P450 enzyme system activity (carbamazepine, phenobarbital, phenytoin, and primidone) have been associated with metabolic changes that may contribute to increased cardiovascular risk. Previous investigations have shown that patients treated with these anticonvulsants experienced increased levels of total cholesterol and most of the various lipid fractions, including low-density lipoprotein (LDL) cholesterol and serum triglycerides.27–30 Meaningful increases in the levels of total cholesterol and LDL cholesterol have been observed after 2 months of therapy with carbamazepine or phenytoin.27,30

We followed a cohort of patients 40 to 64 years old from the HealthCore Integrated Research Database who had initiated an anticonvulsant medication between the years 2001 and 2006 and had no recorded major coronary or cerebrovascular condition31 in the 6 months before treatment initiation (eTable 1, http://links.lww.com/EDE/A766). Medical and pharmacy data were collected from the HealthCore Integrated Research Database of commercially insured members. This database contains longitudinal healthcare claims data from commercial health plans in the southeastern, mid-Atlantic, central, and western regions of the United States.

For our purposes, anticonvulsants were grouped as follows: anticonvulsants that strongly induce cytochrome P450 enzyme system activity (carbamazepine, phenobarbital, phenytoin, and primidone) and others that do not (gabapentin, lamotrigine, levetiracetam, oxcarbazepine, pregabalin, tiagabine, topiramate, valproate, and zonisamide).32,33 On the basis of the medication prescribed on the index date, each subject was identified as a new user of a specific anticonvulsant category. Exposed participants were initiators of any strongly inducing anticonvulsant; unexposed participants were initiators of any other anticonvulsant. Follow-up began on the day after the prescription was filled for the first time. Patients were allowed to have gaps of up to 30 days between prescription fill dates in the calculation of continuous therapy. In the case of drug discontinuation (ie, more than 30 days between fill dates), the exposure-risk window for each patient treatment episode extended until 30 days after the expiration of the supply of the last prescription of continuously filled therapy. Participants were followed for up to 90 days, to the end of their exposure-risk window, until switching to another anticonvulsant agent, until the occurrence of a study event, until death from causes not included in the study outcome, until end of continuous health plan enrollment, or until the end of the study period, whichever came first (as-treated analysis). For the purpose of this study, we focused solely on the 90 days after drug initiation, an exposure window during which anticonvulsant drugs are unlikely to affect ischemic cardiovascular risk through metabolic changes. Shortly after treatment initiation, ischemic cardiovascular events are unlikely to be the result of the treatment and proper confounding adjustment will therefore play a crucial role.

The primary study outcome was a composite of ischemic coronary events (hospitalization for myocardial infarction, acute coronary syndrome, cardiac revascularization procedure, or death from ischemic heart disease) and ischemic cerebrovascular events (ischemic stroke or ischemic cerebrovascular death). For myocardial infarction, acute coronary syndrome, cardiac revascularization procedure, and ischemic stroke, we used previously validated claims algorithms (eTable 1, http://links.lww.com/EDE/A766).34–36 Causes of death were determined through National Death Index linkage. Deaths from ischemic heart disease were identified through recorded International Classification of Diseases (ICD)-10 codes I20-I25, whereas cerebrovascular ischemic deaths were identified as codes I63-I66, I67.2, I67.8, or I67.9.37 Only primary causes of death were considered. Patient characteristics were identified during the 6 months preceding cohort entry. Age, sex, calendar year, and comorbidities were investigated via ICD-9 codes and Current Procedural Terminology-4 codes38 and medication use via National Drug Codes.

The study was approved by the institutional review board of Brigham and Women’s Hospital.

Back to Top | Article Outline

Statistical Analysis

Propensity-score methodology was used to control for confounding by indication and to evaluate ischemic cardiovascular risk among anticonvulsant new users. The propensity score was the probability of initiating anticonvulsants that strongly induce cytochrome P450 activity versus other anticonvulsant agents. We used three strategies to select the potential confounders used to estimate the propensity score. On the basis of expert knowledge only, we constructed an exposure propensity score including basic variables (age, sex, and calendar year) and an extended set of investigator-selected covariates from the subjects’ baseline characteristics (strategy 1). The rationale for this variable selection was that variables had to be clinically recognized as being associated with both exposure and outcome or with outcome alone.16,17 We then augmented the propensity score based on investigator-selected covariates with two sets of empirically identified variables through the high-dimensional propensity-score algorithm (strategy 2): the first set was selected and ranked on the basis of the potential to bias the exposure–outcome relation (“bias” ranking criterion); the second set was selected and ranked on the basis of the confounder–exposure association (“exposure-only” ranking criterion), because of the limited number of exposed events. Both these sets of covariates were added separately to the propensity-score model based on basic and investigator-selected covariates, leading to two final scores informed by empirical specification. Finally, we estimated two additional propensity scores based exclusively on basic and empirically identified variables through the high-dimensional propensity-score algorithm, these latter identified with either a bias or an exposure-only ranking criterion (strategy 3). Because of the few exposed cases, the high-dimensional propensity-score algorithm automatically added 0.1 to each cell of the covariate–exposure and covariate–outcome 2 × 2 tables, to make the confounders’ associations with exposure and outcome consistently computable when there were cells with zero patients. For this analysis, we used high-dimensional propensity-score algorithm version 2, which incorporates an automated assessment of health-service usage intensity and the creation of summary utilization variables, which are screened like all the other high-dimensional propensity-score variables and allowed for inclusion into the final propensity score.26

The estimated propensity scores were then used to reduce confounding via two adjustment methods: propensity-score matching and stratification by propensity-score deciles. Before stratification into deciles, the range of propensity scores was trimmed asymmetrically according to the 5th and 95th percentiles of the propensity-score distribution in the exposed and unexposed patients, respectively (10% asymmetric trimming), to exclude those patients who were treated (or not) most contrary to prediction.39 For propensity-score–matched analyses, exposure groups were 1:1 matched on their propensity score using a “greedy” matching algorithm,40 with a maximum caliper of 0.01. For both propensity-score–matched and propensity-score–stratified analyses, we computed incidence rates and hazard ratios (HRs) with 95% confidence intervals (CIs), and we recorded the following: (1) unadjusted HR and its 95% CI; (2) HR and 95% CI after adjusting for basic variables only; (3) HR and 95% CI after adjusting for basic and investigator-selected variables; (4) HR and 95% CI after adjusting for basic, investigator-selected variables, and empirically selected covariates (k = 500 and k = 700); and (5) HR and 95% CI after adjusting for basic and empirically selected covariates only (k = 500 and k = 700).

To better assess the impact of confounding adjustment provided by the inclusion of empirically selected variables, we sequentially added these empirical covariates, one by one up to k = 700, to two logistic models used to estimate the propensity score, including basic variables only (as per strategy 3) or basic and investigator-selected covariates (as per strategy 2). For each added empirical variable, we estimated a propensity score, created separate propensity-score–matched and propensity-score-trimmed populations, performed propensity-score–matched and propensity-score–stratified analyses, and computed adjusted estimates of the HR. We then plot the relation of the HR estimates to the increasing number of empirical covariates (k = 0 to k = 700) using the Loess procedure in SAS.41 Separate plots were created on the basis of the strategy of confounder selection (strategies 2 and 3), the ranking criterion for empirical covariates (“bias” or “exposure-only” ranking criterion), and the adjustment method (propensity-score matching or propensity-score stratification). In general, we would expect that, by sequentially adding empirical covariates, the confounding adjustment would improve and the slope of HR estimate curves would progressively flatten around the least biased estimate.

Back to Top | Article Outline

RESULTS

We identified 166,031 new treatment episodes (for 150,124 patients) and 564 ischemic cardiovascular events during the 90-day follow-up period. Among those, 12,580 were new users of strongly inducing anticonvulsants who experienced 68 ischemic cardiovascular events. Table 1 shows the variables included in the propensity-score model: basic covariates, ie, age (five categories), sex, and calendar year (six categories); and 33 investigator-selected covariates (30 binary and three categorical covariates). As expected, we found considerable imbalances suggesting potential confounding. Compared with initiators of other agents, new users of strongly inducing anticonvulsants were more likely to be male, to have more frequent previous hospitalizations, and to have a history of cardiac arrhythmia, other cardiovascular diseases, hemorrhagic stroke, and epilepsy. Patients taking those medications also visited a physician less frequently, were less likely to have a history of diabetes and hyperlipidemia, and were less likely to have received antihypertensives, lipid-lowering agents, insulin, and oral hypoglycemics in the 6 months before treatment initiation.

TABLE 1

TABLE 1

Tables 2 and 3 show the impact of various strategies of variable selection, ranking criteria for identification of empirical variables, and adjustment methods on the relative risk of cardiovascular events during the first 90 days after initiation of anticonvulsants that were strong inducers compared with other anticonvulsant agents. The unadjusted HR was 1.72 (95% CI = 1.34–2.22). After adjustment for basic covariates, the HR was 1.42 (0.98–2.05) using propensity-score-matching and 1.63 (1.24–2.13) using stratification by propensity-score deciles. Adjustment for propensity score based on basic and investigator-selected covariates (strategy 1) led to further reductions in the HR of cardiovascular events in matched and stratified analyses (HR = 1.31 [0.88–1.94] and HR = 1.13 [0.76–1.69]), respectively).

TABLE 2

TABLE 2

TABLE 3

TABLE 3

When the propensity score based on investigator-selected covariates was informed by empirically selected variables (strategy 2), the adjusted estimates produced under strategy 1 further attenuated toward the null. The change in adjusted estimates was particularly evident in the propensity-score–matched analysis, with 23% to 31% reductions when the propensity score based on investigator-selected covariates was augmented with empirical covariates. Only the augmentation with 700 “bias” ranked covariates seemed to perform less optimally compared with strategy 1 in the propensity-score–stratified analysis (HR = 1.18 [0.75–1.87]).

We also considered adjustment for basic and empirically identified covariates only (strategy 3). This strategy seemed to produce wide and inconsistent variations in the HR estimates and to perform less optimally or not consistently better than strategies 1 or 2. This unpredictable performance pattern seemed to be relatively independent of the ranking criterion for identification of empirical variables although it seemed to be attenuated in the propensity-score–stratified analysis.

Figures 1 and 2 show the relation between the risk of cardiovascular events associated with enzyme-inducing anticonvulsants compared with the risk associated with other anticonvulsants and the number of empirical covariates (k = 0 to k = 700) sequentially added to the propensity-score model, per strategies 2 and 3, respectively. Under strategy 2, the sequential inclusion of empirical covariates led to HRs that tended toward generally attenuated estimates compared with the propensity-score model including basic and investigator-identified covariates (k = 0). The better performance for confounding adjustment was generally achieved within the first 500 empirical covariates added to the propensity-score model. Under strategy 3, the sequential inclusion of empirical covariates led to wide and unpredictable variations of the point estimates, which became evident after the first 50 to 100 empirical covariates were added to the propensity-score model. As previously observed, this pattern seemed to be slightly attenuated in the propensity-score–stratified analysis. However, no attenuated pattern was observed when the propensity-score–stratified analysis was performed without trimming (eFigure 1, http://links.lww.com/EDE/A766). When the propensity-score–matched analysis was performed after 10% trimming, an attenuated pattern was also observed though less evidently than in the trimmed propensity-score–stratified analysis (eFigure 2, http://links.lww.com/EDE/A766). We investigated the informativeness of the empirically selected covariates by plotting the logarithm of the absolute value of the bias term (the potential to bias the exposure–outcome relation) derived from the Bross formula23 (Figure 3). The level of informativeness of empirically selected covariates tended to drop rapidly when we added variables beyond the first 100. We also investigated the relation of the c-statistic generated at each estimated propensity score to the number of empirical covariates sequentially added to the propensity-score model (eFigure 3, http://links.lww.com/EDE/A766). The c-statistic tended to increase by increasing number of empirical covariates though not monotonically for strategy 3.

FIGURE 1

FIGURE 1

FIGURE 2

FIGURE 2

FIGURE 3

FIGURE 3

Finally, we explored the codes and descriptors for the 700 variables chosen by each of the ranking criterion for empirical covariates and added to the propensity-score “model 0,” including basic and investigator-selected covariates (as per strategy 2) or basic covariates only (as per strategy 3) (eTables 2–5, http://links.lww.com/EDE/A766). Many of these codes represent variables identified by the investigator on the basis of subject-matter knowledge only, such as seizures, cardiovascular diseases, diabetes, and cardiovascular drug therapies. Other codes are variables that we as investigators would not have considered for adjustment; these could be proxies for unmeasured confounders such as head or brain radiologic examination, head injury, emergency-related services, or brain-related diagnoses that were not captured in our exclusion criteria relating to cancer diagnosis, ie, neoplasm codes more likely indicating ascertainment for suspect of cancer than definite cancer diagnosis). Certain other codes, based on their prevalence and univariate association with the exposure and outcome, can function as instruments and have the potential of introducing bias (eg, neuralgia, low back pain, medications for pain management). We also investigated which codes, among the ones used for covariate definition by the investigator in Table 1, were not selected as empirical covariates. Several medical diagnoses (eg, hyperlipidemia, diabetes complications, cardiac arrhythmia, previous myocardial infarction, stable angina, dementia, drug abuse, or dependence), medical procedures (eg, peripheral vascular disease or vascular surgery), and drug therapies (eg, specific statins and other lipid-lowering agents, beta-blockers, angiotensin receptor blockers, diuretics, glucose-lowering agent, antidepressants, and other psychotropic medications) were either not captured or only partially captured (eTables 2–5, http://links.lww.com/EDE/A766).

Back to Top | Article Outline

DISCUSSION

In a study on the effect of anticonvulsant drugs on ischemic cardiovascular risk, we evaluated the performance in adjusting for confounding of several covariate-selection strategies for propensity-score estimation based on the following: expert knowledge only, expert knowledge with additional high-dimensional propensity-score empirical specification of variables, or high-dimensional propensity-score empirical specification only. We also considered multiple adjustment methods based on the estimated propensity score. Among the wide array of methods available for confounding adjustment, we focused on propensity-score methodology, as it is a useful tool for confounding adjustment in the setting of many potential confounders, frequent exposure, and rare events, a common circumstance in epidemiology using secondary databases.

We found that the high-dimensional propensity-score algorithm can be a valuable tool for complementing the investigator’s choice of variables for propensity-score estimation in a context characterized by a limited number of exposed cases, considerable confounding by indication, therapies with heterogeneous pattern of utilizations (on- and off-label), and limited clinical information on their indication. We also confirmed earlier findings that, in a setting with rare outcomes, a selection strategy based on empirically identified variables alone performed less consistently than a strategy combining subject-matter knowledge with empirical specification.21,22 These results seemed largely independent of the empirical variables ranking criterion (“bias” or “exposure-only”). Finally, we observed that a trimmed propensity-score–stratified analysis stabilized the HR estimates more than an untrimmed propensity-score–matched analysis when outcomes are rare.

Our adjusted findings suggested that confounding explains most—if not all—of the increases in relative risk of ischemic coronary or cerebrovascular events associated with the use of anticonvulsants that strongly induce cytochrome P450 activity compared with other anticonvulsant agents during a 90-day follow-up period. Adjustment for investigator-identified covariates led to 41% to 59% reductions in the unadjusted HR of ischemic cardiovascular events; adjustment for both investigator and empirically identified covariates led to larger reductions (54% to 72%), suggesting that high-dimensional propensity-score algorithm can have a considerable impact on confounding adjustment, as previously demonstrated.20–22 Furthermore, consistent with previous studies that found that 350 to 400 variables are usually sufficient for maximal confounding adjustment,22 we generally achieved better performance for confounding adjustment within the first 500 empirical covariates added to the propensity-score model including basic and investigator-identified covariates.

The volatility in the relative risk estimates observed with empirical variable specification alone has been previously described. In particular, it has been shown that in the setting of rare exposed outcomes, there might not be sufficient information for high-dimensional propensity-score adjustment to function optimally and thus high-dimensional propensity-score might not offer any improvement beyond investigator’s specification of covariates.21,22

In our study example, the distribution of the estimated propensity scores of exposed and unexposed subjects tended to be highly right-skewed, and a substantial number of cardiovascular events occurred among the subjects falling in the tails of the propensity-score distribution. Minor shifts in the propensity-score distributions by adding covariates one by one may lead to the inclusion or exclusion of multiple cases in the matching sets. Given the overall small number of cases, this can be quite influential on the effect estimates and thus produce the observed volatility. In such a context, trimmed analyses (which excluded portions of the tails of the propensity-score distribution) stabilized the effect estimates more than untrimmed analyses.

Another scenario to consider is that the accurate identification of some confounders in healthcare utilization databases may involve complex algorithms with multiple diagnoses, procedures, and drug codes.42 Although this is the usual approach for variable identification in an investigator-based covariate-selection strategy, the use of principle-components analysis in future versions of the high-dimensional propensity-score algorithm may better capture this information. The nature of the wide and unpredictable variations in the point estimates observed under our strategy using empirical specification alone needs further exploration.

Regardless of the explanation, an inspection on the relation of the effect estimates to the increasing number of empirical covariates may be a useful diagnostic. Any meaningful departure from a monotonic trend or a slope that does not flatten after several hundred covariates should be interpreted as warning signal.

Although it is a limitation that in this study we focused on only a specific selection of strategies for propensity-score estimation and confounding adjustment with rare outcomes, these are standard approaches for variable selection and confounding adjustment. Our results have implications that might apply beyond the specific methods considered (ie, alternative automated approaches for variable selection), and might be of interest to the broader area of rare diseases epidemiology.

In conclusion, our study shows that the high-dimensional propensity-score algorithm is a valuable tool to successfully complement the investigator’s subject-matter knowledge for propensity-score estimation and confounding adjustment, and it is particularly useful in applications where the confounders are largely unmeasured or unknown. However, with a limited number of exposed patients with outcomes, the performance of a selection strategy based on high-dimensional propensity-score empirical specification without expert knowledge is less well understood and may be associated with an increased likelihood of bias. In this scenario, thoughtful consideration of variable selection by the investigator combined with empirical specification seems to be a better strategy to balance important covariates across treatment groups. Furthermore, in the context of rare outcomes, trimmed propensity-score analyses may perform better than untrimmed analyses to stabilize the effect estimates. Finally, plotting the relation of effect estimates to the increasing number of empirical covariates is a useful diagnostic tool for signaling situations of suboptimal confounding adjustment.

Back to Top | Article Outline

ACKNOWLEDGMENTS

We thank HealthCore Inc. and specifically Rhonda Bohn, Peter Wahl, and Daniel Mines for enabling and supporting this research.

Back to Top | Article Outline

REFERENCES

1. Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7:1393–1512
2. Van der Laan MJ, Rose S Targeted Learning: Causal Inference for Observational and Experimental Data. 2011 New York, N.Y. Springer
3. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49:1373–1379
4. Steyerberg EW, Schemper M, Harrell FE. Logistic regression modeling and the number of events per variable: selection bias dominates. J Clin Epidemiol. 2011;64:1464–1463
5. Glynn RJ, Gagne JJ, Schneeweiss S. Role of disease risk scores in comparative effectiveness research with emerging therapies. Pharmacoepidemiol Drug Saf. 2012;21(suppl 2):138–147
6. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55
7. Braitman LE, Rosenbaum PR. Rare outcomes, common treatments: analytic strategies using propensity scores. Ann Intern Med. 2002;137:693–695
8. D’Agostino RB Jr. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med. 1998;17:2265–2281
9. Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–560
10. Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:962–973
11. Lendle SD, Fireman B, van der Laan MJ. Targeted maximum likelihood estimation in safety analysis. J Clin Epidemiol. 2013;66(8 suppl):S91–S98
12. Tibshirani R. Regression shrinkage and selection via the Lasso. J Royal Statist Soc B. 1996;58:267–288
13. Setoguchi S, Schneeweiss S, Brookhart MA, Glynn RJ, Cook EF. Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol Drug Saf. 2008;17:546–555
14. Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Stat Med. 2010;29:337–346
15. Austin PC. Using ensemble-based methods for directly estimating causal effects: an investigation of tree-based G-computation. Multivariate Behav Res. 2012;47:115–135
16. Rubin DB. Estimating causal effects from large data sets using propensity scores. Ann Intern Med. 1997;127(8 pt 2):757–763
17. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol. 2006;163:1149–1156
18. Myers JA, Rassen JA, Gagne JJ, et al. Effects of adjusting for instrumental variables on bias and precision of effect estimates. Am J Epidemiol. 2011;174:1213–1222
19. Brookhart MA, Stürmer T, Glynn RJ, Rassen J, Schneeweiss S. Confounding control in healthcare database research: challenges and potential approaches. Med Care. 2010;48(6 suppl):S114–S120
20. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20:512–522
21. Rassen JA, Glynn RJ, Brookhart MA, Schneeweiss S. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples. Am J Epidemiol. 2011;173:1404–1413
22. Rassen JA, Schneeweiss S. Using high-dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system. Pharmacoepidemiol Drug Saf. 2012;21( suppl 1):41–49
23. Chen H, Deshpande AD, Jiang R, Martin BC. An epidemiological investigation of off-label anticonvulsant drug use in the Georgia Medicaid population. Pharmacoepidemiol Drug Saf. 2005;14:629–638
24. Radley DC, Finkelstein SN, Stafford RS. Off-label prescribing among office-based physicians. Arch Intern Med. 2006;166:1021–1026
25. Bross ID. Spurious effects from an extraneous variable. J Chronic Dis. 1966;19:637–647
26. Rassen JA, Doherty M, Huang W, Schneeweiss S Pharmacoepidemiology Toolbox. Boston, Mass Available at: http://www.hdpharmacoepi.org
27. Mintzer S, Skidmore CT, Abidin CJ, et al. Effects of antiepileptic drugs on lipids, homocysteine, and C-reactive protein. Ann Neurol. 2009;65:448–456
28. Calandre EP, Rodriquez-Lopez C, Blazquez A, Cano D. Serum lipids, lipoproteins and apolipoproteins A and B in epileptic patients treated with valproic acid, carbamazepine or phenobarbital. Acta Neurol Scand. 1991;83:250–253
29. Nikolaos T, Stylianos G, Chryssoula N, et al. The effect of long-term antiepileptic treatment on serum cholesterol (TC, HDL, LDL) and triglyceride levels in adult epileptic patients on monotherapy. Med Sci Monit. 2004;4:MT50–MT52
30. Isojärvi JI, Pakarinen AJ, Myllylä VV. Serum lipid levels during carbamazepine medication. A prospective study. Arch Neurol. 1993;50:590–593
31. International Classification of Diseases, Ninth Revision, Clinical Modification. 1988 Washington, D.C. US Dept. of Health and Human Services
32. McNamara JOBruton LL, Lazo JS, Parker KL. Pharmacotherapy of the Epilepsies. Goodman & Gilman’s The Pharmacological Basis of Therapeutics. 200611th ed. New York, NY McGraw-Hill In
33. Perucca E. Clinically relevant drug interactions with antiepileptic drugs. Br J Clin Pharmacol. 2006;61:246–255
34. Solomon DH, Avorn J, Katz JN, et al. Immunosuppressive medications and hospitalization for cardiovascular events in patients with rheumatoid arthritis. Arthritis Rheum. 2006;54:3790–3798
35. Rassen JA, Choudhry NK, Avorn J, Schneeweiss S. Cardiovascular outcomes and mortality in patients using clopidogrel with proton pump inhibitors after percutaneous coronary intervention or acute coronary syndrome. Circulation. 2009;120:2322–2329
36. Wahl PM, Rodgers K, Schneeweiss S, et al. Validation of claims-based diagnostic and procedure codes for cardiovascular and gastrointestinal serious adverse events in a commercially-insured population. Pharmacoepidemiol Drug Saf. 2010;19:596–603
37. World Health Organization. International Statistical Classification of Diseases, 10th Revision (ICD-10). 1992 Geneva, Switzerland World Health Organization
38. Physicians’ Current Procedural Terminology (CPT). 19894th ed Chicago, Ill. American Medical Association
39. Stürmer T, Rothman KJ, Avorn J, Glynn RJ. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution—a simulation study. Am J Epidemiol. 2010;172:843–854
40. Parsons L Reducing Bias in a Propensity Score Matched-Pair Sample Using Greedy Matching Techniques. Proceedings of the 26th Annual SAS User Group International Conference. 2001 Cary, N.C. SAS Institute Inc. Available at: http://www2.sas.com/proceedings/sugi26/p214-26.pdf. Accessed Jun 24, 2013
41. Cleveland WS. Robust locally-weighted regression and smoothing scatterplots. J Am Stat Assoc. 1979;74:829–836
42. Toh S, García Rodríguez LA, Hernán MA. Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records. Pharmacoepidemiol Drug Saf. 2011;20:849–857

Supplemental Digital Content

Back to Top | Article Outline
© 2014 by Lippincott Williams & Wilkins, Inc