Secondary Logo

Journal Logo


A Propensity-score-based Fine Stratification Approach for Confounding Adjustment When Exposure Is Infrequent

Desai, Rishi J.; Rothman, Kenneth J.; Bateman, Brian .T; Hernandez-Diaz, Sonia; Huybrechts, Krista F.

Author Information
doi: 10.1097/EDE.0000000000000595


Propensity scores (PSs), which are balancing scores defined as the predicted probability of receiving a particular exposure given covariate realizations of a study subject,1 are commonly used to account for a large number of confounders efficiently in pharmacoepidemiology studies, especially when the outcome is infrequent and adjusting for a large number of confounders in conventional multivariable models may result in overfitting and biased estimates.2 Two of the most widely used propensity score-based approaches to control confounding are matching a comparison group to the exposed group by propensity score and stratifying subjects by a function of the score.3 Propensity-score matching has previously been shown to be superior with respect to bias reduction to stratification using five strata.4,5 Matching, however, reduces precision because, along with exposed patients for whom no reference patient with comparable propensity is found, it also omits those comparison patients who would be matchable but who are not selected because a prespecified number of matches were already selected for each exposed patient.4,5 This loss of subjects is especially relevant when exposure is rare, in which case propensity-score matching excludes a large proportion of comparison subjects and results in considerable loss of information. In contrast, all these potential comparison subjects would be retained in an analysis that stratifies or weights by a function of the propensity score.

It has long been thought that creating five strata based on a continuous variable, such as quintiles of the propensity score, with the stratum boundaries determined by its distribution in the exposed and the comparison group combined, eliminates approximately 90% of measured confounding.6,7 When exposure is infrequent, however, determining the stratum boundaries by the propensity-score values for the combined groups of exposed and comparison patients may result in all the exposed patients being aggregated in one or more extreme strata. This approach is therefore prone to considerable residual confounding. This aggregation of exposed subjects in the extreme strata can be mitigated by increasing the number of strata, or by forming the stratum boundaries based on the values of the PS for the exposed group alone. While both these approaches have been discussed in the literature,8,9 they have not been compared with other standard confounding control approaches in settings with infrequent exposure, where the aggregation of exposed at one end of the PS distribution is most severe. Therefore, using both a simulation study and a previously published empirical example of the fetal safety of first trimester statin use during pregnancy,10 we assessed confounding control and precision for various PS stratification approaches and for PS matching and weighting.


Empirical Example

For our empirical evaluation, we assessed the prevalence of congenital malformations among live-born infants after first trimester statin exposure, compared with nonexposure. Statins are contraindicated during pregnancy owing to teratogenicity concerns from animal studies. Furthermore, the indications for statin use are uncommon in women of reproductive age. Therefore, exposure to statins during pregnancy is rare. We studied a previously assembled cohort of Medicaid-enrolled women ages 12 to 55 years with completed pregnancies resulting in live-born infants, from 46 US states and Washington, DC, for the period of 2000–2007.11 The use of this deidentified database for research was approved by the Institutional Review Board of Brigham and Women’s Hospital. We estimated the date of last menstrual period (LMP) based on the delivery date combined with a validated algorithm based on diagnosis codes for preterm delivery.12 We required women to have Medicaid eligibility continuously from 3 months before the estimated LMP month through 1 month postpartum. Similarly, to have complete information on malformations during the outcome measurement period (first 3 months of life), linked infants were also required to have continuous enrollment in Medicaid for at least 3 months following birth, unless they died in which case a shorter eligibility period was allowed. We excluded pregnancies with exposure to a known teratogenic medication (i.e., lithium, antineoplastic agents, retinoids, or thalidomide) during the first trimester and pregnancies in which the infant was diagnosed with a chromosomal abnormality.

For this analysis, women who filled at least two prescriptions for statins during the first trimester were defined as exposed. Statins included in our study were simvastatin, lovastatin, pravastatin, fluvastatin, atorvastatin, cerivastatin, and rosuvastatin. We only considered statin exposure in the first trimester because it is the etiologically relevant window of exposure for the outcome of congenital malformations. Women not meeting this exposure definition were classified as unexposed. The outcome of interest was any congenital malformation in the infant. These malformations were identified based on the presence of ICD-9 diagnostic codes on two or more separate days in the infant inpatient or outpatient records during the first 3 months of life. The potential confounders considered were demographic characteristics of maternal age, race and region, maternal comorbid conditions including diabetes, hypertension, dyslipidemia, obesity, tobacco use, chronic renal disease, illicit drug and alcohol abuse, maternal comedication use including other suspected teratogenic medications, oral antidiabetic medications, hypertension medications, and insulin, and healthcare utilization variables including the number of distinct prescription medications (other than statins) and physician visits. Details on the study design, variable measurements, and substantive results have been previously published.10

Simulation Study

To augment the findings from the empirical example, we also conducted a simulation study with 1,000 simulated cohorts of 20,000 women. To base our simulation parameters on realistic covariate distributions, we generated 10 confounders (c1–c10) using confounder–exposure associations, confounder–outcome associations, and confounder prevalences observed in the empirical study. In addition, we generated two hypothetical nonconfounding exposure determinants (c11, c12) and two hypothetical nonconfounding outcome determinants (c13, c14). eTable 1 ( summarizes the parameters that were used to generate simulated data.

A binary exposure variable was generated indicating first trimester statin use versus nonuse. A logistic regression model was used to predict the probability of the exposure using 10 confounders and two exposure determinants (c1–c12). A binary outcome variable was generated indicating the presence or absence of congenital malformations, also using logistic regression models with 10 confounders (c1–c10) and two outcome determinants (c13, c14). A true null exposure effect of statins on the risk of congenital malformations was generated by not including the simulated exposure variable in the logistic regression model predicting the outcome. We used logistic regression models to generate the binary outcome variable and retrieve the true effect estimates as odds ratios (ORs) in our simulations as these models are theoretically more robust to nonconvergence compared with log-binomial models13–15 and when the treatment effect is null, ORs and risk ratios (RRs) coincide.

We generated eight simulation scenarios by varying the exposure prevalence and outcome risks. We varied simulated exposure prevalence (0.5%, 1%, 5%, and 10%) by changing the intercept term in the logistic regression models used to generate the exposure variable, while keeping other coefficients constant. Two outcome risk scenarios were generated for each exposure scenario; the first was based on the risk observed in the empirical example (3.5%) and the second was chosen to represent more common outcomes (10%). The null exposure effect was held constant across the simulation scenarios.

Statistical Analysis

We implemented propensity score-based methods in both the empirical example and the simulation studies to account for measured confounders. The propensity score was estimated as the predicted probability of statin exposure during the first trimester using logistic regression models. In the empirical study, all the confounders described above (under empirical example) were included in the propensity-score model, while in the simulation studies the 10 confounders, c1–c10, were included in the propensity-score models. After propensity-score estimation, the following three approaches were used to derive adjusted associations between statin exposure and congenital malformations.

  1. Propensity-score stratification: Two approaches were used to conduct stratification: (1) creating equally-sized propensity-score strata, numbering 10, 50, or 100, after ranking the entire cohort based on the propensity score (hereafter referred to as the propensity-score strata cohort approach); and (2) creating unequally sized propensity-score strata, numbering 10, 50, or 100, after ranking only the exposed patients based on the propensity score and assigning unexposed patients to these strata based on their PS (hereafter referred to as the propensity-score strata exposed approach). Weighted regression models were used to derive an adjusted exposure effect after stratification, in which each exposed patient received a weight of 1 and unexposed patients were weighted in proportion to the distribution of the exposed in the stratum into which they fell. The unexposed group weights were scaled to sum to the number of unique unexposed individuals included in the analysis

  2. Effectively, this weighting creates a pseudopopulation in which confounder distribution concordance is achieved between the exposed and unexposed groups, to the extent that it is achieved within each stratum. The exposure effect estimate can then obtained by computing the marginal effect estimates in this weighted population using generalized linear models, with exposure term as the only independent variable. Because the weighting in this approach aims to make the confounder distribution among the unexposed akin to their distribution in the exposed, the marginal estimates computed in the weighted population consistently estimate the average treatment effect among the exposed (also referred to as the average treatment effect among the treated).3,16 SAS macros used for the propensity-score stratification approaches can be downloaded from our website (
  3. Propensity-score matching: We used three propensity-score matching approaches. The first two approaches matched unexposed women to each statin-exposed woman using a nearest neighbor approach with a caliper of 0.01 in ratios of 1:1 and 1:5. For 1:5 matching, we employed a variable-ratio matching strategy, allowing for fewer than the target number of matches as long as at least one match is found, as this strategy has been recommended over fixed-ratio matching as the preferred method for achieving greater confounding control.17,18 The third propensity-score matching approach consisted of a recently proposed full matching strategy, in which exposed and unexposed individuals were matched on the propensity score to form matched sets that contained at least one exposed and at least one unexposed individual using an optimal matching algorithm seeking to minimize the mean within matched-set differences in the propensity score between the exposed and unexposed individuals.19 Unlike traditional propensity-score matching approaches, full matching seeks to utilize information on the majority of the original patient population by including all matchable exposed and unexposed individuals in matched sets. Full propensity-score matching was implemented using the R package MatchIt (R Version 3.2.3).16 After matching, generalized linear models were used to derive average treatment effect estimates among the exposed for all three approaches. For 1:5 and full matching, weights induced by matching were incorporated in the regression models. These weights were created in a manner similar to the stratification weights, where unexposed individuals were weighted according to the distribution of the exposed in each matched set. To account for the clustering of subjects within matched sets, we used a robust variance estimator to compute 95% confidence intervals (CIs) for propensity-score matching methods.19
  4. SMR weighting: We also used the SMR weighting approach based on the propensity score to derive exposure effect estimates. In this approach, the unexposed patients within our cohort received weights equal to the ratio of (PS/(1 − PS)), while the exposed patients received weights of 1.20 Marginal effect estimates and 95% CI for exposure effects were computed using weighted generalized linear models and robust variance estimators. Because this approach reweights the unexposed population to be similar to the exposed population with respect to confounder distribution, it also results in estimation of the average treatment effect among the exposed.21

We excluded the observations from the nonoverlapping regions of the propensity-score distributions among exposed and unexposed populations before conducting propensity-score full matching, stratification, and SMR weighting. This step, also referred to as “trimming,” ensures exclusion of patients who will always or never receive therapy because of indications or contraindications and focuses the estimation of treatment effects in a population with clinical equipoise.22 The ability of propensity-score-based approaches to allow researchers to measure treatment effects in a population with clinical equipoise through trimming is one of its great strengths over traditional multivariable outcome regression models.

Evaluation of Performance

In the simulation study, we compared performance across 1,000 simulations of each method on the following metrics: (1) confidence limit ratios (CLR),23 which were computed on the OR scale as the average of ratios of upper to lower 95% confidence limit for the exposure effect estimates, as indicators of precision; (2) relative (%) bias estimates as indicators for the extent of confounding control, which were calculated on the OR scale as

, where

is the exponentiated average of the estimated coefficients and

is exponentiated true exposure coefficient, (3) mean squared errors (MSE), which were computed on the log OR scale as the average of the squared differences between estimated and true exposure coefficients, as indicators for the overall accuracy.24

In the empirical study, baseline characteristics of the full cohort were reported by statin exposure status. The differences in the baseline characteristics between the exposed and the unexposed women were summarized using absolute standardized differences.25 For 1:5 and full PS matching, all stratification approaches, and the SMR weighting approach, absolute standardized differences were computed after weighting the unexposed observations according to the weight calculations described above. We further computed risk differences (RDs) and RRs along with their 95% confidence interval for each of the method using weighted generalized linear models with identity and log links, respectively, in the SAS GENMOD procedure (SAS version 9.3, SAS institute, Cary, NC). Because the truth is unknown in the empirical study, we did not compute other measures of performance (relative bias estimates and MSEs).


Empirical Study

The full cohort comprised 886,996 pregnancies. Of these, 335 (0.04%) were classified as exposed to statins during the first trimester. Important baseline differences between statin exposed and unexposed women were observed in the full cohort (Table). Statin-exposed women were older, had a higher prevalence of all of the comorbid conditions considered, and higher use of antihypertensive medications, insulin, and oral diabetes medications. 1:1 and 1:5 PS matching, SMR weighting, as well as propensity-score stratification (exposed) approaches resulted in excellent covariate balance between exposed and unexposed groups, as demonstrated by low values of absolute standardized differences (Table). Notably, propensity-score stratification (cohort) approaches demonstrated remaining imbalances in numerous important baseline characteristics, including diabetes and hypertension diagnosis, as well as medication use for these conditions (absolute standardized difference > 0.1). Full propensity-score matching also resulted in imbalances in a number of characteristics including race and age categories between exposed and unexposed groups.

Distribution of Selected Baseline Characteristics in Pregnant Women by Their Statin Exposure Status in the First Trimester and Differences in Baseline Characteristics with Different Analytic Approaches, Medicaid Data 2000–2007

In the unadjusted analysis, statin exposure was associated with increased risk of malformations compared with nonexposure (RD: 0.048, 95% CI: 0.019, 0.078 and RR: 2.4, 95% CI: 1.6, 3.4; Figure 1). After 1:1 matching on the propensity score, this large increase in risk associated with statin exposure was greatly diminished (RR: 1.2, 95% CI: 0.69, 2.0). 1:5 and full propensity-score matching resulted in similar attenuation of the RR estimates (RR: 1.3, 95% CI: 0.86, 2.0 and RR: 1.2, 95% CI: 0.76, 1.9, respectively). Trimming nonoverlapping regions of the propensity-score distribution resulted in exclusion of 315,908 unexposed patients (35.8%) and none of the exposed patients. In the propensity-score stratification (cohort) approach, the RR (95% CI) was estimated at 1.7 (1.2, 2.4) with 10 strata. This value was reduced to 1.4 (0.97, 2.0) with 50 and 1.3 (0.91, 1.8) with 100 strata. In the propensity-score stratification (exposed) approach, the RR (95% CI) was estimated at 1.2 (0.86, 1.7) with 10 strata, 1.2 (0.85, 1.7) with 50 strata, and 1.3 (0.88, 1.8) with 100 strata. The SMR weighting approach resulted in an RR (95% CI) of 1.1 (0.79, 1.6). The patterns in RDs with each method mirrored the patterns observed in RRs (Figure 1).

Measures of association between congenital malformation and first trimester statin exposure versus nonexposure using different analytic approaches, Medicaid Data 2000–2007. LCL indicates lower confidence limit; PSScohort, propensity-score stratification, strata created by ranking the entire cohort; PSSexposed, propensity-score stratification, strata created by ranking only the exposed group; SMRW, standardized mortality ratio weighting; UCL, upper confidence limit.

Simulation Study

For almost all of the simulation scenarios, relative bias was higher with the propensity-score stratification (cohort) approach using 10 and 50 strata compared with the propensity-score stratification (exposed) approach using the same number of strata (Figure 2). Relative bias with the cohort approach decreased substantially with increasing number of strata, as well as increasing exposure prevalence or outcome risk. For all propensity-score matching approaches, exposed approaches, and the SMR weighting approach, relative bias was close to zero regardless of the exposure prevalence and outcome risk.

Relative bias for different analytic approaches over 1,000 simulations. The dashed line indicates unbiased estimates (% bias of 0). PSScohort indicates propensity-score stratification, strata created by ranking the entire cohort; PSSexposed, propensity-score stratification, strata created by ranking only the exposed group; SMRW, standardized mortality ratio weighting.

In terms of precision, propensity-score matching approaches resulted in greater confidence limit ratios (lower precision) compared with all the stratification approaches under all exposure–outcome combinations studied (Figure 3). The SMR weighting approach also resulted in lower precision compared with the stratification approaches. Propensity-score stratification cohort and exposed approaches were equivalent in terms of their precision across all the scenarios tested.

Confidence limit ratios for different analytic approaches over 1,000 simulations. The dotted line indicates precision (as measured by the CLR) of the crude estimate at 3.5% outcome risk; the dot-dashed line indicates precision (as measured by the CLR) of the crude estimate at 10% outcome risk. PSScohort indicates propensity-score stratification, strata created by ranking the entire cohort; PSSexposed, propensity-score stratification, strata created by ranking only the exposed group; SMRW, standardized mortality ratio weighting.

Overall, all stratification approaches and the SMR weighting approach resulted in lower MSEs compared with propensity-score matching approaches at 0.5% and 1% exposure prevalences (Figure 4). At the lowest exposure prevalence and outcome risk combination (0.5% exposure–3.5% outcome), increasing the number of strata in the stratification approaches appeared to increase the MSEs. At 5% and 10% exposure prevalences, all approaches resulted in MSEs close to zero, indicating equivalent performance of these methods.

Mean squared errors for different analytic approaches over 1,000 simulations. PSScohort indicates propensity-score stratification, strata created by ranking the entire cohort; PSSexposed, propensity-score stratification, strata created by ranking only the exposed group; SMRW, standardized mortality ratio weighting.


At very low exposure prevalences (0.5% and 1%), propensity-score stratification approaches with strata formed based on the exposed group distribution resulted in greater confounding control as compared with stratification approaches based on the whole cohort. Propensity-score stratification based on the exposed group also provided greater precision compared with propensity-score matching and SMR weighting approaches. Regardless of the ranking method used, creating finer strata by increasing the number of strata resulted in greater confounding control without any meaningful loss in precision. At higher exposure prevalences (5% and 10%), very little difference in the performance of these methods was noted.

Our finding of improved precision with propensity-score stratification approaches compared with propensity-score matching is in keeping with theoretical expectations and previous reports.3,4 In general, propensity-score stratification could be expected to produce estimates with greater precision compared with propensity-score matching approaches that discard unmatched unexposed or exposed patients from the analysis. The higher precision of stratification compared with the SMR weighting approach implies that conducting a weighted analysis based on strata-specific weighting of unexposed may be less prone to the influence of extreme weights than SMR weighting. Full matching also resulted in lower precision compared with stratification approaches despite using a nearly equivalent amount of information. However, the confidence intervals for full propensity-score matching were computed based on robust variance estimators, while confidence intervals for stratification were not. One could argue that propensity-score stratification approaches may also result in formation of clustered samples of individuals in each stratum similar to the clustering observed in propensity-score matching approaches, and therefore stratification approaches should use robust variance estimators. However, because the number of strata in the stratification approach is typically much smaller than the number of matched sets in full propensity-matching in practice, computation of robust variance estimators for estimates from stratification approaches is challenging.26

The gain in precision with propensity-score stratification approaches can be substantial in circumstances with low exposure prevalence. Exposure prevalence may be low in various settings. In our empirical example, the exposure was a treatment used despite contraindications. Other settings where exposure prevalence might be expected to be low are safety evaluations of newly marketed drugs, and safety evaluations of infrequently used drugs in other vulnerable populations such as elderly or children. At higher exposure prevalences, the gain in precision offered by propensity-score stratification may be modest compared with propensity-score matching.

Our simulations demonstrate the possibility of substantial residual confounding in coarse stratification approaches with ranking based on the whole cohort (the propensity-score stratification cohort approach). This problem can be remedied by finer stratification, but the approach of forming strata based on the exposed only (the propensity-score stratification exposed approach) was highly effective. In essence, both these solutions address the problem of residual confounding by defining propensity-score strata narrowly so that they resemble propensity-score matched sets in which multiple exposed individuals are matched to multiple unexposed individuals based on their propensity score. Each narrow stratum contains approximately exchangeable exposed and unexposed individuals. Therefore, an analysis accounting for the narrow-stratum membership of each individual results in excellent confounding control. The problem of residual confounding with coarse stratification in the propensity-score stratification cohort approach is especially severe when the exposure is infrequent, as stratum boundaries are almost entirely driven by the propensity-score distribution in the unexposed. Consequently, the exposed patients cluster in just a few strata at the extreme, leaving the potential for substantial confounding within strata. Such clustering of exposed patients is avoided in the propensity-score stratification exposed approach, because the strata are determined by the propensity-score distribution in the exposed (eTables 2a and 2b; In situations with low exposure prevalence, the propensity-score stratification exposed approach is a method that is both efficient and effective in controlling confounding.

An alternative way to create fine strata would be to use fixed stratum boundaries defined on the probability scale. With 10 equally wide strata, this approach would place all the subjects with propensity-score values between 0 and 0.1 into the first stratum, and similarly for the other nine strata. The results from this approach applied to our empirical example, using 100 strata, were similar to stratification approaches that use distributions of the propensity scores either in the whole cohort or the exposed group (100 strata with the fixed-width approach [OR: 1.34, 95% CI: 0.91, 1.97], compared with the stratification cohort approach [OR: 1.33, 95% CI: 0.90, 1.95] and stratification-exposed approach [OR: 1.27, 95% CI: 0.86, 1.87]).

Our findings also underscore that it may be difficult to make general recommendations about the optimum number of strata that should be used in propensity-score stratification in situations with low exposure prevalence. Full propensity-score matching is another promising method that uses the majority of the individuals in the cohort and selects the number of strata automatically. However, this method produced disappointing results with low exposure prevalence. In our simulations with 0.5% and 1% exposure prevalence, full propensity-score matching resulted in higher MSEs compared with a majority of other approaches. In a previous simulation study, Austin and Stuart19 have also reported high MSEs with full propensity-score matching when not using any caliper restrictions compared with nearest-neighbor propensity-score matching. Introducing a caliper restriction to full propensity-score matching may lead to improved performance.

Traditionally, either Mantel-Haenszel pooling or computing a weighted average of subclass-specific estimates have been used to derive adjusted exposure effects after propensity-score stratification. We used weighted regression models after stratification, over these traditionally used approaches, because this approach does not rely on the homogeneity assumption (as opposed to Mantel-Haenszel pooling) and provides stable estimates under circumstances of a large number of strata or sparsely populated strata (as opposed to the weighted average method). In a separate set of simulations, we have found the performance of the Mantel-Haenszel pooling approach to be robust with propensity-score stratification.27 If researchers can reasonably assume homogeneity of the exposure effect across strata, Mantel-Haenszel pooling is a good, simple alternative to the weighting approach used in this study. The precision of the summary exposure effect estimate using the Mantel-Haenszel estimator is optimal as the strata are pooled using inverse-variance weighting.

In conclusion, our findings indicate that performance of propensity-score-based fine stratification in confounding control is equivalent to propensity-score matching at higher exposure prevalence and better than propensity-score matching at low exposure prevalence. Therefore, propensity-score-based fine stratification should be considered as a strategy for confounding control in routine pharmacoepidemiology practice. Creation of strata should be based on the propensity-score distribution of the exposed group when evaluating outcomes of an infrequent exposure.


1. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.
2. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49:1373–1379.
3. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011;46:399–424.
4. Wang Z. Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision. Int J Epidemiol. 2009;7:1–8.
5. Austin PC, Mamdani MM. A comparison of propensity score methods: a case-study estimating the effectiveness of post-AMI statin use. Stat Med. 2006;25:2084–2106.
6. Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc. 1984;79:516–524.
7. Cochran WG. The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics. 1968;24:295–313.
8. D’Agostino RB Jr. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med. 1998;17:2265–2281.
9. Akers A. Determination of the optimal number of strata for bias reduction in propensity score matching. 2010.University of North Texas.
10. Bateman BT, Hernandez-Diaz S, Fischer MA, et al. Statins and congenital malformations: cohort study. BMJ. 2015;350:h1035.
11. Palmsten K, Huybrechts KF, Mogun H, et al. Harnessing the Medicaid Analytic eXtract (MAX) to evaluate medications in pregnancy: design considerations. PLoS One. 2013;8:e67405.
12. Margulis AV, Setoguchi S, Mittleman MA, Glynn RJ, Dormuth CR, Hernández-Díaz S. Algorithms to estimate the beginning of pregnancy in administrative databases. Pharmacoepidemiol Drug Saf. 2013;22:16–24.
13. Williamson T, Eliasziw M, Fick GH. Log-binomial models: exploring failed convergence. Emerg Themes Epidemiol. 2013;10:14.
14. Skov T, Deddens J, Petersen MR, Endahl L. Prevalence proportion ratios: estimation and hypothesis testing. Int J Epidemiol. 1998;27:91–95.
15. McNutt LA, Wu C, Xue X, Hafner JP. Estimating the relative risk in cohort studies and clinical trials of common outcomes. Am J Epidemiol. 2003;157:940–943.
16. Ho D, Imai K, King G, Stuart EA. MatchIt: nonparametric preprocessing for parametric causal inference. J Stat Softw. 2011;42:28.
17. Ming K, Rosenbaum PR. Substantial gains in bias reduction from matching with a variable number of controls. Biometrics. 2000;56:118–124.
18. Rassen JA, Shelat AA, Myers J, Glynn RJ, Rothman KJ, Schneeweiss S. One-to-many propensity score matching in cohort studies. Pharmacoepidemiol Drug Saf. 2012;21(Suppl 2):69–80.
19. Austin PC, Stuart EA. Estimating the effect of treatment on binary outcomes using full matching on the propensity score. Stat Methods Med Res. 2015. Sep. pii: 0962280215601134. [Epub ahead of print] PubMed PMID: 26329750.
20. Brookhart MA, Wyss R, Layton JB, Stürmer T. Propensity score methods for confounding control in nonexperimental research. Circ Cardiovasc Qual Outcomes. 2013;6:604–611.
21. Kurth T, Walker AM, Glynn RJ, et al. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol. 2006;163:262–270.
22. Glynn RJ, Schneeweiss S, Stürmer T. Indications for propensity scores and review of their use in pharmacoepidemiology. Basic Clin Pharmacol Toxicol. 2006;98:253–259.
23. Poole C. Low P-values or narrow confidence intervals: which are more durable? Epidemiology. 2001;12:291–294.
24. Devore JL, Berk KN. Modern Mathematical Statistics with Applications. 2012.New York, NY: Springer;.
25. Austin PC. Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Commun Stat Simulat Comput. 2009;38:1228–1234.
26. Arceneaux K, Nickerson DW. Modeling certainty with clustered data: a comparison of methods. Polit Anal. 2009;17:177–190.
27. Desai RJ RK, Bateman B, Hernandez-Diaz S, Huybrechts KF. Confounding control using propensity scores when the exposure is infrequent: making the case for a fine stratification approach. Pharmacoepidemiol Drug Safety. 2015;24:1–587.

Supplemental Digital Content

Copyright © 2017 Wolters Kluwer Health, Inc. All rights reserved.