**ArticlePlus**

Click on the links below to access all the ArticlePlus for this article.

Please note that ArticlePlus files may launch a viewer application outside of your web browser.

In cohort studies of cardiovascular disease utilizing large databases with medical care encounters, confounding by a substantial number of cardiovascular medications and comorbidities is not uncommon. The standard method of controlling for such confounding is to fit multiple-regression models, which include a term for the exposure variable of primary interest as well as a term for each of the potential confounders. However, problems may occur for models in which the number of variables is large relative to the number of events. Peduzzi and colleagues^{1} conducted simulation studies evaluating events per variable in logistic regression; they observed that for fewer than 10 events per variable, regression coefficients could be biased and standard errors (SEs) could be incorrect.^{1} Harrell and colleagues^{2} reported similar findings.

There are 2 general approaches to this problem. Models can be made parsimonious by excluding variables that do not meet the criteria for potential confounders. However, cardiovascular disease is characterized by a large number of potential risk factors and medical treatments, each of which may make a small but nonzero contribution to the risk of disease, and which should thus be retained as potential confounders. Furthermore, some of the techniques for constructing parsimonious models (eg, stepwise regression) may require arbitrary decisions by the investigators, and have limitations that can lead to SEs of regression coefficient estimates that are biased low.^{3}

A second approach is to calculate a single summary score from several variables, which reduces the number of variables in the regression model, thus increasing the events per variable. At present, the most common approach to reducing the dimensionality of models is propensity scores.^{4} For dichotomous exposure variables, the score calculated is the probability that each subject is exposed, conditional on their observed confounders. In simulation studies, Cepeda and colleagues^{5} compared logistic regression to propensity score analyses for multiple confounders and rare numbers of events. They reported that the propensity score was a good alternative when there were 7 or fewer events per confounder.

However, propensity scores have several potential limitations. Studies are typically powered to have an adequate number of events. However, some studies assess exposure categories that are relatively infrequent. In such situations, and because the exposure is the outcome in the propensity score estimation, instabilities may occur. Another limitation is that propensity scores are typically applied to dichotomous exposures. Propensity scores can be applied to exposures with more than 2 levels.^{6} However, this approach seems to be rarely used and not well-studied. Furthermore, propensity scores currently cannot be used for continuous exposures. In addition, the use of propensity scores in nested case-control studies (which are often used to study cardiovascular disease), is not well understood.

We recently have used an alternative approach to reduce dimensionality of regression models in cohort^{7–11} and nested case-control^{12} studies of the adverse cardiovascular effects of medications. We have calculated a cardiovascular risk score that is a slight variation of the multivariate confounder score originally proposed by Miettinen as a means to adjust for multiple confounders with a single summary measure.^{13} Such a summary score has 3 important potential advantages.

First, it provides a single numerical measure characterizing cardiovascular disease risk. This is helpful for assessing both how the study groups differ according to cardiovascular risk and whether or not there is effect modification. The distribution of the summary risk score, which can be interpreted as an index of overall cardiovascular risk, provides useful information about the baseline comparability of study groups. The investigator also can determine whether there is effect modification by examining the association between exposure and outcome changes for varying levels of cardiovascular risk.^{7} This would be difficult if not impossible when considering a large number of individual cardiovascular medications and comorbidities.

Second, it can reduce the dimensionality of analyses. A typical study may have 40 covariates that are surrogate indicators of cardiovascular illness plus an exposure variable with 31 levels (such as 10 different nonsteroidal anti-inflammatory drugs [nonsteroidal anti-inflammatory drugs;NSAIDs], each with 3 separate doses vs. a nonuser group). Using risk scores, the ultimate models may have 35 parameters to estimate (30 for the exposure and 5 for the quintiles of the risk score) as opposed to the 70 to be estimated in a conventional analysis.

Third, use of a risk score may be more objective than traditional covariate selection procedures. These procedures use a variety of criteria to select the covariates that are ultimately in the model (eg, use of a *P* value for the test that a parameter estimate differs from 0 or a measure of the effect of including the covariate on the estimate of the exposure parameter). Similarly, there is no wide agreement on threshold *P* values, with use of 0.05, 0.10, or even 0.20 sometimes recommended.

However, despite the potential advantages of confounder summary scores, their use was inhibited by an early study of their statistical properties. Pike and colleagues^{14} examined the multivariate confounder score in the context of the case-control study, demonstrating that under certain conditions for multivariable discriminant analysis and logistic regression, the confounder score would exaggerate the statistical significance of the exposure of interest. They also reported results from a simulation study in which they considered an exposure (not related to case-control status) and 2 confounders, with simulated samples of 100 cases and 100 controls. They evaluated alternative degrees of correlation (ranging from 0 to 0.7) between the exposure and confounders and observed that the empirical Type 1 error rates were larger than the nominal significance levels.

Cook and Goldman^{15} reexamined the multivariate confounder score, and, in particular, revisited the simulation study conducted by Pike and colleagues.^{14} Cook and Goldman agreed with Pike and colleagues’ theoretical arguments of the analysis of covariance model. However, what Cook and Goldman considered an actual multivariate confounder score analysis (ie, creating strata with the confounder score and conducting stratified analyses) was not examined by Pike and colleagues. Pike and colleagues treated the multivariate confounder score as a continuous variable. Cook and Goldman replicated the simulation study reported by Pike and colleagues, comparing the traditional discriminant analysis to the confounder score analysis Pike and colleagues used, as well as to the stratified confounder score analysis and to a propensity score analysis. The stratified confounder score analysis had an exaggeration of statistical significance, similar to that reported by Pike and colleagues. However, this exaggeration was most pronounced when the squared multiple correlation coefficient exceeded 90%, implying a high degree of predictability of the exposure by the confounders, which is not very likely to occur in practice. When the correlation between exposure and confounders was lower, there was little exaggeration of statistical significance. The propensity score analysis was less affected by the high correlation.

Despite this analysis suggesting that the results of Pike and colleagues work may have overstated the limitations of confounder scores for most practical applications, the confounder score has rarely been used. Pike and colleagues’ paper is often cited as the reason. However, although they provided theoretical arguments for an exaggeration of statistical significance, their numerical evidence is limited as to the magnitude of this exaggeration. Their simulation study was for a very specific and relatively uncommon setting, using a statistical method that is rarely used today (discriminant analysis). In terms of current pharmacoepidemiologic practice, the confounder score has not been well-studied.

In recent years, interest in confounder scores has increased.^{16–18} Our studies of adverse cardiovascular effects of medications^{7–12} had very large sample size (tens of thousands to hundreds of thousands of subjects), exposures that were not binary, and a large number of potentially confounding cardiovascular risk factors. Thus, we used confounder scores and found that within the context of the individual studies, they gave results very similar to those from more traditional methods. However, because these results from individual studies may have limited applicability, we present here a more extensive investigation of the performance of confounder scores, set in the context of studies of cardiovascular disease. We conducted extensive simulation studies comparing regression models using the cardiovascular risk score to regression models fitting the individual components of the risk score.

## METHODS

### Cardiovascular Risk Score

The cardiovascular risk score we calculate is a slight variation of the multivariate confounder score proposed by Miettinen.^{13} For the traditional multivariate confounder score, a regression model is first fit relating the study outcome to the exposure of interest and the confounders for the entire study cohort. The score is then computed as the fitted value from that regression model for each study subject. However, the exposure status is set to nonexposure for all study subjects. The subjects are then grouped into strata, based on the score, and the exposure is analyzed across strata. In our variation, the regression model used to derive the summary score is restricted to nonexposed study subjects (instead of fitting the entire cohort and including the exposure variable). The rest of the algorithm is the same as Miettinen's.

We calculated the risk score in the nonexposed group to reduce bias in situations where the exposure is correlated with the confounders. However, for the primary scenarios considered in our simulations, the 2 methods had very similar performance.

All of our simulations assume a binary, nontime-dependent exposure, a binary disease outcome, and a fixed period of follow-up with no censoring. The cardiovascular risk score (which is the confounder score) is derived from a logistic regression model relating the confounders to the outcome and is restricted to nonexposed persons. This summary score can be expressed using either the linear predictor or the fitted value from the regression model and is computed for each member of the cohort. Although the cardiovascular risk score can be treated as a continuous variable, our practice^{7–12} has been to treat the risk score as a categorical variable. The lowest level is defined by persons who have none of the cardiovascular risk factors and the remaining are defined as quantiles. This avoids assuming a functional form for the relationship for the risk score and the outcome, as well as being convenient for assessing effect modification. Although the functional form is avoided for the outcome model, it is still necessary for the estimation of the risk score.

### Data Simulation

Our simulation studies were designed to mimic the recently published study of nonsteroidal anti-inflammatory drugs and the risk of acute myocardial infarction and sudden cardiac death,^{12} with the exception that we simulated cohort data instead of case-control samples. Note that our simulation study design was also consistent with other pharmacoepidemiologic studies that motivated this work.^{7–11} The outcome was simulated from a logistic regression model that included a binary exposure and 30 confounders (eg, cardiovascular risk factors). Regression parameters were specified such that event probabilities took values of approximately 0.1 and 0.01. We considered exposure probabilities of 0.1, 0.01, and 0.001. We employed a 2-step approach to simulate the cardiovascular risk factors. First, we simulated the probability of a study subject having any cardiovascular risk factors and considered probabilities of 0.58 and 0.8, which were the reported prevalences of any cardiovascular drug use in the past year for controls and cases, respectively.^{12} If the study subject was found to have cardiovascular risk factors, we then simulated the risk factors and considered 2 different settings. The first consisted of all dichotomous confounders, reflecting our motivating study. The second consisted of confounders in which half were dichotomous and half were continuous. The prevalences of the dichotomous risk factors were based on those reported among controls in Table 1 of the motivating study.^{12} The continuous risk factors were simulated from a normal distribution with mean of 0 and standard deviation of 1. To induce confounding, the exposure prevalence was increased by one-tenth of the proportion of confounders that the study subject possessed. For example, if a study subject had 15 out of the 30 cardiovascular risk factors, then the probability of being exposed was increased by one-tenth times one-half, ie, by 0.05. Note that for the continuous risk factors, a risk factor was considered to be present if it took a positive value.

We set the odds ratio (OR) for the exposure to 2.0. For the confounders, we set one-third of their ORs to 1.25, another third to 1.5, and the final third to 2.0. Similarly for the simulations in which half the confounders were continuous, we set one-third of their ORs to 1.25, another third to 1.5, and the final third to 2.0 separately among the dichotomous confounders and the continuous confounders.

### Sampling Strategy

For each simulation setting, 1000 samples were generated. The motivating pharmacoepidemiologic studies included very large samples, consisting of hundreds of thousands of study subjects. To reflect this, and also due to computational considerations, each simulated sample consisted of 10,000 study subjects.

### Model-Fitting Approaches

For each simulated sample, the data were fit using 5 different logistic regression models. We first fit the exposure and all 30 individual confounders in a single regression model, which was designated as the “conventional” model. The second, third, and fourth regression models fit the exposure and the summary risk score. The second model included the risk score as a linear term. The third model used 5 groups in which the lowest group consisted of study subjects with no risk factors and the remaining 4 groups were defined by quartiles based on study subjects who had any cardiovascular risk factors. The fourth model used 10 groups in which the lowest group consisted of study subjects with no risk factors and the remaining 9 groups were defined by evenly spaced percentiles based on study subjects who had any cardiovascular risk factors. For the fifth model, we derived the multivariate confounder score as described by Miettinen^{13} and categorized it into 5 groups similar to that of our third model.

Because the 3 regression models using the summary risk score had similar results, we only report the simulation findings when the risk score was expressed as 5 groups, which is consistent with Miettinen's approach.

### Evaluation Strategies

To evaluate the performance of both the risk score and conventional models, we computed the percent bias (the average difference between the estimated and true regression coefficient divided by the true regression coefficient times 100%). We computed the mean SE of the estimated regression coefficient for the exposure as well as the standard error estimate (SEE) from the simulated samples. The SEE is the standard deviation of the estimated regression coefficients from the simulated samples, and the mean SEs should approximate the SEEs. Coverage probabilities—the proportion of the 95% confidence intervals (CIs) that include the true regression coefficient—were also computed for each model. If a method has proper coverage, then for a 95% CI, the coverage should be very close to 0.95. Finally, the empirical power was estimated by dividing the exposure's estimated regression coefficient by its SE and tabulating the proportion of simulations in which this ratio exceeded 1.96, the upper 97.5 percentile for a 2-tailed 0.05-level test. Given that there is a true association between the exposure and disease, values closer to 1 are desirable. Simulations were conducted using SAS version 9.1 (SAS Institute, Cary, NC). The simulation code is presented in the Appendix (available with the online version of the article).

## RESULTS

Tables 1 and 2 contain simulation study results in which all of the confounders are dichotomous. In Table 1, the probability of having any cardiovascular risk factor is 0.58. The mean SEs approximated the SEE well for all of the regression models, and the mean SEs and SEEs from the risk score models were similar to the corresponding mean SEs and SEEs from the “conventional” models—although in some settings were slightly smaller. Bias, coverage probabilities, and power were similar between risk score and conventional models, although in some settings, the risk score models had not only slightly greater bias but also greater power. Table 2 summarizes the findings in which the probability of having any cardiovascular risk factor was 0.8. These results are similar to those observed in Table 1, except when the event probability is 0.01 and exposure probability is 0.001. In this setting, the mean SEs from both the risk score and conventional models clearly overestimate their corresponding SEEs.

Tables 3 and 4 present simulation study results in which half the confounders are dichotomous and half are continuous. The probabilities of having any cardiovascular risk factors were 0.58 and 0.8 for Tables 3 and 4, respectively. In both tables, the mean SEs approximated the SEEs well for all models and all event/exposure settings. The mean SEs and SEEs from the risk score models were similar to the corresponding mean SEs and SEEs from the conventional models, although the SEs from the risk score models were consistently slightly smaller. In both tables, when the event probability was 0.01 and exposure probability was 0.001, the mean SEs from the risk score and conventional models overestimated their corresponding SEEs. Bias, coverage probabilities, and power were consistent across the risk score and conventional models, although the risk score models consistently had slightly lower power.

The above simulations had only moderate correlation between the exposure and the confounding variables, which reflected the underlying pharmacoepidemiologic examples. We thus conducted additional simulations to assess the effect of more extreme correlation between exposure and confounders. To simplify our investigation, we considered 10 dichotomous confounders: 3 with ORs set to 1.25, 4 with ORs set to1.5, and 3 with ORs set to 2.0. We simulated exposure status with probability of 0.1, and then to induce correlation, simulated our confounders conditional on exposure status. We selected 3 confounders (1 with OR set to 1.25, 1 with OR set to 1.5, and 1 with OR set to 2.0), and, for an exposed subject, we simulated these confounders with prevalences of 0.5. If the subject was unexposed, these confounders were simulated with prevalences of 0.1. That is Pr(3C+ |E+) = 0.5. All other confounders, regardless of exposure status, were simulated with prevalences of 0.1. We also evaluated prevalences of 0.2 and 0.1 for the 3 confounders among exposed subjects, the latter simulating independence between exposure and confounders. We considered exposure ORs of 1.0 and 2.0. The former corresponds to the simulations conducted by Pike and colleagues^{14} and Cook and Goldman.^{15} We then simulated our outcome using logistic regression as previously described such that event probabilities were approximately 0.01. For each simulation setting, 1000 samples were generated, each consisting of 10,000 study subjects.

Table 5 summarizes our findings. Note that when the exposure OR was 1.0, the empirical power corresponded to empirical size (Type I error rate). When the exposure and confounders were independent, the risk score and conventional models had similar results. When the prevalence of the 3 correlated confounders was 0.2, the mean SE slightly underestimated the SEEs and the empirical size slightly increased in the risk score model. When the prevalence of the 3 correlated confounders was 0.5, there is a greater difference between the mean SE and SEEs and the empirical size increased in the risk score model. In addition, we observe more bias in the risk score model compared with the conventional model. We also considered prevalences of 0.7 and 0.9 for the 3 correlated confounders and observed increasing differences between the mean SE and SEE and increasing empirical sizes. We simulated other combinations of event and exposure probabilities and observed similar findings. We also simulated intercorrelation among the potential confounders and noted similar findings, ie, that both risk score and conventional models performed similarly so long as the analogue of Pr(3C+ |E+) was no greater than 0.2.

We considered 2 additional scenarios that are likely to occur in pharmacoepidemiologic studies. In each of these, the OR for the exposure was 2.0 and there were 10 dichotomous covariates that were potential confounders. In the first, each covariate had a modest association with the outcome (OR of 1.2 for each). With standard regression methods, such confounders may not be identified and included in the model, particularly when variable selection methods based upon statistical significance are used. However, with the risk score method, they may be identified a priori and included in the risk score. As before, we set the confounder prevalences to 0.1 for unexposed persons and considered 0.1, 0.2, and 0.5 for the 3 confounders among exposed persons. We compared the risk score model to the logistic regression model that adjusted only for the exposure. When the confounder prevalence for exposed persons was 0.1 (simulating no confounding), the risk score and exposure-only models were similar. Otherwise, the risk score models had less bias than the standard exposure-only model (2.0% vs. 6.4% in the exposure-only model for confounder prevalence of 0.2, and 9.5% vs. 25.9% for confounder prevalence of 0.5). The exposure-only model had smaller SEE and mean SE and lower coverage probabilities.

In the second scenario, the 10 covariates included one that was a strong confounder (OR of 3.0) and 9 that were not confounders. We performed simulations with respective confounder prevalences of 0.1, 0.2, and 0.5, comparing the risk score model to a conventional model that included only the exposure variable and the 1 true confounder. When the confounder prevalence was 0.1 (no confounding), there were no differences between the risk score and conventional models. When the confounder prevalence was 0.2 for exposed persons, the risk score model had greater percent bias (4.8% compared with 0.5% in the conventional model). The SEEs, mean SEs, and coverage probabilities were similar between risk score and conventional models. When the confounder prevalence was 0.5, the risk score model had greater percent bias (10.8% compared with 1.7% in the conventional model). The SEEs and mean SEs were similar between risk score and conventional models, and the mean SEs were slightly smaller than the corresponding SEEs for both. The risk score model had slightly lower coverage probability.

## DISCUSSION

In the context of large cohort studies of cardiovascular disease, we used simulation studies to compare the performance of regression models using a summary risk score for cardiovascular disease with models that included all of the individual components of the summary score (conventional models). Although the performance of the models varied according to the exposure and outcome probabilities, there was little systematic difference between the performance of the confounder score and conventional models, as long as the correlation between the covariates and the exposure was not large. In particular, under these circumstances, there was no exaggeration of statistical significance, the major concern with regard to use of confounder scores.

Recently, Sturmer and colleagues^{19} compared several methods to adjust for confounding in a study of NSAIDs and 1-year all-cause mortality in 103,133 elderly Medicaid beneficiaries. They compared several propensity score analyses (matching, inverse probability weighting, stratification, modeling), the multivariate confounder score (called “disease risk score”), and “conventional” adjustment (modeling the individual confounders directly in the regression models). They considered 71 covariates that were selected into the propensity score, confounder score, and “conventional” models using forward variable selection. They also compared these methods by randomly resampling 10,000, 1000, and 500 persons 1000 times from their underlying study. In all of these, they examined the estimated rate ratios and 95% CIs for NSAID use, and observed no major difference in any of these methods.

These results are entirely consistent with those of the present study. However, our study extends these findings by broadening the assessment of confounder score performance through simulations employing a range of parameters that reflected several motivating pharmacoepidemiologic studies. In addition, we compared the confounder score and conventional models in a number of different aspects, which included bias, estimated SEs, coverage probabilities, and power.

We also considered scenarios with more extreme correlation between exposure and confounders. The measure of correlation was the ratio between the proportion of exposed:unexposed persons with the confounder. As long as this ratio was 2 or less, there was little difference between the risk score and conventional models. In one of the motivating pharmacoepidemiologic studies,^{12} the mean of this ratio for the cardiovascular drugs used in the risk score was 1.36. Thus, at least for this and comparable studies, exposure:confounder correlation should not undermine the validity of risk score based analyses. Similar findings were present when a comparable degree of confounding was introduced among the covariates used to derive the risk score.

However, for more extreme correlations, the performance of the risk score deteriorated. When there were covariates in the risk score with a 5-fold difference in the ratio of exposed:unexposed prevalence, the risk score had coverage probabilities that were too low, as was also the case when there was a similar degree of intercorrelation between the confounders. This is consistent with what was reported by Pike and colleagues^{14} and Cook and Goldman.^{15} Thus, risk scores should be used with caution in the presence of high correlation between either the exposure and confounder variables or the confounder variables themselves.

We identified 1 scenario in which the risk score might perform better than the conventional analysis. When the confounders were modestly associated with the outcome and included in the risk score model, it performed better in terms of bias than did an exposure-only model. The exposure-only model represented the situation in which modest confounders may not be identified using conventional methods, particularly those that use variable selection for models based upon statistical significance, even though they make small but nonzero contributions to disease risk.

However, better performance was not the primary factor that led to use of the risk score in the motivating pharmacoepidemiologic studies. For these examples, the more important considerations were the ability to characterize cardiovascular risk with a single number, the reduction of dimensionality of models, and the elimination of the need to use potentially subjective variable selection procedures. However, we speculate that there may be scenarios in which risk scores provide better performance, such as case-control studies requiring conditional logistic regression analysis in which the exposure variable has a large number of levels. Further work would be useful.

Since we used logistic regression in our simulations, the measure of association between exposure and outcome was the OR. We could have used Poisson regression with a sandwich variance estimator^{20} or binomial regression to estimate relative risks. However, this would have entailed more time-consuming simulations, given these models’ additional computational complexity and the characteristics of our simulation studies (ie, large sample sizes, many covariates, large variety of simulation settings). The logistic regression models thus provide a good starting point for study of the characteristics of risk scores; further work with other models would be useful.

A limitation of the simulation studies is whether we adequately covered the range of possible scenarios that are likely to be encountered in large pharmacoepidemiologic studies. We designed our simulations to mimic such published studies in terms of numbers of confounders, incidence of outcome, prevalences of exposure and confounders, measures of association, and sample size, and we simulated a variety of settings. However, it is possible that we may have not considered some important settings.

To summarize, when considering a large number of covariates that are either mutually independent or only modestly intercorrelated, are related to the outcome, and only modestly associated with the exposure of interest (eg, if the ratio of exposed:unexposed persons for each covariate is no more than 2), the risk score performs as well as conventional analyses in terms of SE estimation, bias, and coverage probability and can be used to characterize cardiovascular risk. If the covariates are strongly associated with the exposure or strongly intercorrelated, the risk score model may have greater bias, smaller SEs, and poor coverage compared with standard analysis methods, and hence, should be cautiously considered or not used at all. If one is considering a large number of covariates, all of which may only be modestly associated with the outcome—a situation that is common for cardiovascular medications and comorbidities in pharmacoepidemiologic studies—the risk score can perform better than conventional analyses, which may fail to consider some covariates. An important question not addressed by the present study is the relative performance of propensity scores and confounder scores. In some instances, Cepeda and colleagues^{5} observed bias when using propensity scores in their simulation studies. This bias persisted regardless of the number of events per confounder. It is possible that multivariate confounder scores are less subject to bias in some circumstances. Further work that directly compares these 2 methods for scenarios that realistically reflect contemporary pharmacoepidemiology studies is needed.

## ACKNOWLEDGMENTS

We thank the reviewers for their valuable comments and suggestions.

## REFERENCES

*J Clin Epidemiol*. 1996;49:1373–1379.

*Cancer Treat Rep*. 1985;69:1071–1077.

*Stat Med*. 1989;8:771–783.

*Biometrika*. 1983;70:41–55.

*Am J Epidemiol*. 2003;158:280–287.

*Observational Studies.*New York, NY: Springer-Verlag; 2002.

*Arc Gen Psychiatry*. 2001;58:1161–1167.

*Lancet*. 2002;359:118–123.

*Lancet*. 2002;360:1071–1073.

*Clin Pharmacol Ther*. 2004;75:234–241.

*New Engl J Med*. 2004;351:1089–1096.

*Lancet*. 2005;365:475–481.

*Am J Epidemiol*. 1976;104:609–620.

*Epidemiol Community Health*. 1979;33:104–106.

*J Clin Epidemiol*. 1989;42:317–324.

*J Clin Epidemiol*. 1998;51:233–236.

*Pharmacoepidemiol Drug Saf*. 2003;12:551–558.

*J Clin Epidemiol*. 2004;57:1223–1231.

*Am J Epidemiol*. 2005;161:891–898.

*Am J Epidemiol*. 2004;159:702–706.