Pharmacoepidemiology and health services research studies based on health insurance claims require a mechanism to describe and adjust for the baseline health status of study subjects. Several claims-based comorbidity or risk-adjustment metrics are reported in the literature. The International Classification of Diseases (9th revision) codes for ambulatory or hospital services provide the basis for several metrics that are adaptations of the Charlson Index, 1 the first widely used comorbidity metric, 2–5 and for the Ambulatory Care Group risk-adjustment tool. 6 Instead of using clinical diagnoses, the Chronic Disease Score (CDS) uses drugs dispensed as surrogate markers for chronic illness. 7,8 Claims-based comorbidity metrics were recently reviewed by Schneeweiss et al.9
The CDS was developed by Von Korff et al.7 at a single health maintenance organization (HMO). It was designed to describe an individual’s comorbidity, using automated pharmacy dispensing data. A multidisciplinary expert panel applied predefined scoring rules for various medication-use patterns to 1 year of dispensing records, creating a summary measure of an individual’s burden of chronic disease. After controlling for age and gender, a score of 7 or more when compared with the lowest score of 0 was predictive of following-year hospitalizations in two HMOs (odds ratios [OR] = 5.0 and 5.9) 7,8 and of following-year deaths in one HMO (OR = 9.8). 7 Clark et al.10 revised the CDS weighting scheme to create three sets of scoring weights, one for each of the following outcomes: total health care costs (Clark-TC), outpatient care costs (Clark-OC), and primary care visits (Clark-OV). Empirical weights were derived from linear regression models of age, gender, and medication use on each of these outcomes. The resulting scores were more predictive of utilization outcomes and of following-year hospitalizations than the Von Korff expert-panel approach, and the three CDS versions were equally predictive of hospitalization. 10 The most recent revision of the CDS, by Fishman et al., 11 uses empirically derived weights that predicted health care utilization costs associated with chronic diseases among enrollees of two HMOs. A comparison of three versions of the CDS is shown in Table 1.
Comorbidity adjustment based on medication claims has been used to control for potential confounding in a number of studies, most of which were based on HMO data. 12–18 Therefore, further assessment of their applicability for this use, as indicated by their association with hospitalization, is needed. The following analysis evaluates the capacity of four versions of the CDS (Clark-TC, Clark-OC, Clark-OV, and Fishman) to predict subsequent hospitalization in eight HMOs.
We used data from a study of gastroduodenal safety of alendronate that had been conducted in eight HMOs. 19 Participating HMOs are members of the HMO Research Network, a geographically diverse group of HMOs committed to public-domain research. 20,21 The organizational structures of these HMOs include staff, group, and network models, as well as independent physician network models that represent a variety of healthcare-delivery systems in the United States. Automated membership, demographic, drug dispensing, and hospital discharge records formed the basis of this analysis. The Institutional Review Board of each HMO approved both the original (alendronate) study and the subsequent use of the data for the present study.
The population included randomly selected nonusers of alendronate who had been frequency-matched for age and gender to users of alendronate within each of the eight HMOs. We restricted the study population to the 29,247 female nonusers of alendronate who were 45 years of age or older on 1 October 1994 and who had continuous HMO enrollment that included a prescription drug benefit from October 1994 through September 1995.
Calculation of Chronic Disease Score
We used records of drugs dispensed from October 1994 through September 1995 to construct the Fishman CDS for all study subjects. Records of drugs dispensed from March 1995 through September 1995 were used to construct the three Clark versions of CDS. To construct the scores, we used selected drugs to represent the presence or absence of the specified chronic diseases. The scores were calculated by the following equations:MATH(The Clark weighted scores use 28 disease categories.)MATH (The Fishman weighted score uses 29 disease categories.)
The SAS code and drug list used for the CDS calculations can be found at http://www.hsph.harvard.edu/faculty/arnoldchan.html. The scores were then ranked lowest to highest and divided into quintiles and deciles. We assigned eight score variables to each individual: two ranks (quintiles and deciles) for each of four scores.
Identification of Outcomes
For each study subject, the first hospital discharge from October 1995 through September 1996 was the outcome of interest.
We used logistic regression models to estimate the odds ratios associated with increasing rank decile or rank quintile (as indicator variables) for each CDS as predictors of hospitalization; the lowest quantile was the reference group for each model. Women who left the HMO before September 30, 1996 were considered lost to follow-up. Indicator variables for each HMO were included in all models. In addition to the Mantel-Haenszel test for heterogeneity, 22 we estimated stratified regression models for individual HMOs to assess between-HMO differences. Comparisons within the decile and the quintile models were made using several criteria: the magnitude of the beta estimates (indicator variables and categorical trend), the Hosmer-Lemeshow (H-L) goodness-of-fit statistic, 23 Akaike’s Information Criteria (AIC), 24,25 the area under the receiver operating characteristic (ROC) curve (C statistic), 26,27 and the model R2 for explained variation. 28 In addition, we used proportional-hazards models to analyze observed person-time of all study subjects. The proportional-hazards analyses incorporated the same variables for CDS and HMOs that were used in the logistic regression models.
As of 1 October 1994, the mean age of the study population was 67.2 years (standard deviation = 10.6). During the follow-up year, approximately 6% of the cohort (fewer than 150 women per month) was lost to follow-up as a result of disenrollment or death. In the 6 or 12 months preceding 30 September 1995, a large percentage of the cohort (32% and 27%, respectively) received no medications relevant to their CDS; the scores of these patients represent the contribution of age and gender only. The distributions of all four CDS versions are skewed to the right (Table 2).
Of the 29,247 study subjects, 3,371 (12%) had at least one hospitalization from October 1995 through September 1996. Women with higher CDS quintile ranks had a higher risk of hospitalization (Figure 1), as did those with higher decile ranks (Figure 2). For both quintiles and deciles, the risk of hospitalization by quantile was very similar for the four scores. For the lowest quintile, 4% (all scores) were hospitalized, whereas for the highest quintile the risk ranged from 22% (Clark-OV) to 24% (Clark-TC). For the lowest decile, 4% (all scores) were hospitalized compared with 27% (Clark-OV, Fishman) to 29% (Clark-TC) in the highest decile.
Table 3 presents examples of the categorizations for each CDS version. The subject with the most comorbidity in this table (example 2) has the most consistent ranking by all categorizations. In contrast, examples 1 and 3 have fewer comorbid conditions and have somewhat less consistent rankings by either quintile or decile categorization. There was very high correlation between the categorizations derived from scores using the Fishman and Clark versions. Weighted kappa statistics ranged from 0.84 (Clark-OV and Fishman) to 0.91 (Clark-TC and Fishman) for both quintile and decile categorizations.
Model parameters of interest are shown in Table 4. The H-L goodness-of-fit tests show that no model is considered a poor fit. According to the C statistic, all models discriminated similarly. Graphically, the ROC curves for each model (not shown) were very similar across the range of sensitivity and (1 minus) specificity. Whether comparing models using quintiles or deciles, the Clark-TC scores have the lowest AIC, followed by the Fishman scores. The model R2 value ranged between 0.21 and 0.24 for quintiles and between 0.23 and 0.25 for deciles. Both categorizations of the Clark-TC score have the largest model R2 values, although there is little difference between the models themselves. Also shown in Table 4 are the odds ratios for the strength of linear association for a one-unit increase in quintile or decile from models in which the quantiles were modeled as ordinal variables. The Clark-TC scores had the highest value for this ordinal predictor within each comparison group. The range of predicted probabilities is greater for the decile models than for the quintile models, and among the decile models this range is greatest for the Clark-TC score.
All CDS versions were highly associated with hospitalization. The odds ratios comparing the highest with the lowest decile ranged from 8.9 for the Clark-OV model to 10.2 for the Clark-TC model. Comparing the highest with the lowest quintiles, odds ratios ranged from 7.4 for the Clark-OV model to 8.6 for the Clark-TC model. Similar results were observed when proportional-hazards regression was used. Estimated hazard ratios were 6.6 for Clark-OV (quintiles), 7.6 for Clark-TC (quintiles), 7.6 for Clark-OV (deciles), and 8.9 for Clark-TC (deciles).
The risk of hospitalization by CDS varied among the HMOs. Mantel-Haenszel tests showed heterogeneity for the strength of association between CDS and subsequent-year hospitalization among HMOs in the Fishman quintile model (P = 0.03) but not for the other CDS models. Stratified regression for individual HMOs showed a similar trend of increased risk of hospitalization for increasing scores among all HMOs, suggesting that although quantitative heterogeneity between the HMOs may exist, the results were qualitatively similar.
The Chronic Disease Score (based on HMO records of dispensed medications, age, and gender) is a predictor of hospitalization in a heterogeneous group of delivery systems that reflects the type of healthcare delivery used by approximately one-third of the U.S. population. 29 A strong association between all levels of categorized score compared with the lowest category was observed among all four versions of the score studied. Given that the CDS is an independent predictor of subsequent-year all-cause hospitalizations, it may also be an independent predictor of the specific outcome of interest for a given study. If the CDS is also correlated with the exposure of interest, it may then be a useful tool for the adjustment of confounding.
Records of drugs dispensed are suitable markers for chronic diseases because they lack the variability in coding that accompanies diagnosis-based comorbidity metrics. 9 They are not subject to limitations on number of diagnoses recorded in a single claim or to false positives caused by rule-out diagnoses. The CDS is relatively easy to compute once the drugs of interest are classified by disease category. As a stratification variable, the CDS can be modeled as a binary or other quantile variable or matched to a range of scores (as a continuous variable) depending on the needs of the study.
Several aspects of prescription benefit management that may appear to cause bias in a dispensed-drug-based comorbidity metric affect the CDS minimally, if at all. Tiered payment systems will have no effect because drugs in different tiers (eg, different calcium-channel blockers) will contribute to the same disease category (heart disease/hypertension). Annual pharmacy coverage caps would affect only newly diagnosed comorbidities if the cap is reached before a new diagnosis for which CDS-relevant drug is prescribed (this is a theoretical advantage of the 1-year vs 6-month scoring systems).
Dispensing records have several limitations as surrogates for disease. Factors that influence prescribing, including care-seeking behavior, are not incorporated into the CDS and may cause misclassification. Undertreatment is another source of misclassification that occurs when people with disease are not prescribed medications for their condition. Observations by Redelmeier et al.30 and Glynn et al.31 suggest that selective underprescribing, based on comorbidity and age, respectively, is possible, and that the absence of prescriptions to treat a condition does not necessarily indicate a lack of disease. Redelmeier et al.30 noted that persons with one of several chronic diseases were less likely to be treated for their comorbid conditions. Glynn et al.30 reported that elderly Medicaid recipients who were prescribed certain drug classes (including antidiabetic and antihypertensive agents) experienced reduced mortality relative to nonusers; selective underprescribing in the elderly was suggested as a possible cause of this paradoxical finding. Moreover, drug-based comorbidity measures will not ascertain comorbidity for which there is no specific drug treatment nor will they ascertain alternate disease states for drugs that treat multiple conditions (eg, angiotensin-converting enzyme inhibitors or methotrexate).
These limitations may address the relative underperformance of the Clark-OV score used in a recent study that compared several administrative claims-based comorbidity metrics among elderly drug-treated hypertensives in British Columbia, Canada. 32 The analysis suggested that the differences between metrics were important enough to affect the ability of the metrics to control confounding (using dichotomous score variables).
Misclassification of comorbidity may also be caused by the choice of categorization and by model specification. In a large study, the amount of residual confounding related to categorization could easily be decreased by making group comparisons using a closer range of scores (ie, using deciles rather than quintiles). Fishman et al.11 investigated several modeling specifications of the CDS to address their predictive capacity. Although the more sophisticated modeling specifications provided better fit, they did not predict the outcome of interest (total cost) as well as the ordinary least-squares regression did.
When calculating the CDS, new use of a medication is not differentiated from long-term use. At any time during the 6- or 12-month CDS ascertainment period, a single dispensing for a drug that represents a disease category will trigger the scoring weight for that category. Similarly, use immediately before hospitalization is considered the same as use months before hospitalization. This assumption may cause overestimation of the effect of CDS because recent medication use may be associated with increasing risk of hospitalization (sudden decline in health) and will also contribute to increased CDS.
The risk of hospitalization by CDS varied somewhat among the HMOs. This finding was not unexpected, given the geographic diversity of the HMOs and variations in both benefit plan and physician practice. Most HMO-specific confidence limits were within the range of the confidence limits of the overall ORs. Subjects from one smaller HMO had lower risk of hospitalization and had the smallest HMO-specific C-statistic values, regardless of the CDS version, indicating that the scores were less predictive of hospitalization in that HMO. The inter-HMO differences observed do not diminish the utility of the CDS as a predictor of hospitalization, however, because all scores were significant predictors (comparing highest to lowest quantiles) in all HMOs.
Prescription drug coverage was made one of the study eligibility requirements in order to ensure that all subjects who received no dispensings were at least eligible for prescription benefits if a dispensing had been medically indicated. This restriction, while reducing heterogeneity caused by disparate insurance coverage, may preclude generalizability of the CDS as a predictor of hospitalization to the 5–15% of HMO enrollees with no prescription drug coverage 33 and to delivery systems where prescription drug coverage is either unavailable or only sporadically available. The older women in this study would be expected to have higher scores than the general HMO enrollment. However, within the cohort, the scores included a range wide enough to predict hospitalization to a similar, and possibly greater, extent than in the original Von Korff report, 7 which used the general adult population.
The model comparison statistics used for this analysis represent a variety of methods for comparing nonnested models. 23 For the main analyses, we used logistic regression models rather than proportional-hazards models in order to explore a wider range of comparisons among the scores, particularly the C-statistic and ROC curves. The various proportional-hazards models showed qualitatively similar results. Model R2 is not often discussed in the context of logistic regression. However, Mittlebock and Schemper 28 recommend a model R2 equation for logistic regression that best agreed with a standard model R2 (linear regression) across a broad range of predicted probabilities. 25 This model R2 value facilitates comparisons between comorbidity metrics that use linear regression, including both the Clark and Fishman scores (predicting costs) and the Charlson Index. The values observed in this study compare favorably with the model R2 values in the original validation samples, which were between 0.09 (Fishman) 11 and 0.23 (Clark OC). 10 For the Charlson Index, which uses medical record abstraction to identify comorbidity, the model R2 for the original validation sample was 0.41. 1
All four versions of the CDS predicted subsequent-year hospitalization approximately equally. Based on the statistics used to evaluate the scores, the Clark-TC weights performed slightly better than the others. Another benefit to the Clark scores is that the 6-month data requirement allows more subjects to meet eligibility requirements. Although no comorbidity or risk-adjustment metric is an ideal measure of baseline health status, 34 the findings of this study suggest that the CDS comorbidity metrics, on the basis of their strong association with subsequent year hospitalization, can be applied to multi-HMO studies to control for potential confounding.
We thank James Donahue and Keith Kaye for classification of drugs. We also thank Emily Cain, Rachel Dokholyan, Ryan Lee, and Parker Pettus for technical assistance.
1. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chron Dis 1987; 40: 373–383.
2. Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol 1992; 45: 613–619.
3. Roos LL, Sharp SM, Cohen MM, Wajda A. Risk adjustment in claims-based research: the search for efficient approaches. J Clin Epidemiol 1989; 42: 1193–1206.
4. Roos LL, Sharp SM, Cohen MM. Comparing clinical information with claims data: some similarities and differences. J Clin Epidemiol 1991; 44: 881–888.
5. D’Hoore W, Sicotte C, Tilquin C. Risk adjustment in outcome assessment: the Charlson comorbidity index. Methods Inf Med 1993; 32: 382–387.
6. Weiner JP, Starfield BH, Steinwachs DM, Mumford LM. Development and application of a population-oriented measure of ambulatory care case-mix. Med Care 1991; 29: 452–472.
7. Von Korff M, Wagner EH, Saunders K. A chronic disease score from automated pharmacy data. J Clin Epidemiol 1992; 45: 197–203.
8. Johnson RE, Hornbrook MC, Nichols GA. Replicating the chronic disease score (CDS) from automated pharmacy data. J Clin Epidemiol 1994; 47: 1191–1199.
9. Schneeweiss S, Maclure M. Use of comorbidity scores for control of confounding in studies using administrative databases. Int J Epidemiol 2000; 29: 891–898.
10. Clark DO, Von Korff M, Saunders K, Baluch WM, Simon GE. A chronic disease score with empirically derived weights. Med Care 1995; 33: 783–795.
11. Fishman PA, Goodman MJ, Hornbrook MC, Meenan R, Bachman D, O’Keefe-Rosetti M. Risk adjustment using automated pharmacy data: a global chronic disease score. International Health Economics Association, Second International Health Economic Conference, June 8, 1999, Rotterdam, the Netherlands.
12. van Hulten R, Teeuw KB, Bakker AB, Bakker A, Leufkens HG. Characteristics of current benzodiazepine users as indicators of differences in physical and mental health. Pharm World Sci 2000; 22: 96–101.
13. Thurston-Hicks A, Paine S, Hollifield M. Functional impairment associated with psychological distress and medical severity in rural primary care patients. Psychiatr Serv 1998; 49: 951–955.
14. Malone DC, Carter BL, Billups SJ, et al
. Can clinical pharmacists affect SF-36 scores in veterans at high risk for medication-related problems? Med Care 2001; 39: 113–122.
15. Helms LJ, Melnikow J. Determining costs of health care services for cost-effectiveness analyses: the case of cervical cancer prevention and treatment. Med Care 1999; 37: 652–661.
16. Grad R, Tamblyn R, Holbrook AM, Hurley J, Feightner J, Gayton D. Risk of a new benzodiazepine prescription in relation to recent hospitalization. J Am Geriatr Soc 1999; 47: 184–188.
17. Chan KA, Andrade SE, Boles M, et al
. Inhibitors of hydroxymethylglutaryl-coenzyme A reductase and risk of fracture among older women. Lancet 2000; 355: 2185–2188.
18. Buist DS, LaCroix AZ, Newton KM, Keenan NL. Are long-term hormone replacement therapy users different from short-term and never users? Am J Epidemiol 1999; 149: 275–281.
19. Donahue JG, Chan KA, Andrade SE, et al
. Alendronate, osteoporotic fracture, and gastroduodenal perforation, ulcer, or bleeding. Arch Int Med
20. Platt R, Davis R, Soumerai S, et al
. Multicenter epidemiologic and health services research on therapeutics in the HMO Research Network Center for Education and Research on Therapeutics. Pharmacoepidemiol Drug Saf 2001; 10: 373–377.
21. Selby JV. Linking automated databases for research in managed care settings. Ann Int Med 1997; 127: 719–724.
22. Breslow NE, Day NE. Statistical Methods in Cancer Research
. vol. 1. The Analysis of Case-Control Studies.
IARC Scientific Pub. No. 32. Lyon: International Agency for Research on Cancer, 1980.
23. Hosmer DW, Lemeshow S. Applied Logistic Regression. 2nd ed. New York: John Wiley and Sons, Inc., 1998.
24. Collett D. Modeling Survival Data in Medical Research. London: Chapman and Hall, 1994.
25. Hoel P. Introduction to Mathematical Statistics. New York: John Wiley and Sons, 1984; 435.
26. Miller ME, Hui SL, Tierney WM. Validation techniques for logistic regression models. Stat Med 1991; 10: 1213–1226.
27. Walsh SJ. Limitations to the robustness of binormal ROC cures: effects of model misspecification and location of decision thresholds on bias, precision, size and power. Stat Med 1997; 16: 669–679.
28. Mittlebock M, Schemper M. Explained variation for logistic regression. Stat Med 1996; 15: 1987–1997.
29. Anonymous. 1997 saw HMO membership up, hospital days down. Am J Health Syst Pharm
30. Redelmeier DA, Tan SH, Booth GL. The treatment of unrelated disorders in patients with chronic medical diseases. N Engl J Med 1998; 338: 1516–1520.
31. Glynn RJ, Monane M, Gurwitz JH, Choodnovskiy I, Avorn J. Aging, comorbidity, and reduced rates of drug treatment for diabetes mellitus. J Clin Epidemiol 1999; 52: 781–790.
32. Schneeweiss S, Seeger JD, Maclure M, Wang PS, Avorn J, Glynn RJ. Performance of comorbidity scores to control for confounding in epidemiologic studies using claims data. Am J Epidemiol 2001; 154: 854–864.
33. U.S. Department of Health and Human Services. Report to the President: Prescription Drug Coverage, Spending, Utilization, and Prices
. April. Available at: http://aspe.hhs.gov/health/reports/drugstudy./
Accessed November 13, 2001.
34. Iezzoni LI. Risk adjustment for medical effectiveness research: an overview of conceptual and methodologic considerations. J Invest Med 1995; 43: 136–150.