#### What We Already Know about This Topic

#### What This Article Tells Us That Is New

^{1},

^{2}Fair comparisons of health outcomes across hospitals require proper adjustment for heterogeneity among providers in terms of patient and procedural risk. Consequently, risk-adjustment models abound,

^{3–8}—although there is evidence that many of these models are inconsistent in terms of their characterizations of risk,

^{9}leading to the potential for misclassification of high- and low-quality hospitals.

^{10},

^{11}

^{3}However, the Medicare Provider Analysis and Review database on which RSI was developed does not distinguish preexisting conditions that were present-on-admission (POA) from complications that occurred during hospitalization. This is potentially a serious limitation because risk for certain patients might be inflated by codes that result from hospital-acquired complications. Hospitals might thus appear to have a higher risk population, whereas in fact at least some fraction of the risk resulted from hospital-acquired complications rather than baseline risk

*per se*.

^{12–14}

^{15}With the increasing ubiquity of administrative datasets which incorporate POA information, it appears that the establishment of such models is timely.

#### Materials and Methods

##### Model Development

*e.g.*, 410.2X refers to the inferolateral wall), and the fifth digit specifies the episode of care. As such, many of the five-digit codes lacked sufficient representation for inclusion in our logistic model: an aggregation routine was used to ensure that predictors have adequate cell sizes. As in development of the original RSI,

^{3}we aggregated these sparsely represented diagnoses by truncating the fifth digit off of the corresponding ICD-9-CM diagnosis code. Codes with fewer than 1,000 discharges per year on average in the 80% model development cohort were truncated to four digits (for this average calculation, we excluded the year 2004 as there were a number of new codes introduced the next year). The process was repeated, truncating sparsely represented four-digit codes to three digits. Three-digit codes represented by fewer than 1,000 discharges per year were not considered in the model development. A comparable aggregation algorithm was implemented for the procedure codes, although we note that procedure codes are represented by a maximum of four digits and the base codes are only two digits; thus we aggregated procedure codes from four to three to two digits based on the 1,000 discharges per year criterion.

^{16}The elastic net is a “shrinkage” methodology devised to ensure protection against overfitting a model to the development cohort. The term “shrinkage” comes from the fact that regression coefficients are purposely biased toward zero; this action has been shown to improve prediction accuracy in external cohorts (specifically, the elastic net encourages highly correlated predictors to be averaged, whereas at the same time encouraging irrelevant predictors to be removed from the model altogether).

^{17},

^{18}Removing variables in this manner has been shown to have favorable statistical properties over traditional methods such as stepwise variable selection or the use of significance criteria for entry into the model.

^{18}To fit these models, we used the R statistical software package “glmnet” developed by Friedman

*et al.*

^{19}(on R version 2.13.0 for 64-bit Linux, The R Project for Statistical Computing, Vienna, Austria). The overall model shrinkage parameter (parameter λ in the glmnet software) was chosen using fivefold cross-validation

^{20}(specifically, we used the largest value of λ within one cross-validated standard error of the minimum in the model development cohort), and we used an elastic net mixing parameter (parameter α in the glmnet software) of 0.15, which encouraged averaging of correlated predictors a certain degree more than removing irrelevant predictors. Sensitivity analysis (not reported) revealed little change in predictive accuracy for values of α anywhere between 0.05 and 1.00.

##### Calibration

^{21}In other words, there is either a perceived or real lack of agreement between predicted probability of an outcome produced by the model in question and the actual probability of the outcome in a new set of patients,

*i.e.*, a lack of model calibration. Model calibration is sometimes overlooked in risk adjustment modeling

^{22}; even when calibration is considered, it is often as a model diagnostic

^{23}instead of a prescription for adjusting the model estimates to remove any biases introduced by the lack of calibration. We used a recently developed recalibration technique to adjust our model estimates.

^{24}Methodological details are briefly reviewed in the appendix. For our particular modeling application, we initially calibrated our model using the randomly reserved 20% calibration cohort, with the intention that—as with any risk-adjustment model—calibration should be assessed and, if necessary, corrected whenever applied to new data (such as, for instance, when we used the 2009 data to compare this model with a second model that ignored POA status of diagnoses and timing of procedures; see following section).

##### Comparator Models

##### Model Performance and Reliability

^{25}and models were compared on C-statistics using two-sample

*z*-tests for proportions (the Bonferroni correction for three simultaneous pairwise comparisons was applied for these tests).

*a priori*as at least 95% of hospitals with an O/E ratio within ±20% of that defined by the POARisk model, or in other words a “ratio of O/E ratios” between 0.8 and 1.2.

#### Results

*P*< 0.001, two-sample

*z*-test for proportions) and the modified RSI model (0.0444 [0.0440–0.0448],

*P*< 0.001). In addition, the POARisk model discriminated better than the modified RSI (0.0216 [0.0212–0.0221],

*P*< 0.001).

*versus*using the POARisk model—was 0.0% (fig. 3B). The O/E ratio under the AllCodeRisk model was within ±20% of the O/E ratio under the POARisk model for 89.0% of hospitals, which was lower than our predefined criterion for agreement (

*i.e.*, at least 95% of hospitals within ±20%). For 95% of hospitals, the O/E ratio under the AllCodeRisk model was between −18.1 and +51.2% of the O/E ratio under the POARisk model. Comparing the modified RSI model with the POARisk model (fig. 3, C and D), we found similar results. The median percent difference was again 0.0%. The O/E ratio under the modified RSI model was within ±20% of the O/E ratio under the POARisk model for 81.3% of hospitals. For 95% of hospitals, the O/E ratio under the modified RSI model was between −31.3 and +35.8% of the O/E ratio under the POARisk model.

#### Discussion

*i.e.*, POA diagnosis codes, principal procedure codes, and secondary procedure codes occurring on dates exclusively before the date of the principal procedure), whereas the AllCodeRisk model took all codes regardless of their timing. Both models were assessed for calibration and corrected for use in external datasets, although we recommend that an additional calibration step be performed whenever the model is to be applied in order to adjust model estimates specifically to the population being analyzed (as we did to the 2009 data in our validation analysis). Instructions for downloading model coefficients and calculating POARisk predicted probabilities can be found at the Cleveland Clinic POARisk Model website.††

^{26}Finally, our analysis did not account for potential correlation among data from multiple repeated visits within a person. Thus, external validation is necessary to evaluate the suitability of our models in other states’ data.

*i.e.*, mediated) by the chronic exposure. For example, if the goal of a study is to estimate the independent risk associated with diabetes mellitus, one would likely not wish to adjust for secondary diseases that are caused by diabetes and thus worsen outcomes.