Available metrics for comparing hospital safety have expanded in recent years. These measures have transitioned from voluntary self-report to compulsory national collection of standardized instruments, such as those on the Centers for Medicare & Medicaid Services (CMS) Hospital Compare website.1
Figure 1 illustrates the timeline of hospital and patient safety measure development. The Leapfrog Group, founded in 2000 by employers to encourage transparency of hospital performance, provided the earliest measures.2 In 2001, they launched the Leapfrog Hospital Survey, a voluntary instrument covering hospital and patient safety process and outcome measures. In 2004, Leapfrog added self-reported Safe Practices Score (SPS) measures3 built from 34 National Quality Forum-endorsed practices to reduce risk of patient harm in acute-care hospitals.4 Leapfrog SPS measures focus on implementing structures or protocols reflective of accountability, rather than objective outcomes. SPS initially included 27 measures, and were trimmed to 8 in 2013 (Supplemental Table 1, Supplemental Digital Content 1, http://links.lww.com/MLR/B359). In 2012, SPS was bundled with other process and outcome measures to inform a more consumer-friendly composite Hospital Safety Score (HSS) rating hospitals on a scale ranging from 0 to 4, and providing a single corresponding letter grade of “A” (best), “B,” “C,” “D,” or “F” (worst) (Supplemental Table 2, Supplemental Digital Content 2, http://links.lww.com/MLR/B360).5 Hospital self-reports on the 8 SPS measures are available for consumers to compare across hospitals on the HSS website6; they also account for a substantial portion of the HSS (22.6% of total score; 45% of “Process and Structural Measures” domain).
Over time, compulsory measures of hospital quality and patient safety were developed. In 2002, the Hospital Quality Alliance, a public-private partnership, formed to support hospital quality improvement and improve consumer health care decision-making.7 Their efforts created the Hospital Compare website,1 a consumer-facing website focused on improving consumer decision-making by providing hospital performance and safety metrics. Hospital Compare first mandated reporting in 2008, requiring hospitals to report patient satisfaction and mortality measures or face a 2% reduction in CMS’ annual payment update.8 Hospital Compare measures now include hospital-associated infections and complications, including central line–associated bloodstream infections (CLABSI), catheter-associated urinary tract infections (CAUTI), and the Agency for Healthcare Research and Quality (AHRQ) Patient Safety Indicators (PSI).
In 2012, Hospital Compare began reporting data from 2 new CMS value-based purchasing programs. The Hospital Readmission Reduction Program (HRRP) aims to decrease unplanned 30-day readmissions following select procedures for certain conditions.9 The Hospital-Acquired Conditions Reduction Program (HACRP) targets reduction in incidence of hospital-acquired conditions (HACs) including CLABSI, CAUTI, and serious complications of treatment.10 In 2015, hospitals whose HACs or readmissions during the evaluation period exceeded expected values could be penalized up to 1% (under HACRP) or 3% (under HRRP) of total hospital Medicare reimbursement.
It is unclear how well Leapfrog’s voluntary SPS correlates with more recent compulsory Medicare metrics displayed by Hospital Compare. Prior work demonstrated Leapfrog’s voluntary nature overrepresents “high-quality” hospitals,11 and tied Leapfrog-led implementation efforts with improved process quality and decreased mortality rates12 and surgical death13; however, SPS measures have shown no relationship with all-cause or surgical mortality14,15 or trauma outcomes, including hospital-associated infections.16 Given these mixed findings, this paper addresses 2 objectives: first, among hospitals reporting SPS, evaluate how well Leapfrog’s SPS correlates with compulsory outcomes and penalties for readmission and complications publicly reported on Medicare’s Hospital Compare; and second, among hospitals not reporting SPS, evaluate the potential impact of SPS on Leapfrog’s HSS grades using imputed SPS measures to simulate new HSSs.
For all analyses, we combined data from 4 sources: (1) the Spring 2014 Leapfrog HSS dataset, which includes hospital grades, SPS measures as reported in the 2013 Leapfrog Hospital Survey, and all other HSS components listed in Supplemental Table 2; Supplemental Digital Content 2 (http://links.lww.com/MLR/B360); (2) Hospital Compare data on CLABSI and CAUTI in 201317; (3) Hospital Compare data on penalties assessed under the HRRP and HACRP in 201517; and (4) hospital characteristics from the 2013 American Hospital Association (AHA) Survey Database.
Objective 1: Do Leapfrog SPS Measures Predict Publicly Reported Outcomes and Penalties?
Our predictor variables were Leapfrog SPS measures (Supplemental Table 1, Supplemental Digital Content 1, http://links.lww.com/MLR/B359) for hospitals that reported SPS measures. We selected 5 individual SPS measures (indicated in bold in Table 1) as representative of direct pathways from standards of care to study outcomes, as well as total SPS. AHA data were used to control for hospital characteristics: bed size (<50, 50–200, and >200 beds); ownership (public, private nonprofit, private for-profit); Council of Teaching Hospitals membership; and safety-net status, defined as ≥1 standard deviation more Medicaid patients than state average.
We examined 4 publicly reported outcome variables: CLABSI and CAUTI standardized infection ratios (SIRs), and penalization under HRRP or HACRP.
CLABSI and CAUTI SIRs
Hospital Compare CLABSI and CAUTI SIRs were reported to the National Health and Safety Network (NHSN) from April 1, 2012 to March 31, 2013. SIRs are risk-adjusted measures dividing the number of observed infections by the number of predicted infections calculated from CLABSI or CAUTI rates from a standard population throughout a baseline time period.18–20 SIRs >1.0 indicate more infections observed than predicted, whereas SIRs <1.0 indicate fewer observed than predicted.21
The 2015 HRRP penalties covered readmissions from July 1, 2010 to June 30, 2013. Readmissions penalties are calculated via the readmissions adjustment factor (RAF), which incorporates a risk-adjusted excess readmission ratio and diagnosis-related group payments for all included conditions.22 The 2015 HAC penalties used CLABSI and CAUTI rates from January 1, 2012 to December 31, 2013, and PSI-90 from July 1, 2011 to June 30, 2013. HAC penalties were computed from the average decile of performance for the NHSN CAUTI and CLABSI rates, weighted at 65%, plus the decile of performance for the PSI-90, weighted at 35%.10 For both programs we examined a binary measure of penalization.
To examine the relationship between Leapfrog SPS measures (individual and total) and CLABSI and CAUTI SIRs, we looked at bivariate correlations and used linear regression to evaluate the effect of SPS on outcomes, controlling for hospital characteristics. For penalties, we computed point-biserial correlations between SPS measures and penalty indicators, and then used binary logistic regression to evaluate effect of SPS on odds of penalization, controlling for hospital characteristics. All analyses were performed using Stata MP Version 14.123 and a 0.05 two-sided significance level.
Objective 2: How Much Can Voluntary SPS Measures Impact HSS Grades?
Imputed SPS measures were our main predictors of interest. Because we were interested in their impact on HSS grades for hospitals that did not report them, 4 sets of SPS measures were imputed for hospitals, based on the distribution of SPS measures for hospitals that did report: lowest SPS measures (first percentile); low (10th percentile); median (50th percentile); and highest (100th percentile). As control inputs, we also included hospital data as observed for all other HSS components listed in Supplemental Table 1, Supplemental Digital Content 1 (http://links.lww.com/MLR/B359) as provided in the HSS database.
Our dependent variable was overall HSS, which ranges from 0 to 4; and corresponding HSS grades, which range from “A” to “F.”
We simulated change in the HSS and corresponding grades after imputing SPS measures using the methodology reported by Leapfrog for their Spring 2014 HSS.24 HSS comprise weighted z-scores (trimmed at 99th percentile, or z=±5) across 2 domains: Process and Structural Measures (50% of total HSS); and Outcomes (remaining 50%). SPS measures account for 8 of 15 Process measures, or 22.6% of the total score. Hospitals that do not report SPS have other Process measures upweighted proportionally by Leapfrog. To simulate new scores imputing missing SPS measures at lowest, low, median, and highest levels, we converted the 8 SPS measures into z-scores, trimmed as appropriate, and recalculated weights for Process measure scores including SPS measures, before recalculating the Process domain score and subsequent total HSS. No changes were made to Outcome domain scores. Simulated scores for different values of SPS were then compared with original scores to evaluate change in score and letter grade.
Supplemental Figure 1, Supplemental Digital Content 3 (http://links.lww.com/MLR/B361) illustrates the study flow diagram. In total, 2530 hospitals were included in the Spring 2014 HSS database. In total, 2178 had AHA data; either CLABSI or CAUTI SIR; and either HRRP or HACRP penalty data. The 1098 hospitals (50.4%) provided SPS and were included in our objective 1 analyses; 1080 (49.6%) declined to report SPS and were used for objective 2 analyses.
The University of Michigan Institutional Review Board deemed this study exempt from oversight.
Distributional statistics for SPS measures (Table 1) show highly skewed distributions for all individual measures. For all but 1 measure (SPS #1), the median score is also the highest score, indicating that at least 50% of hospitals self-report perfect data. First percentile values generally correspond to receipt of 1/3 of possible points for an individual measure; and 10th percentile values to 3/4 of possible points. Mean total SPS was 444.40; 213 hospitals (19.4%) reported a perfect 485.
With respect to hospital characteristics, outcomes and grades (Table 2), the 2178 hospitals included 279 (12.8%) teaching hospitals and 305 (14.0%) safety-net hospitals. Ownership was predominantly private, not-for-profit (70.3%); the majority had >200 beds (60.7%). Average CLABSI SIR across all hospitals was 0.55, similar to the national baseline of 0.54, and average CAUTI SIR was 1.03 compared with the national baseline of 1.07.25 Of note, NHSN SIRs analyzed here had baselines from 2008, with declines reflecting both improvements in care and NHSN definition changes. NHSN used 2015 data to rebaseline SIRs in January 2017.26 In total, 1875 hospitals (86.1%) received a penalty under HRRP in 2015, and 582 (26.7%) were penalized for HAC. Compared with hospitals declining SPS, those providing SPS were larger (P<0.001) and more for-profit (P=0.001). CAUTI and CLABSI SIRs and penalization rates did not vary significantly by SPS provision. However, hospitals that provided SPS were graded significantly higher than hospitals that declined; 510 (46.5%) hospitals providing SPS received an “A” grade, compared with 193 (17.9%) hospitals declining SPS (P<0.001).
Objective 1: Do Leapfrog SPS Measures Predict Publicly Reported Outcomes and Penalties?
Bivariate correlations between SPS measures and outcomes were consistently weak (range, −0.05 to 0.05, Supplemental Table 3, Supplemental Digital Content 4, http://links.lww.com/MLR/B362).
CLABSI and CAUTI SIRs
Figure 2A presents standardized regression coefficients and 95% confidence intervals from linear regression models predicting CAUTI and CLABSI SIRs, controlling for hospital characteristics (full model results in Supplemental Table 4, Supplemental Digital Content 5, http://links.lww.com/MLR/B363). Neither individual nor total SPS were significant predictors of CLABSI or CAUTI SIRs.
As sensitivity analyses, negative binomial models of observed infections were also estimated with an exposure for number of catheter days. These models also revealed no associations. We also compared the CAUTI/CLABSI SIRS self-reported in Leapfrog Hospital Survey with these same hospitals’ CAUTI/CLABSI SIRs reported on Medicare’s Hospital Compare. Note that Leapfrog uses CLABSI and CAUTI SIRs reported in the Leapfrog Hospital Survey as the primary data source for the HSS, and Hospital Compare SIRs as a secondary data source. This analysis revealed similar CLABSI SIRs, but significantly lower CAUTI SIRs, even after accounting for Leapfrog’s trimming of extreme values, with a mean CAUTI rate 0.47 reported in the Leapfrog Hospital Survey, compared with 1.05 in Hospital Compare (Supplemental Figure 2, Supplemental Digital Content 6, http://links.lww.com/MLR/B364).
Figure 2B presents standardized odds ratios and 95% confidence intervals from binary logit models predicting penalization under HRRP or HACRP, controlling for hospital characteristics (full model results in Supplemental Table 5, Supplemental Digital Content 7, http://links.lww.com/MLR/B365). No SPS measures were significantly associated with penalization under HRRP, net hospital characteristics. One SPS (culture of measurement, feedback, and intervention) was significantly associated with penalization under HACRP, with a standard deviation increase in measure score decreasing odds of penalization by a factor of 0.87 (95% CI, 0.76–0.98). On average, this equates to a 2.7 percentage point decrease in probability of penalization. Sensitivity analyses used censored linear regression models to examine associations between SPS and HRRP RAF (range, 0.97–1.00) and HACRP total HAC score (range, 1–10). Correlations remained very small (range, −0.01 to 0.07) and only 1 SPS measure showed a significant association in either model (Supplemental Table 6, Supplemental Digital Content 8, http://links.lww.com/MLR/B366).
Objective 2: How Much Can Voluntary SPS Measures Impact Leapfrog’s HSS Grades?
With lowest SPS (first percentile; Fig. 3, Panel 1) imputed, hospitals saw grades decline by 0.8 points (of 4), on average. In total, 1062 (98%) of hospitals’ grades declined by ≥1 letter grade and very few hospitals (N=16; 1.5%) received a grade higher than D. Imputing 10th percentile grades for SPS (Fig. 3, Panel 2) resulted in a 0.24-point average decline in score, with 588 (54%) of hospitals’ grades declining by ≥1 letter grade. Alternatively, 9 hospitals’ (8%) grades improved by 1 letter grade.
Imputing median SPS (Fig. 3, Panel 3) resulted in a small improvement of 0.16 points, on average, in HSS, which improved grades for 528 hospitals (49%) by ≥1 letter. Imputing highest SPS (Fig. 3, Panel 4) resulted in only marginally more improvement, improving scores by 0.18 points, on average, and improving grades by ≥1 letter grades for 586 hospitals (54%).
The Leapfrog group has been a vanguard in developing and publicizing novel measures to inform patient choice. As the market of measures has grown more crowded, their niche is increasingly delineated by 2 proprietary measures: 8 National Quality Forum-inspired SPS measures; and the HSS and corresponding grade, with Leapfrog SPS as its sole proprietary component. This studily reports 2 major findings. First, there is a lack of meaningful association between voluntary SPS measures and compulsorily-reported patient outcomes and Medicare penalties for complications and readmissions. Second, the highly positively skewed voluntary SPS measures strongly impact the Leapfrog HSS beyond compulsory scores, so that imperfect SPS scores often result in lower grades.
Several mechanisms could underlie the lack of association between SPS and outcomes and penalties, yet lack of variation within SPS measures (Table 1) is responsible for much of the limited predictive ability. The observed lack of variation, meanwhile, could be due to selection effects; hospitals able to reliably report high scores may be more likely to volunteer. Alternatively, given that hospitals have a clear incentive to score themselves highly, participating hospitals may inflate their SPS reports, resulting in the skewed distributions and undermining the measures’ predictive value. As Leapfrog’s SPS focuses on processes and protocols linked to accountability (eg, protocols for handwashing for SP #19) rather than hard outcomes (eg, handwashing compliance), hospitals also have a strong incentive to produce protocol documents that meet Leapfrog documentation standards but may do little to impact clinical practice or patient outcomes.
Even with accurate data, however, SPS measures may not impact the outcomes highlighted in this study. Although prior work has argued that SPS measures are more likely to be associated with complications than mortality,14 hospital variation in validity of CLABSI and CAUTI reports potentially correlates meaningfully with SPS measures. For example, hospitals with better reporting might also have higher SPS, which could cancel out more conventional negative associations.
Our analyses also show that Leapfrog SPS measures, when provided, can substantially impact a hospital’s HSS and grade—however, again due to the highly skewed distributions of the SPS measures, on average, there is more potential for low scores to negatively impact a hospital’s grade than for high scores to improve a grade. Indeed, as most hospitals report perfect scores for most SPS measures, hospitals accurately reporting scores that fall in the lower half of the potential distribution end up with z-scores for these measures that are strongly negative (up to the trim point of −5). Given the composite weight of these measures—nearly ¼ of the total HSS—low (or even lower than perfect) SPS can take a hospital’s grade from “A” to “B,” or even “C.” For hospitals that are uncomfortable with or unable to report very high SPS, the current Leapfrog methodology thus presents a strong incentive against reporting SPS.
Alternatively, hospitals that improve SPS and/or report high, or even perfect, scores gain relatively modest advantages in their HSS. Perversely, there were 24 hospitals whose HSS declined after the highest SPS were imputed. This result is a function of the Leapfrog methodology converting highly skewed distributions into z-scores—in these cases, the most positive z-scores allowable by the SPS distribution were lower than the positive z-scores they had received for other Process measures; including SPS resulted in downweighting of these larger z-scores, and thus a lower grade. Leapfrog’s methodology, in tandem with the highly skewed SPS, results in a system that punishes hospitals whose scores fall at the lower end of the distribution far more significantly than it rewards those hospitals falling at the highest end.
Our study has several important limitations. First, we assess associations only among hospitals with all metrics of interest available; broader inclusion may have revealed more associations between SPS and outcomes. Second, we assess relationships between SPS and outcomes at 1 timepoint, thus ignoring potential for association over time, or correspondence between change in SPS and change in patient safety outcomes. Third, our simulations rely on an implied counterfactual that all other observed process and outcome measures would remain the same in presence of imputed levels of SPS.
Leapfrog has faced prior criticism for using methods that advantage HSS for hospitals participating in the Leapfrog Hospital Survey in ways unrelated to representations of valid hospital safety.27 This study revealed another way that Leapfrog Hospital Survey participation potentially advantaged hospitals. Rather than use Hospital Compare’s publicly reported CLABSI and CAUTI SIRs for the HSS for all hospitals, these SIRs were only used for hospitals who did not complete the 2013 Leapfrog Hospital Survey; participating hospitals were allowed to use self-reported rates instead. Our comparisons of these self-reported SIRs with the Hospital Compare SIRs found that while CLABSI SIRs were largely similar across data sources for hospitals participating in the Leapfrog Hospital Survey, self-reported CAUTI SIRs were substantially lower than Hospital Compare CAUTI SIRs. This resulted in an advantage for hospitals that participated in the Leapfrog Hospital Survey, as they received credit for a lower SIR; it also disadvantaged hospitals that did not participate in the Leapfrog Hospital Survey by artificially deflating the mean of the distribution with which these hospitals’ SIRs were compared.
Improving the Leapfrog HSS
Leapfrog’s mission to grade hospitals in a manner that is both methodologically rigorous and results in accessible comparisons is undoubtedly laudable. However, the lack of association between Leapfrog’s proprietary, and voluntary, SPS and the compulsory metrics reported on Medicare’s Hospital Compare website raises questions about the internal consistency of Leapfrog’s HSS. Recent press releases highlighting Fall 2016 Leapfrog grades28,29 illustrate the score’s 2 audiences: for consumers attempting reconciliation of safety-related metrics, the HSS offers a comprehensive measure incorporating proprietary process measures and important outcomes; for hospital administrators, an “A” grade from Leapfrog offers consumer-friendly marketing opportunities. For both groups, however, the composite is only meaningful if it is internally consistent, that is, if process measures correlate in meaningful ways with important outcomes. For consumers, important outcomes reflect personal health needs and concerns; if SPS does not provide a direct pathway from experience to outcome, its value is unclear. For administrators, important outcomes are increasingly defined by policies that incentivize or penalize certain metrics; SPS that adds more noise than signal to composite measures undermine any value-added proposition.
Some of the deficiencies of the Leapfrog HSS have straightforward remedies. For example, Leapfrog should use Hospital Compare’s CLABSI and CAUTI SIRs for all hospitals, rather than self-reported rates. Other deficiencies will require Leapfrog to align broader incentive structures with reporting accuracy, rather than opportunity for leniency. In the context of the HSS, where nearly all inputs now stem from compulsory, standardized measures, voluntary SPS self-reports represent a rare locus of hospital control.
Although Leapfrog currently incorporates methods for encouraging data accuracy, including requiring a letter of affirmation and flagging potentially erroneous or misleading reports,30 auditing processes are crucial for ensuring that variation in these measures reflects true differences in process best practice. Just as we would not expect drivers to turn themselves in for speeding, we should not expect hospitals to accurately self-report failure to protocolize safe practices. Leapfrog has recently implemented new efforts to externally validate data,30 which may help to incentivize accurate reporting. As a further step, Leapfrog should consider asking hospitals to report information about the survey completion process, including potential conflicts of interest, for example, which administrators spearheaded Leapfrog survey response? What direct access to clinical practice do they have? And what stake (if any) do they have in the hospital’s grade? To the extent that mechanisms of safe practices go beyond minimally implemented protocols, Leapfrog may also want to consider adding more objective safe practice measures to their survey.
Finally, Leapfrog should ensure that “honest” hospitals are not unfairly disincentivized to report less-than-ideal SPS measures. Given the strongly skewed distributions observed in recent SPS data, methods other than z-scores should be considered for making data commensurate.
In dissecting Leapfrog’s Safe Practices Score measures and HSS and grades, our study finds little association between self-reported SPS measures and publicly reported outcomes and penalties data. Further, we find that Leapfrog’s current methodologies, in combination with strongly positively skewed self-reports of SPS measures, punish low SPS reports substantially more than they reward high SPS. These concerns cast doubt on the utility of SPS and, more generally, the HSS and grades.
2. The Leapfrog Group. The Leapfrog Group homepage. 2016. Available at: www.leapfroggroup.org/
. Accessed March 30, 2016.
3. Austin JM, D’Andrea G, Birkmeyer JD, et al. Safety
in numbers: the development of Leapfrog’s composite patient safety
score for US hospitals. J Patient Saf. 2014;10:64–71.
4. Meyer GS, Denham CR, Battles J, et al. SafePractices for Better Healthcare–2010 Update: A Consensus Report
. Washington, DC: National Quality Forum; 2010.
11. Ghaferi AA, Osborne NH, Dimick JB. Does voluntary reporting bias hospital quality rankings? J Surg Res. 2010;161:190–194.
12. Jha AK, Orav EJ, Ridgway AB, et al. Does the Leapfrog program help identify high-quality hospitals? Jt Comm J Qual Patient Saf. 2008;34:318–325.
13. Birkmeyer JD, Dimick JB. Potential benefits of the new Leapfrog standards: effect of process and outcomes measures. Surgery. 2004;135:569–575.
14. Kernisan LP, Lee SJ, Boscardin WJ, et al. Association between hospital-reported Leapfrog Safe Practices Scores and inpatient mortality. JAMA. 2009;301:1341–1348.
15. Qian F, Lustik SJ, Diachun CA, et al. Association between Leapfrog safe practices score and hospital mortality in major surgery. Med Care. 2011;49:1082–1088.
16. Glance LG, Dick AW, Osler TM, et al. Relationship between Leapfrog Safe Practices Survey and outcomes in trauma. Arch Surg. 2011;146:1170–1177.
19. Dudeck MA, Horan TC, Peterson KD, et al. National Healthcare Safety
Network (NHSN) report, data summary for 2009, device-associated module. Am J Infect Control. 2011;39:349–367.
20. Dudeck MA, Edwards JR, Allen-Bridson K, et al. National Healthcare Safety
Network report, data summary for 2013, device-associated module. Am J Infect Control. 2015;43:206–221.
21. Centers for Disease Control and Prevention’s National Healthcare Safety
Network. Bloodstream Infection Event January 2016 (central line-associated bloodstream infection and non-central line-associated bloodstream infection). 2016. Available at: www.cdc.gov/nhsn/PDFs/pscManual/4PSC_CLABScurrent.pdf
. Accessed March 30, 2016.
23. StataCorp. Stata Stastical Sofware: release 14. College Station, TX: StataCorp LP. 2015.
26. Centers for Disease Control and Prevention. Paving the path forward: 2015 rebaseline. 2016. Available at: www.cdc.gov/nhsn/2015rebaseline/
. Accessed December 22, 2016.
27. Hwang W, Derk J, LaClair M, et al. Hospital patient safety
grades may misrepresent hospital performance. J Hosp Med. 2014;9:111–115.