At present, there is growing interest in the use of improvement collaboratives as a means of more rapidly achieving positive change, and several large-scale multicenter projects have begun. The evidence for the effectiveness of collaboratives as a means of accelerating improvement, however, has been mixed.3,14 The reasons for the lack of uniformity may include the highly variable nature of the collaboratives and the assortment of methods and strategies used. It is essential, therefore, that we begin to understand whether and what collaborative methodologies are effective and in what contexts. This article presents the results of one such collaborative in reducing inpatient mortality, details the methods used, and discusses possible reasons for the successes.
On January 1, 2008, the Premier health care alliance launched a multiyear performance improvement collaborative focusing on Quality, Efficiency, Safety and Transparency (QUEST). The hypothesis was that, with the use of the power of collaboration, improvement would occur at a more rapid pace. The framework in QUEST, through a strategic partnership with the Institute for Healthcare Improvement (IHI), used a specific improvement model8 that attributes lack of improvement in health care as a “failure of will, a failure of ideas, or a failure of execution.” The QUEST framework also used experience gained in an earlier initiative, the Premier Hospital Quality Incentive Demonstration (HQID), a 6-year project with the Centers for Medicare and Medicaid Services.9 That demonstration was successful in achieving its primary goal, adherence to evidence-based medicine (EBM)1,11; however, an impact on mortality has yet to be demonstrated.5,16,17
In this article, we test the hypothesis that participants in a structured collaborative were more successful in reducing mortality than hospitals that did not participate. To do so, we start with a comparison of risk-adjusted mortality trends between the 2 cohorts of hospitals, the initial “charter” members of QUEST and a group of non-QUEST hospitals that had access to the same software quality improvement tools. We also test the hypothesis using a multivariate model that isolates the effect of QUEST from other factors that may have impacted mortality.4,6,7,13
Collaborative Execution Framework
The methods used in the QUEST collaborative (described later) gave rise to the specific requirements for participation, namely, (1) commitment of senior leadership including the CEO, (2) use of a standard set of data analytic products that enabled capture of measurement data, (3) an agreement that all data would be transparent within the collaborative. The QUEST collaborative framework also required agreement on specific measures and methods to be used as well as agreement on the definition of top performance targets. With regard to the mortality improvement target, participants elected to study risk-adjusted, all-cause, hospital-wide mortality. The participants examined 3 potential methods for risk adjusting the data and chose to use the method initially developed by CareScience,10,15 which adjusts for palliative care and comorbid conditions among other factors. A target performance of an observed-expected (O/E) mortality ratio of 0.82 was chosen because it represented the lowest 25th percentile of mortality in the baseline period.
Each participant received a report on a quarterly basis, highlighting the institution’s O/E value and distance from the goal, which also included a breakdown of clinical subgroups that represented the highest areas of opportunity for improvement. Participants also had access to an analytical tool that allowed them to explore data in great detail. Participants could see the performance of all fellow participants and could drill into the data to find top performers in any given area. The CEO of each institution was also provided with a yearly performance summary.
In addition, Premier staff examined the pooled data to determine the greatest opportunities for mortality reduction (conditions where the number of deaths greatly exceeded the model prediction). Through another aspect of the collaborative, staff also became aware of substantial variation in the approach to and documentation of secondary diagnoses and in particular palliative care.
To support QUEST hospitals in improving their performance in the delivery of evidence-based care, Premier provided several offerings, from educational calls, customized action plans, Web-based resources, to “sprints” and “mini collaboratives.” A sprint is a short-term, rapid-cycle improvement education series designed to drive and sustain change in specific indicators or processes of care. Mini collaboratives are more intensive 6- to 9-month improvement initiatives focused on a specific condition, disease state, or process of care.
Measuring the Collaborative’s Impact
All observational information and data derived from a database maintained by Premier, which includes a pooled hospital cross-section time-series sample, consisting of approximately 36 million deidentified inpatient discharges from approximately 650 hospitals during a 6-year time frame.
We took 2 approaches to measuring the impact of the QUEST collaborative on mortality. The first approach, largely descriptive, tracked hospital mortality trends over 4 years since the start of the collaborative, comparing QUEST participants with other Premier hospitals with access to the same software tools but not participating in the collaborative (non-QUEST group). For the descriptive trend analysis, mortality was measured in each calendar quarter by comparing the observed mortality rate and the expected risk-adjusted mortality rate at the hospital level. The O/E ratios were tracked across a baseline period of third quarter of 2006 through the second quarter of 2007 and a performance period of the first quarter of 2008 through the last quarter of 2011.
The second approach was to conduct a multivariate analysis that uses a cross-section time-series regression model to isolate a QUEST effect from other factors that might explain both hospital effects and time trends. Table 1 summarizes the sample, giving the hospital counts by year, QUEST status, and cohort. Note that we have a full 2 years of non-QUEST data before the launch of QUEST in 2008. This formal inferential setting makes it possible to conduct hypothesis tests on the timing and strength of the QUEST effect.
Because coding practices can affect the expected mortality generated from the risk adjustment predictive model selected for QUEST, an analysis of coding practices was performed on the 2 cohorts to determine any factors that might be contributing to the observed differences. Non-QUEST hospitals were matched on bed size, urban/rural location, teaching status, and geographical region, and we examined the International Classification of Diseases, Ninth Revision, diagnosis data from the 141 hospitals in QUEST cohort and 141 hospitals in the matched non-QUEST cohort. Because palliative care code is considered an important marker for mortality risk, we specifically examined the frequencies of that code (V66.7).
In the multivariate analysis, the QUEST effect is inferred from a parametric estimation of a general regression model, which has the following functional form:
where yhq is the O-E difference (O-E Diff) mortality rate of hospital h at quarter q. The vector xhq includes hospital characteristics, data evolving control factors, and QUEST indicators. β is the marginal effect of the independent variables on the O-E Diff rate, and εhq is the random error component of the model.
In the model, hospital characteristics include bed size, teaching status, rural or urban location, and geographic area location. Data control factors are specified as yearly and quarterly (for seasonality) dummy variables. They are intended to capture the effect of general evolution of clinical practice and coding completeness during the study period. Descriptive statistics of the variables are in Table 2.
The QUEST effects on risk-adjusted mortality are modeled at 3 levels of parametric restriction. The first, most constrained model specification contains a fixed QUEST effect (specified as a binary QUEST flag) and a QUEST linear trend effect, represented by the number of quarters that have passed since a hospital joined QUEST. The flag is turned on for those quarters when the hospital participated in QUEST. For example, a hospital that joined QUEST in the first quarter of 2010 will get the flag turned on starting with that quarter and for all subsequent quarters of participation. If a hospital drops from QUEST, the flag is turned off.
In the second model specification, the linearity of the trend effect is relaxed by interacting the QUEST flag with annual time effects (yearly dummy variables). The third model specification is the least constrained, allowing for cohort effects, as well as flexible time effects by interacting the QUEST flag with the cohort, as well as the year. A hospital’s cohort is determined by the year it joined QUEST, that is, charter membership (starting in 2008), the “class” that started in 2009, and the class of 2010.
Another variant of the model introduces full hospital effects, which effectively removes the influence of all potential latent effects, thereby isolating the timing effects. This variant is applied to each of the 3 aforementioned model specifications—version “b” as distinct from the original version “a” described earlier. Version b is a type of Heckman specification2 to remove selection bias. In this setting, all hospital effects are represented by dummy variables, which replace all control traits, such as size, teaching status, location, and the like. Finally, version “c” of the model introduces hospital random effects to account for the correlation of observations over time within a hospital and the correlation of patients within a hospital. Hence, we allow for nonindependence of hospital disturbances over time, which, if treated as fixed, have the potential of inflating the significance of the QUEST effect. If the QUEST effect were purely based on self-selection into the collaborative, then the time effects would vanish in favor of sorting on hospital effects, whether fixed or random.
There were 141 charter members in QUEST. Not all QUEST hospitals license data to be included in the Premier research database, and these were excluded. During the 4-year performance period, some charter members of QUEST dropped out of the collaborative. There were 136 QUEST charter members with at least one quarter of data between the third quarter of 2006 and the last quarter of 2011. There were 317 hospitals in the Premier database that did not join QUEST at any time during the baseline or performance period with at least one quarter of data in the same period.
QUEST charter member and non-QUEST Premier hospital characteristics are shown in Table 3.
The change in mortality during the baseline and performance period for these 2 hospital cohorts is shown in Figure 1 as a 4-quarter moving average. The average O/E ratio for the baseline period for QUEST hospitals and non-QUEST hospitals was 0.98 and 1.07, respectively. By the end of the 4-year performance period, the QUEST hospital cohort average O/E ratio was reduced to 0.65 in the final year of the performance period compared with 0.75 for the non-QUEST cohort. There were 40,859 deaths avoided calculated for the QUEST hospital cohort compared with 38,115 death avoided calculated for the non-QUEST cohort. Deaths avoided were computed by subtracting observed deaths from expected deaths in each calendar quarter and finding the total for the performance period.
To compare results of non-QUEST hospitals with consistent data during a long period, only non-QUEST hospitals with data in every quarter of the baseline and performance period were examined separately. Imposing this constraint had no material effect on the results displayed in Figure 1.
The QUEST participants understood that expected mortality depends in part on the number of comorbid conditions coded in the discharge abstract. That raises the possibility that differences in O/E ratios between the QUEST cohort and the controls reflect (at least in part) differences in the rate of coding secondary diagnoses. Figure 2 shows that the rate of recording secondary diagnosis codes was indeed higher among QUEST hospitals at the outset, but over time, the non-QUEST facilities increased their use of secondary diagnosis codes at a faster pace than the QUEST cohort, and by the third quarter of 2011, the gap nearly closed. Figure 3 depicts the results of a paired t test of matched hospitals, which tests the null hypothesis that between the QUEST cohort and the non-QUEST controls, the difference in the number of secondary diagnoses per patient is zero (heavy horizontal line at 0.00). The point estimate of that difference in our sample is the boundary between the lower half (dash shaded region) of the 95% confidence interval and the upper half (stripe shaded region). Although the point estimate does not cross over the zero line until the middle of 2010, the confidence interval contains the zero line in all but 5 quarters, so in only those 5 quarters was the number of secondary codes per patient in each cohort statistically different.
Similar results were found in an analysis of the use of the palliative care code (v66.7), which was a code that QUEST members were particularly attentive to because it is a significant contributor to the risk of mortality. Despite the potential to differentiate QUEST participant, we found no statistical differences between the 2 matched comparison cohorts. Indeed, although the non-QUEST control had slightly lower use rate of the code at the beginning of the study period, they had a higher rate by the fourth quarter of 2010 (Fig. 4).
All 3 alternative specifications of the regression model demonstrated statistically significant QUEST effects for risk-adjusted mortality. The size of the effect varied depending on the model specification and whether controls for latent hospital effects were introduced into the model. However, the estimated coefficients were highly consistent across the restriction, as summarized in Table 4.
The results from models 1a, 2a, and 3a assume that latent effects do not bias the results, whereas models 1b, 2b, and 3b (the Heckman variation) are included in Table 4 to show how the introduction of fully interacted hospital changes the estimates of the QUEST effect. This model variant effectively removes the influence of potential latent effects, thereby isolating the timing effects and removing selection bias. In this setting, all hospital effects are represented by dummy variables, which replace all control traits, such as size, teaching status, location, and the like. The random-effect versions of the models are represented in specifications 1c, 2c, and 3c. In each variant of the 3 model specifications, the full fixed-effects and random-effects results are that the QUEST effects are attenuated to approximately half of the more restricted model, which demonstrates that self-selection alone cannot explain all of the QUEST effect. More specific results for each model are described in the following sections.
The QUEST hospitals have a risk-adjusted mortality rate 0.18% lower than non-QUEST hospitals, assuming the same hospital mix. Given an overall mortality rate of approximately 2%, the absolute difference of 0.18 percentage points is a relative difference of approximately 9% in favor of QUEST hospitals. A similar but smaller effect was seen in models 1b and 1c (random hospital effects). Nevertheless, we found no statistically significant linear relationship between risk-adjusted mortality and the duration of a hospital’s participation in QUEST.
The interactive variables between QUEST flag and year dummies allow the QUEST trend effect to be nonlinear. In each year, QUEST hospitals performed better than non-QUEST hospitals. In the model 2a, this effect was statistically significant each year. However, there has been no progressive QUEST effect over time in either the fixed- or random-effects versions (2b and 2c).
In this model, the QUEST effect is examined by each class. Charter members performed significantly better than non-QUEST hospitals in all 4 years, but there is no progressive QUEST effect over time. Both classes 2009 and 2010 started with no significant difference from non-QUEST hospitals but ended performing better in 2011 in model 3a, suggesting a strong lag effect for mortality reduction associated with QUEST membership. Interestingly, controlling for latent effects in models 3b and 3c significantly reduced the size effect of the coefficient for the class of 2010 in the final year of this study.
The study found that hospitals participating in Premier’s QUEST Collaborative reduced the O/E mortality ratio as much as 10% more than a matched group of non-QUEST Premier hospitals—a group committed to quality improvement with access to many of the tools QUEST participants had. The matching result was corroborated in the formal multivariate analysis.
We found no evidence that these improvements in O/E mortality could be attributed solely to improved coding and more precise documentation; rather, all hospitals’ “expected” rate of mortality is increasing across the board as documentation improves. This is likely due to pressure for accurate coding as hospital payments become increasingly tied to risk-adjusted outcomes.
In addition, focused discussions with QUEST members helped to identify factors that contributed to the success of the collaborative. Several themes have emerged, which correspond to our model for collaboration. (1) Building will—that all results and data are transparent to everyone in the collaborative has been cited as a means to provide a sense of urgency, and participants once complacent in the assumption that they were providing the highest possible care have often been confronted with a different reality. (2) Sharing ideas—because data are collected in a common format, participants and staff are able to identify those “islands” of top performance in specific domains that serve as models. (3) Collaborative execution—organized sprints and collaboratives provide a structured means to facilitate improvement much like the management systems of successful enterprises.
In addition, we have corroborating evidence that hospitals took concrete actions to lower their O/E ratios: O was driven down by a number of interventions that took place in a matter of months, such as advances in sepsis treatment and better discharge management that led to greater use of hospice care.
The results of the multivariate model established an undeniable and statistically significant “QUEST effect.” The question, however, is whether it is purely a selection effect, rather than the result of collaborative efforts to reduce mortality. Evidence for the latter emerged from controlling for the potential selection effect (QUEST participation greater for relatively higher-performing hospitals at the outset). Indeed, the ensuing statistical results revealed that the selection effect could not explain the QUEST effect. Although it is true that the median O/E mortality ratio started lower in the QUEST cohort, the individual hospital O/E ratios were widely distributed with many QUEST hospitals underperforming relative to peers. For mortality in the cohort to remain consistently lower than the control group, sustained and ongoing improvement was necessary.
Although these results do not rule out a selection effect, they do indicate that it is unlikely to be the entire explanation because the specification of our multivariate model explicitly captures the time effect across hospitals. In so doing, the QUEST effect parameter estimates control for individual hospital effects that would be the source of the selection effect. In other words, because we include data from the pre-QUEST period (2006 and 2007), our measure of the QUEST effect means at the very least that QUEST hospitals changed their behavior with the outset of QUEST in 2008.
The version of our model that includes fully interactive hospital fixed effects controls for all hospital-specific latent (unobserved) characteristics, within the bounds of our available data, which leaves only time effects to be explained. If selection explained the entire QUEST effect, the effect would vanish in that version of the model, but that did not happen. Approximately half of the QUEST effect remained. Hence, at least some of the QUEST effect is about what hospitals did during their participation in the QUEST collaborative.
Although the results from this study are compelling, they may not be applicable to all hospitals because hospital members of the Premier database have resources required to be in the quality improvement alliance. In particular, they have specific analytic database capabilities, access to benchmarks, and sufficient staff to engage in the sprints and mini collaboratives. Not all hospitals have such resources. Although QUEST was designed to be scalable and adoptable by any hospital in the nation, a hospital’s lack of resources may hinder its ability to participate in such a collaborative.18 However, a recent study found no association between mortality and hospital margins, so it may just be related to how such a program would be prioritized.12
In this study, we only examined only inpatient deaths and not deaths occurring in a 30-day period. Therefore, some of the reduction in mortality could arise from patients being allowed to die in other settings such as home or hospice. We know that the approach to end-of-life care varies greatly, and in fact, some hospitals specifically addressed this.10 We feel that matching the needs and preferences of the patient to an appropriate end-of-life setting is itself an improvement in patient-centered and family-centered care.
There is much speculation about the reasons for improvement in QUEST hospitals. Many activities were available exclusively to QUEST members; however, most were of an educational nature and not directly interventional. It was not possible, therefore, to identify the exact activities that each QUEST member implemented. To that end, this study does not draw a direct causal pathway between these individual activities and outcomes (mortality). As we move forward with QUEST, we are focused on directly capturing the activities and interventions hospitals are participating in. We expect to determine in which contexts each of these becomes important.19
Given the relative improvement of the QUEST collaborative participants compared with their peers, this study has potential policy implications with regard to transparency, promotion of success tactics, providing a platform for structured improvements, and goal setting. This is evident when contrasting QUEST with our previous effort to establish a major improvement collaborative—HQID. Although it shared many of the features of the Centers for Medicare and Medicaid Services/Premier HQID—commitment of senior leadership, collection of a common data set via standard tools, and transparency—QUEST differed from HQID in several important ways, which we feel may have contributed to its success. Whereas HQID used an educational portal to post improvement ideas, QUEST actively sought out “islands of excellence” and actively promoted success tactics through a variety of mechanisms. In addition, whereas HQID relied for the most part on hospitals to structure their own improvement efforts internally, QUEST provided a structured platform for collaborative execution. Finally, HQID was structured as a “tournament”, with a relative (highest 25 percentile) top performance goal, ensuring, by its nature, that 75% of the participants would not be top performers. QUEST used a fixed goal set in advance, providing a condition in which every participant could achieve the goal and fostering an environment of mutual assistance and collaboration. All of these factors seemed to have potentially played a role in improving mortality outcomes in QUEST but will require further study to draw the direct causal pathway. To better understand the mechanism of the effect, we recommend further research to help document causation.
1. Grossbart SB. What’s the return? Assessing the effect of pay-for-performance initiatives on the quality of care delivery. Med Care Res Rev
. 2006; 63: 29S–48S.
2. Heckman J. Sample selection bias as a specification error. Econometrica
. 1979; 47: 153–161.
3. Hulscher ME, Schouten LM, Grol RP, et al. Determinants of success of quality improvement collaboratives: what does the literature show? BMJ Qual Saf
. 2013; 22: 19–31.
4. Jha A, Orav EJ, Epstein AM. The effect of financial incentives on hospitals that serve poor patients. Ann Intern Med
. 2010: 299–306.
5. Jha A, Joynt KE, Orav EJ, et al. The long-term effect of premier pay for performance on patient outcomes. N Engl J Med
. 2012; 366: 1606–1615.
6. Kahn CN III, Ault T, Isenstein H, et al. Snapshot of hospital quality
reporting and pay-for-performance under Medicare. Health Aff
. 2006; 25: 148–162.
7. Kruse GB, Polsky D, Stuart EA, et al. The impact of hospital pay-for-performance on hospital and Medicare costs. Health Serv Res
. 2012; 47: 2118–2136.
8. Langley GJ, Moen R, Nolan KM, et al. The Improvement Guide: A Practical Approach to Enhancing Organizational Performance
, 2nd ed, San Francisco, CA: Jossey-Bass Publishers, 2009.
10. Kroch E, Johnson M, Martin J, et al. Making hospital mortality measurement more meaningful: incorporating advance directives and palliative care designations. Am J Med Qual
. 2010; 25: 24–33.
11. Lindenauer PK, Remus D, Roman S, et al. Public reporting and pay for performance in hospital quality
improvement. N Engl J Med
. 2007; 356: 486–496.
12. Ly DP, Jha AK, Epstein AM. The association between hospital margins, quality of care, and closure or other change in operating status. J Gen Intern Med
. 2011; 26: 1291–1296.
13. Mehrota A, Damberg CL, Sorbero ME, et al. Pay for performance in the hospital setting; what is the state of the evidence?. Am J Med Qual
. 2009; 24: 19–28.
14. Nadeem E, Olin SS, Hill LC, et al. Understanding the components of quality improvement collaboratives: a systematic literature review. Milbank Q
. 2013; 91: 354–394.
15. Pauly MV, Brailer DJ, Kroch EA. The corporate hospital rating project: measuring hospital outcomes from a buyers perspective. Am J Med Qual
. 1996; 11: 112–122.
16. Ryan AM. Effects of the Premier hospital quality
incentive demonstration on Medicare patient mortality and cost. Health Serv Res
. 2009; 44: 821–841.
17. Ryan AM, Blustein J, Casalino L P. Medicare’s flagship test of pay-for-performance did not spur more rapid quality improvement among low-performing hospitals. Health Aff (Millwood)
. 2012; 31: 797–805.
19. Van Citters AD, Nelson EC, Schultz L, et al. Understanding Context-specific Variation in the Effectiveness of the QUEST Program: An Application of Realist Evaluation Methods. Podium presentation at the Academy Health Annual Research Meeting. Orlando, FL June 18, 2012. Available at http://www.academyhealth.org/files/2012/sunday/citters.pdf
. Accessed July 12, 2014.
Keywords:Copyright © 2015 Wolters Kluwer Health, Inc. All rights reserved
hospital quality; collaborative improvement; inpatient mortality