Secondary Logo

Journal Logo

Rating the Raters: The Inconsistent Quality of Health Care Performance Measurement

Shahian, David M. MD; Normand, Sharon-Lise T. PhD; Friedberg, Mark W. MD, MPP; Hutter, Matthew M. MD, MPH; Pronovost, Peter J. MD, PhD

doi: 10.1097/SLA.0000000000001631

*Department of Surgery, Massachusetts General Hospital, Boston, MA

Center for Quality and Safety, Massachusetts General Hospital, Boston, MA

Harvard Medical School, Boston, MA

§Department of Health Care Policy, Harvard Medical School, Boston, MA

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA

||RAND Corporation, Boston, MA

**Department of Medicine, Brigham and Women's Hospital, Boston, MA

††Armstrong Institute for Patient Safety and Quality, Johns Hopkins University School of Medicine, Baltimore, MD

‡‡Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, MD.

Reprints: David M. Shahian, MD, Massachusetts General Hospital, 55 Fruit St., Bulfinch 284, Boston, MA 02114. E-mail:

Disclosures: David M. Shahian, MD, is a member of the National Quality Forum Executive Committee and Board of Directors. He has led cardiac surgery performance measurement and public reporting efforts in Massachusetts and nationally (with the Society of Thoracic Surgeons).

Sharon-Lise T. Normand, PhD, is Director of the Massachusetts Data Analysis Center (Mass-DAC), which produces public report cards for all non-federal hospitals performing CABG surgery and PCI for the Commonwealth of Massachusetts, and has provided methodological leadership to the national development of hospital and physician performance measurements to the Centers for Medicare and Medicaid Services, the American College of Cardiology Foundation, the American Heart Association, and the Society of Thoracic Surgeons.

Peter J. Pronovost, MD, PhD, FCCM, is the Johns Hopkins Medicine Senior Vice President for Patient Safety and Quality, and Director of the Armstrong Institute for Patient Safety and Quality. In these roles, Dr Pronovost reviewed internal performance data for a Johns Hopkins surgeon who had a high “Adjusted Complication Rate” on the ProPublica Surgeon Scorecard and met with this surgeon and his department director.

Funding sources: Internal.

The author reports no conflicts of interest.

More than a century after Dr Ernest Amory Codman of the Massachusetts General Hospital first advocated for health care transparency and public accountability, these principles have finally achieved widespread support. Report cards abound: the federal government, professional organizations such as the Society of Thoracic Surgeons1,2 and the American College of Surgeons, consumer groups, media companies such as Consumer Reports and US News and World Report, and “public interest” organizations such as ProPublica all produce reports on the quality of care provided by hospitals and physicians.

Now that the ethical and health policy arguments for transparency are broadly accepted, attention has focused on more practical questions, including selection of the most appropriate measures and establishment of reasonable standards that should be met before those metrics are used for public reporting and reimbursement.

Back to Top | Article Outline


In 1966, when Avedis Donabedian first proposed the triad of structure, process, and outcomes to measure health care quality, he observed that “Outcomes, by and large, remain the ultimate validators of the effectiveness and quality of medical care.”3 Most stakeholders now agree that outcomes measures (including clinical and patient-reported outcomes) best reflect what matters most to patients.

In that seminal article, Donabedian also enumerated the many inherent challenges of measuring health care outcomes, challenges that persist nearly 50 years later. Accurately and fairly assessing provider performance is a complex task (Table 1).4 It requires clinicians to identify relevant risk factors and endpoints; statisticians with experience in the methodological nuances of performance measurement; and experts in health literacy and consumer behavior to assure that performance results are conveyed in ways that can be understood and used by different stakeholders.



Back to Top | Article Outline


Given the analytical complexity of health care outcomes measures and lack of widely accepted and enforceable standards, flawed or inconsistent rating methodologies are neither surprising nor rare.5–12 In some instances, different rating organizations have produced completely divergent ratings for the same hospital during the same rating period, and patients have no way of knowing which are more accurate and relevant to their needs.

Contrary to the aphorism that “any data are better than no data,” bad data or methodology can often be worse than no data, and can produce serious unintended consequences. They misclassify doctors and hospitals, misinform and confuse the public, and squander increasingly scarce resources. Inaccurate report cards can mislead patients into seeking care from providers falsely labeled as above average in performance, or steer patients away from truly excellent providers mistakenly identified as low performing. Providers incorrectly categorized as below average may divert resources to address alleged but nonexistent quality issues, whereas truly low-performing providers are lulled into complacency. Based on faulty reports, payers may reward low-performing providers or penalize true high performers. Ultimately, flawed report cards foster cynicism and distrust of all performance measurement.

Back to Top | Article Outline


Most measure developers want to do a good job. However, contrary to best practice in most scientific disciplines, the reliability, validity, and usefulness of health care performance rating methodologies are not always adequately verified before their results are published. Reports of provider outcomes often reach the public without the filter of comprehensive external review. Many health care performance measure developers do not submit their measures for publication in credible peer-reviewed journals or seek similar robust review. Some forgo traditional peer review and publish unreviewed or informally reviewed methodologies contemporaneously with their report cards, leaving no opportunity for prior vetting and improvement. Once ratings are published, retractions are rare, even when serious flaws are discovered.

Rigorous peer review could play an important role in improving the quality and utility of health care performance reports. This process, which is standard for credible scientific publications, provides an opportunity for independent experts to critically evaluate research methodology, results, and conclusions, often making suggestions that substantially improve the final work product. The integrity of the peer-review process is further enhanced through final adjudication by an editor or publisher who has relevant scientific expertise, full access to reviewer comments, and the ability to require authors to respond to valid reviewer concerns. In some instances, when research is irrevocably flawed, this process is designed to reject it for publication.

The July 2015 ProPublica publication of surgeon outcomes for 8 procedures13 illustrates a report card that may have benefited from more systematic, external peer review. Notwithstanding ProPublica's interesting past work in investigative journalism, its surgeon report card led the organization into unfamiliar territory, the science of provider profiling, a highly specialized area in which it had little prior experience. ProPublica developed a new and idiosyncratic surgeon performance measure, deemed it acceptable based on feedback from reviewers they selected, and then published their ratings of nearly 17,000 surgeons, some of whom were individually discussed in an accompanying “Making the Cut” article. A critique published by the RAND Corporation identified serious shortcomings in ProPublica's scientific methodology, such as problematic construct validity (eg, ProPublica's “Adjusted Complication Rates,” which exclude in-hospital complications and those not resulting in readmission, are not actually complication rates by any conventional understanding of that term); masking of between-hospital performance differences; inadequate risk adjustment (eg, the risk model coefficient for their generic comorbidity score is zero, indicating it has no effect in their model); inaccurate attribution of procedures to the appropriate surgeons (eg, in some instances to nonsurgeons or surgeons in the wrong specialties); and failure to calculate or require minimum levels of measurement reliability, thereby leading to potentially high rates of performance misclassification due to chance.14 It is also unclear whether quality and safety experts whose endorsements are cited by ProPublica actually conducted detailed reviews of their specific report card methodology, or rather were simply lending their general support to a new transparency initiative. Conversely, 2 coauthors of this Surgical Perspective (DMS, PJP) did convey serious concerns to ProPublica staff about their methodologies well in advance of the Scorecard's publication, but these issues were not resolved. We do not know how many other experts made similar comments or suggestions, and how these were addressed by ProPublica.

Back to Top | Article Outline


A century ago, describing hospital performance reports, EA Codman presciently observed: “… they are not used by anybody … they are too inaccurate and too diverse in plan and method of classification to be of service either for comparison one with another, or for large statistics made by adding them one to another.” Despite substantial progress in performance measurement, there are still few widely accepted standards, and the current mélange of unvalidated, inconsistent methodologies undermines rather than advances the goals of responsible public reporting.

We must promote development of the most scientifically credible performance measures and encourage their preferential use by the public. Several approaches may help. First, there is a role for nationally recognized measurement standards similar to those in the financial industry (eg, the Financial Accounting Standards Board). Second, professional societies and other responsible developers must accelerate their production of valid performance measures based on the best data and methodologies, thereby filling measure gaps which might otherwise be populated by flawed measures. Third, whether or not they are published in peer-reviewed journals, all high-stakes performance measures should be fully transparent and subjected to systematic, external review before they are made public. Methodological errors could then be identified and corrected before, rather than after, patients and other stakeholders have acted upon faulty information, and before the reputations of providers have been unfairly enhanced or impugned. If external review is not conducted through the more traditional pathway of scientific publication, then alternative, centralized, peer-review entities could be established, much like central IRBs for research trials. Finally, submission of measures for consideration by the National Quality Forum, a multistakeholder private–public partnership, provides an exceptional opportunity for rigorous measure vetting and external endorsement.

Governmental agencies and professional organizations can play significant roles. The Centers for Medicare and Medicaid Services, the Agency for Healthcare Research and Quality, the Centers for Disease Control and Prevention, and the Food and Drug Administration all have expertise in health care quality and outcomes measurement and could collaborate to develop and disseminate appropriate standards. Measures not adhering to such standards would be viewed less favorably and their use would be discouraged. The American Heart Association promulgated standards for statistical models used for public reporting of health outcomes almost a decade ago,15 and recommended standards for public reporting have been published by the Surgical Quality Alliance16 and the American College of Cardiology Foundation.17

Back to Top | Article Outline


Patients and providers deserve valid and transparent performance measures, and hospitals and doctors should be accountable for the care they provide. Flawed measures, however, are not only meaningless but may actually harm patients, providers, and other stakeholders. The measurement enterprise must be held to the same high standards that we appropriately expect of health care providers.

Back to Top | Article Outline


1. Shahian DM, Edwards FH, Jacobs JP, et al. Public reporting of cardiac surgery performance: Part 1—history, rationale, consequences. Ann Thorac Surg 2011; 92:S2–S11.
2. Shahian DM, Edwards FH, Jacobs JP, et al. Public reporting of cardiac surgery performance: Part 2—implementation. Ann Thorac Surg 2011; 92:S12–S23.
3. Donabedian A. Evaluating the quality of medical care. Milbank Mem Fund Q 1966; 44:166–206.
4. Normand S-LT, Shahian DM. Statistical and clinical aspects of hospital outcomes profiling. Statist Sci 2007; 22:206–226.
5. HANYS’ Report on Report Cards. Understanding publicly reported hospital quality measures. Available at: 2013. Accessed August 1, 2015.
6. Austin JM, Jha AK, Romano PS, et al. National hospital ratings systems share few common scores and may generate confusion instead of clarity. Health Aff (Millwood) 2015; 34:423–430.
7. Leonardi MJ, McGory ML, Ko CY. Publicly available hospital comparison web sites: determination of useful, valid, and appropriate information for comparing surgical quality. Arch Surg 2007; 142:863–868.
8. Berenson RA, Pronovost PJ, Krumholz HM. Achieving the Potential of Health Care Performance Measures. Robert Wood Johnson Foundation and Urban Institute. Available at: 2013. Accessed February 5, 2014.
9. Friedberg MW, Damberg CL. A five-point checklist to help performance reports incentivize improvement and effectively guide patients. Health Aff (Millwood) 2012; 31:612–618.
10. Rothberg MB, Morsi E, Benjamin EM, et al. Choosing the best hospital: the limitations of public quality reporting. Health Aff (Millwood) 2008; 27:1680–1687.
11. Rothberg MB, Benjamin EM, Lindenauer PK. Public reporting of hospital quality: recommendations to benefit patients and hospitals. J Hosp Med 2009; 4:541–545.
12. Rosenbaum L. Scoring no goal—further adventures in transparency. N Engl J Med 2015; 373:1385–1388.
13. Wei S, Pierce O, Marshall A. ProPublica surgeon scorecard. Available at: 2015. Accessed September 23, 2015.
14. Friedberg MW, Pronovost PJ, Shahian DM, et al. A methodological critique of the ProPublica Surgeon Scorecard. Santa Monica, CA: RAND Corporation. Available at: 2015. Accessed September 25, 2015.
15. Krumholz HM, Brindis RG, Brush JE, et al. Standards for statistical models used for public reporting of health outcomes: an American Heart Association Scientific Statement from the Quality of Care and Outcomes Research Interdisciplinary Writing Group: cosponsored by the Council on Epidemiology and Prevention and the Stroke Council. Endorsed by the American College of Cardiology Foundation. Circulation 2006; 113:456–462.
16. Surgical Quality Alliance. Surgery & Public Reporting: Recommendations for Issuing Public Reports on Surgical Care. Available at: 2014. Accessed February 6, 2014.
17. Drozda JP Jr, Hagan EP, Mirro MJ, et al. ACCF 2008 health policy statement on principles for public reporting of physician performance data: a Report of the American College of Cardiology Foundation Writing Committee to develop principles for public reporting of physician performance data. J Am Coll Cardiol 2008; 51:1993–2001.
Copyright © 2016 Wolters Kluwer Health, Inc. All rights reserved.