Health care quality measurement is an important accountability factor that is used for assessing provider quality and making provider payments.1 The National Quality Forum (NQF) has endorsed >700 quality measures, with many increase in the pipeline.2 For many of these measures, collecting and reporting data are a complex, time-consuming, manual process.2 The adoption of electronic health records (EHRs) has long been viewed as the key to eliminate this major barrier to health care measurement.3 As the development and sophistication of the EHR grows,4–8 it has increasingly become a focus of measurement and measure developers.9–15
To date, retooling efforts have primarily focused on process measures.16,17 However, outcome measures—which quantify the end results of health care—are frequently of most interest to the consumers and payers.18,19 The usefulness of any outcome measure is dependent upon the quality of the risk adjustment model accompanying it. Risk adjustment is the process of taking patient risk factors (eg, comorbidities, severity of illness, physiological status), into account when comparing providers’ outcomes.20 Unfortunately, the clinical data necessary (eg, forced expiratory volume or carotid artery diameter occlusion) to develop robust outcome measures are often located in numerous places in the medical record, making data collection a labor-intensive, primarily manual process. Obtaining risk factors directly from EHRs may reduce burden and costs in the data abstraction and risk adjustment processes.
This study examines the ability of EHRs to capture the detailed clinical information used to assess quality of care in a clinical registry and compares hospital quality assessments based on EHRs with corresponding assessments based on clinical registry data for all nonfederal hospitals in the state. The measures of interest are the New York State Department of Health (NYS DOH)’s risk-adjusted in-hospital 30-day mortality (death in the hospital at any time in the index admission, or death within 30 days of surgery after discharge from the hospital) and hospital readmission within 30 days following discharge after a coronary artery bypass graft (CABG) procedure that are used in New York’s Cardiac Surgery Reporting System (CSRS) annual reports.
Cardiac surgery outcomes were first evaluated by the Veterans Administration in the late 1980s,21 and shortly afterward, the NYS DOH began to publicly report hospital risk-adjusted mortality outcomes.22 New York’s CSRS was created in 1989 after the DOH noted that there were large interhospital variations in short-term CABG surgery mortality rates and no way to the extent to which these differences were related to the preoperative risk of patients versus the quality of cardiac care with existing data. Consequently, a clinical registry containing all major risk factors for short-term adverse outcomes was created to risk-adjusted hospitals’ (and later surgeons’) outcomes. There is mandatory reporting of these risk factors as well as outcomes (in-hospital mortality, complications by all nonfederal hospitals in the state. Out-of-hospital deaths are captured by matching registry data to vital statistics data. The data are also audited to ensure completeness and accuracy. Patient risk factors contained in the report form are listed in Appendix 1 (Supplemental digital Content 1, http://links.lww.com/MLR/B745), and their definitions are contained in Appendix 2 (Supplemental Digital Content 2, http://links.lww.com/MLR/B746). A more detailed description of the system and its history is available elsewhere.23
In the 1990s, similar reports evaluating hospital and physician-level outcomes for valve surgery, pediatric cardiac surgery, and percutaneous coronary interventions were released, and all these reports continue to be published. There is evidence that mortality rates have declined more in New York than nationally, and several studies have concluded that that this is related to public reporting.24–30 Moreover numerous studies have used the New York registries to evaluate different ways of performing cardiac procedures and providing cardiac care.23
The goal of our analysis was to determine the feasibility of obtaining key data elements directly from the EHR to develop an EHR-based risk model for the CABG mortality and readmission measures. This assessment was conducted using guiding principles and criteria identified by the NQF’s eMeasure Feasibility Assessment Project.31 The federal government relies on NQF-defined measures as the best, evidence-based approaches to improving care, and The NQF framework/scorecard is a requirement for all electronic clinical quality measures that are submitted to NQF for endorsement consideration.
In keeping with NQF recommendations, a collaborative approach involving EHR vendors, individuals from the measure development community and health care providers was undertaken. The first phase of our feasibility assessment involved the engagement of vendors and quality measurement software developers. During this phase, we assessed the perceived feasibility of electronic data extraction for the required data elements. During the second phase of our feasibility assessment, focus groups were conducted with staff from a sample of New York State (NYS) hospitals to obtain their perspective on the feasibility of electronic extraction of data elements for the CABG measures and to determine the level of agreement between the hospital focus group participants and the vendors.
EHR vendors are responsible for implementing the specifications of eMeasures. An eMeasure is a health quality measure that is encoded in Health Quality Measures Format. Health Quality Measures Format is a standard for representing a health quality measure as an electronic document that enables consistency for quality measures by standardizing the structure, meta data, definitions and logic of the measures.32 Leaders representing 2 leading EHR vendors (EPIC and McKesson) and a quality measurement software development company (Medisolv) participated in five 90-minute consultation meetings with project staff. Each expert provided extensive input with regard to the viability of electronically extracting the specific data elements required to risk-adjust the NYS CSRS) CABG mortality and readmission rates. Discussions were structured around a framework designed to gathered input with regard to each of the data elements that included vendor ratings for 3 high-level criteria33: data completeness (data availability, data extractability—“How likely is it that structured data is available and extractable?”), data correctness (data accuracy, data reproducibility—“How likely is it that structured data is accurate and reliable?”), and data normalization (critical for comparability across organizations—“How likely is it that the data to be encoded is in standard terminology?”), Appendix 3 (Supplemental Digital Content 3, http://links.lww.com/MLR/B747).
This framework was then incorporated into a data element feasibility scorecard. A scorecard is a useful tool to obtain uniform information about the feasibility of each of the data elements in the measure. Participants were asked to consider each data element in the measure and to think about the feasibility of obtaining each data element directly through the EHR with no manual abstraction. They were asked to assign a separate score for each of the 3 data quality criteria using a 5-point Likert-type scale, which was soon collapsed to 3 categories because there was not sufficient granularity to support 5 categories. The final categories were: Easy, Somewhat Difficult, and Very Difficult to extract directly from the EHR. A variable was determined to be Easy to abstract if it was scored as “likely” or “very likely” for all the feasibility criteria, Somewhat Difficult if it was scored as “somewhat likely” or “unlikely” for at least one feasibility criterion and Very Difficult if it was scored as “very unlikely” for at least one feasibility criterion. More points were assigned as the likelihood increased that the data element would currently be available in a structured format in an EHR that is extractable, accurate, reliable, and can be mapped to a standardized vocabulary.
When the vendors agreed that a data element was “easy” to extract (eg, weight or sex), it was not discussed further. For the other data elements, there were lengthy discussions with regard to the ability to obtain them from other electronic sources and the ability to convert them to a structured format. Eventually, these discussions resulted in a consensus with regard to whether the other data element were either somewhat difficult to obtain or very difficult to obtain. As an example of a data element that is somewhat difficult (can be available and extractable from an EHR or modified EHR with some work), ejection fraction would require a LOINC (Logical Observation Identifiers Names and Codes) code to identify the test used to obtain it. If the code is not currently captured in a hospital’s EHR, the system could be modified to create an interface to capture it from other sources. This discrete field would then need to be mapped to the standardized terminology to encode the data.
Hospital Focus Groups
The next step consisted of presenting the vendors’ evaluations to the hospital focus group participants. Recruitment letters, calls, and follow up calls were made to the Director or Chief of Cardiothoracic Surgery at each of the 40 NYS cardiac surgery hospitals. A convenience sample of 13 hospitals was obtained (these were the hospitals that agreed to participate). The primary contacts at each of the participating hospitals were asked to provide the staff member/s who were most closely involved in data abstraction and/or submission to the NYS Cardiac Care Registry to participate in the focus groups. This snowball sampling approach resulted in 20 focus group participants across the 13 participating hospitals. Four 90-minute focus groups were conducted with the data coordinators, managers and/or abstractors. Participants helped determine the viability of electronically extracting the specific data elements required to risk adjust the NYS CSRS CABG mortality and readmission rates (Appendix 4, Supplemental Digital Content 4, http://links.lww.com/MLR/B748 contains the focus group discussion guide).
To compare the impact of risk-adjusting with the subset of risk factors that the vendors and hospital focus groups identified as being amenable for use in an EHR format with the entire set of risk factors available in the NY registry, the project utilized the methodology for assessing hospital risk-adjusted mortality and readmission rates for CABG surgery used by the NYS DOH described below.34 There were 40 hospitals in the 2010–2012 period (all hospitals with CABG surgery in the state), and the number of patients per year ranged from 8142 to 9421. Annual mortality rates ranged from 1.2% to 1.6% during this period, and readmission rates ranged from 14.4% to 15.0%.
Obtaining Risk-adjusted Outcome Rates
A description of the process used for obtaining risk-adjusted mortality and readmission rates is as follows:
- Identify all patient-related risk factors for the outcomes contained in the DOH CSRS. One of the patient-related risk factors, shock, is not included because patients with shock before CABG surgery are not reported publicly due to their exceedingly high risk of mortality. Also, identify the subsets of these data elements that are “easy” to obtain and the subset that are “somewhat difficult” to obtain from EHRs.
- Identify the presence or absence of the outcomes for each patient undergoing isolated CABG surgery (CABG surgery without any other major open-heart surgery in the same admission). In-hospital mortality outcomes are available in the registry and out-of-hospital deaths are obtained by matching with vital statistics data. Readmission outcomes are obtained by matching with the state’s hospital administrative database.
- Develop statistical models that predict, for each patient, the probability that patient will experience each outcome using backwards logistic regression to obtain the subset of candidate risk factors that are independently significant. Cross-validation was used to identify a set of significant risk factors for half of the dataset and these risk factors were then tested on the remaining half of the data. Risk factors that remained significant were then used for the entire dataset. For the registry [New York Public Reporting Risk Model (NYPRRM)], the candidate risk factors were all risk factors available in the registry. For the EHR models, the candidate risk factors were the subset of data elements that were either “easy” to obtain from EHRs or “easy or somewhat difficult” to obtain using EHRs.
- The expected outcome rate (EOR) for each hospital based on the statewide experience is then obtained by summing the predicted probabilities of the outcome that come from (3) above for all patients in that hospital, and then dividing by the number of patients undergoing CABG surgery in the hospital. Each hospital’s EOR is then contrasted with its observed outcome rate, which is the number of patients in the hospital who experienced the outcome divided by the total number of patients undergoing CABG surgery in the hospital. Multiplying the hospital’s observed/expected ratio by the statewide outcome rate yields the hospital’s risk-adjusted rate.
- Next, a 95% confidence interval is calculated for each hospital’s risk-adjusted rate to identify hospitals with risk-adjusted rates that are significantly higher and significantly lower than the statewide rate, referred to as high outliers and low outliers, respectively.
The relative ability of EHRs and New York’s clinical registry to obtain hospital risk-adjusted mortality and mortality outliers was then assessed by comparing outcomes obtained from registry models (with all registry risk factors as candidates) and models based only on registry risk factors elements that can be captured by EHRs. Mortality models were developed for each of 3 different years (2010, 2011, and 2012), and readmission models were developed for the 2 years with New York registry reports (2011, 2012).
For each of the models, comparative information was collected on global model fit (using Bayes Information Criterion and P-value for comparison between the full model and alternative model), discrimination (c-statistic and Brier score with their respective 95% confidence intervals), calibration (Hosmer-Lemeshow statistic), and reclassification performance [Net Reclassification Index (NRI) and net reclassification improvement] using the original DOH model as the gold standard.35 In addition to the measures of fit, the final models were evaluated at the hospital level to compare individual hospitals’ risk-adjusted rates and outlier status to that obtained from the NYPRRM.
Appendix 5 (Supplemental Digital Content 5, http://links.lww.com/MLR/B749) presents a feasibility matrix obtained after discussions with the vendors with a rating of how likely each candidate risk factor in the NY registry is to meet each of the 3 criteria required for capture of that data element in an EHR format. Appendix 5 (Supplemental Digital Content 5, http://links.lww.com/MLR/B749) contains ratings of the ease of directly capturing and extracting data elements for use in EHRs (easy, somewhat difficult, very difficult) based on the rating of elements in the feasibility matrix in Appendix 6 (Supplemental Digital Content 6, http://links.lww.com/MLR/B750). Of the 28 available data elements, 6 were then determined to be “easy” to obtain from EHRs, 10 were “somewhat difficult” to obtain and 12 were determined to be “very difficult” to obtain from EHRs. The “easy” data elements were used as candidate risk factors in the first EHR model (EHR1), and the “easy” plus the “somewhat difficult” data elements were candidate risk factors in the second (EHR2) model to be compared with the NYPRRM.
The focus group participants echoed many of the concerns and limitations voiced by the vendors related to the feasibility of electronic extraction of the data elements. They elaborated on the numerous challenges associated with the extraction of clinical data from the EHR, including the unstructured nature of many free text notes, comment boxes, and strings of text that could not be codified with a standard terminology. They also noted errors that could be introduced by automatic data capture such as mapping the wrong data fields. Overall, focus group participants agreed with the feasibility scores provided by the vendors and shared similar concerns around the viability of EHR extraction for many of the data elements.
Comparisons of the NYPRRM With the EHR Models
The model risk factors for the NYPRRM and the corresponding EHR models are presented in Tables 1 and 2. The EHR “Easy to Obtain” mortality model (EHR1) was developed using only age, body surface area, and renal dysfunction for 2010 and 2011, and these 4 risk factors plus sex in 2012. The EHR “Easy/Somewhat Difficult to Obtain” mortality model (EHR2) uses age, body mass index, renal dysfunction, and left ventricular ejection fraction in all 3 years, and 2 more risk factors (sex and previous valve operations) for the year 2012. The EHR1 and EHR2 models for readmission include age, sex, body surface area, and renal dysfunction for both years. The 2011 EHR2 model also includes ejection fraction, previous valve surgery and diabetes, and the 2012 EHR2 model also includes ejection fraction.
Tables 3 and 4 provide comparison statistics for the models. Both mortality models identified the same set of high and low outliers for all 3 years, but of the total of 23 outliers identified in all the readmission models in 2011 and 2012, only 12 were in common between EHR and corresponding NYPRRM (Table 5). For all mortality and readmission model comparisons, the c-statistics were higher for the NYPRRM, and all but 2 comparisons were significant, indicating better discrimination. The Hosmer-Lemeshow statistic demonstrated adequate fit for all the NYPRRM and most of the EHR models, and the Pearson correlation coefficients of the predicted probabilities for the NYPRRM and EHR1 models were all very high. The NRI values demonstrated that the NYPRRM had significantly better discrimination than the EHR models in all but one of the 8 comparisons. The Bayesian Information Criterion (BIC) was always lower for the NYPRRM, indicating that they are the preferred models. The Brier scores were significantly lower for the NYPRRM in all but one comparison, meaning that they have a better fit between observed and expected values.
Our study revealed that while EHR data could likely be used to capture some of the clinical data necessary for risk adjustment, for many risk factors the required data are often not available in structured formats with specific and precise definitions within the EHR. Therefore, a chart abstraction of many risk factors is currently necessary.
When aggregating the data at the hospital level we found that both the EHR models successfully identified mortality outlier hospitals that were identified by the NYPRRM, but that of the 23 outliers identified by either readmission model in each of 2 years, only 12 outliers were in common. Also, the c-statistic and the NRI were always better and usually significantly better in the NYPRRM than in their comparison EHR models, indicating better discrimination at the patient level. Calibration (Hosmer-Lemeshow statistic) was better in the NYPRRM, but it was usually acceptable in all models. Moreover, other measures (Bayesian Information Criterion and Brier scores) indicated superiority of the NYPRRM.
It is very important to note that this study assessed the feasibility of using EHRs in lieu of a clinical registry to evaluate provider performance. We used a single clinical registry, and the results may have been different if other registries had been used. Nevertheless, it would appear that because of the detailed clinical definitions in most or all registries, the scarcity of detail in EHRs would compromise their ability to capture numerous risk factors. However, many other organizations (eg, Centers for Medicare and Medicaid Services, US News and World Reports, Healthgrades) use administrative data that rely on ICD-10-CM diagnosis codes to risk-adjusted outcomes, and these codes do not contain detailed clinical definitions. Consequently, there is a strong likelihood that EHRs would suffice as substitutes for risk-adjusted outcomes based on administrative data.36 However, it is debatable whether report cards based on administrative data provide assessments that are as accurate as their clinical registry counterparts or clinically enhances administrative databases, as evidenced by some earlier studies conducted in New York.37,38 Nevertheless, EHR data have the potential to add other important information to clinical and administrative risk models, including laboratory data and pharmacy claims data.39,40 Other important advantages of EHR data are that they would save considerable resources required for data collection and would enable reports to be issued in a much more timely manner.
The ability to distinguish performance at the surgeon level was not evaluated as part of this study. Given the use of 1-year time windows for estimating the risk models, we were not able to test how well the risk models we developed performed at the physician level (which use a 3-year time window).
The comparisons of the models make the implicit assumption that the risk factors obtained from the registry and the corresponding risk factors that could be obtained from an EHR would be identical, with little measurement error. This is an untested assumption which would need to be verified. For example, the clinical registry data are subject to audits and it is not clear whether EHR data could be audited in a similar manner.
This study demonstrated that an EHR was unable to capture most of the clinically defined risk factors in the NYS CABG surgery risk model. Although the simplified model has the advantage of being more cost-effective to develop, it would have to be used in a data collection that involves chart abstraction for many risk factors. Even though the EHR is currently limited in its ability to capture clinical data, many outcomes report cards rely on administrative data, and it is likely that EHRs could capture those ICD-based risk factors. This should be the topic of a future study. Another future study of interest would be to use administrative data for clinical risk factors that cannot be captured by EHRs and to add this to EHR data for comparison with clinical registries in assessing risk-adjusted outcomes.
1. Baker DW, Chassin MR. Holding providers accountable for health outcomes. Ann Intern Med. 2017;167:418–424.
4. Jha AK, Ferris TG, Donelan K, et al. How common are electronic health records in the United States? A summary of the evidence. Health Aff. 2006;25:w496–w507.
5. Jha AK, DesRoches CM, Campbell EG, et al. Use of electronic health records in US Hospitals. N Engl J Med. 2009;360:1628–1638.
6. Jones SS, Adams JL, Schneider EC, et al. Electronic health record adoption and quality improvement in US Hospitals. Am J Manag Care. 2010;16:SP64–SP71.
7. Linder JA, Kaleba EO, Kmetik KS. Using electronic health records to measure physician performance for acute conditions in primary care. Med Care. 2009;47:208–216.
8. Kharrazi H, Gonzalez CP, Lowe TR, et al. Forecasting the maturation of electronic medical record functions among US hospitals: retrospective analysis and predictive model. J Med Internet Res. 2018;20:e10458.
9. Chan KS, Fowles JB, Weiner JP. Electronic health records and the reliability and validity of quality measures: a review of the literature. Med Care Res Rev. 2010;67:503–527.
10. Pipersburgh J. The push to increase the use of EHR technology by hospitals and physicians in the United States through the HITECH Act and the Medicare incentive program. J Health Care Finance. 2011;38:54–79.
11. Robinson EF, Cooley C, Schleyer AM, et al. Using an electronic medical record tool to improve pneumococcal screening and vaccination rates for older patients admitted with community-acquired pneumonia. Jt Comm J Qual Patient Saf. 2011;37:418–424.
12. Radecki RP, Sittig DF. Application of electronic health records to the Joint Commission’s 2011 National Patient Safety Goals. JAMA. 2011;306:92–93.
13. Persell SD, Wright JM, Thompson JA, et al. Assessing the validity of national quality measures for coronary artery disease using an electronic health record. Arch Intern Med. 2006;166:2272–2277.
14. O’Toole MF, Kmetik KS, Bossley H, et al. Electronic health record systems: the vehicle for implementing performance measures. Am Heart Hosp J. 2005;3:88–93.
15. Hatef E, Lasser EC, Kharrazi HHK, et al. A population health measurement framework: evidence-based metrics for assessing community-level population health in the global budget context. Popul Health Manag. 2018;21:261–270.
16. Benin AL, Vitkauskas G, Thornquist E, et al. Validity of using an electronic medical record for assessing quality of care in an outpatient setting. Med Care. 2005;43:691–698.
17. Goulet JL, Erdos J, Kancir S, et al. Measuring performance directly using the Veterans Health Administration electronic medical record. Med Care. 2006;45:73–79.
18. Clancy CM, Eisenberg JM. Outcomes research: measuring the end results of health care. Science. 1998;282:245–246.
19. Krumholz HM, Normand SLT, Spertus JA, et al. Measuring performance for treating heart attacks and heart failure: the case for outcomes measurement. Health Aff. 2007;26:75–85.
20. Iezzoni LI. Risk Adjustment for Measuring Health Care Outcomes
Vol 3. Chicago, IL: Academy Health Press; 2003.
21. Grover FL, Hammermeister KE, Burchfiel C. Initial report of the veterans administration preoperative risk assessment study for cardiac surgery. Ann Thorac Surg. 1990;50:12–26.
22. Hannan EL, Kilburn H Jr, O’Donnell JF, et al. Adult open heart surgery in New York State. An analysis of risk factors and hospital mortality rates. JAMA. 1990;264:2768–2774.
23. Hannan EL, Cozzens K, King SB III, et al. The New York State cardiac registries: history, contributions, limitations and lessons for future efforts to assess and publicly report health care outcomes. J Am Coll Cardiol. 2012;59:2309–2316.
24. Hazelhurst B, Sittig DF, Stevens VJ, et al. Natural language processing in the electronic medical record. Am J Prev Med. 2005;29:434–439.
25. Hannan EL, Samadashvili Z, Wechsler A, et al. The relationship between perioperative temperature and adverse outcomes following off-pump coronary artery bypass graft surgery. J Thorac Cardiovasc Surg. 2010;139:1568–1575.
26. Hannan EL, Samadashvili Z, Lahey SJ, et al. Predictors of post-operative hematocrit and association of hematocrit with adverse outcomes for coronary artery bypass graft surgery patients with cardiopulmonary bypass. J Cardiac Surg. 2010;26:636–646.
27. Hannan EL, Sarrazin MSV, Doran DR, et al. Provider profiling and quality improvement efforts in CABG surgery: the effects on short-term mortality among medicare beneficiaries. Med Care. 2003;41:1164–1172.
28. Hannan EL, Kumar D, Racz M, et al. New York State’s cardiac surgery reporting system: four years later. Ann Thorac Surg. 1994;58:1852–1857.
29. Hannan EL, Kilburn H, Racz M, et al. Improving the outcomes of coronary artery bypass surgery in New York State. JAMA. 1994;271:761–766.
30. Chassin MR, Hannan EL, DeBuono BA. Benefits and hazards of reporting medical outcomes publicly. N Engl J Med. 1996;334:394–398.
36. Kharrazi H, Chi W, Chang HY, et al. Comparing population-based risk-stratification model performance using demographic, diagnosis and medication data extracted from outpatient electronic health records versus administrative claims. Med Care. 2017;55:789–796.
37. Hannan EL, Kilburn H Jr, Lindsey ML, et al. Clinical versus administrative data bases for CABG surgery: does it matter? Med Care. 1992;30:892–907.
38. Hannan EL, Samadashvili Z, Cozzens K, et al. Appending limited clinical data to an administrative database for acute myocardial infarction patients: the impact on the assessment of hospital quality. Med Care. 2016;54:538–545.
39. Lemke KW, Gudzune KA, Kharrazi H, et al. Assessing markers from ambulatory laboratory tests for predicting high-risk patients. Am J Manag Care. 2018;24:e190–e195.
40. Chang HY, Richards TM, Shermock KM, et al. Evaluating the impact of prescription fill rates on risk stratification model performance. Med Care. 2017;55:1052–1060.
outcome measurement; electronic health record (EHR); risk adjustment; electronic clinical quality measures (eCQM)
Supplemental Digital Content
Copyright © 2019 Wolters Kluwer Health, Inc. All rights reserved.