Haan, Constance K. MD, MS; Edwards, Fred H. MD; Poole, Betty; Godley, Melissa; Genuardi, Frank J. MD, MPH; Zenni, Elisa A. MD
The Accreditation Council for Graduate Medical Education (ACGME) has been working diligently to promulgate the concept that outcomes of medical education can and should be measurable, and that quantifiable improvements can then be applied to the processes of medical education. Furthermore, the ACGME is endeavoring to demonstrate that clinical patient outcomes are associated with and linked to educational outcomes. At the University of Florida College of Medicine–Jacksonville, we recognized that integrating competencies and assessment with learning and clinical care would require tailoring of appropriately selected measures to the interests, priorities, and needs of individual programs in order to develop a method of evaluation feedback that would be meaningful for both faculty and residents or fellows. With this in mind, we developed a tiered system of identifying and applying appropriate measures of success across our graduate medical education (GME) programs.
ACGME core competencies have been incorporated into medical education curricula, goals, and objectives and evaluations since 2001.1 The core competencies are a key component of the Outcome Project, which is designed to move the focus of GME program accreditation from components of structure and process to actual accomplishments through assessment of program outcomes. Phase 3 of the Outcome Project entails full integration of the competencies and their assessment with learning and clinical care. Now, as Phase 3 has been brought forward in July 2006, medical educators are likely wondering what, exactly, they are expected to do to meet the ACGME requirements and measure their success in doing so. In fact, many experienced educators have lamented that they have no idea how or where to start. So, how are educators to select the right clinical measures to reflect how faculty teach and how trainees learn? And what does excellence look like?
Each specialty and training program must identify what is appropriate and important to measure, as a reflection of quality of medical education and quality of care for that particular specialty or program. Assessment of quality of health care delivery is known by several names—quality measures, quality indicators, clinical outcomes, and performance measures, to name a few. Quality indicators may, of course, be either process measures (e.g., administration of aspirin and beta-blocker on admission for acute myocardial infarction, administration of ventilator-associated pneumonia prophylaxis) or outcome measures (e.g., death and complication rates, average length of stay). There are instances where what matters, in fact, cannot be measured directly, so proxy measures are identified for use instead. For example, improvement in patient education and medication compliance may not be easily measured per se, but unplanned readmissions within 48 hours of discharge can be measured as a proxy or representative measure.
However, program directors do not necessarily have to start from scratch in determining standards of measurable educational outcomes. There has been a tremendous amount of work already done at the local, specialty society, and national levels in the arena of quality measures and performance improvement. These endeavors form the foundation for the establishment of national indicators, standards, and benchmarks of clinical outcomes. Until such standards are firmly established across the spectrum of health care, educators in specialties with identified gaps can consider the relevant data that are already being collected and studied within the system of care delivery. We present herein our methodology for selecting appropriate clinical indicators for measuring quality of medical education, and a description of our process for incorporating measurable patient-care outcomes to drive and guide program improvement.
The University of Florida College of Medicine–Jacksonville Office of Educational Affairs and Graduate Medical Education Committee (GMEC) developed a tiered strategy for selecting clinical indicators. The goal of this strategy was to develop external, evidence-based measures as evidence of full integration of the ACGME competencies and their assessment with learning and clinical care.
The tiered, logical strategy for selecting clinical indicators uses the following sequence of prioritization of measures for GME programs:
1. Align first and foremost with national benchmarked consensus standards when available.
2. Align with those quality indicators and standards recommended or selected by the national specialty society quality leaders.
3. Align with indicators and standards used by local, institutional, or regional quality initiatives.
4. Absent these standards with which to align, identify top-priority diagnostic and/or therapeutic categories for the specialty and then select appropriate process, outcome, or proxy measures to represent these specialty priority areas. Selection of measures is based on areas of high frequency or volume as well as high impact and cost.
To begin, the ACGME Outcome Project was discussed in GMEC and in other venues of multiple or individual program directors. The emphasis was initially placed on the concept of linking quality education to quality health care delivery. With this in mind, the discussion turned to specific questions from the program directors about what external measures would be most appropriate and applicable to individual programs. In October 2006, program directors and associate program directors of all GME programs selected three to five clinical indicators and identified data sources for their selected indicators. Then, in November 2006, data collection proceeded with those indicators selected and data sources thus far identified. The midyear resident evaluations for academic year 2006–2007 and the education effectiveness evaluation carried out by each program in the spring of 2007 would, therefore, provide the first test of the data sources and the mechanism by which the data would be reported to the program directors, and of the application of outcomes in resident and programmatic evaluation.
Taking the first step beyond discussing the Outcome Project, program directors were urged to select three to five initial external measures for their program and trainee evaluation. Beginning with a preliminary set of measures allowed faculty to test out the measures’ applicability in teaching and learning environments. This initial challenge inspired the Office of Educational Affairs to create the tiers of existing measures and data to provide guidelines for selection of measures. Program directors determined which tier would guide their selection of educational measures on the basis of how advanced their specialty was in establishing evidence-based quality indicators. Determining the relevant tier is less difficult for some specialties than for others. For example, cardiovascular disease programs have well-established measures for management of acute myocardial infarction and congestive heart failure from which to choose, whereas orthopedic surgery programs are challenged to select either measures that are more broadly applicable to health care in general (infection rates or patient satisfaction) or measures that represent local endeavors in quality improvement.
All 23 programs on our campus were able to select appropriate measures on the basis of the tiered model. Examples of identified quality indicators from each tier are as follows:
1. National standards: National Quality Forum consensus standards for asthma care, diabetes care; Joint Commission core measures for care of acute myocardial infarction, congestive heart failure, community-acquired pneumonia
2. National specialty society standards: Surgical Care Improvement Project measures, American Gastroenterology Association Center for Quality in Practice recommendations
3. Local, institutional, or regional initiatives: Surgical Critical Care Medicine protocols and complication prophylaxis; pain assessment in emergency medicine
4. Program priority areas: vascular interventional radiology complications and report sided accuracy
Program directors were able to successfully apply the tiered process to clinical indicator selection, as displayed in Figures 1–4.
Next, the program directors were instructed to identify sources from which they could collect data to track their clinical performance around the selected measures. The program directors required significant assistance with data source identification, as many, if not most, presumed that they would have to initiate or create their own manual data-collection processes and that each program would have to marshal personnel and time resources to accomplish such a task. Program directors and faculty were often overwhelmed when considering quality measures because they did not know how or by whom the large volumes of available data were collected in hospitals and clinics. Further, they often had trouble seeing how data collection can be built into their daily work or that, in many cases, it already is. An important part of beginning the data collection process was orienting the program directors to the extent of data that already exist in the health care delivery system and connecting them to the appropriate data sources—especially appropriately constructed electronic data queries. In November 2006, faculty proceeded with clinical quality data collection, on the basis of the indicators and data sources the program directors had previously identified.
Because neither medical education nor health care delivery is done in isolation, clinical outcomes in resident evaluation should be used to assess a resident’s performance as reflective of his or her participation in the health care delivery team. The data collected for the selected clinical quality indicators provide additional inputs for resident assessment at both midyear and end-of-year evaluations. Here, the program directors have struggled with the challenge of using data reporting and analysis that does not identify the individual resident provider. In a separate initiative, our hospitals have moved from reporting on quality measures at department or clinical service levels to individual faculty and staff levels. However, without the ability to query an electronic medical record, performance data reported at the resident-specific level are currently not available. Another issue that makes it difficult to track resident performance is the lack of clarity in assigning responsibility for work and decisions within a team of residents. For example, if an intern writes an order for aspirin for a patient with acute myocardial infarction, who gets the credit and feedback—the intern who writes the order, or the senior resident who tells the intern to write the order? Here, we have begun to provide education and guidance to the program directors on how to use aggregate data for the service at the team level to inform and assist the residents in understanding their individual performance and improvement in performance over time.
Programmatic improvements, for instance, in the form of curriculum modifications, are driven by clinical outcomes that are below benchmark across the residents. In this case, data for the selected clinical quality indicators provide additional inputs to the annual educational effectiveness evaluation for a particular program, as well as to the program assessments in the ACGME-required midaccreditation cycle internal review process and the continuous quality improvement monitoring that follows the internal review. Our institution’s process for tracking progress on issues identified at internal reviews and/or site visits has been expanded to include discussion of the program’s selected clinical measures. It gives the program director opportunity to have feedback on the measures selected, the data collected, and the application of both in resident and program evaluation, and it allows the program director the opportunity to ask questions and get advice and assistance for integrating the clinical indicators in the educational process.
The Tiered Strategy for Indicator Selection
Selecting indicators from the first tier was most preferable, but program directors could move through the four tiers, considering the availability of measures from each tier, to ensure that they selected the most widely agreed-on and appropriate indicators of success in their particular program or specialty. We describe each tier in detail below.
National consensus standards
Preferably, a set of clinical indicators for educational programs would always be aligned with the set of national consensus standards already selected for a clinical specialty, major diagnostic group, or area of care. To start, a subset of indicators may be selected for a particular program on the basis of national standards while program leaders identify data sources and data-collection processes and test and refine reporting methods to find those that work best for their program and institution.
Working with indicators that are consistent with known consensus standards serves several purposes. It puts the program in concert with other programs on a national level, using the same definitions, criteria, and comparable benchmarking. It also places the institution and its faculty in a ready or more competitive position for the data and reporting for pay-for-performance necessities. Third, it exposes the trainees to the quality indicators, data feedback, and performance framework with which they will be working for much, if not all, of the rest of their professional lives. Therefore, part of our duty in training them is to give them the data analysis and quality improvement tools they will need to apply to their practice-based learning and system-based practice.
The National Quality Forum (NQF) is a quasi-governmental organization that rigorously evaluates performance measures and that is regarded as the gold standard for performance measure acceptance, representing national endorsement. The NQF has already published consensus standards for one specialty (cardiac surgery) and one major diagnosis (adult diabetes), with cancer care consensus standards under development. In addition, the NQF has endorsed quality consensus standards by location of care delivery—hospital care,2 ambulatory care,3 nursing home care, and home health care. Child health care measures are also under consideration, among others.4
The AQA Alliance (formerly the Ambulatory Care Quality Alliance) is another national leadership entity involved in establishing performance standards. This organization has the broadest array of stakeholders and strong support of the Center for Medicare and Medicaid Services (CMS) and the Joint Commission and evaluates each set of performance measures. If a set of performance measures is approved by the AQA Alliance, insurers have agreed to use the measure set in any quality initiative they develop, which ensures that physicians are not bombarded with different rating schemes and different criteria from different insurers. The AQA Alliance has also formed a liaison with the Hospital Quality Alliance, which focuses entirely on quality measurement at the hospital level. These two alliances form a group that meets regularly with the secretary of health and human services.
CMS is also now contributing to the identification of quality measures by way of its initial foray into identification of quality indicators that will be held up as national standards in the Physician Quality Reporting Initiative—the voluntary reporting initiative described as the precursor to “pay for performance.”5
National specialty society-selected measures
There is a good deal of work underway at the national societal level to identify or develop standards or standardized indicators for quality of care, building on the evidence of the literature. Ideally, it is with input from and representation of the specialty societies that the NQF is able to endorse sound consensus standards that make good sense clinically and facilitate the needs and demands of other stakeholders such as patients, payers, and accreditation bodies. So, when the NQF has not yet had the opportunity to see to the indicators for a given specialty or diagnostic area or area of care pertaining to a given GME program, then that program should look next to the national quality leadership within its own society.
The American Medical Association Physician Consortium for Performance Improvement is charged with developing performance measures for the medical specialties. In contrast to the AQA Alliance, it consists entirely of physicians and American Medical Association staff. The consortium works at the level of the science of performance measure development and guides a specialty society through the process of identifying fair and meaningful measures for use in measuring quality.
The Surgical Quality Alliance (SQA) is the quality arm of the American College of Surgeons (ACS). Its purpose is to shepherd surgical specialty societies through the process of developing methods of quality measurement and applying those methods to improve quality. At present, all but two surgical specialties are represented on the SQA, and this organization also consists entirely of physicians and ACS staff.
Examples of specialty societal leadership in quality measurement endeavors include, but are not limited to, the ACS and the American Gastroenterology Association.6,7 In addition, there are other bodies of leadership in the clinical specialty arena that have developed and tested quality indicators. A premier example of such efforts is the Veterans Administration (VA) work on its National Surgery Quality Improvement Program (NSQIP). The ACS is now collaborating with VA surgical leaders to build on the work done through NSQIP to apply these quality indicators and standards beyond the VA.8
Local, institutional, or regional initiatives
Lacking established national consensus standards and well-developed specialty society work in quality indicators and measurement standards, program and institution leaders would do well to explore what quality- and performance-improvement endeavors are in place at the local, institutional, or regional levels.
The University of Florida College of Medicine and Shands Health Care Corporation facilities established in 2004 a formal agreement known as the Academic Quality Support Agreement. This alliance tracked and reported 69 indicators reflecting a broad spectrum of quality measures. These indicators reflect quality of care across inpatient and outpatient/ambulatory care, and across specialties, with a number of interdisciplinary or shared indicators, as well as a number of indicators that apply to all physicians. The endeavor provided a platform to drive protocol development, standardization of care processes, and system efficiencies, and it also provided feedback on mortality and major morbidities for selected diagnoses and major procedures.
It is useful to investigate whether one’s institution already participates in a local or regional reporting effort for benchmarking performance against like institutions or those in proximity. This is an appropriate place to start when higher-issued standards do not exist. If program leadership were not aware of the institutional quality measures and audits underway, then it would be appropriate to explore this with the institution’s quality management and compliance staff.
Or select what matters …
Should a program director be unable to identify clinical quality indicators through any of the aforementioned avenues, then it falls to the program director, with the assistance of fellow faculty and the designated institutional official, to select quality indicators for the program and specialty that make clinical “sense.”
The first step in selecting quality measures to represent an educational program is identifying the major diagnostic areas of the specialty—the top three to five high-frequency, high-risk, or high-volume features of the specialty. These features represent some of the major “must haves” of the training program, as applies to expectations for resident or fellow competence and accomplishment and knowledge during training. After these top priorities have been identified, the faculty and program director can identify appropriate process and outcome measures, or proxy measures for those desired.
Identify Data Sources and Data Collection Processes
In identifying appropriate data sources, program directors should assess the national or regional resources that are already available and, perhaps, even already in use. If a specialty-specific validated national or regional clinical database or registry exists, participating in this forum is paramount. Doing so provides a vehicle for validated data collection for appropriate risk-adjusted clinical outcomes to be derived, and a large enough dataset for solid, critical study and research. Another value of a large database or registry is the substantially greater potential for complete and validated data. Access to these data can support studies that yield sufficient statistical power to make strong conclusions on impact of care processes on outcomes of interest.
Many institutions and/or departments have internal quality audits and performance improvement endeavors that are already tracking and reporting selected quality measures. Most institutions and their quality management departments have extensive data collection and auditing processes already in place. It is important to realize that a program may already be collecting data for clinical quality assessment and review that can readily be applied to the educational mission as well.
Local or institutional data collection can be limited by the relatively small numbers in the dataset. Because of this, it is difficult to provide data feedback with any statistically significant conclusions on variance. The labor-intensive nature of data collection, where data are not available via an electronic database or health record, often translates into data only available by an audit of a sample of patients’ records. This methodology may be simply the best currently available for the time and circumstances, but it must be recognized that such a methodology can provide only incomplete information on the performance by all caregivers involved in the measure and that statistical performance is easily affected by the sample selection.
Data for quality measures, in cases of inadequate clinical volume for demonstrating satisfactory process or outcomes, may be provided by simulation as an alternative to or in combination with clinical data. Simulation is beginning to evolve as a training tool and is undergoing increasing study and validation for its effectiveness in training and in testing skills, judgment, and teamwork aspects of quality performance.
Challenges of Implementation
Whose performance is really being measured?
Program directors commonly express concern about not being able to directly attribute a selected process or outcome quality measure to a particular resident or fellow. However, virtually all of health care delivery is a team activity and, to varying degrees, relies on multiple stakeholders. This concept is reinforced by the study of one’s own microsystem of health care delivery9 and by the study and application of systems based practice. It is our experience that, whether discussing clinical outcomes and performance at a medical staff or faculty level or at a GME level, clinicians regularly discount or express dissatisfaction with data that are not reported at the individual physician level. Using aggregate data to study and improve performance of the team as a whole is still a paradigm to be embraced and taught.
Medical education does not occur in isolation, and most process and outcomes measures represent the group milieu in which teaching and learning occur. GME, like clinical care delivery, involves teams and groups of various sizes and compositions to affect the delivery of each specialty’s care and to facilitate interaction and collaboration with other caregivers as consultants and multidisciplinary care teams. So, it follows that quality measures applied to the educational process would also reflect the individual’s roles as part of a team and microsystem—all of which are part of the clinical specialty learning process. Recognizing one’s role and responsibility in that team and microsystem also helps the physician attach value to participation and leadership in the team, and contribution to and influence on the microsystem to drive improvement.
How do we effectively apply general or service data?
Even though practicing clinicians may have become familiar with quality measures and performance data feedback in recent years in terms of their own practices, few have yet become used to tying those measures and data to the GME process. More than new measures and data, this will take a new way of thinking about the data we already have. It will require that we recognize and reinforce the connection between clinical care and the educational curriculum and evaluation process. This is especially true for broadly stated measures, such as patient satisfaction. Patient satisfaction reports by clinical service or hospital unit usually report patients’ responses to questions about physicians in general or as a group, but do not specify satisfaction about each physician separately. Similarly, some key clinical indicators, such as pain management selected by medical oncology, are multifactorial, influenced by the activities of numerous types of providers—physicians, nurses, pharmacists, and therapists, to name a few. Though not resident specific, these types of indicators are still very useful to the GME evaluation process. Such indicators introduce the residents to thinking about their individual responsibility for and contribution to systems-based practice and measurement thereof. At evaluation, the program director and resident or fellow have opportunity to discuss the development of the trainee’s role as physician leader in performance improvement of care delivery.
Data Feedback and Utilization—Measuring What Matters
Once quality indicators are selected, data sources are identified, and data collection is underway, program directors must address the application of data feedback. In other words, how will the data be reported and used as part of educational evaluation in GME? In our experience, collected data have a twofold application to educational effectiveness evaluation.
First, we incorporate data feedback into the resident’s or fellow’s regular evaluation, which takes place on a frequency of at least every six months. The data report on clinical outcomes provides feedback to the physician-in-training about the patient outcome and satisfaction evidence for their performance in the six general competencies. Thus, performance evaluation extends beyond the assessment of the trainee’s knowledge, work ethic, communication, and contribution to discussion and conferences. Providing clinical outcomes feedback to trainees begins to instill in them the sense of personal ownership of their role in those outcomes, and it also provides information on which practice-based learning and system performance improvement can and should be based. At each evaluation, besides assessing performance during a specific period of time, the program director and resident or fellow should be able to track improvement throughout training in the data trends over time.
The second utility of clinical outcomes applied to medical education is the context in which the strength of a program’s curriculum can be assessed. It is critical to identify gaps in care. Measures that are consistently not meeting target should signal areas of weakness in the curricular plan or the venue and means by which a key portion of the curriculum (as reflected by the corresponding clinical measure) is presented. Additional or different educational processes can then be applied—for instance, additional didactic lectures related to that topic of care, or simulation scenarios to enhance the educational experience and foster better integration of knowledge and judgment. Program-wide clinical indicator monitoring also identifies those individuals who are struggling in multiple or all measures, and it can direct individualized counseling, remediation, and development assessment. The service- or team-level clinical outcomes measured when a resident is on a particular rotation provide the basis for individual resident feedback, even when the specific contribution of a resident to a measure may not be quantifiable. Figure 5 displays both utilities in programmatic evaluation, illustrating identification of need for curricular changes as identified by one measure that is low across multiple trainees, versus individual trainee counseling and remediation when one trainee scores lower than others on multiple measures.
There is much work yet to do in refining the selection of the most optimal quality indicators and benchmarked targets. It is, therefore, important for physicians—clinician leaders and education leaders—to work to be sure that they, or their specialty society representatives, have a “seat at the table” when CMS and/or the NQF is determining their specialty’s consensus standards. It is imperative that physicians be leaders in the process of selecting the measures and definitions that make good clinical sense to practitioners and that measure what matters. It is far better to be a leader or participant in the process than to be a passive victim. Academic clinicians are now not only acting on behalf of themselves and their patients, but also of the future providers they are training! This is the ultimate opportunity for clinicians to impact quality of care and quality improvement through health care advocacy and influence on health policy.
The ongoing challenge for leaders and educators is to identify how a resident’s action and judgment can be realistically linked with a patient outcome. We propose that this effort is an important aspect of orienting trainees to using data for monitoring and improving care processes and outcomes throughout their careers. Furthermore, this is an important first step to preparing medical trainees to “own their data,” as familiarity and facility in working with data will impact their lifelong practice-based learning and systems-based practice and data-driven clinical decision making, maintenance of certification, and likely, eventually, their reimbursement in the form of pay for performance. This will foster the integration of quality of care and quality improvement with resident practice-based learning and faculty scholarship in clinical teaching. We must train not just for medical knowledge, but for medical practice.