Competency-based medical education (CBME) focuses on outcomes and emphasizes learner abilities.1 The Accreditation Council for Graduate Medical Education (ACGME) Milestones Project grew naturally from CBME to create “meaningful data on the performance that graduates must achieve before entering unsupervised practice.”2 Following implementation of the Next Accreditation System, residency programs must submit composite milestone data on residents to the ACGME every six months. However, the ACGME has provided little guidance on how these data should be collected or used in residency programs.
Crossley and Jolly3 described four principles of high-quality work-based assessments in medical education. First, rating scales should be aligned with clinician–assessor priorities. Second, global judgments of performance on a given activity should be sought, as these often provide better assessments than objective observation checklists. Third, assessment should focus on competencies central to the activity observed. Fourth, the chosen assessors should be the ones best placed to observe performance.
ten Cate and others introduced the concept of entrustability as a work-based assessment framework, defining the entrustable professional activity (EPA) as a crucial task every trainee must master prior to graduation.4,5 In practice in our internal medicine (IM) residency program, we found that using EPAs in direct work-based assessments met Crossley and Jolly’s first principle3 well but did not meet their other principles. Although our assessors easily adopted levels of entrustment as a rating scale, EPAs can be broad and therefore difficult to assess directly.6–9 Consider, for example, the IM end-of-training EPA “manage care of patients with acute common diseases across multiple care settings.”7 It would be difficult for any single assessor to observe residents managing multiple diseases across multiple settings. What rating should be given if an assessor observes a resident executing some of the competencies that compose this EPA, but not others? Should only attending physicians perform the assessments? Nurses, pharmacists, case managers, and peers could all likely assess resident performance, but is it ideal to ask them to judge management across settings?
Schuwirth and Van der Vleuten10 have suggested that “information of all assessment sources can be used to inform about all competency domains, and all competency domains are informed by various information sources.” To date, residency programs have found it challenging to gather and report competency-based assessment data in meaningful ways. To address these issues, we created a system of entrustment based on discrete work-based skills called observable practice activities (OPAs), which we have mapped to the ACGME and American Board of Internal Medicine (ABIM) reporting milestones for IM.11,12 Interprofessional assessors provide OPA entrustment ratings of resident performance that can be tracked longitudinally. Our aim was to operationalize the concepts inherent in competency-based assessment and entrustment, and to provide meaningful information to learners, programs, and reporting agencies. Here, we report the outcomes of the first three years of these efforts.
This is a descriptive study of the use of an OPA-based assessment system in the University of Cincinnati’s categorical IM residency program. Our program consists of approximately 92 categorical and preliminary residents who rotate in a large academic teaching hospital, affiliated ambulatory practices, and a Veterans Affairs medical center. The University of Cincinnati institutional review board granted an exemption for this work.
Assessment tool development
In 2011, we rewrote our residency program’s entire curriculum using OPAs as the basis of assessment.11,13 OPAs are discrete collections of knowledge, skills, and attitudes that can be observed and entrusted in daily practice. Content OPAs are specific to each rotation, discipline, and assessor type. For example, an intensivist may assess “perform intubation,” an endocrinologist may assess “titrate basal bolus insulin,” a nurse may assess “returns pages in a timely manner,” and a peer may assess “performs handoffs.” Process OPAs are conserved across rotations and experiences; these include “develop prioritized differential diagnoses” for attending physician assessors and “demonstrate respectful behavior to all members of the health care team” for nurse assessors. The number of OPAs is always in flux, as curricula for new rotations are created and existing ones are revised, but there were approximately 400 OPAs spread across more than 75 rotations as of May 2016. List 1 provides sample content and process OPAs for first-year residents in the general IM wards rotation. For the OPAs assessed in each rotation and differences by level of training, if any, see our curriculum.13
OPA performance is assessed using a five-point entrustment scale, where 1 = not entrusted (critical deficiencies), 2 = entrusted with direct supervision, 3 = entrusted with indirect supervision, 4 = entrusted with no supervision, and 5 = entrusted at aspirational level.
All OPAs are mapped directly to the ACGME/ABIM reporting milestones for IM.11,12 On average, each OPA is mapped to 3.31 milestones. Assessors generally use a desktop computer to complete evaluation forms. When an assessor rates performance on an OPA, the entrustment rating is automatically assigned to the mapped milestones, and this information is tracked longitudinally. We chose the five-point entrustment scale described above because we wanted to reduce ambiguity about what each level meant and because these five levels correspond to the five columns of the IM reporting milestones document.12 Every six months these data are linearly transformed from our five-point scale to the nine-point ACGME milestone rating scale for Clinical Competency Committee (CCC) review and reporting.
Attending physicians complete evalu ations that consist of 8 to 10 content OPAs and 8 to 10 process OPAs; they are also required to provide narrative explanations of their entrustment decisions as well as a description of strengths and weaknesses noted in resident performance. Attending physicians review their entrustment decisions with residents as formative feedback at the midpoint and end of each rotation. Ideally, assessment data are entered into the electronic database by the final day of a given attending physician’s portion of the rotation. The data must be entered within 14 days to avoid the loss of teaching practice plan dollars.14 Attending physicians receive formal instruction on use of the entrustment rating system.
Peers, chief residents, fellows, nurses, case managers, social workers, and office staff members also provide assessments of residents. Their evaluation forms include three to eight content and/or process OPAs and require a narrative assessment of strengths and weaknesses. These assessors are asked to watch a video on use of the system, but no formal education or incentive program exists for them.
Residents review their own longitudinal numeric and narrative performance data in real time and during semiannual meetings with the program director. Residents analyze the data prior to these meetings, and both the resident and program director use the information to create formative improvement plans. The CCC uses these data for remediation, promotion, and reporting decisions.
Data were downloaded directly from MedHub (Medhub LLC, Dexter, Michigan) for all categorical and preliminary IM residents who were assessed during the entrustment system’s first 36 months (July 2012 through June 2015). Mean entrustment ratings were calculated for each residency month and were stratified by assessor type, resident, and milestone. Linear regression was performed to determine the relationship between month of residency and the mean rating for residents in that month of residency (monthly mean rating). This relationship between residency month and monthly mean rating was summarized using r2 and P values where appropriate. All analyses were completed using SAS version 9.4 (SAS Institute, Cary, North Carolina).
Global assessment outcomes
From July 2012 through June 2015 (36 months), assessors made 364,728 milestone assessments (mapped from OPA assessments) of 189 categorical and preliminary IM residents. Annualized over the course of a 36-month residency, this equals a mean of 83 different assessment encounters (i.e., 83 OPA assessment forms completed) per resident, including 46 (56%) from attending physicians and 37 (44%) from peers or allied health professionals. These encounters produced an annualized three-year mean of 3,987 mapped milestone assessments per resident, with 3,142 (79%) from attending physicians and 845 (21%) from peers or allied health professionals. (Attending physician assessment forms contain more OPAs than peer/allied health professional forms, as described above.) In addition, residents received an annualized three-year mean of 4,325 words of narrative assessment from all sources. Once the penalty for late faculty member evaluations was enacted, more than 80% were completed within 14 days of the end of the rotation.
Milestone assessment trends
Figure 1 shows the mean entrustment ratings for all milestones for each month of residency. Overall (middle curve), entrustment ratings rose from an average of 2.46 for residents in the first month of residency to 3.92 for residents in the 36th month of residency (r2 = 0.9252, P < .001). The r2 describes the relationship between the mean monthly rating and time without accounting for the variability due to multiple evaluators and evaluatees, differing rotations, and different levels of training. Attending physician entrustment ratings (bottom curve) rose from an average of 2.27 for residents in the first month of residency to 3.81 for residents in the 36th month of residency (r2 = 0.9524, P < .001). Peer/allied health professional entrustment ratings (top curve) rose from an average of 3.16 for residents in the first month of residency to 4.49 for residents in the 36th month of residency (r2 = 0.8411, P < .001).
Figure 2 displays the milestone entrustment data by ACGME core competency. In the first month of residency, medical knowl edge (MK) and patient care (PC) milestones had the lowest mean entrust ment ratings (MK = 2.23 and PC = 2.26); professionalism (PROF) and interper sonal and communication skills (ICS) had the highest (PROF = 2.76 and ICS = 2.63); and systems-based practice (SBP) and practice-based learning and improvement (PBLI) were in the middle (SBP = 2.59 and PBLI = 2.38). By the 36th month of residency, the mean entrustment ratings for PBLI (3.80), MK (3.81), and PC (3.84) were the lowest; the ratings for PROF (4.08), SBP (4.04), and ICS (4.01) were the highest. For discrete IM milestones, PROF-2 (“accepts responsibility and follows through on tasks”) had the highest mean entrustment rating (4.38), and PC-4 (“skill in performing procedures”) had the lowest (3.22) by the 36th month of residency (data not shown).
As Figures 1 and 2 illustrate, there were spikes in the mean entrustment ratings in the 13th and 25th months of residency; these correspond to the Julys of the second and third years of residency, respectfully. In Figure 1, these spikes appear in the overall curve and the peer/allied health assessors curve, but not the faculty assessors curve. In Figure 2, they appear in the curves for the ICS, PROF, and SBP competencies.
Figure 3 shows aggregate entrustment data available during the study period separated by individual residents. Unlike the smooth aggregate line in Figure 1, individual residents’ lines varied greatly over time. There were fewer OPA-based assessments during months 17 to 29, secondary to a preexisting non-OPA-based assessment system used during an ambulatory long block.15
Table 1 shows the breakdown of the 364,728 mapped milestone assessments by milestone. MK-1 (“clinical knowledge”) was assessed the most (14.73%), followed by PC-2 (“develops and achieves comprehensive management plan for each patient”; 9.36%), PC-3 (“manages patients with progressive responsibility and independence”; 8.45%), and PROF-1 (“has professional and respectful interactions with patients, caregivers and members of the interprofessional team”; 8.20%). The milestones assessed the least were PC-4 (“skill in performing procedures”; 0.37%) and SBP-2 (“recognizes system error and advocates for system improvement”; 0.31%).
The most commonly assessed OPAs were process based, with “demonstrate shared decision-making with the patient,” “demonstrate empathy, compassion, and a commitment to relieve pain and suffering,” and “recognize the scope of his/her abilities and ask for supervision and assistance appropriately” collectively representing more than 16% of all assessments (data not shown). Among the OPAs rated at least 20 times for first-month residents during the study period, the OPA with the highest entrustment rating was “assist colleagues in the provision of duties” (mean = 3.23); many tied for lowest at a mean of 2.00 (e.g., “write initial admission orders for pancreatitis,” “manage and write ventilator orders”). Among OPAs rated at least 20 times for 36th-month residents during the study period, the OPA with the highest entrustment rating was “takes leadership role of teaching health care team” (mean = 4.67), and the OPA with the lowest rating was “manage arrhythmias” (mean = 3.00).
Entrustment of milestones appeared to rise progressively over time for the study cohort of residents. Although others have reported similar findings,16 our data show the association between entrustment and time on a large scale for a single residency. A goal of the ACGME Milestones Project was to create a “logical trajectory of professional development.”2 The trajectories seen in our data, along with the high r2 values, suggest that after enough data are collected, reliable predictions might be made about future performance (within the limits of the 36 months of residency). If true, these entrustment trajectories could be helpful in remediation, promotion, and reporting decisions.
In this system’s first 36 months, attending physicians’ OPA entrustment ratings were lower than those of peers/allied health professionals, but the entrustment curves for both assessor types show significant progression and high r2 values over time. The differences in entrustment levels may be a function of the raters themselves or of the questions posed to the raters. For example, Figures 1 and 2 show spikes in some competencies in July when new interns are assessing senior residents for the first time. Also, entrustment ratings for PROF and ICS consistently tracked higher than those for PC and MK. This may be because attending physicians rate more content OPAs than peer/allied health assessors do, and content OPAs are more heavily mapped to PC and MK milestones than to milestones in other competencies. It is also possible that PROF and ICS skills may be more highly developed than PC and MK skills immediately after medical school, thereby accounting for our findings. Peer and allied health professional assessors may have been more trusting than attending physicians or were rating OPAs that were easier to entrust. In addition, the differing ratings could be a reflection of the variable training in the use of our system that peer/allied health assessors received compared with attending physicians. Nevertheless, peer/allied health professional and faculty member ratings rose significantly and consistently over time, suggesting increasing entrustment.
As Figure 3 shows, individual resident entrustment curves demonstrated significant variability over time, although most progressed from lower to higher monthly mean ratings over the course of residency. One of the tenets of CBME is that success is not defined simply by dwell time but, rather, by competency attainment. Our data show that residents do not appear to progress uniformly over time, suggesting that our assessors are not simply providing set entrustment levels based on class position or time of year.
Most residents did not reach a rating of 4 (entrusted without supervision) on most OPAs, IM milestones, or competencies by the end of their residency. This again may be a function of types of assessors or OPAs, or it could be a reflection of the residents in our program. The ACGME has stated that residents do not need to reach independence on the milestones to graduate.12 It is not yet clear what combination of entrustment ratings indicates that competence has been reached, and our CCC must weigh the evidence and make these decisions.
As the data in Table 1 demonstrate, we emphasize assessment of certain milestones and competencies over others. These decisions were not deliberate but, rather, occurred organically as we created and mapped our curriculum to the IM reporting milestones.12 Using this information, we now have the opportunity to cover curricular gaps. For example, milestone SBP-2 (“recognizes system error and advocates for system improvement”) received just 0.31% of all assessments, and we could design experiences and OPAs that augment assessment opportunities for this milestone. We could also use overall OPA entrustment data to guide curriculum development. For example, if “manage arrhythmias” is the lowest-rated content OPA by the end of residency, we could design and test a curriculum that might lead to better entrustment of this skill.
Our study has several limitations. First, this is a single-institution study with only three years of data, and it may not be generalizable to other institutions or residency programs. Smaller programs or programs in specialties in which residents have significant exposure to consistent assessors may not need such a system or could get different results.
Second, although we see an association between entrustment and time, we have not determined the relationship between our intended construct and the thought processes of the assessors.17 It is possible that assessors simply assigned entrustment ratings based on the time of year and that these ratings do not reflect true entrustment of skills. The spikes shown in Figures 1 and Figure 2 in the 13th and 25th months of residency (July of second and third years, respectfully) may be consistent with this phenomenon. On deeper analysis, however, these spikes appear to be related more to peer/allied health professional assessments than attending physician assessments (Figure 1), with the majority of each spike coming from first-year resident assessments of senior residents. Over the next part of the year, first-year resident assessments of senior residents tend to fall from this peak. In addition, ratings for MK and PC milestones, which are primarily assessed by attending physicians, did not show the spikes seen in the PROF, ICS, and SBP milestones (Figure 2). Another concern is that our system is based on direct observation of OPAs, and it is possible that assessors rated OPAs that they did not observe. There also have been examples of attending physicians failing to discriminate between skills and failing to justify their entrustment decisions in their narrative comments. In 2015, we instituted a program to provide feedback to physicians on the quality of the assessments made; however, the effect of such feedback needs to be studied.
Third, we have not yet accounted for sources of assessment bias18–23 likely present in the data, nor have we performed a generalizability theory analysis. Van der Vleuten and colleagues24 have suggested that one strategy to reduce the effect of assessment bias is to collect large amounts of low-stakes data from multiple sources. Although we collected a significant amount of information over the 36-month study period, it is not clear whether that amount of data is enough to minimize any bias present. In addition, a generalizability theory analysis could determine how many ratings of what type are needed to obtain a reliable estimate of performance at each point in time. Such an analysis is outside the scope of this initial study, but must be the next step in this work.
Fourth, the conversion of directly observed skills entrustment to milestone entrustment occurs as a result of our mapping decisions. Some argue that there is no need for the complexity that milestone mapping introduces. However, we have found this process to be essential to measuring performance on disparate OPAs over time. For example, it would be difficult to directly compare performance on the OPAs “manage a ventilator” and “titrate insulin” because the skills are different. However, we can measure and longitudinally track performance on the common milestones to which these OPAs are mapped. Also, although we had a strict and logical rubric for our mapping decisions,11 it is possible that different decisions could lead to different entrustment curves.
Fifth, although we create entrustment curves for each individual resident and use them in remediation, promotion, and reporting decisions, there is no gold standard for what ratings indicate that competence has been attained. Research is needed to connect this type of assessment data directly to knowledge-based testing results, patient care outcomes, and other external validity measures.
Finally, this system represents only one type of assessment data. If our CCC is not careful, it could simply pass the numbers to the ACGME with a false sense of completeness. Van der Vleuten and colleagues suggest that a robust program of assessment should contain data from multiple sources and types—the more data, the better.24–26 Complex skills like caring for patients are often best described using narrative information, and in our program, the CCC uses both the numerical and narrative data to make reporting, promotion, and graduation decisions.
This initial evaluation of our system demonstrates the measurement of entrustment over time on a large scale for an IM residency program. Entrustment of milestones rises over the 36 months of residency, and it appears that it may be possible to predict a single assessment point along the line with fairly good certainty based on the others. Further study is needed to understand the validity and generalizability of these data to provide meaningful feedback to residents, residency programs, and reporting agencies. Mapping OPAs to milestones allows us to determine which competencies we emphasize during training and gives us the opportunity to change our curriculum to cover gaps. Finally, data and processes such as ours may be helpful in the ongoing discussions regarding competency-based versus time-based training.
1. Frank JR, Snell LS, Cate OT, et al. Competency-based medical education: Theory to practice. Med Teach. 2010;32:638645.
2. Nasca TJ, Philibert I, Brigham T, Flynn TC. The next GME accreditation system—rationale and benefits. N Engl J Med. 2012;366:10511056.
3. Crossley J, Jolly B. Making sense of work-based assessment: Ask the right questions, in the right way, about the right things, of the right people. Med Educ. 2012;46:2837.
4. ten Cate O. Trust, competence, and the supervisor’s role in postgraduate training. BMJ. 2006;333:748751.
5. ten Cate O, Scheele F. Competency-based postgraduate training: Can we bridge the gap between theory and clinical practice? Acad Med. 2007;82:542547.
6. Driessen E, Scheele F. What is wrong with assessment in postgraduate training? Lessons from clinical practice and educational research. Med Teach. 2013;35:569574.
8. Carraccio C, Burke AE. Beyond competencies and milestones: Adding meaning through context. J Grad Med Educ. 2010;2:419422.
9. Caverzagie KJ, Iobst WF, Aagaard EM, et al. The internal medicine reporting milestones and the next accreditation system. Ann Intern Med. 2013;158:557559.
10. Schuwirth LW, Van der Vleuten CP. Programmatic assessment: From assessment of learning to assessment for learning. Med Teach. 2011;33:478485.
11. Warm EJ, Mathis BR, Held JD, et al. Entrustment and mapping of observable practice activities for resident assessment. J Gen Intern Med. 2014;29:11771182.
14. Luke RG, Wones RG, Galla JH, Rouan GW, Tsevat J, Dorfmeister JW. Development and implementation of a teaching practice plan in a department of medicine (1995–1998): Relative teaching units (RTU’s). Trans Am Clin Climatol Assoc. 1999;110:214226.
15. Warm EJ, Schauer D, Revis B, Boex JR. Multisource feedback in the ambulatory setting. J Grad Med Educ. 2010;2:269277.
16. Beeson MS, Holmboe ES, Korte RC, et al. Initial validity analysis of the emergency medicine milestones. Acad Emerg Med. 2015;22:838844.
17. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: Theory and application. Am J Med. 2006;119:166.e7166.e16.
18. Williams RG, Klamen DA, McGaghie WC. Cognitive, social and environmental sources of bias in clinical performance ratings. Teach Learn Med. 2003;15:270292.
19. Downing SM, Haladyna TM. Validity threats: Overcoming interference with proposed interpretations of assessment data. Med Educ. 2004;38:327333.
20. McManus IC, Thompson M, Mollon J. Assessment of examiner leniency and stringency (“hawk–dove effect”) in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling. BMC Med Educ. 2006;6:42.
21. Kogan JR, Hess BJ, Conforti LN, Holmboe ES. What drives faculty ratings of residents’ clinical skills? The impact of faculty’s own clinical skills. Acad Med. 2010;85(10 suppl):S25S28.
22. Kogan JR, Conforti L, Bernabeo E, Iobst W, Holmboe E. Opening the black box of clinical skills assessment via observation: A conceptual model. Med Educ. 2011;45:10481060.
23. Kogan JR, Conforti LN, Iobst WF, Holmboe ES. Reconceptualizing variable rater assessments as both an educational and clinical care problem. Acad Med. 2014;89:721727.
24. van der Vleuten CP, Schuwirth LW, Driessen EW, et al. A model for programmatic assessment fit for purpose. Med Teach. 2012;34:205214.
25. Dijkstra J, Galbraith R, Hodges BD, et al. Expert validation of fit-for-purpose guidelines for designing programmes of assessment. BMC Med Educ. 2012;12:20.
26. van der Vleuten CP, Schuwirth LW, Driessen EW, Govaerts MJ, Heeneman S. Twelve tips for programmatic assessment. Med Teach. 2015;37:641646.