Driving Care Quality: Aligning Trainee Assessment and Supervision Through Practical Application of Entrustable Professional Activities, Competencies, and Milestones : Academic Medicine

Secondary Logo

Journal Logo


Driving Care Quality

Aligning Trainee Assessment and Supervision Through Practical Application of Entrustable Professional Activities, Competencies, and Milestones

Carraccio, Carol MD, MA; Englander, Robert MD, MPH; Holmboe, Eric S. MD; Kogan, Jennifer R. MD

Author Information
Academic Medicine 91(2):p 199-203, February 2016. | DOI: 10.1097/ACM.0000000000000985


The purpose of medical education is to prepare a physician workforce that can provide high-quality care—defined as safe, effective, timely, efficient, equitable, and patient centered—without a need for ongoing supervision.1,2 Fifteen years after the shift to a graduate medical education (GME) system based on the Accreditation Council for Graduate Medical Education and the American Board of Medical Specialties’ core competencies, the United States still ranks last of 11 countries in overall health outcomes.3 Although the 2013 National Healthcare Quality Report4 demonstrates some improvement in the quality of the nation’s health outcomes, the pace of progress remains slow. As noted in the recent Institute of Medicine’s report2 on GME financing and governance, the U.S. health care system’s level of performance reflects, to some extent, the effectiveness of our training programs. For example, work by Asch et al5 demonstrates that the quality of clinical care provided in a training institution correlates with the quality of care delivered by those trainees in other practice settings up to 15 years after graduation.

Competency-based medical education (CBME), whose foundation is built on the competencies required by physicians to address the needs of patient populations being served, has been implemented across GME to address the quality gaps outlined above.6 Implementation of CBME, however, exposed the pervasive lack of meaningful measures to assess trainee performance, and assessment continues to be a major rate-limiting step in the advancement of CBME.6

Kogan et al7 have recently suggested that the challenges in learner assessments be reconceptualized as “both an educational and clinical care problem.” In so doing, they proposed the Accountable Assess ment for Quality Care and Supervision (AAQCS) equation: Trainee performance × Appropriate supervision = Safe, effective patient-centered care.

AAQCS emphasizes that safe, effective, patient-centered care only results from the product of appropriate supervision (required level of intervention to ensure care quality) matched to the trainee’s level of performance (assessment of a trainee’s level of competence in a particular context). The major contribution of AAQCS is that it emphasizes that patient outcomes is the dependent variable of the assessment equation. That is, the fundamental purpose of assessing a trainee’s performance is lost when the impact of that performance on patients is not considered. Furthermore, the fundamental purpose of supervision is lost when its impact on patient care quality is not considered. Therefore, an important contribution of the AAQCS equation is that it demonstrates the interdependence of trainees and supervisors on care outcomes. The supervisory role requires active engage ment in simultaneously identifying gaps in a trainee’s care and stepping in to fill those gaps so that patients receive high-quality care. When supervisors successfully perform this role, trainees get meaningful feedback for professional development that then enables the appropriate developmental changes in level of supervision.

The purpose of this article is to demonstrate the practical application of the AAQCS equation in the clinical learning environment through a unifying framework that integrates entrustable professional activities (EPAs), competencies, and milestones to inform faculty assessment of trainee performance and guide subsequent supervision decisions. By definition, EPAs are units of work that focus on outcomes of care, in contrast to competencies and milestones, which focus on trainee abilities.8 EPAs require the integration of competencies within and across domains. Making the decision to entrust a learner with a professional activity is verifying that the learner is capable of safely and effectively carrying out the professional activity without supervision. The unit of measure of the EPA, the outcome of the professional activity, aligns well with the dependent variable of care quality in the AAQCS equation: safe, effective, patient-centered care. Before we can demonstrate the practical application of the AAQCS equation using a unifying framework of EPAs, competencies, and milestones, we must first examine the challenges facing each of the independent variables—assessment of trainee performance and appropriate supervision—and the impact of their interdependence in affecting patient outcomes.

Assessment of Trainee Performance

A foundational strategy to assess trainees in CBME is workplace assessment.9,10 Defined as observing and assessing trainees in the authentic clinical environment, workplace assessment targets the “does” level of Miller’s pyramid, which builds an assessment perspective of learners beginning with a base of “knows,” and moving through the levels of “ knows how, shows how and does” at the pinnacle.11 Ideally, experts engage in direct observation of learners to assess their competence in the context of actual care delivery. However, researchers have found that even when several experts observe the same trainee in the same encounter, there is variability in their judgments about learner performance.9,12 There are many explanations for this variability. Assessors use variable standards by which they assess the trainee, such as comparing trainees with themselves (self as standard) or comparing trainees with what they would expect of others at a similar level of training (normative standard).13 Typical assessment tools ask raters to use numerical scales to assess learner performance, and error may result from the conversion of natural, narrative-based thought processes to numerical values.14 Other causes of rater variability include limitations in cognitive processes used by raters,15 cognitive errors,16 use of less literal descriptions and more inferences based on rater expertise,17 raters’ own clinical skills,13 and the context in which the judgment is made including social context.17,18

Appropriate Supervision

Although the purpose of workplace assessment is to assess trainee competence, another critical function is to inform the level of supervision a trainee requires.7 That is, workplace assessment should inform faculty of the gap between what a trainee is capable of doing and what is required for safe, effective patient-centered care. This concept is highlighted in the AAQCS equation by the second independent variable, appropriate supervision.

The appropriate supervision variable of the AAQCS equation highlights two points that deserve clarification: how to define “appropriate” supervision, and how to address the dual purpose of supervision—ensuring quality patient care and learner development on the trajectory toward unsupervised practice. Kennedy et al19 defined “clinical oversight” as “patient care activities performed by supervisors to ensure quality of care.” Based on this definition, the quality of care that the patient receives in a given encounter is dependent on the combined clinical competence of the trainee and the supervisor. The supervisor must be able to assess the edges of trainee competence and make decisions about what interventions are required to ensure seamless quality of care. Therefore, engaging in the appropriate level of supervision is highly dependent on faculty assessment skills. Given the challenges with reliability of workplace assessments, there is likely significant variability in matching the level of supervision to level of trainee performance on the basis of these assessments. Ultimately, this can adversely affect care quality in the short run with undersupervision leading to a negative care outcome in the given encounter and in the long run with oversupervision leading to trainees who are not ready to engage in unsupervised practice upon graduation.

When defining supervision, Kennedy20 has also suggested that “designing clinical supervision to maximize trainee learning will be the most powerful way to impact the quality of patient care in the long-term.” The quality of care that patients receive in future encounters with this trainee will not only be affected by the modeling of care done by the supervisor in a given encounter but the supervisor’s ability to support the professional development of the learner beyond competence to capability (the ability to demonstrate competence in new contexts).21 Vygotsky’s22 zone of proximal development suggests that the best learning occurs when trainees are challenged, with support, to work beyond the level at which they are comfortable performing (i.e., at the upper edge of their competence), which ideally calls for a supervisory approach that is matched to the developmental level of the trainee.23 This might include, for example, a directive approach for novices, a collaborative approach for advanced beginners, and a nondirective approach for trainees who are competent.23 Supervision needs to be close enough that learners are provided with informative feedback, balanced with enough autonomy to challenge their ability to advance along their trajectory from novice toward expertise.24

Unwanted variability in assessment of trainees risks negatively impacting both care quality and learner development. If faculty are not able to identify the gap between what a trainee can do and safe, effective, patient-centered care, they will not be able to engage in the specific supervision behaviors that fill the gaps in care while also promoting learner development. Through the remainder of this article we build a case for using a unifying framework that integrates EPAs, competencies, and milestones for assessment that potentially could significantly reduce unwanted variability.

Toward Safe, Effective, Patient-Centered Care: Developing Faculty to Assess and Supervise

Lessons from the literature

The literature suggests four important mechanisms to improve faculty assessment and supervision: faculty development, global assessments, narrative scripts, and effective supervision skills.

Faculty development.

Faculty development is critical for mitigating unwanted variability in assessment of trainee performance. Group methods and techniques that help faculty develop shared mental models and understanding of clinical competencies can help to eliminate some of the variability seen in performance assessments.25 In essence, these techniques are designed to provide a robust script of what behaviors to look for in assessing learner competence and how those behaviors align with predefined levels of performance. These techniques can lead to more accurate assessments and thus judgments about trainee performance and level of clinical intervention needed by the patient, as well as provide substrate for formative feedback to learners.

Use of global assessments.

Literature suggests that global assessments of overall judgment in the hands of experts are more reliable than checklists.26 Thus, replacing global rating forms with checklists is not a strategy for reducing unwanted variability. We are familiar with the experience of observing a learner who can complete all items on a checklist but cannot integrate the required tasks in a way that enables effective care delivery. An important lesson learned is that objectivity and reliability are not synonymous; subjective assessments in the hands of experts and under conditions of appropriate sampling are more reliable than checklists.26

Use of narrative scripts.

While some variability in rater assessments is unwarranted and due to error that is desirable to minimize, some variability contributes meaningful information to comprehensive learner feedback by providing different lenses through which raters see the same learner.9 Gingerich et al14 have drawn from literature on impression formation to explain how raters build unique narratives or scripts, based on their experiences with a variety of learners over time, to inform judgments about current learner performance. Much as physicians do in clinical reasoning, which relies on unique and continuously updated illness scripts and pattern recognition, raters draw on scripts stored in memory to inform impressions of new learners.27,28 These scripts exist as recollections of behaviors in the form of rich narratives. The uniqueness of an individual assessor’s experiences, and thus her or his scripts, creates variability that is not due to error but, rather, to perspective. Of note, the script of an isolated assessor may be quite different from those of the majority based either on perspective or an outdated script. If the former, the perspective can provide useful formative feedback but should not be the basis of a summative decision. If the latter, the feedback is not relevant. Either way, adequate sampling remains an essential requirement for ensuring reliability when subjective judgment is used in assessment.29

Effective supervision skills.

Despite the clear need, supervision of trainees is often inadequate. The literature illustrates the need for faculty development to help supervisors navigate their dual roles as clinicians ensuring care quality and educators ensuring learner development.20 Although there are a variety of supervision styles, there are fundamental principles underpinning essential, effective supervisory skills that need to be packaged for those charged with learner assessment.30 The principles of effective supervision must be learned and practiced, and supervisors require feedback on their performance as supervisors to support continuous improvement in the role.

Building on the past: Applying the AAQCS equation in the workplace through EPAs, competencies, and milestones

A unifying framework integrating EPAs, core competencies, and specialty milestones will address some of the challenges of trainee assessment and appropriate supervision. EPAs are the essential professional activities that describe a specialty.31,32 Holistic judgments regarding learner readiness to be entrusted with performing an EPA without supervision require a wide-angle lens for viewing that learner, ideally through the eyes of an expert witness observing the care that the learner delivers in the workplace. Work by Ginsburg et al33 suggests that some of the difficulty faculty have assessing competencies results from the fact that they cannot practically “observe and experience” an abstract individual competency in the authentic clinical setting. EPAs may prove more helpful in focusing faculty on what they are able to observe in the context of clinical oversight, thus enhancing their ability to provide meaningful assessments. Mapping EPAs to the essential competencies that inform each EPA sets the stage for integrating the behavioral descriptions of the milestones at each performance level, thus enabling the creation of a shared mental model of what a novice or advanced beginner, for example, looks like when engaged in the professional activity (Supplemental Digital Appendix 1, https://links.lww.com/ACADMED/A308). For example, Supplemental Digital Appendix 1 lists the competencies (left-hand column) and their progressive milestones34 (subsequent columns read from most novice performance on the left to most expert performance on the right) for the EPA “manage patients with acute, common diagnoses in ambulatory, emergency, or inpatient setting.” To minimize the influence of a label on the assessor’s choice of performance level, the columns are just given a level of development. Looking at the matrix, if one reads down the first column of milestones, one can integrate and synthesize the salient behaviors (see “Behaviors of an early learner”) as illustrated just after the bottom row of the table. Immediately following these behaviors is a vignette of what a learner would look like engaged in those behaviors as he or she carried out the given EPA in the workplace. The same exercise can be done for each subsequent column, so that reading down the right-most column would allow synthesis and integration of the behaviors of an expert performing this EPA, which can then be used to develop a clinical vignette of the expert learner. Thus, these vignettes are built on the essential competencies and their respective milestones that are required to perform the EPA, and provide a common mental model of the learner at each stage of development. Creation of this shared mental model, similar to performance appraisal training methods for faculty development,35–37 may be helpful in addressing the unwanted variability in judgment that occurs when different raters, even experts, observe the same learner in the same clinical encounter.12,25 We recognize that most learners will not match the behaviors for all milestones in a given column. For example, a learner may exhibit behaviors of the first milestone (novice) on one competency and the second milestone (advanced beginner) on another. This in no way diminishes the value of having a shared mental model for each level of performance. In fact, it provides rich substrate, not available from simple global ratings, for decisions about required levels of supervision and feedback that supports learner development to ensure care quality in future encounters.

The use of vignettes in learner assessment may actually add to assessment reliability in three ways. It may assist in aligning the format of assessment tools with the natural tendency to recall narrative descriptions when rating learner performance,17 reducing the cognitive load that is required to convert from the narrative descriptions to numerical values,38 and eliminating the errors that may result from translating descriptions to numerical values.14 Work by Regehr et al39,40 has demonstrated that faculty were better able to distinguish between levels of performance when matching trainee performance to standardized narrative vignettes that integrated competencies than when assessing learners on the basis of individual competencies. Predetermined standards for a given performance level were embedded within each vignette in the series, just as the competencies and their level-specific milestones are embedded in our vignettes.40

There are two potential benefits of linking EPAs to competencies and their milestones to improve care quality. First, there is a direct relationship between assessment of a trainee’s ability to perform a professional activity and the requisite level of supervision needed to ensure that the activity is safely and effectively carried out.32 Better assessment decisions of the trainee can better inform appropriate supervision to ensure care quality. Judiciously mapping EPAs to the competencies and milestones that are critical to entrustment decisions provides a zoom lens that allows a more granular view of learner behaviors that must be integrated to perform the EPA. Contrasting a trainee’s current behaviors with the desired behaviors represented within the developmental milestone narratives can more specifically inform supervision practices.

The second benefit of linking EPAs to competencies and their milestones is that it provides a learning road map that serves as the substrate for feedback, which can improve patient care quality in future care encounters. It is not enough to know that some learners need to be more closely supervised than their peers or that they cannot be entrusted when their peers can—supervisors must know why. The supervisor must be able to focus on the competencies that are causing the struggle to provide specific, behaviorally anchored feedback contrasting the learner’s current behaviors with the desired behaviors represented within the developmental milestone narratives. This enables the supervisor to support the professional development of the learner. The milestones can provide a vocabulary for supervisors challenged by breaking down global impressions into specific constructive actionable feedback. Milestones can provide learners and supervisors with a shared understanding of expected learner development. This is important, as Crossley41 recently highlighted that “credibility [in feedback delivery] is disrupted if the provider and recipient have no common language or map for discussing performance and progression.” Although the entrustment decision is a binary one (yes–no), describing levels of supervision leading up to entrustment has been proposed as a valuable framework for workplace assessment of trainees.32


The AAQCS equation brings care quality into the judgments about trainee performance. Scaffolding the independent variables of trainee performance and appropriate supervision with a unifying framework integrating EPAs, competencies, and milestones facilitates practical application of the equation in the clinical learning environment with the goal of improving the alignment between trainee assessment and appropriate supervision. The lack of a gold standard against which to assess both trainees and supervisor judgments adds to the complexity of the relationships of the variables in the equation. However, this unifying framework gives us a starting point from which to begin to address the complexities of trainee performance assessment and appropriate supervision as well as their interdependence in impacting care quality. Levels of supervision leading up to entrustment have been proposed, and one of the next challenges will be moving from implicit and circumstantial decisions about needed level of supervision to a more explicit and deliberate model based on EPAs, competencies, and milestones.

Although much work remains to be done, aligning the level of trainee supervision with the level of performance on the basis of EPAs, competencies, and milestones would guide assessors in the degree of intervention needed to fill trainee performance gaps in order to deliver safe, effective, patient-centered care. Our goal is to create an infrastructure that aligns level of supervision with a given milestone level. Importantly, only a competent clinical microsystem that supports the interdependent relationship between supervisor and supervisee can ultimately support the dependent variable in the AACQS—safe, effective, patient-centered care.42


1. Institute of Medicine. Crossing the Quality Chasm: A New Health System for the 21st Century. 2001 Washington, DC National Academies Press
2. Institute of Medicine. Graduate Medical Education That Meets the Nation’s Health Needs. 2014 Washington, DC National Academies Press
3. Davis K, Stremikis K, Schoen C, Squires D Mirror, Mirror on the Wall: How the U.S. Health Care System Compares Internationally. 2014 update. 2014 Washington, DC Commonwealth Fund
4. Agency for Healthcare Research and Quality. 2012 National Healthcare Quality Report. 2013 Rockville, Md Agency for Healthcare Research and Quality AHRQ publication no. 13-0002. http://www.ahrq.gov/research/findings/nhqrdr/nhqr12/2012nhqr.pdf. Accessed May 4, 2015
5. Asch DA, Nicholson S, Srinivas S, Herrin J, Epstein AJ. Evaluating obstetrical residency programs using patient outcomes. JAMA. 2009;302:1277–1283
6. Carraccio CL, Englander R. From Flexner to competencies: Reflections on a decade and the journey ahead. Acad Med. 2013;88:1067–1073
7. Kogan JR, Conforti LN, Iobst WF, Holmboe ES. Reconceptualizing variable rater assessments as both an educational and clinical care problem. Acad Med. 2014;89:721–727
8. ten Cate O, Snell L, Carraccio C. Medical competence: The interplay between individual ability and the health care environment. Med Teach. 2010;32:669–675
9. Govaerts MJB, van der Vleuten CPM, Schuwirth LWT, Muijtjens AMM. Broadening perspectives on clinical performance assessment: Rethinking the nature of in-training assessment. Adv Health Sci Educ Theory Pract. 2007;12:239–260
10. Carraccio C, Wolfsthal SD, Englander R, Ferentz K, Martin C. Shifting paradigms: From Flexner to competencies. Acad Med. 2002;77:361–367
11. Miller GM. The assessment of clinical skills/competence/performance. Acad Med. 1990;65(9 suppl):S63–S67
12. Govaerts MJB, Van de Wiel MW, Schuwirth LWT, Van der Vleuten CPM, Muijtjens AMM. Workplace-based assessment: Raters’ performance theories and constructs. Adv Health Sci Educ Theory Pract. 2013;18:375–396
13. Kogan JR, Conforti L, Bernabeo E, Iobst W, Holmboe E. Opening the black box of clinical skills assessment via observation: A conceptual model. Med Educ. 2011;45:1048–1060
14. Gingerich A, Regehr G, Eva KW. Rater-based assessments as social judgments: Rethinking the etiology of rater errors. Acad Med. 2011;86(10 suppl):S1–S7
15. Gingerich A, Kogan J, Yeates P, Govaerts M, Holmboe E. Seeing the “black box” differently: Assessor cognition from three research perspectives. Med Educ. 2014;48:1055–1068
16. Yeates P, O’Neill P, Mann K, Eva KW. Effect of exposure to good vs poor medical trainee performance on attending physician ratings of subsequent performances. JAMA. 2012;308:2226–2232
17. Govaerts MJB, Schuwirth LWT, Van der Vleuten CPM, Muijtjens AMM. Workplace-based assessment: Effects of rater expertise. Adv Health Sci Educ Theory Pract. 2011;16:151–165
18. van der Vleuten CPM. When I say … context specificity. Med Educ. 2014;48:234–235
19. Kennedy TJ, Lingard L, Baker GR, Kitchen L, Regehr G. Clinical oversight: Conceptualizing the relationship between supervision and safety. J Gen Intern Med. 2007;22:1080–1085
20. Kennedy TJ. Towards a tighter link between supervision and trainee ability. Med Educ. 2009;43:1126–1128
21. Fraser S, Greenhalgh T. Complexity science: Coping with complexity: Educating for capability. BMJ. 2001;323:299–303
22. Vygotsky LSVygotsky LS, Cole M. Interaction between learning and development. Mind in Society: The Development of Higher Psychological Processes. 1978 Cambridge, Mass Harvard University Press:79–91
23. Barak M, Pearlman-Avnion S, Glanz J. Using developmental supervision to improve science and technology instruction in Israel. J Curric Supervision. 1997;12:367–382
24. Ericsson KA. An expert-performance perspective of research on medical expertise: The study of clinical performance. Med Educ. 2007;41:1124–1130
25. Holmboe ESHolmboe E, Hawkins R. Direct observation by faculty. Practical Guide to the Evaluation of Clinical Competence. 2008 Philadelphia, Pa Mosby Elsevier
26. Regehr G, MacRae H, Reznick RK, Szalay D. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med. 1998;73:993–997
27. Bowen JL. Educational strategies to promote clinical diagnostic reasoning. N Engl J Med. 2006;355:2217–2225
28. Bordage G. Elaborated knowledge: A key to successful diagnostic thinking. Acad Med. 1994;69:883–885
29. Van der Vleuten CPM, Schuwirth LWT. Assessing professional competence. Med Educ. 2005;39:309–317
30. Kilminster S, Cottrell D, Grant J, Jolly B. AMEE guide no. 27: Effective educational and clinical supervision. Med Teach. 2007;29:2–19
31. ten Cate O, Scheele F. Competency-based postgraduate training: Can we bridge the gap between theory and clinical practice? Acad Med. 2007;82:542–547
32. Ten Cate O. Nuts and bolts of entrustable professional activities. J Grad Med Educ. 2013;5:157–158
33. Ginsburg S, McIlroy J, Oulanova O, Eva K, Regehr G. Toward authentic clinical evaluation: Pitfalls in the pursuit of competency. Acad Med. 2010;85:780–786
34. Carraccio C, Gusic M. The Pediatrics Milestone Project. Acad Pediatr. 2014;14(2 suppl):S1–S98
35. Woehr DJ, Huffcutt AI. Rater training for performance appraisal: A quantitative review. J Occup Organ Psychol. 1994;67:189–205
36. Hauenstein NMASmither JW. Training raters to increase the accuracy of appraisals and the usefulness of feedback. Performance Appraisal: State of the Art in Practice. 1998 San Francisco, Calif Jossey-Bass:404–442
37. Stamoulis DT, Hauenstein NMA. Rater training and rating accuracy: Training for dimensional accuracy versus training for rater differentiation. J Appl Psychol. 1993;78:994–1003
38. Durning SJ, Artino AR, Boulet JR, Dorrance K, van der Vleuten CPM, Schuwirth LWT. The impact of selected contextual factors on experts’ clinical reasoning performance (does context impact clinical reasoning performance in experts?). Adv Health Sci Educ Theory Pract. 2012;17:65–79
39. Regehr G, Regehr C, Bogo M, Power R. Can we build a better mousetrap? Improving the measures of practice performance in the field practicum. J Soc Work Educ. 2007;43:327–343
40. Regehr G, Ginsburg S, Herold J, Hatala R, Eva K, Oulanova O. Using “standardized narratives” to explore new ways to represent faculty opinions of resident performance. Acad Med. 2012;87:419–427
41. Crossley JG. Addressing learner disorientation: Give them a roadmap. Med Teach. 2014;36:685–691
42. Nelson EC, Batalden PB, Godfrey MM Quality by Design: A Clinical Microsystems Approach. 2007 San Francisco, Calif Jossey-Bass

Supplemental Digital Content

© 2016 by the Association of American Medical Colleges