Secondary Logo

Journal Logo


Criterion-Based Assessment in a Norm-Based World: How Can We Move Past Grades?

Pereira, Anne G. MD, MPH; Woods, Majka PhD; Olson, Andrew P.J. MD; van den Hoogenhof, Suzanne PhD; Duffy, Briar L. MD; Englander, Robert MD, MPH

Author Information
doi: 10.1097/ACM.0000000000001939
  • Free


The paradigm shift from Flexner to competency-based medical education (CBME) is well under way.1 One result has been to place us at a crossroads in assessing student competence. Increasingly, we are trying to move toward competency-based assessment while continuing to rely on traditional grading systems. Assessment in a competency-based system relies on comparing learners’ performance versus predefined outcomes (criterion-referenced), which may be at odds with the emphasis on assessment that compares learners versus mean performance (norm-referenced) that has been the primary model in medical education and training for decades. For example, students’ performance on clerkships is often assessed in comparison with the other students with whom they are rotating in that clerkship or within a defined academic period. Similarly, while residency training programs may have a United States Medical Licensing Examination (USMLE) Step 1 cut score below which they will not interview candidates (criterion-referenced), the relative strength of a residency candidate’s USMLE Step 1 score is compared with other candidates for the same training program (norm-referenced). Thus, the USMLE Step 1 score may be used as a criterion-based “screening test” for selection for interview for residency and a norm-based assessment for the remainder of the residency selection process. This is particularly problematic in light of the standard error of measurement (5 points) and the standard error of difference (8 points) in Step 1 that both contribute to the lack of significance when comparing scores that differ by up to 20 points.2

The purpose of this article is to elaborate on many of the tensions inherent between these two assessment frameworks and to identify broadly a path to a more well-defined and implementable criterion-referenced system of assessment to align with the emerging competency-based model of medical education. We first review the current graduate medical education (GME) model using competencies and their attendant milestones and the emerging use of the framework of entrustable professional activities (EPAs). Next, we describe the challenge of moving undergraduate medical education (UME) toward this model, with special attention to the fact that the residency selection process is based at least in part on norm-based assessment, particularly in the use of the USMLE Step 1 and Step 2 examinations, the Medical Student Performance Evaluation (MSPE, or “dean’s letter”), and medical school grades. Last, we propose a way forward for the UME community to successfully navigate this transition.

The GME Journey Toward Competency-Based Assessment

Since the development of the Accreditation Council for Graduate Medical Education (ACGME) and American Board of Medical Specialties’ (ABMS’s) Outcomes Project in 1999,3 the GME community in the United States has been on a journey from the structure-process model of education originally suggested by Flexner4 in 1910 to a CBME framework. The paradigm shift requires four steps: establishment of the requisite outcomes or competencies; development of expected performance levels for those competencies; creation and validation of a framework for assessing where learners are along that developmental continuum of performance; and measuring the intended and unintended results of the shift across the educational system.

Step 1: The Outcomes Project developed the requisite outcomes or competencies that, when combined, describe the knowledge, demonstrated skills, and attitudes of a physician ready to practice independently. The ACGME, in partnership with the ABMS, outlined six “core competencies,” or domains of competence, and the associated competencies that every practicing physician should demonstrate. The ACGME then tasked specialties with developing the specific competencies needed to reflect the outcomes desired within each specialty.

Step 2: The Milestones Project resulted in each specialty developing performance levels, or milestones, for each of the competencies. The goal of the ACGME in the development of these milestones was to “provide more explicit and transparent expectations of performance, support better self-directed assessment and learning, and facilitate better feedback for professional development.”5 Through national consensus building with key stakeholders including leaders in UME, residency program directors, and students and residents, each specialty developed measurable, behaviorally based milestones for their competencies. Similar to the well-known markers of the domains of development that are used in assessing child development, a milestone is “a defined, observable marker of an individual’s ability along a developmental continuum.”6 The milestones are intended to provide a developmental framework for marking the progression of residents and fellows along the educational continuum in the key components of physician competence within their specialty. Semiannually, each training program is required to report to the ACGME individual trainee performance and progress toward ability for independent practice in specialty subcompetencies.7

Step 3: The development of standardized, accurate, and practical assessments to measure learners’ progression throughout training may prove to be the most challenging in GME. At the core of this lies the struggle of moving from the norm-referenced model of assessment in our traditional structure-process educational model to the criterion-referenced model in CBME.

Step 4: The long-term outcomes of GME’s transition to criterion-based assessment on learner and patient outcomes are still unknown, as are potential unintended or unanticipated consequences. Future study is needed to measure the impact of this paradigm shift.

Despite many challenges, the GME community has made great progress over the past decade, particularly with the emergence of the milestones and EPAs or their derivatives such as observable professional activities.8 The ACGME’s mandate of milestone-based reporting drove much of this progress. As a result, most of the intended assessment in GME is criterion referenced, although one persistent area of normative assessment in GME centers on the average point in training at which each of the milestones is achieved based on historical performance. That said, the field continues to add studies that provide an ever-growing body of information around the validity of tools used in milestone assessment.9

The UME community is earlier in the transition to competency-based education and thus just beginning to face the challenges inherent in it.

Setting the Foundations for CBME in UME

The UME community faces many challenges in navigating this same transition toward a competency-based system of education, which can be examined in the same steps as the above GME process. With respect to the first step, delineation of the desired outcomes or competencies, we may be approaching consensus on the requisite competencies of a physician across the continuum of education, training, and practice with the publication of the Physician Competency Reference Set (PCRS).10 The PCRS is a compilation and synthesis of more than 150 competency lists for physicians across the education continuum, across countries, across specialties, and across health professions. The ACGME core competencies served as the framework for the PCRS and provided critical guidance as the reference set. The resulting 58 competencies in eight domains are the framework of the Association of American Medical Colleges’ (AAMC’s) curriculum inventory.

For the second step, development of the milestones that mark progression of competence, the AAMC has at least begun the process by creating milestones for medical students for all of the PCRS competencies that were considered critical to the performance of the Core EPAs for Entering Residency (Core EPAs).11 The milestones delineate two levels of performance, “pre-entrustable and entrustable,” corresponding to the novice learner and the learner capable of performing the EPA with indirect supervision. As we begin to implement the Core EPAs, educators are seeing the need for at least a third, intermediate milestone that would describe the advanced beginner, such as a learner who has completed her core clerkships and is moving to more advanced rotations.12

Addressing the third step, the framework for assessment, Core EPAs could function as a framework for assessing the foundational aspects of becoming a physician and being prepared to enter residency training. Students will additionally need to demonstrate school-mission-specific and specialty-specific competencies to optimize the transition to GME. The bottom line is that we now have a scaffolding to make the transition to a competency-based system of education—most important, a framework for assessment that is criterion referenced. Nevertheless, our UME community remains largely reliant on norm-based tools.

Challenges in the UME Journey Toward Competency-Based Assessment

While there are a number of salient differences in the assessment methodologies of GME and UME,1 three have been particularly problematic in the transition toward competency-based assessment in UME: our current reliance on norm-based assessment models, our reliance on proxy assessments rather than direct observation of learners, and the emphasis on summative rather than formative assessments.

In UME, many of our assessments are reliant on, and reinforcing of, norm-referenced standards. Examples include comparative performance on tests in the first two years, performance on the National Board of Medical Examiners’ subject (“shelf”) examinations,13 the USMLE examinations, and overall norm-based rankings of medical students in the MSPE or departmental letters of recommendation.14 The recent AAMC recommendations for a standard MSPE emphasize the importance of student ranking and of documentation of student grades compared with their peers, again reinforcing the emphasis on norm-referenced standards, since by necessity then students are compared with one another, not a criterion standard.15

A first major challenge in this transition that may prove a substantial obstacle is the degree to which our educational system from K–12 through college is steeped in norm-based assessment. For medical students, faculty, and educational systems, this tradition and the comfort of the status quo are two powerful drivers against change. Almost all medical students have known primarily norm-based assessment throughout their lives, and they have learned how to be highly successful in this system. Even if the educational system were to change, students fear that without grades they will not be able to separate themselves from their peers and stand out in a residency program selection process that is perceived as highly competitive. Indeed, until an approach is adopted for the residency selection process that considers applicants in a more holistic way, our students are likely to remain the most vocal opponents to a change to criterion-referenced assessments as the cornerstone of a competency-based educational system. As previously mentioned, many residency programs use grades and test scores as “screening tests” for even considering applicants.16 Students are concerned that residency programs must wade through a significant amount of norm-based information before even considering other, more holistic measures of student aptitude and performance that may be better predictors of their ultimate performance as physicians. Residency program directors are also likely to be wary of losing these “screening tests” that allow them to process the massive number of applications that each program receives.

For educators, there are also two powerful drivers for maintaining the status quo: The current system is all they have known, and they already have familiar assessment tools in use. A shift represents substantial work in developing new tools as well as in learning how to apply these tools. New tools alone are not sufficient without meaningful application by skilled assessors.17 Furthermore, faculty believe that the current measures are valid, even though evidence is strong that these measures (grades, USMLE Step examination scores) have little correlation with future clinical performance.18

A second challenge is competency-based assessment’s reliance on direct observation of learners “in the trenches,” while the traditional model has relied on proxy assessment (e.g., through multiple-choice questions or oral presentations to assess a learner’s history and physical exam skills). This challenge is pragmatic as most clinical environments do not readily accommodate frequent direct observation of learners. Additionally, faculty need to be developed in how to observe and assess performance on the desired outcomes.19 We will require new ways of thinking, including engaging our colleagues in systems design, to recreate clinical learning environments that allow simultaneous observation of learners and optimization of patient flow.

A third challenge is competency-based assessment’s emphasis on formative feedback while most of our traditional models focus on summative assessment. For example, while the Liaison Committee for Medical Education mandates midrotation feedback for clinical rotations, it is the final grade in the clerkship that will affect students’ competitiveness for the residency Match.20

In addition to these differences in assessment methods, the newness of the CBME paradigm challenges of shifting from a traditional structure/process-based assessment system to a competency-based assessment system in UME is further complicated by the relatively new construct that CBME represents. The granular nature of the competencies may frustrate assessors, and in many instances attempts to measure learners’ development result in deconstructing the competencies to a measurable, but at times meaningless, level. Students may demonstrate all of the discrete tasks on a checklist, yet the complete checklist still may not be able to indicate whether the student is able to “put it all together to deliver care to the patient.” EPAs have emerged in response to this segmented, deconstructed assessment of an individual’s competence. EPAs by definition require the integration of competencies across domains to provide care in the clinical setting. The EPAs provide a framework for workplace-based assessment of the complex tasks our trainees must be capable of to function as physicians.

Current Forces Driving Competency-Based Assessment

While continuing to select applicants for interviews and making rank-order decisions based primarily on norm-based assessments, the GME community is beginning to demand a change to our UME assessment system. Residency program directors report that a lack of professionalism can be a common problem for interns.21 While a certain level of USMLE Step 1 score is reassuring that residents will be able to pass their board certification examination, there is consensus that residents lacking competence in interprofessional communication, teamwork, professionalism, and patient-centered care are much more difficult to remediate and that their behavior is much more disruptive than that of residents who need additional support to develop their medical knowledge.22 Indeed, there is evidence that unprofessional behavior in medical school is associated with subsequent disciplinary behavior by a state medical board, and there is no evidence to our knowledge of a similar association with lack of medical knowledge.23 Once a threshold has been passed, residency program directors acknowledge the lack of difference between resident clinical performance based on USMLE scores. Nonetheless, the increasing number of residency applications per applicant has further emphasized the use of the USMLE Step 1 score for a purpose for which it is not designed.24

Solution: The Time Is Now to Embrace Criterion-Referenced Assessment

As a UME community we must develop a system that can provide meaningful information to the developing physician and the GME community as we prepare to hand over our students to residency programs. Standardized handover tools improve patient safety.25 An analogy can be drawn to improve the transition from UME to GME: If a standard series of assessments that measure each student’s development is used, we can create more specific, anticipatory, and individualized learning plans for GME.26 We know our current norm-based assessments are not predictive of performance as a physician, but until recently, we did not have established criteria that could be standardized across medical schools. With the advent of the PCRS, the related milestones for UME, and the Core EPAs as a framework for assessment, we are positioned to make a change. Indeed, by focusing on the requisite competencies as demonstrated by performance of the Core EPAs, we can expect more meaningful information to hand over to GME program directors. In addition, medical school graduates will be familiar with demonstrating competence through their work, will be better able to reflect on domains in which they need to further develop, and will expect (or even demand) formative feedback to help them on their developmental trajectory toward independent practice. Ideally, this work will prepare them for a career of lifelong learning and self-assessment.

For the UME community to successfully transition to competency-based assessment, we will need to build consensus across the UME–GME continuum. Once medical schools can reliably assure GME programs that the information they provide to residency programs about prospective candidates is accurate, precise, transparent, meaningful, and generalizable within and across schools, our community can move intentionally and successfully from norm-based to criterion-referenced assessment, thereby taking one more step toward realization of this much-needed paradigm shift.

Early UME Successes in the Switch to a Competency-Based System and the Use of Criterion-Referenced Assessment

Currently, there are several pilots of competency-based assessment and progression in UME. For example, four schools are involved in the AAMC Education in Pediatrics Across the Continuum (EPAC) pilot that is testing the feasibility of competency-based progression from UME to GME.27 In these schools, students transition directly to residency once they have been entrusted to perform the Core EPAs and successfully demonstrated advanced transition competencies as required by their school. While the generalizability of this pilot is yet to be determined, a number of features of this approach may be valuable to consider in developing effective criterion-based assessments. First, these assessments are heavily reliant on direct observation of learners performing definable and demonstrable behaviors, such as taking a specific type of history of performing a specific procedure. Second, the assessments must also be longitudinal to allow learners and educators to track progress toward competence. Third, learners must be engaged as stakeholders and advocates in their assessment and progression toward competence.

While the vast majority of the U.S. medical schools have yet to move to competency-based progression, the conversation about competency-based assessment is becoming increasingly broad, robust, and action oriented. Many schools are working on competency-based assessments and pilot projects to determine where and how to build a CBME model. For example, Indiana University School of Medicine has been using a competency-based curriculum since 1999 and has systematically documented longitudinal results of student advancement.28 The current work in the AAMC’s Core EPA pilot will inform the development of assessment tools that can be shared across medical schools. Furthermore, there are some single-site examples of UME–GME student handovers that include specific information about students’ progression along predetermined competency milestones.29

Closing Words

Completing the paradigm shift to CBME in UME will require changing our assessment framework from norm referenced (comparing students’ mean or median performance versus their peers’) to criterion referenced (measuring student performance using a predetermined set of criteria). There are a number of factors at the system, student, and faculty level driving a commitment to, and reinforcement of, the status quo despite the substantial evidence that our current reliance on standardized tests and grades has little, if any, correlation with clinical performance. We have overcome a major barrier to change by establishing the assessment criteria with the advent and general acceptance of the competencies and milestones in GME, as well as the PCRS and Core EPAs in UME. Now, we must do the hard work of developing assessments steeped in direct observation that learners and faculty can accept across the educational-training-practice continuum and that can be shown to predict clinical performance in a much more meaningful way than the norm-referenced assessments of the past.


1. Carraccio C, Wolfsthal SD, Englander R, Ferentz K, Martin C. Shifting paradigms: From Flexner to competencies. Acad Med. 2002;77:361367.
2. United States Medical Licensing Examination. USMLE score interpretation guidelines. Updated June 2016. Accessed July 13, 2017.
3. Swing SR. The ACGME Outcome Project: Retrospective and prospective. Med Teach. 2007;29:648654.
4. Flexner A. Medical Education in the United States and Canada. 1910.Washington, DC: Science and Health Publications, Inc.
5. Accreditation Council for Graduate Medical Education. Milestones. Accessed July 13, 2017.
6. Englander R, Frank JR, Carraccio C, Sherbino J, Ross S, Snell L; ICBME Collaborators. Toward a shared language for competency-based medical education. Med Teach. 2017;39:582587.
7. Accreditation Council for Graduate Medical Education. Milestones by reporting date. Updated January 2015. Accessed July 13, 2017.
8. Warm EJ, Mathis BR, Held JD, et al. Entrustment and mapping of observable practice activities for resident assessment. J Gen Intern Med. 2014;29:11771182.
9. Park YS, Zar FA, Norcini JJ, Tekian A. Competency evaluations in the Next Accreditation System: Contributing to guidelines and implications. Teach Learn Med. 2016;28:135145.
10. Englander R, Cameron T, Ballard AJ, Dodge J, Bull J, Aschenbrener CA. Toward a common taxonomy of competency domains for the health professions and competencies for physicians. Acad Med. 2013;88:10881094.
11. Association of American Medical Colleges. Core Entrustable Professional Activities for Entering Residency. 2014. Washington, DC: Association of American Medical Colleges; Accessed July 13, 2017.
12. Chen HC, van den Broek WE, ten Cate O. The case for use of entrustable professional activities in undergraduate medical education. Acad Med. 2015;90:431436.
13. Torre D, Papp K, Elnicki M, Durning S. Clerkship directors’ practices with respect to preparing students for and using the National Board of Medical Examiners Subject Exam in medicine: Results of a United States and Canadian Survey. Acad Med. 2009;84:867871.
14. Lang VJ, Aboff BM, Bordley DR, et al. Guidelines for writing department of medicine summary letters. Am J Med. 2013;126:458463.
15. Association of American Medical Colleges. Recommendations for revising the medical school performance evaluation (MSPE). Accessed February 26, 2017.
16. Katsufrakis PJ, Uhler TA, Jones LD. The residency application process: Pursuing improved outcomes through better understanding of the issues. Acad Med. 2016;91:14831487.
17. Holmboe ES, Sherbino J, Long DM, Swing SR, Frank JR. The role of assessment in competency-based medical education. Med Teach. 2010;32:676682.
18. McGaghie WC, Cohen ER, Wayne DB. Are United States Medical Licensing Exam Step 1 and 2 scores valid measures for postgraduate medical residency selection decisions? Acad Med. 2011;86:4852.
19. Gingerich A, Kogan J, Yeates P, Govaerts M, Holmboe E. Seeing the “black box” differently: Assessor cognition from three research perspectives. Med Educ. 2014;48:10551068.
20. National Resident Match Program. Results of the 2016 NRMP program director survey. Accessed July 13, 2017.
21. Lyss-Lerman P, Teherani A, Aagaard E, Loeser H, Cooke M, Harper GM. What training is needed in the fourth year of medical school? Views of residency program directors. Acad Med. 2009;84:823829.
22. Dupras DM, Edson RS, Halvorsen AJ, Hopkins RH Jr, McDonald FS. “Problem residents”: Prevalence, problems and remediation in the era of core competencies. Am J Med. 2012;125:421425.
23. Papadakis MA, Hodgson CS, Teherani A, Kohatsu ND. Unprofessional behavior in medical school is associated with subsequent disciplinary action by a state medical board. Acad Med. 2004;79:244249.
24. Prober CG, Kolars JC, First LR, Melnick DE. A plea to reassess the role of United States Medical Licensing Examination Step 1 scores in residency selection. Acad Med. 2016;91:1215.
25. Starmer AJ, Sectish TC, Simon DW, et al. Rates of medical errors and preventable adverse events among hospitalized children following implementation of a resident handoff bundle. JAMA. 2013;310:22622270.
26. Warm EJ, Englander R, Pereira A, Barach P. Improving learner handovers in medical education. Acad Med. 2017;92:927931.
27. Association of American Medical Colleges. Education in Pediatrics Across the Continuum (the EPAC project). Updated 2017. Accessed July 13, 2017.
28. Brokaw JJ, Torbeck LJ, Bell MA, Deal DW. Impact of a competency-based curriculum on medical student advancement: A ten-year analysis. Teach Learn Med. 2011;23:207214.
29. Starmer AJ, O’Toole JK, Rosenbluth G, et al.; I-PASS Study Education Executive Committee. Development, implementation, and dissemination of the I-PASS handoff curriculum: A multisite educational intervention to improve patient handoffs. Acad Med. 2014;89:876884.
Copyright © 2017 by the Association of American Medical Colleges