In February 2020, the National Board of Medical Examiners (NBME) and Federation of State Medical Boards (FSMB) announced that United States Medical Licensing Exam (USMLE) Step 1 reporting would change from a 3-digit numeric score to a pass/fail outcome, with an anticipated implementation no sooner than January 2022. This announcement followed several years of national debate amongst key stakeholders including residency program directors, medical student educators, learners, and participants in the Invitational Conference on USMLE Scoring. A key factor in the decision was “supporting the educational engagement and overall experience of medical students and … increasing the dialogue about how multiple assessments of competency could best be utilized by stakeholders in medical regulation and medical education.”1 The goal was to reduce the current overemphasis on USMLE performance while retaining the ability to use Step 1 for its primary purpose, medical licensure. The FSMB and the NBME viewed this change as one of the first important steps toward broader, large-scale changes to improve the transition from undergraduate medical education (UME) to graduate medical education (GME) and to “advance reliable and holistic assessment of the training of physicians.”1 It is from this point of view, advancing reliable and holistic assessment of medical students and their transition from UME to GME, that we write this Invited Commentary. We do so by using the frameworks of validity and growth/mastery mindset.
The Risk of Narrow Implementation
If the change to Step 1 scoring is narrowly implemented without additional changes in medical student assessment, we will fail to solve one of the key challenges in medical education: the problematic UME–GME transition that frustrates and overwhelms many stakeholders and increasingly functions like a lottery. The UME–GME transition can be characterized in the starkest terms as medical schools doing their best to advocate for their learners by making them look as desirable as possible, and as residency programs trying their best to weed out applicants as they sort through an increasingly unmanageable number of applications. Those of us who have held roles in both medical student and residency education have experienced both sides of this dilemma; we know firsthand the shortcomings of our current assessments and recognize our own reluctance to transmit learner information that is less than glowing. We witness the negative impact of assessment on our students, who view all assessments and feedback as high stakes. We have seen our medical students find themselves constantly performing to “look good,” often at the expense of engaging in meaningful learning. We have worked with interns and residents who lack the skills their medical schools suggested they had.
Implementing a narrow approach that involves only changing Step 1 scoring without other substantive modifications to assessment will likely result in program directors using readily available or easily obtained substitute metrics instead of the Step 1 numeric score to stratify residency applications. USMLE Step 2 Clinical Knowledge (CK) scores are the obvious replacement, and yet they come so late in medical school that students and their educators would wait years to know how students stack up against their peers heading into the residency application process. While Step 2 CK assesses knowledge relevant to clinical practice, few studies demonstrate its correlation with success as a practicing physician; more importantly, shifting the focus to this exam will perpetuate the same problems we have experienced with the emphasis on Step 1 scores for applicant screening. It goes without saying this is not a desired direction or outcome. Other surrogate measures of good doctoring such as clerkship grades, Medical Student Performance Evaluation (MSPE) bottom line ranks or adjectives, Alpha Omega Alpha Honor Medical Society selection, or even medical school name could easily carry greater weight in the selection of applicants. Addressing concerns about grade inflation and bias in evaluation and grading, along with concerns about recent shifts to pass/fail clerkship grading, would take on even greater importance and urgency.
If we fail to use this time to reimagine assessment, we are at risk for perpetuating the problems that prompted changes to Step 1 scoring, and we will undoubtedly face additional unintended consequences. Alternatively, medical educators can take advantage of the bold and brave step to shift to pass/fail scoring as an opportunity to innovate and redefine the purpose of assessment to better meet the needs of multiple key stakeholders.
An Opportunity to Capitalize on the Educational Impact and Catalytic Effect of Assessment
The medical education assessment community has historically focused on validity as one of the most essential hallmarks of well-designed assessment. Messick conceptualizes validity as an integrated evaluative judgment of the degree to which evidence and theoretical rationales support the inferences and actions made based on test scores and other methods of assessment.2 That is, does the assessment measure what you want to know? The consequences of an assessment are an important component of validity. Consequential validity represents the anticipated and unanticipated consequences of legitimate test interpretation. These include the ways that scores are used to promote and advance some students over others as well as unintended social effects, such as bias and teaching to the test. The consequential impact of Step 1 is broad and occurs both before the exam is taken and after exam scores are released. Preclerkship medical school curricular innovation has been stifled by the need to cover Step 1 content, and students’ energies are preferentially directed to Step 1 studying at the expense of important content “not on the test,” all with major detriments to student wellness. After the exam, Step 1 scores define both the specialty and residency program for which a student will be competitive.
In transitioning away from the 3-digit Step 1 score, the education community has an opportunity to leverage the construct of consequential validity to better meet the needs of various stakeholders. It is time to embrace assessment as an intervention rather than a static outcome. Historically, validity has been the priority when selecting assessment tools. However, the utility equation, used to select assessment instruments, recognizes not only the reliability, acceptability, and cost of assessment methods but also importantly the educational impact or effects of assessments.3Educational effect is what happens when the assessment motivates those who take it to prepare in a fashion that benefits their learning. That is, there is a positive impact or consequence for students. Catalytic effect occurs when the assessment provides results and feedback that motivate all stakeholders to create, enhance, and support education; it drives the design and implementation of curricula and improves overall program quality. This catalytic effect reminds us that medical education ultimately serves the purpose not of resident selection but of patient care. That is, assessment has an impact on or consequence for students’ future learning and practice while also prompting educators to shape our educational programs to ensure that patients receive the highest-quality care.
Now is the time to imagine an education system that refocuses on the educational and catalytic effects of assessment. Educators should use this window of opportunity to prioritize tools and strategies that achieve desired educational and catalytic effects. What would this entail?
First is culture change, transforming our learning environments from a performance-oriented, fixed-mindset culture to one that is growth and mastery oriented.4 Learners would be rewarded for demonstrating not only what they know but also what they do not know. The latter is necessary for learning, continuous growth, and improvement into clinicians with the adaptive expertise to face new clinical problems throughout their careers. Just as program directors desire that their interns will seek help when needed, medical school assessment systems should value students identifying and sharing their learning needs.
Second, the orientation of assessment would need to shift from static to dynamic.5 Static assessment measures the skills and knowledge that a learner has gained from prior experiences, whereas dynamic assessment focuses on an individual’s ability to acquire skills or knowledge from the assessment. Dynamic assessment enables learners to use assessment results to inform their knowledge and skill gaps. This information is then used to catalyze additional learning. Assessment shifts from being an end point to informing the next phase of learning by helping learners identify what has been mastered, what is still not well understood, and what knowledge and skill gaps remain. In doing so, assessment is transformed to support growth and lifelong learning. This ideal directly contrasts to our current testing paradigm in which a learner moves on after a test is taken rather than reengaging with the content that was not sufficiently mastered.
Third, a shift to dynamic assessment would emphasize the learner’s trajectory. Rather than simply focusing on single points in time, assessments would be used to demonstrate growth over time. After a series of assessments, teachers could identify learning trajectories that indicate readiness for acceleration to new training opportunities or those that need additional support to improve. Assessment portfolios would include multiple assessments to demonstrate not only what learners know but also how they learn and their learning trajectories.
Moving Up Miller’s Pyramid
The shift away from a 3-digit Step 1 score additionally provides us with the unique opportunity to continue to move key medical student assessments “up” Miller’s pyramid. Miller’s pyramid of assessment provides a framework for assessing clinical competence in medical education and assists clinical teachers in matching learning outcomes (clinical competence) with expectations of what a learner should be able to do at any stage.6 Step 1 scores focus on cognition and the assessment of knowledge—the base of Miller’s pyramid (what a learner knows). Although educators have used multiple assessment tools with students, the primacy of medical knowledge scores has driven grades and opportunities, relegating other assessments to seem less useful or valued. Moving away from 3-digit Step 1 scoring enables educators to prioritize assessment of key behaviors, such as communication and patient care skills, over what students know for a test. Assessment strategies such as objective structured clinical examinations and standardized patients can be used for learners to show what they can do, and can provide meaningful reports of this performance. Even more important, we can continue to improve our ability to use direct observation and work-based assessments to capture what our students do in daily patient care in authentic clinical settings (what the learner can do—the highest level of authenticity on Miller’s pyramid).
Improving Transparency Throughout the UME–GME Transition
The UME landscape must continue its journey toward competency- and outcomes-based medical education. While knowledge-based assessments are important, assessment in general must broaden to include clear assessments of learners’ knowledge application, behaviors, and skills. We should focus on defining the skills that make for a successful intern or resident; assessing those would naturally guide teachers to devote time to teaching and coaching these relevant skills. Many of these skills have been defined through the Association of American Medical Colleges’ Core Entrustable Professional Activities for Entering Residency.7 We must continue to identify ways to support the observation of learners engaging in those skills, using these observations to provide feedback to learners. Our educational systems must enable the observation of our learners in multiple clinical contexts by multiple observers who are knowledgeable and skilled in evaluation and assessment.
MSPEs should be reformatted to focus on competencies required of all physicians rather than on performance and grades in individual, discipline-specific rotations. Additionally, letters of recommendation could be tailored by anticipated specialty, thereby promoting each specialty to define what is important for its practice to meet patient care needs. One example of this is standardized letters of evaluation, developed collaboratively within certain disciplines, which use a structured approach to capture students’ cumulative knowledge and skills toward the end of medical school in their specialty of choice.8 Clinical rotations should be designed so that assessors have the time and ability to spend more time on assessments that matter, and institutions need to provide the appropriate time and support for this to be realized. In doing so, we could obtain assessments which would both support a learner’s growth and collect valid and transparent data to inform the UME–GME transition. The onus would be on medical schools to share the assessments of their learners. In turn, schools would receive information about their graduates’ 3- and 6-month milestones, providing quality assurance data to drive their own undergraduate curricular improvement. Medical schools would need to be comfortable sharing their assessment data, and residency programs would need to be willing to receive a student who has identified areas requiring improvement. Schools who share accurate performance data could not be penalized.
Refocusing on the educational effect and catalytic effect of assessment enables us to achieve the shared goal of all stakeholders in the UME–GME transition—the pursuit of mastery to ensure that our patients receive the highest-quality care possible. Learners must be encouraged to be transparent about what they do not know rather than having to always demonstrate what they do know. We can then be teachers, and our learners can focus on learning, with the result that we train a cadre of physicians who are able to grow and improve to achieve mastery with teachers at their side. The scope of assessment would not look at assessment in the silos of medical school, residency, and fellowship, but rather across the continuum of the educational journey, including lifelong learning in practice. Changing Step 1 score reporting should create urgency to catalyze needed change for assessment in medical education. Step 1 pass/fail scoring will not, in and of itself, solve the inherent problems that we are trying to solve without additional significant change to our assessment systems. The change in score reporting provides an opportunity to reinvigorate the educational and catalytic effects of assessment by transitioning to more outcomes-based UME that could enhance the UME–GME transition and draw focus to essential patient care skills.
1. United States Medical Licensing Examination. InCUS—Invitational Conference on USMLE Scoring. Change to pass/fail score reporting for Step 1. https://www.usmle.org/inCus
. Accessed May 13, 2020.
2. Messick S. Meaning and values in test validation: The science and ethics of assessment. Educ Res. 1989;18:5–11.
3. Norcini J, Anderson MB, Bollela V, et al. 2018 Consensus framework for good assessment. Med Teach. 2018;40:1102–1109.
4. Dweck CS. Mindset: The New Psychology of Success. 2006.New York, NY: Random House Publishing Group;
5. Brown PC, Roediger HL, McDaniel MA. Make it Stick: The Science of Successful Learning. 2014.Cambridge, MA: Harvard University Press;
6. Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990;65(9 suppl):S63–S67.
7. Englander R, Flynn T, Call S, et al. Toward defining the foundation of the MD Degree: Core Entrustable Professional Activities for Entering Residency. Acad Med. 2016;91:1352–1358.
8. Jackson JS, Bond M, Love JN, Hegarty C. Emergency Medicine Standardized Letter of Evaluation (SLOE): Findings from the new electronic SLOE format. J Grad Med Educ. 2019;11:182–186.