The abrupt suspension of the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills (CS) exam in March 2020 due to the COVID-19 pandemic was an important safety measure for medical students, simulated patients, and staff at testing centers. 1 The January 2021 announcement of the decision to fully discontinue this licensing requirement 2 represents a notable change in the way medical students will be assessed at the national level. This change necessitates a thoughtful dialog within the medical education community to ensure that the focus on, and quality of, teaching and assessing clinical skills does not diminish.
Step 2 CS was introduced in 2004 as a licensure requirement for students graduating from U.S. medical schools amid concerns that graduating students lacked key clinical skills. The exam was based on direct interactions with standardized (simulated) patients (SPs); it assessed students’ ability to gather relevant patient data through a medical interview and physical exam and to organize that information into a patient note, including evidence of clinical reasoning (i.e., differential diagnosis and plan). The exam scoring rubric separated the data gathering and interpretation content of the encounter from the communication and interpersonal skills students displayed while interacting with the SP. 3 Step 2 CS was closely studied, and modifications to the encounter challenges and scoring processes evolved over time. 4
The introduction of this new licensure requirement prompted medical schools to refocus attention on the instruction and assessment of students’ clinical skills and spurred the development and refinement of clinical testing at the medical school level. 5,6 Ecker and colleagues 7 reviewed the beneficial consequences of Step 2 CS in detail and offered suggestions for its enhancement. Although concerns were raised about the exam, especially regarding the cost, stress, and inconvenience to students, the assessment of students’ clinical skills through direct observation of their encounters with an SP contributed a critical component of licensure that was not otherwise attainable through written multiple-choice exams. 7
How should medical schools respond to the elimination of Step 2 CS? As we ponder next steps, it is important to consider the potential consequences of the exam’s cancelation, both negative and positive, as well as the resulting responsibilities that now devolve to our schools.
What Has Been Lost?
The elimination of Step 2 CS results in both losses and opportunities. The primary loss is of a rigorous national-level standardized screening process to determine minimal competency in graduating medical students’ key clinical skills, including data gathering, data interpretation, and communication. Local, medical school-based assessments are highly vulnerable to validity threats. 8 Construct underrepresentation can result from common practices such as using inappropriate assessment methods (e.g., written tests to assess physical exam skills) or insufficient and unsystematic sampling (e.g., drawing inferences from too few SP encounters). Construct irrelevant variance can be due to challenges such as poorly written cases and checklists, inadequate standardization of SP case portrayal, insufficient rater training, and rater biases and errors. In addition, local pass/fail standards are variable and may be adjusted post hoc. Medical schools and faculty are understandably reluctant to downcheck or fail their own students, leading (among other reasons) to persistent failure to fail. 9 On the other hand, the Step 2 CS failure rate in 2018–2019 was 5% for first-time students from U.S. MD-granting medical schools, 15% for students from DO-granting schools, and 23% for international medical graduates. 10 The inability to screen out those students who lack basic clinical skills comprises a threat to patient safety during residency and beyond.
Also threatened by the elimination of Step 2 CS are the positive byproducts of a national high-stakes assessment. Step 2 CS motivated curriculum deans, faculty, and students to devote time, effort, and resources to clinical skills development, as evidenced by the growth in doctoring courses in the past 20 years. 7 Assessment drives the curriculum; will medical schools now infer that it is no longer important to rigorously teach and assess history-taking, physical exam, and communication skills? Will students conclude that it is no longer necessary to devote time and effort to learning these skills? Medical schools have committed substantial resources to human simulation (SP) programs to support doctoring courses and provide a practice Step 2 CS exam for students. Once available, SP programs frequently provide additional curricular programming, including activities that promote the acquisition of advanced skills such as difficult conversations, clinical reasoning, high-value care, and telehealth. Absent the need to prepare for a national high-stakes clinical skills exam, will budget support for these programs and their attendant curricular benefits continue?
What Might Be Gained?
The cancelation of Step 2 CS clearly reduces the financial burden on medical students and the stress of preparing for, traveling to, and participating in the exam. 11 There may be curricular benefits to the exam’s elimination as well. Many medical schools patterned their graduation competency clinical skills exams on Step 2 CS to familiarize their students with the USMLE format. Eliminating the need to practice for a high-stakes licensing exam affords the opportunity for local clinical skills assessments to grow beyond the narrow scope and format of Step 2 CS, which focused only on a narrow subset of the clinical skills needed by interns: the ability to communicate with a patient and to gather and interpret data toward making an initial diagnosis of a new patient. SP encounters can be leveraged to assess other types of patient interactions and entrustable professional activities such as obtaining informed consent, transitioning care to another provider, and recognizing a patient requiring urgent care. Medical schools can experiment with different types of postencounter notes and postencounter challenges such as allowing students to access resources used during clinical practice. Other formats can be explored, including group, longitudinal, and interprofessional encounters. Any of these options would serve to broaden the exam and enhance its content validity. Step 2 CS provided only a summative pass/fail decision; local assessments can provide much more detailed and informative feedback to guide future learning.
Meeting the Need
With the loss of a national-level exam, the responsibility for high-quality summative assessment of clinical skills must revert to medical schools. Patients, their families, and the licensing bodies sworn to protect them must be assured that graduating students can function safely and effectively to provide clinical care under appropriate supervision. Residency programs need to know that interns are prepared for their responsibilities and which interns need additional coaching and supervision (previously signaled by a history of having failed Step 2 CS, especially if repeatedly). 12 Students want to know that they are well prepared to assume a higher level of independent practice. Finally, medical school faculty, administrators, and accrediting bodies need to confirm that their curricula are indeed producing safe and capable physicians.
In meeting these needs, it is critical to identify assessment objectives that can be satisfied only through an encounter with a patient and those that can be accomplished through other means. For example, when assessing data gathering and interpretation in the context of a new patient, some clinical reasoning tasks (e.g., knowing what history and physical exam information to obtain, developing a differential diagnosis based on the patient, and identifying the appropriate diagnostic tests to be done immediately) might be assessed through well-constructed written tests, Key Feature exams, 13 or virtual patient programs. Patient encounters are uniquely suited to elicit and assess behaviors such as (1) whether the student is able to perform a physical exam and identify physical exam findings; (2) whether the student is sufficiently proficient to appreciate and interpret salient history and physical exam findings during the course of the encounter (i.e., go beyond rote questioning to obtain the specific data needed to support a working differential diagnosis); (3) whether the student has the marginal capacity to relate to the patient effectively and empathically while obtaining the history and physical exam, rather than focusing their attention on thinking of what question to ask next; and (4) whether the student is able to sensitively modify their communication and interpersonal behavior to accommodate and respond to diverse patient communication styles, personalities, emotions, and needs.
The first of these goals can be accomplished only through a live, face-to face, hands-on patient encounter. The second could potentially be accomplished through a sophisticated virtual patient program, although these require much study before being deployed for high-stakes decisions. The third and fourth require a live patient encounter but likely could be accomplished remotely. Focusing on the unique affordances of live SP encounters and designing assessment tools that target these affordances will allow this resource-intense assessment modality to be leveraged most efficiently and effectively.
The loss of Step 2 CS generates opportunities as well as challenges for medical schools and faculty. There now exists a unique impetus to broaden the scope of clinical skills exams and expand the toolbox of available assessment methods. It is imperative that we do so in a way that promotes the validity of high-stakes decisions and confidence in their results. This demands, and is an opportunity for, a concerted program of educational research. Table 1 identifies some next steps and related research agendas.
The USMLE has committed to “partner with the medical education and medical board community to better develop innovative ways to assess the breadth of clinical skills in medicine.” 2 The urgent need for medical schools to assess students’ clinical skills rigorously and affordably should spur local and national efforts to study and improve the validity and utility of a variety of assessment modalities; these could be used in conjunction with face-to-face human simulation to enrich understanding of students’ development. Workplace-based assessments 14 could be leveraged to provide information about students’ physical exam skills, communication skills, and professionalism, and screen-based simulations such as virtual patients could provide valuable insights about students’ clinical reasoning skills. The use of telesimulation (i.e., SP encounters conducted remotely 15) and simulated telehealth encounters has exploded during the COVID-19 pandemic; these could provide an additional source of ratings and feedback for students from the crucial perspective of the patient and enable more frequent and rigorous assessment of students who are located far from the main campus of the medical school.
These modalities are currently used primarily for low-stakes formative feedback and skills development. Effective use of these and other promising methods for the purpose of high-stakes assessment demands a rigorous analysis of their strengths and limitations, similar to the intensive study of human simulation conducted when SPs were first introduced. Because no single assessment method is perfect, a robust understanding of how to qualitatively and quantitatively aggregate longitudinal and episodic information from different modalities toward a final high-stakes summative decision is vital. Here too, it is essential to differentiate between aggregating data for the purpose of low- to medium-stakes formative assessments (e.g., semiannual evaluations) and the evidentiary requirements of a final, high-stakes summative decision that a student is indeed ready to function safely as an intern.
Far from becoming redundant within such an assessment system, the expertise offered by mature SP programs becomes more valuable than ever. Rigorous, repeated, and systematic assessment through direct observation of students’ clinical skills in a controlled environment comprises a highly reliable source of information about what students can (and cannot) do in a clinical setting. However, conducting local assessments of clinical skills with the rigor to support high-stakes decisions is a challenging task. Educators need robust skills in writing SP cases, blueprinting exams, standardizing SP portrayals, training SP and/or faculty raters, maintaining quality assurance standards, monitoring and minimizing threats to exam validity, and providing actionable reports to students and faculty. Standards of Best Practice for human simulation, such as those promulgated by the Association of Standardized Patient Educators, 16 are key. The development of regional consortia, such as the Mid-Atlantic Consortium and the California Consortium for the Assessment of Clinical Competence, may assist in leveraging expertise and resources across medical schools. The use of telesimulation can enable trained SPs and SP training expertise to be shared across schools while increasing the diversity of available SPs. The use of remote observers and raters, based in consortium partner schools, may facilitate the development of rater expertise and help mitigate local pressures contributing to the “failure to fail.”
The demise of Step 2 CS carries both great risks and exciting opportunities for medical education in the United States. A rigorous summative assessment of clinical skills is critical for patient safety as medical students transition to residency. A strong mandate from the Liaison Committee on Medical Education and licensing bodies would affirm the value of clinical skills curricula and confirm that the responsibility for a credible summative assessment of clinical skills now lies squarely with medical schools. Robust human simulation (SP) programs, including regional and virtual consortia, can provide infrastructure and expertise for innovative and creative local assessments to meet this need. Novel applications of human simulation and traditional formative assessment methods can contribute to defensible summative decisions. The need to establish validity evidence for decisions based on these novel assessment methods creates a powerful opportunity for medical education research and scholarship, and we look forward to seeing the results of such studies presented in these pages.
1. Murphy B. Step 2 CS suspended, temporary assessment measures being weighed. American Medical Association. https://www.ama-assn.org/residents-students/usmle/step-2-cs-suspended-temporary-assessment-measures-being-weighed
. Posted June 4, 2020. Accessed June 7, 2021.
2. United States Medical Licensing Examination. Work to relaunch USMLE Step 2 CS discontinued. https://www.usmle.org/announcements/?ContentId=309
. Posted January 26, 2021. Accessed June 2, 2021.
3. Hawkins RE, Swanson D, Dillon G, et al. The introduction of clinical skills assessment into the United States Medical Licensing Examination (USMLE): A description of USMLE Step 2 Clinical Skills (CS)J Medical Licens Discip. 2005;91:22–25.
4. Haist SA, Butler AP, Paniagua MA. Testing and evaluation: The present and future of the assessment of medical professionals. Adv Physiol Educ. 2017;41:149–153.
5. Hauer KE, Teherani A, Kerr KM, O’Sullivan PS, Irby DM. Impact of the United States Medical Licensing Examination Step 2 Clinical Skills exam on medical school clinical skills assessment. Acad Med. 2006;81(suppl 10):S13–S16.
6. Gilliland WR, La Rochelle J, Hawkins R, et al. Changes in clinical skills education resulting from the introduction of the USMLE Step 2 Clinical Skills (CS) examination. Med Teach. 2008;30:325–327.
7. Ecker DJ, Milan FB, Cassese T, et al. Step up—not on—the Step 2 Clinical Skills exam: Directors of Clinical Skills Courses (DOCS) oppose ending Step 2 CS. Acad Med. 2018;93:693–698.
8. Downing SM, Haladyna TM. Validity threats: Overcoming interference with proposed interpretations of assessment data. Med Educ. 2004;38:327–333.
9. Yepes-Rios M, Dudek N, Duboyce R, Curtis J, Allard RJ, Varpio L. The failure to fail underperforming trainees in health professions education: A BEME systematic review: BEME guide no. 42. Med Teach. 2016;38:1092–1099.
10. United States Medical Licensing Examination. 2020 Performance Data. https://www.usmle.org/performance-data/default.aspx#2020_step-2-cs
. Accessed June 2, 2021.
11. Rajesh A, Desai TJ, Patnaik R, Asaad M. Termination of the USMLE Step 2 CS: Perspectives of surgical residents with diverse medical backgrounds. J Surg Res. 2021;265:60–63.
12. Paniagua M, Salt J, Swygert K, Barone MA. Perceived utility of the USMLE Step 2 Clinical Skills examination from a GME perspectiveJ Med Regul. 2018;104:51–57.
13. Bordage G, Page G. The key-features approach to assess clinical decisions: Validity evidence to date. Adv Health Sci Educ Theory Pract. 2018;23:1005–1036.
14. Kogan JR, Hatala R, Hauer KE, Holmboe E. Guidelines: The do’s, don’ts and don’t knows of direct observation of clinical skills in medical education. Perspect Med Educ. 2017;6:286–305.
15. Hess BJ, Kvern B. Using Kane’s framework to build a validity argument supporting (or not) virtual OSCEs [published online ahead of print April 9, 2021]. Med Teach. doi:10.1080/0142159X.2021.1910641.
16. Lewis KL, Bohnert CA, Gammon WL, et al. The Association of Standardized Patient Educators (ASPE) Standards of Best Practice (SOBP). Adv Simul (Lond). 2017;2:10.