The National Board of Medical Examiners and the Federation of State Medical Boards developed the three-step United States Medical Licensing Examination (USMLE) to provide “individual medical licensing authorities … a common evaluation system for medical licensure.”1 The USMLE Step 1 component, a one-day multiple-choice examination, is designed to ensure “mastery of the sciences that provide a foundation for the safe and competent practice of medicine in the present, as well as the scientific principles required for maintenance of competence through lifelong learning.”1
The primary consequence of the test result is binary. Candidates who pass the series of examinations are eligible for state licensure; those who fail are not. USMLE Step 1 examination results are reported with an equated numerical score with high reliability (designed to represent equivalent meaning over time) and a pass or fail determination. The passing score is based on a comprehensive synthesis of expert judgment about the test content and competency of examinees, further conditioned by information about the predicted decision accuracy. This synthesis is reviewed periodically; the review may result in an adjustment of the minimal passing score. The USMLE model, unlike many other professional licensing examinations in the United States and around the world, is designed to harvest a national consensus about core medical education content from a large “national faculty of medicine.”2 Basic scientists, clinical faculty, and practicing doctors are selected to be broadly representative of medical education and medical licensure in the United States; these experts create examination designs, write and review test questions, approve test forms, and set the minimum passing score. Questions are designed to explore students’ ability to demonstrate higher degrees of integrative knowledge and logic rather than the recall of facts. Experts in question development and validation ensure that the test material is reliable and reproducible and is a valid assessment of the content outline it is designed to test. In addition, there is some evidence that the results of licensing examinations may predict future clinical quality and outcomes.3–7
In addition to determining qualification for licensure, USMLE test results, particularly Step 1 scores, are often used in screening applicants for residency.8 Although scores on the USMLE steps correlate with one another and low scores on USMLE Step 1 correlate with failure on subsequent in-training and certification exams in many specialties,9 Step 1 was not designed to be a primary determinant of the likelihood of success in residency. We believe that many other factors are likely to be equally or more predictable of performance during residency. Studies of the predictive value of USMLE Step 1 scores for performance in residency are few, contradictory, and inconclusive. None provides direct evidence that USMLE Step 1 scores are able to identify medical students most likely to succeed in developing the broad range of competencies expected during residency.
An Insufficient Tool for the Job
Despite its intended purpose, many residency program directors continue to use applicants’ USMLE Step 1 scores as a sole or primary filter for selecting candidates to interview,10 often disregarding the statistical characteristics of the score scale as they interpret what score difference might be meaningful. In general, the more competitive the residency discipline (e.g., orthopedic surgery, radiation oncology, dermatology, ophthalmology, and otolaryngology), the higher the USMLE Step 1 score needed to pass through the filter.
Like the use of an off-label drug for purposes that have not been well studied, it is ill advised to use USMLE examination scores for a purpose for which the test was not developed and has not been directly validated. The USMLE pass score is intended to segregate those with adequate knowledge from those with inadequate knowledge, not to infer substantial differences in knowledge between test takers. Yet, scores from USMLE Step 1 have been adopted by many residency training programs as a major factor in offering interviews, often because program directors find it difficult to compare students from different medical schools because of variations in curricula and assessments. Program directors often represent that the flood of initial applicants for competitive residency positions requires some easy-to-apply mechanism to reduce the large applicant pool to a reasonably sized group for more careful scrutiny. The standardized USMLE Step 1 score is perceived to meet this need. On one hand, we pride ourselves in teaching medical students to use diagnostic tests for their designed purpose, to be critical thinkers, and to use evidence-based support to guide their decisions. On the other hand, programs may make career-changing decisions about medical school graduates based on overweighting a screening test in a manner not supported by strong evidence and for which the test was not specifically designed.
Although many argue that these examinations’ results should be reported as pass or fail in order to reduce overreliance on scores, this approach may compromise those who use scores to help individuals or their programs improve. For example, there is evidence to suggest some correlation between USMLE and in-training exam performance that could help inform residents and their advisors on knowledge acquisition within the program.11 In light of the current dependence on quantitative data among program directors for making decisions about applicants, a pass–fail USMLE may also redirect them to use other scores and create new knowledge-based exams with less utility than the USMLE.
Beyond concerns about inappropriately using the absolute USMLE score as a sole screen for residency applicants, there are additional unintended consequences to placing so much emphasis on applicants’ USMLE scores. We regularly learn of students who have decided to abandon their plans to apply to certain specialty areas because they believe that their application will be not be considered in the initial screening process because of a USMLE score around the median. This often is despite other achievements that might suit the student for that specialty. Other students who have high scores are encouraged to pursue the more competitive specialties because they might otherwise “waste their intelligence” in the pursuit of a less demanding discipline!
Because students recognize the high stakes of USMLE, they prioritize learning what they believe to be important for the test during their preclerkship courses. They are emotionally stressed about perceived disconnects between what they need to learn for the test and what they need to know to care for their patients and prepare for lifelong learning. In an unpublished 2012 survey conducted for a reaccreditation visit, Stanford medical students expressed concern that curricular content did not match what they were expected to know to perform well on the USMLE test; the survey revealed that this mismatch caused anxiety in more than 70% of students. This concern and the challenges of work/life balance were tied as the most common sources of stress. It is widely recognized that stress experienced by medical students across the country contributes to the unacceptably high rates of reported burnout and depression (each reported by ~50% of students) and suicidal ideation (reported by ~10%).12–14 We believe that undue emphasis on USMLE Step 1, driven by its common role in screening for residency selection, contributes unnecessarily to this stress.
While the USMLE program has evolved in parallel with changes in medical education15 (e.g., by emphasizing application of knowledge in clinical context rather than isolated recall, by expanding focus on competencies other than medical knowledge, and by emphasizing choosing the best course of action in ambiguous situations), overemphasis on the exams, particularly on Step 1, risks distorting students’ perception of what is important. Equally, although the U.S. medical education system has been marked by innovation, it is possible that undue emphasis on USMLE Step 1 scores could distort faculty perceptions of the relative importance of the medical knowledge competency over the other five important general competencies, potentially creating an adverse impact on curriculum change. For example, students may consider a discussion of unsolved problems in medicine and how to approach what we do not know as a waste of valuable time, time that could be spent learning about what is known and what likely will be tested. Curricular reform, a critical part of quality improvement in medical education, must move away from a focus on the acquisition of testable facts. Although USMLE has evolved from testing knowledge to applying knowledge in solving clinically relevant problems, misperceptions about this evolution of USMLE could inhibit desirable curricular change.
Finally, there are educational and fiscal consequences attendant to what some have referred to as “Step 1 madness.” We do not dispute the idea that study is valuable. And many students, at least in retrospect, value the synthesis of learning that accompanies their focused preparation for the examinations. However, there are no specific test preparation routines or materials that have been shown to be superior. Undoubtedly, the most effective means of preparation is dependent on the individual learning style of each test taker.
Medical school deans of education tell us that students at most U.S. medical schools sequester themselves for four to nine (average six) weeks to study full-time for USMLE Step 1. If each of the approximately 20,000 U.S. medical students who take the examination each year devotes six weeks to studying for it, the total amount of time dedicated to studying is more than 2,000 person-years. That is quite the opportunity cost! Apparently, many students do not study from their school-specific syllabi or standard medical textbooks; rather, they use material purchased from third-party for-profit companies. The three most commonly used resources as reported by our students are First Aid USMLE Step 1 (the top-selling medical test preparation text on Amazon), Step 1 qBank from USMLE World, and Pathoma. Based on current pricing of these resources on Amazon, if every student in the United States were to purchase all three resources, we estimate that the total cost would exceed $7.5 million.
A More Rational Approach to Selecting Residency Applicants
What can be done to mitigate the unintended consequences of overreliance on USMLE Step 1 scores in screening residency applicants? We do not believe that the examinations need to be abandoned. Nonetheless, we do not believe that USMLE Step 1 scores should continue to be the major determining factor in the selection of graduating medical students for interview for graduate medical education positions. These scores do not measure many clinical aptitudes and skills, qualities of professionalism, or competencies specific to the planned training program. Although using numbers as a filter is a convenient way to screen large numbers of applications, USMLE Step 1 scores do not come close to reflecting the totality of attributes critically relevant to a candidate’s potential performance during residency training.
A more rational approach to selecting among residency applicants would give greater attention to other important qualities, such as clinical reasoning, patient care, professionalism, and ability to function as a member of a health care team. These attributes are typically considered by program directors in making ranking decisions when selecting residents. However, for measures of these skills to be valuable in comparing large numbers of initial applicants from many diverse medical education backgrounds for more extensive evaluation, we need more standardized modes of assessment and reporting that are readily sortable. Other components of a holistic review of candidates should be nationally normed as well; these might include research experience and accomplishments, community engagement, leadership roles, unique personal attributes, and diversity. Measures derived from performance during clinical rotations and recommendations from clinical faculty, residents, other health professionals, patients, and peers must be more generalizable and interpretable across institutions, and they need to be represented digitally in a form that can be used in a prediction equation for screening large numbers of applicants.
We further recommend that substantial weight in evaluating the relative merit of candidates for specialty training should be given to factors shown empirically to predict performance in the relevant specialty. This performance might include evaluation during the core clerkship, performance during specialty-specific subinternships and electives, and any additional specialty-specific activities (e.g., research) that the applicant may have conducted. If supported by research findings and shown to be generalizable, assessment of these types of activities would represent a measure of competencies much more relevant than the USMLE Step 1 scores alone.
We believe that the benefits of this proposed approach would be substantial. Residency programs would be able to select candidates for interview and in-depth evaluation on the basis of characteristics that specifically predict success. Students could prioritize the development of these skills because they would take a more reasoned approach to preparing for exams that reflect only a sample of the breadth and depth of relevant competencies. Schools might experience less pressure to design their curricula to mirror what will be tested and to set aside many weeks of potential education time for students to study from third-party resources. Without the pressure of test preparation, educational innovation could continue to flourish, schools could focus on their unique faculties and strengths, and students could better balance in-depth pursuit of their own academic interests with review of already-encountered material likely to be on the test.
These reforms could have far-reaching effects on medical education. We urge prioritization of the actions necessary to implement them: more rigorous study of the characteristics of students that predict success in residency, better assessment tools for competencies beyond those assessed by USMLE Step 1 that are relevant to success, and nationally comparable measures from those assessments that are easy to interpret and apply.
2. Dillon GF, Clauser BE, Melnick DE. The role of USMLE scores in selecting residents. Acad Med. 2011;86:793
3. Tamblyn R, Abrahamowicz M, Brailovsky C, et al. Licensing examination scores and primary care practice. JAMA. 1998;280:989–996
4. Tamblyn R, Abrahamowicz M, Dauphinee WD, et al. Association between licensure examination scores and practice in primary care. JAMA. 2002;288:3019–3026
5. Hess BJ, Weng W, Lynn LA, Holmboe ES, Lipner RS. Setting a fair performance standard for physicians’ quality of patient care. J Gen Intern Med. 2011;26:467–473
6. Holmboe ES, Wang Y, Meehan TP, et al. Association between maintenance of certification examination scores and quality of care for Medicare beneficiaries. Arch Intern Med. 2008;168:1396–1403
7. Norcini JJ, Blank LL, Arnold LR, Levine MA. The relationship between licensing examination performance and the outcomes of care by international medical graduates. Acad Med. 2014;89:1157–1162
8. Green M, Jones P, Thomas JX Jr. Selection criteria for residency: Results of a national program directors survey. Acad Med. 2009;84:362–367
9. Hamdy H, Prasad K, Anderson MB, et al. BEME systematic review: Predictive values of measurements obtained in medical schools and future performance in medical practice. Med Teach. 2006;28:103–116
11. Spurlock DR Jr, Holden C, Hartranft T. Using United States Medical Licensing Examination(®) (USMLE) examination results to predict later in-training examination performance among general surgery residents. J Surg Educ. 2010;67:452–456
12. Roberts LW. Understanding depression and distress among medical students. JAMA. 2010;304:1231–1233
13. Schwenk TL, Davis L, Wimsatt LA. Depression, stigma, and suicidal ideation in medical students. JAMA. 2010;304:1181–1190
14. Dyrbye LN, Massie FS Jr, Eacker A, et al. Relationship between burnout and professional conduct and attitudes among US medical students. JAMA. 2010;304:1173–1180
15. Swanson DB, Case SM, Kelley PR, et al. Phase-in of the NBME comprehensive Part I examination. Acad Med. 1991;66:443–444