To the Editor:
Much published work on use of United States Medical Licensing Examination (USMLE) Step 1 scores for residency selection has been evidence for content and criterion-related validity,1,2 but construct validity evidence is crucially missing. Construct validity is the degree to which a test successfully measures its intended target and includes indices of reliability, difficulty, and discrimination. This type of evidence is necessary to evaluate secondary uses of numeric Step 1 scores, but remains virtually absent from public discourse. Even after Step 1 results are changed to pass/fail outcomes in 2022, understanding the role of construct evidence will be crucial for the use of other licensing scores that remain numeric, such as Step 2 Clinical Knowledge (CK).
Step 1 was designed to be effective at discriminating between examinees at the lower end of the scale (e.g., a cut score of 194 is roughly the fifth percentile). Construct evidence for this purpose is limited, but does exist. For example, the test’s reliability is highest around the cut score.3 However, even less construct-related evidence is available regarding the test’s use in residency selections, and is limited to reliability indices for the score scale as a whole. One of only 3 available indices is the “standard error of difference”, and hints at the test’s primary purpose as a credentialing exam: Across the score scale, 2 scores must be roughly 16 or more points apart to show a statistically meaningful difference between examinee proficiencies.4 Concerningly, no evidence is publicly available regarding how this reliability changes in the score range of 209–249, where two-thirds of students currently score.4 Despite industry standards for transparency, they remain undisclosed. The test may perform well in this area of the score scale, but until construct-related evidence is released for the purpose of residency selection, residency directors and other score users will not know to what extent or how variably.
For Step 1 scores reported within the next 2 years, the USMLE program should release difficulty, discrimination, and reliability indices related to the test’s use in residency selection. Additionally, Step 2 CK has identical published reliability indices to Step 1,4 and related validity issues may quickly become relevant after 2022. For the secondary use of any licensing exam, residency directors should have access to appropriate evidence for the score range in which they grant interviews, and guidelines for comparison of scores at each place on the score scale.
Andrea Malek Ash, MS
Doctoral student, University of Iowa, Iowa City, Iowa; [email protected]; ORCID: https://orcid.org/0000-0003-1749-2024.
1. McGaghie WC, Cohen ER, Wayne DB. Are United States Medical Licensing Exam Step 1 and 2 scores valid measures for postgraduate medical residency selection decisions? Acad Med. 2011;86:48–52.
2. Prober CG, Kolars JC, First LR, Melnick DE. A plea to reassess the role of United States Medical Licensing Examination Step 1 scores in residency selection. Acad Med. 2016;91:12–15.
3. Federation of State Medical Boards of the United States. Report on licensing examinations. http://www.fsmb.org/siteassets/advocacy/policies/report-on-licensing-examinations.pdf
. Published 2001. Accessed May 28, 2020.
4. United States Medical Licensing Exam. USMLE score interpretation guidelines. https://usmle.org/pdfs/transcripts/USMLE_Step_Examination_Score_Interpretation_Guidelines.pdf
. Updated January 31,2020. Accessed May 28, 2020.