The United States Medical Licensing Examination (USMLE) Step 3 is the final step in the medical licensing sequence of examinations. Its purpose is to determine if a physician “possesses and can apply the medical knowledge and understanding of clinical science considered essential for the unsupervised practice of medicine, with emphasis on patient management in ambulatory care settings.”1 As this examination measures residents’ clinical knowledge necessary for independent practice of medicine, it focuses on clinical, patient-based assessments. The Step 3 examination is completed after the prerequisite Step 1 and Step 2 examinations on which examinees have demonstrated adequate mastery of basic science and clinical fundamentals necessary to provide patient care under supervision, respectively. Since the inception of the Step 3 examination, U.S. medical schools have regularly received summary information from the National Board of Medical Examiners (NBME) for their graduates who took the Step 3 examination and the number of their graduates who passed it compared to national benchmarks. Until 2002, however, individualized USMLE Step 3 scores were not provided to U.S. medical schools for their graduates. Hence, there are limited data available regarding correlates of USMLE Step 3 performance among U.S. allopathic medical school graduates. Earlier studies pertaining to the NBME Part III examination (the predecessor to the USMLE Step 3 examination) documented strong positive correlations between performances on the Part III examination and scores on Parts I and II.2,3 Recent performance-data summaries published by the NBME documented examinee-performance differences on Step 3 similar to those for the Step l and Step 2CK (clinical knowledge) examinations. Pass rates were higher for U.S. and Canadian medical-degree program graduates and for U.S. and Canadian Doctor of Osteopathy program graduates compared to graduates of non-U.S. or Canadian schools, and pass rates for first-time examination takers within each of these groups were higher compared to repeaters.1 Field of specialty for postgraduate training and length of time in postgraduate training have also been reported to impact USMLE Step 3 performance.4 In 2002, the NBME began providing individualized USMLE Step 3 scores to U.S. medical schools for their graduates who agreed to release this information. To date, however, the associations between Step 3 scores and other variables commonly used by medical schools in educational-outcomes assessments have not been fully explored. Since an important component of our educational-outcomes-assessment program is collection of data pertaining to our graduates’ progress through residency training and beyond, we undertook the present study to identify independent predictors of USMLE Step 3 performance among our cohort of U.S. medical school graduates.
All graduates from our institution in the classes of 2000 to 2003 for whom first-attempt, three-digit USMLE Step 3 scores were available were included in our analysis. We analyzed Step 3 scores in association with four measures of academic achievement during medical school, including first-attempt USMLE Step 1 and Step 2 scores, third-year clinical clerkships’ grade point average (GPA), and Alpha Omega Alpha (AOA) election. In this report, “Step 2” refers to the examination currently identified by the NBME as Step 2CK (clinical knowledge). The GPA calculation was based on the grades received in each required clinical clerkship, weighted for the duration in weeks of the clerkship. Grades did not include performance on either Step l or Step 2.
Step 3 scores also were analyzed for associations with Medical Scientist Training Program graduation with MD/PhD degrees, gender, general, broad-based residency training (as defined by the NBME to include internal medicine, family practice, pediatrics, combined medicine–pediatrics, or emergency medicine4), and residency program-director performance evaluations for the first postgraduate year (PGY-1).
The program-director performance evaluation questionnaire is a 21-item questionnaire mailed near the end of PGY-l training to evaluate our graduates’ preparedness for and performance in their residency program. Item responses use a five-point scale: inadequate (1), fair (2), good (3), excellent (4), and outstanding (5). We computed for each graduate a “program-director performance evaluation mean composite score” based on these program-director responses.5
Chi-square tests measured associations between each of the categorical variables (gender, AOA election, specialty choice, MD/PhD graduation), and one-way analyses of variance tested for between-groups differences in GPA, USMLE Steps 1–3 scores, and program-director performance evaluation mean composite scores. Pearson product-moment correlations tested for the significance of associations among each of the continuous variables. Stepwise multiple linear regression analysis was used to identify significant predictors of Step 3 score among those variables that were significantly associated with Step 3 score in bivariate tests. Tests of significance were performed using SPSS version 13.0 (SPSS, Inc., Chicago, IL, 2004). All p values are two-sided. This study was approved by the Institutional Review Board at our institution.
Two hundred ninety-five graduates in the classes of 2000 to 2003 took the USMLE Step 3 examination since 2002 and 237 of these graduates chose to release their scores (237/295, 80%). The group of 58 graduates who took the Step 3 examination but did not release their scores did not differ academically from the group of 237 graduates who did release their scores on the basis of GPA, USMLE Step l or 2 score, or program-director performance evaluation mean composite score (data not shown, each p > .05). The group of 237 graduates in the classes of 2000 to 2003 for whom individualized USMLE Step 3 scores were available (237/444, 53%) comprised our study sample.
Descriptive statistics of the sample are shown in Table 1. The mean Step 3 score for the sample was 226 (range, 161–269). Availability of Step 3 scores differed by year of graduation, with scores available for 30% of the graduates (31/103) in the class of 2000, 66% (81/123) in the class of 2001, 67% (73/109) in the class of 2002, and 48% (52/109) in the class of 2003 (p < .001). However, mean Step 3 scores did not differ by year of graduation (p = .834), therefore, all four years of graduates were analyzed together as a single group. Step 3 scores also did not differ by gender (p = .891). Higher Step 3 scores were associated with AOA election (235 versus 224, p < .001) and residency training in a general, broad-based specialty (229 versus 222, p < .001). Lower Step 3 scores were associated with MD/PhD-program graduation (219 versus 227, p = .016). As shown in Table 2, the three measures of academic performance during medical school (GPA and USMLE Step 1 and Step 2 scores) as well as program-director performance evaluation mean composite scores of our graduates were associated with each other and with Step 3 scores (each p < .001).
Variables that were significantly associated with Step 3 scores in bivariate analyses (AOA election, MD/PhD graduation, selection of a general broad-based specialty for residency training, USMLE Step 1 and 2 scores, GPA, and program-director performance evaluation mean composite scores) were included in a stepwise multiple linear regression model to identify independent predictors of Step 3 scores. In this model, which included 216 graduates with complete data (program-director questionnaires were not completed for 21 of 237 graduates), Step 2 score (Standardized Beta = .604, p < .001), selection of a general broad-based specialty for residency training (Standardized Beta = .206, p < .001), and GPA (Standardized Beta = .158, p = .003) each predicted Step 3 scores. The variables in this model accounted for 53% of the variance in Step 3 scores. Mean Step 2 scores did not differ significantly between graduates choosing or not choosing a broad-based specialty (p = .143).
Our study indicates that performance on the Step 3 examination, which is a measure of physician knowledge of clinical science necessary for unsupervised practice of medicine, is associated with medical students’ academic achievement in clinical clerkships, their knowledge of clinical science as measured on the Step 2 examination, and the nature of their residency training following graduation, with higher GPA, higher Step 2 scores, and choosing residency training in broad-based specialties being associated with higher Step 3 scores. Our finding that Step 2 but not Step 1 scores independently predicted Step 3 performance is not unexpected, and likely reflects the more similar clinically related content of the Step 2 and Step 3 examinations. Significant associations among measures of academic performance during medical school have been documented by others.6,7 However, our finding that the Step 3 examination score is independently predicted by third-year clerkships’ GPA provides external validation for our clinical-clerkships’ performance-assessment system and for the role of the graduates’ clerkship experiences in preparing students to advance through the medical-education continuum towards fully independent practice.
In the bivariate analysis, Step 3 scores were significantly associated with another independent assessment of graduate performance, the program-director performance-evaluation mean composite scores. (Program-director evaluations are not likely influenced by Step 3 scores, since the NBME does not routinely send Step 3 scores to program directors.) Previous studies have shown that program-director evaluations were associated with academic performance during medical school, but none have shown a correlation between program-director evaluations and Step 3 scores—both postgraduate assessments of performance.5–7 The relatively small magnitude of the correlation that was observed between these two measures (Table 2) suggests that the Step 3 examination and items on the program-director performance evaluation questionnaire evaluate different aspects of graduate performance. As they provide additive rather than redundant information, the availability of both program-directors’ performance evaluations and individualized Step 3 scores enhances the scope of outcomes assessments for our graduates.
Gender-based differences in Step l performance (with women scoring lower than men) but not in Step 2 performance have been documented by the NBME.8 Our finding of similar Step 3 scores for men and women, along with findings of another single-school study, suggests that these gender differences do not persist in more clinically based, standardized postgraduate assessments.9
Our finding that residency training in a broad-based specialty independently predicts Step 3 performance is consistent with a recent NBME report demonstrating that higher Step 3 examination scores were associated with postgraduate training in broad-based specialties prior to taking the Step 3 examination.4 We utilized the NBME definition of “broad-based specialties” in our analysis so that we could interpret our results in the context of this large NBME study. However, our study differs from that of the NBME report in several regards. First, we included only those graduates who had chosen to take the USMLE Step 3 examination less than five years after graduation and release their scores to our medical school. The NBME study included graduates taking the examination more than seven years after graduation. Second, we determined specialty choice from graduates’ final residency placement for categorical/advanced residency training, rather than from graduate self-reports. Third, while over 70% of the graduates included in the NBME study pursued broad-based specialty training, only 50% of our graduates did so. Nevertheless, our findings are consistent with the conclusion of the NBME that clinical experience gained in residency training in broad-based specialties is associated with higher Step 3 scores, as this examination focuses on knowledge about a wide variety of patient problems.
After graduation, most U.S. medical students intend to enter advanced/categorical residency positions for the full period of residency training required for specialty board certification eligibility, rather than complete only a single year of training in order to obtain a permanent medial license and enter practice.10 For these graduates, the detailed “profile” information about Step 3 performance can provide important feedback about their professional development and facilitate their ongoing self-assessment as they advance through additional years of graduate medical education. Future research might focus on determining the extent to which USMLE Step 3 scores are associated with other clinically related measures of performance during residency training in a spectrum of specialties as well as physicians’ performance beyond residency (e.g., patient outcomes in medical practice and physicians’ achievement of specialty-board certification) to further clarify the role of the Step 3 examination as a professional-development assessment tool.
Our sample included a self-selected group of graduates from one medical institution who chose to take the Step 3 examination less than 50 months after medical school graduation and to release their scores. Therefore, their Step 3 performance may not be representative of the performance for our entire group of medical school graduates or of graduates of other medical institutions. While identification of factors that determine graduate decisions regarding examination timing and whether or not to release their scores is of interest, it was beyond the scope of our study.
Undoubtedly, the role of Step 3 will continue to evolve as the USMLE Step 2 clinical skills examination becomes part of the licensure process. Currently, however, individualized USMLE Step 3 performance data can provide medical schools with an additional means to externally validate their medical-education programs and to enhance the scope of outcomes assessments for their graduates as they advance along the medical education continuum beyond medical school.
2 Swanson DB, Case SM, Nungester RJ. Validity of NBME Part I and Part II scores in prediction of Part III performance. Acad Med. 1991;66(10 suppl):S7–S9.
3 Markert RJ. The relationship of academic measures in medical school to performance after graduation. Acad Med. 1993;68:S31–S34.
4 Sawhill AJ, Dillon GF, Ripkey DR, et al. The impact of postgraduate training and timing on USMLE Step 3 performance. Acad Med. 2003;78(10 suppl):S10–S12.
5 Andriole D, Jeffe DB, Whelan A. What predicts surgical internship performance? Am J Surg. 2000;75(10 suppl):S28–S30.
6 Alexander GL, Davis WK, Yan AC, Fantone JC III. Following medical school graduates into practice: residency directors’ assessments after the first year of residency. Acad Med. 2000;75(10 suppl):S15–S17.
7 Paolo A, Bonaminio GA. Measuring outcomes of undergraduate medical education: Residency directors’ ratings of first year residents. Acad Med. 2003;78:90–95.
8 Case SM, Swanson DB, Ripkey DR, et al. Performance of the Class of 1994 in the new era of USMLE. Acad Med. 1996;71(10 suppl):S91–S93.
9 Veloski JJ, Callahan CA, Xu G, et al. Prediction of students’ performances on licensing examinations using age, race, sex, undergraduate GPAs and MCAT scores. Acad Med. 2000;75(10 suppl):S28–S30.
10 Signer M. Results and Data 2004 Match. Washington, DC: National Resident Matching Program, 2004.
Moderator: Lee Manchul, MD
Discussant: Eric Holmboe, MD© 2005 Association of American Medical Colleges