U.S. medical schools granting the MD degree use scores on Steps 1 and 2 of the United States Medical Licensing Examination (USMLE) as objective assessments of medical student performance.Indeed, passing scores on all three Steps of the USMLE is a requirement for medical licensure for these graduates.1 While passing these examinations is not the primary focus of basic science and clinical education in medical school, it is an important curricular outcome.
Clinical clerkships incorporate various measures to assess student performance, including objective examinations. Many clerkships use the National Board of Medical Examiners (NBME) Clinical Subject (“shelf”) examinations as one component of the final clerkship examination. Although one may assume that the scores on subject examinations in core clerkships correlate with performance on the USMLE Step 2 Clinical Knowledge (CK) examination, the data supporting this assumption are limited.2–5 The literature addresses several other variables potentially associated with Step 1 and/or Step 2 CK scores such as clinical performance, Medical College Admission Test (MCAT) scores, medical school grade point average (GPA), and timing or length of clerkships. Several studies have also investigated correlations between subject exam scores and subtest scores of the USMLE examination. The only studies addressing the relationship between subject exam performance and overall USMLE scores are in family medicine and obstetrics and gynecology; these studies analyzed the correlations of the USMLE exam with their respective clerkship subject exam.2–4,6–8 Only one study considered the relationship between student performance on subject examinations across multiple clerkships and performance on Step 2 CK examinations; that study addressed the effect of clerkship timing and length.5 Correlating subject exam scores with performance on the USMLE examinations could be important to undergraduate medical educators interested in predicting outcomes on national standardized examinations, and to graduate medical educators for selecting residents.
The purpose of this study was to determine whether the NBME Clinical Subject Examination performances of medical students from our medical school’s six core clerkships correlated with their scores on the USMLE Steps 1 and 2 CK Examinations. We also sought to correlate these students’ GPAs for basic science courses (likely more relevant to Step 1) and clerkships (likely more relevant to Step 2 CK) with subject exam and USMLE performance. We used the theory of context specificity as a conceptual framework for our study. Context specificity9 argues that performance is uniquely tied to context and therefore performance on one subject examination (one context) would not be expected to be strongly correlated with performance on other subject examinations (other contexts) for a given student.
With this framework in mind, we posed several hypotheses:
- Performance on an individual clerkship subject exam will have a small-to-moderate correlation with performances on USMLE Step 1 and Step 2 CK examinations, particularly when considering all exams given over a clerkship year.
- A combination of primary care subject exam variables (Family Medicine, Internal Medicine, and Pediatrics Subject Examinations) will account for a moderate-to-large amount of variance in Step 2 CK scores, both because primary care topics permeate all required clerkships and also because the Step 2 CK examination features a large number of topics covered in primary care clerkships.
- Second-year cumulative GPA will correlate to a greater degree with the Step 1 score than the Step 2 CK score.
- The more clinically-oriented third-year GPA would have a larger correlation with the Step 2 CK score than with the Step 1 score.
Study context and participants
As the United States’ only federal medical school, the Uniformed Services University (USU) of the Health Sciences F. Edward Hébert School of Medicine matriculates approximately 170 medical students annually. Upon matriculation, students are commissioned as military officers and, after graduation, most proceed to military-affiliated residency programs. At the time of this study (2011), USU offered a traditional four-year curriculum, with two years of basic science-focused courses followed by two years of clinically-oriented education, including a year of core clerkship rotations during the third year.
Our study sample consisted of the 507 members of the 2008, 2009, and 2010 graduating classes. The core clerkship year consisted of six rotations: Family Medicine (6 weeks), Internal Medicine (12 weeks), Surgery (12 weeks), Psychiatry (6 weeks), Pediatrics (6 weeks), and Obstetrics and Gynecology (6 weeks).
All students at USU are required to pass USMLE Steps 1 and 2 CK examinations to graduate. Step 1 examinations are given at the end of the second year, and students typically complete Step 2 CK in the fall of the fourth year. Students who fail either Step are permitted to re-take the exam if needed; study time and remediation resources are provided for students needing assistance. An NBME Clinical Subject Examination is given at the end of every core clerkship in the third year. The subject exam score is incorporated into the final grade for students in all clerkships, although the weight assigned to the subject exam varies by clerkship. A passing score on the subject exams is required for passing an individual clerkship. A student with a failing grade is permitted to re-take the exam to achieve a passing clerkship grade. Failure on the exam re-take results in the completion of a remediation rotation specified according to clerkship policies.
Study variables and statistical analysis
We obtained students’ clerkship grades, GPAs, and USMLE Steps 1 and 2 CK exam final attempt scores as part of the Long-Term Career Outcome Study (LTCOS), which is a comprehensive program evaluation investigation of performance measures of USU medical students as well as the academic and clinical careers of graduates from USU. Two separate GPAs were included; the cumulative GPA for the first two years corresponding to primarily basic science courses, and the third-year GPA, related to core clerkship rotations.
The primary outcome variables were USMLE Step 1 and 2 CK exam scores. The explanatory variables were the NBME Subject Exam scores for each of the clerkships and the students’ mean GPAs in their second and third years. We first calculated descriptive statistics for the study variables and the Pearson correlations between USMLE scores and the explanatory variables listed above. We then conducted stepwise linear regression to further investigate the associations between the explanatory variables and the two primary outcome measures. Using context specificity as our conceptual framework, and based on the hypothesized substantial contribution of the primary care disciplines (family medicine, internal medicine, and pediatrics) to the Step 2 CK exam variance, we first entered the NBME Subject Exam scores of the primary care clerkships in the model, followed by the Subject Exam scores for the three remaining clerkships. Next, the cumulative second-year GPA was entered in the regression model of the Step 1 score, with the clerkship year GPA substituted for the cumulative second-year GPA in the regression model of the Step 2 CK score. Significance was set at P <.05. All statistical analyses were performed with SPSS Version 19.0 (IBM SPSS Statistics, IBM, Armonk NY; 2011). This study was approved by the USU Institutional Review Board.
Of the 507 students across the three classes considered, we had complete data for a total of 484 students (95.5%). The demographic data for the 507 students are shown in Table 1. The overall descriptive statistics for the subject exams and the USMLE exams for the 484 included students are presented in Table 2. The mean cumulative second-year GPA for all students was 3.05 (SD = 0.47) and the mean third-year GPA was 3.18 (SD = 0.43).
Table 3 presents the bivariate correlations of the explanatory variables and USMLE scores. Correlation analysis between the composite (average) subject exam scores across all 6 clerkships and the USMLE Step 1 and 2 CK exams was also performed; the correlations with the USMLE Step 1 and Step 2 CK exams were quite strong (0.69 [P <.001] and 0.77 [P <.001], respectively). As shown, USMLE Step 1 scores had moderate-to-high positive correlations with all subject exam scores and with the cumulative GPA at the end of the second year, ranging from 0.46 (95% CI: 0.39, 0.53, P <.01) to 0.74 (95% CI: 0.70, 0.78, P <.01). USMLE Step 2 CK scores were also positively correlated with all explanatory variables, with correlations ranging from 0.51 (95% CI: 0.44, 0.57, P <.01) to 0.68 (95% CI: 0.63, 0.73, P <.01). In addition, NBME Clinical Subject Exam scores for the different clerkships were positively correlated with each other and with the cumulative second-year GPA and with the third-year GPA.
The results of the stepwise linear regression modeling are provided in Chart 1. When entered into the regression model first, NBME Subject Exam scores in the primary care clerkships explained 44% of the variance in USMLE Step 1 scores. The subject scores for surgery, psychiatry, and obstetrics and gynecology added an additional 5% of variance. The second-year cumulative GPA was significantly associated with the Step 1 score after controlling for all NBME Subject exam scores, adding 13% additional variance to the model. The adjusted R2 of the final model was 0.62, indicating that the explanatory variables together were associated with 62% of the variance of the Step 1 score.
For the USMLE Step 2 CK score, 55% of the variance was explained by the subject scores in the primary care clerkships. The scores for the other three clerkships, when entered together as a block in the second step of the regression model, accounted for an additional 6% of variance. The cumulative second-year GPA explained only 3% additional variance beyond NBME scores, although there was still a significant association with the Step 2 CK score. For the regression model of the Step 2 CK score, the third-year GPA accounted for only 1% additional variance beyond NBME scores. The adjusted R2 of the final model was 0.61, indicating the explanatory variables together were associated with 61% of the variance of the Step 2 CK score.
Discussion and Conclusions
Our results suggest that for our study sample, NBME Clinical Subject Examination scores in core clerkships were moderately-to-highly correlated with scores on the USMLE Step 1 and 2 CK exams. Our results also indicate that NBME Clinical Subject Exam scores in multiple clerkships correlated with one another. Considering that passing the USMLE is a necessary component for physicians to obtain licensure, and that a majority of medical schools across the country utilize the subject exams, we believe there is value in demonstrating a relationship between the USMLE exams and the subject exams. Indeed, our data would suggest the subject exams could be considered as “surrogate” exams for USMLE Step 2 CK exams. The relatively high degree of correlation between the subject exams and the USMLE exams provides a degree of validation for using the subject exams to measure students’ medical knowledge. Furthermore, the relationship between multiple subject exam scores with measures of knowledge such as the mean GPA or scores on the USMLE Step examinations is valuable, since performance on a single subject examination may not be representative of a student’s overall knowledge base. This suggests that performance on subject exams across the breadth of clerkships would likely be more indicative of overall knowledge.
The limited extant published data regarding correlations between subject exams and USMLE exams are summarized in Table 4. As shown, correlations between subject exams and USMLE scores range from weak (r = 0.18) to very strong (r = 0.66), depending on the discipline and which USMLE exam is considered. One might expect larger correlations between subject exam scores and the clinically oriented USMLE Step 2 CK exam.
These findings are consistent with the theory of context specificity; this is particularly true regarding the general range of correlations found between scores on individual subject exams and on the USMLE Steps, as well as the larger correlation indicated when we considered multiple subject examination scores using our composite variable (additional “contexts”). In general, based on the correlations shown in Table 4 and our own data, correlations were higher between scores on subject exams and those on the USMLE Step 2 CK exam compared to scores on Step 1, although the magnitude of the differences in correlations was small.
However, the moderate correlations between subject exams and Step 1 scores may be more indicative of students’ academic aptitudes; those who perform well on Step 1 may also generally perform well on subject exams and on the Step 2 CK exam as a reflection of their own knowledge base and test-taking skills. This premise may be supported by the moderate-to-strong correlations between the second-year cumulative GPA and both the Step 1 and Step 2 CK exam scores (0.74 and 0.68 respectively), and between Step 1 and Step 2 CK scores reported by us (0.69) and others (0.67 to 0.78).3–5 This concept is further supported by the relatively strong correlation between the second-year cumulative GPA and the third-year clinical GPA (r = 0.66). It is not surprising that students who perform well in the earlier years of undergraduate training would continue to perform well in clinical rotations and on objective assessments during the clinical years. Additionally, there may be a component of common-method variance; that is, spurious variance attributable to the measurement method (i.e., a knowledge test) rather than to the constructs the measures represent (i.e., medical knowledge and skill).10
We also demonstrated moderate-to-high correlations between all subject exam scores and the third-year GPA, and between the third-year GPA with scores on both USMLE exams, particularly Step 2. As hypothesized, the third-year GPA was more highly correlated with the Step 2 CK score than with the Step 1 score.
The regression analyses were quite enlightening. Given the moderate-to-high correlations between subject exam scores and USMLE scores, one might anticipate that variance in USMLE scores would be mostly accounted for by variance in subject exam scores. Indeed, we found that the subject exam scores and GPA were associated with a sizable proportion of the variance in USMLE Step 1 and Step 2 CK scores (62% and 61%, respectively). As hypothesized, primary care subject exam scores were able to account for most of the variance in both Step 1 and Step 2 CK scores when entered into the regression model first; the further addition of the non-primary care scores (obstetrics and gynecology, surgery, and psychiatry) accounted for only an additional 5% of Step 1 and 6% of Step 2 variance. One possible explanation for primary care topics’ explaining a large proportion of the Step 2 CK variance is that primary care issues, while emphasized in primary care clerkships, are also addressed in non-primary care clerkships. As a result, primary care topics are covered to some degree across the entire clerkship year, and may be a significant component of the USMLE exam even if questions are specifically identified to address an obstetrics and gynecology or surgery topic, for example. Furthermore, many of the clinical scenarios considered during the basic science years are topics encountered in primary care, further reinforcing this content for students.
The second-year cumulative GPA also differed in its contribution to the variance in the Step exams, accounting for 13% additional variance for Step 1 but only 1% for Step 2 CK. Although the cumulative second-year GPA correlated with both Step exams, correlation with Step 1 was greater. Since the second-year GPA is reflective of a predominantly basic science curriculum, it might be expected to have a greater contribution to variance on Step 1, since Step 1 is a basic science-oriented exam.
The minimal contribution of the third-year GPA to the variance in Step 2 CK scores was, however, unexpected. We anticipated that the third-year GPA, reflective of the clinical clerkship year, would account for a sizeable proportion of variance in the Step 2 CK examination. It is possible that since the third-year GPAs are derived from individual clerkship final grades, which themselves are dependent on subject exam scores, the GPA in and of itself adds little to the effect of the subject exam scores.
Although the subject exams explained much of the variance in the USMLE exams, it is important to not over-interpret their contributions, as nearly 40% of variance was still unaccounted for. There are numerous other factors that could affect performance on the USMLE exam that were not measured in our study, such as test-taking strategies, fatigue, anxiety, and clerkship length and/or timing in the academic year. However, it is reassuring to know that subject exam performance does seem to be a reasonable predictor of USMLE exam outcomes.
The correlation of subject exam scores to Step exam performance could assist resident selection. Although a number of criteria may be used to select residents, performance on the Step exams, particularly Step 2 CK, is likely important, considering the necessity of passing Step 3 during residency training. However, Step 2 CK scores may not be available to program directors at the time of selection or ranking of student applicants. The fact that subject exams were correlated with Step 2 CK exam performance suggests that subject exam data could be useful to program directors in lieu of Step 2 CK exam scores in their consideration of applicant selection.
The correlation of the subject exam scores to the Step exams could also assist individual students in making educational plans. For those students having difficulty with one or more subject exams, a medical school may decide to offer, or mandate, specific study periods or formal preparation courses prior to taking the Step 2 CK exam. Theoretically, these proactive measures could decrease the potential for failure of the Step 2 CK exam.
There were several limitations to our study. Data were collected from a single institution; thus we cannot necessarily extrapolate our findings to other institutions. Additionally, our study addressed correlation and not causation. Although the variances found do lend themselves to estimates of prediction, only a prospective study would truly provide for analysis of predictive ability. We did not account for the effect of clerkship timing and/or length of clerkships on USMLE or subject exam scores. Published data are mixed as to the effect of these entities.5,6,11–25 Furthermore, it is important to note that the specific variables, and the order in which the variables were entered into the regression model, likely affected results relative to the percentage of variance in USMLE examinations explained by these variables. However, in order to specifically address the objective of determining the variance accounted for by a combination of primary care subject exams, we elected to enter variables in a specific order.
Strengths of this study include the large number of students considered, subject examination scores across multiple clerkships for three years, the correlations of subject exam scores with each other as well as with scores of USMLE exams and the GPA, and the consideration of the contribution of subject exams scores to the variance in Step 1 and 2 CK exams.
In summary, our findings strongly suggest that NBME Subject Exams exhibit moderate-to-large positive correlations with USMLE Step 1 and Step 2 CK exam scores, and that subject exams explain considerable variance in performance on USMLE exams. Considering the importance of USMLE performance in the progression of a physician’s career, it is reassuring to know that subject exams appear to provide a reasonably valid estimate of USMLE performance during undergraduate medical training.
Other disclosures: None.
Ethical approval: This study was approved by the Institutional Review Board of the Uniformed Services University of the Health Sciences.
Disclaimer: The opinions expressed in this article are those of the authors alone and are not to be construed as official or reflecting the view of the Department of Defense or the Uniformed Services University of the Health Sciences.
1. Federation of State Medical Boards.. State-specific requirements for initial medical licensure. http://www.fsmb.org/usmle_eliinitial.html
Accessed June 19, 2012
2. Myles T, Galvez-Myles R. USMLE Step 1 and 2 scores correlate with family medicine clinical and examination scores. Fam Med. 2003;35:510–513
3. Ogunyemi D, Taylor-Harris D. Factors that correlate with the U.S. Medical Licensure Examination Step-2 scores in a diverse medical student population. J Natl Med Assoc. 2005;97:1258–1262
4. Myles TD, Henderson RC. Medical licensure examination scores: relationship to obstetrics and gynecology examination scores. Obstet Gynecol. 2002;100(5 Pt 1):955–958
5. Ripkey DR, Case SM, Swanson DB. Identifying students at risk for poor performance on the USMLE Step 2. Acad Med. 1999;74(10 Suppl):S45–S48
6. Ogunyemi D, De Taylor-Harris S. NBME Obstetrics and Gynecology clerkship final examination scores: predictive value of standardized tests and demographic factors. J Reprod Med. 2004;49:978–982
7. Armstrong A, Dahl C, Haffner W. Predictors of performance on the National Board of Medical Examiners obstetrics and gynecology subject examination. Obstet Gynecol. 1998;91:1021–1022
8. Myles TD. United States Medical Licensure Examination step 1 scores and obstetrics-gynecology clerkship final examination. Obstet Gynecol. 1999;94:1049–1051
9. Durning SJ, Artino AR, Boulet JR, Dorrance K, van der Vleuten C, Schuwirth L. The impact of selected contextual factors on experts’ clinical reasoning performance (does context impact clinical reasoning performance in experts?). Adv Health Sci Educ Theory Pract. 2012;17:65–79
10. Podsakoff PM, MacKenzie SB, Lee JY, Podsakoff NP. Common method biases in behavioral research: a critical review of the literature and recommended remedies. J Appl Psychol. 2003;88:879–903
11. Metheny WP, Holzman GB. Student performance on the NBME Part II subtest and subject examination in obstetrics-gynecology. J Med Educ. 1988;63:456–462
12. Ripkey DR, Case SM, Swanson DB. Predicting performances on the NBME Surgery Subject Test and USMLE Step 2: the effects of surgery clerkship timing and length. Acad Med. 1997;72(10 Suppl 1):S31–S33
13. Clark KH, Jelovsek FR. Effect of clerkship timing on third-year medical students’ grades and NBME scores in an obstetrics-gynecology clerkship. Acad Med. 1992;67:865
14. Baciewicz FA Jr, Fagley J, Weaver M, Yeasting R, Thomford NR. The effect of surgery clerkship timing on fourth-year students’ surgery knowledge. Acad Med. 1990;65:543
15. Veloski JJ, Hojat M. Learning in medical school clerkships: the effects of time on comprehensive examination scores. Proc Annu Conf Res Med Educ. 1983;22:19–24
16. Whalen JP, Moses VK. The effect on grades of the timing and site of third-year internal medicine clerkships. Acad Med. 1990;65:708–709
17. Baciewicz FA Jr, Arent L, Weaver M, Yeastings R, Thomford NR. Influence of clerkship structure and timing on individual student performance. Am J Surg. 1990;159:265–268
18. Gary NE, Rosevear GC. Effect of reduction in length of third-year clerkships on students’ academic performance. J Med Educ. 1988;63:406–407
19. Vosti KL, Bloch DA, Jacobs CD. The relationship of clinical knowledge to months of clinical training among medical students. Acad Med. 1997;72:305–307
20. Edwards RK, Davis JD, Kellner KR. Effect of obstetrics-gynecology clerkship duration on medical student examination performance. Obstet Gynecol. 2000;95:160–162
21. Hampton HL, Collins BJ, Perry KG Jr, Meydrech EF, Wiser WL, Morrison JC. Order of rotation in third-year clerkships. Influence on academic performance. J Reprod Med. 1996;41:337–340
22. Reteguiz JA, Crosson J. Clerkship order and performance on family medicine and internal medicine National Board of Medical Examiners Exams. Fam Med. 2002;34:604–608
23. Whalen JP. Investigating whether timing of students’ third-year internal medicine clerkships affects their performances as seniors on the NBME examination. Acad Med. 1991;66:709
24. Smith ER, Dinh TV, Anderson G. A decrease from 8 to 6 weeks in obstetrics and gynecology clerkship: effect on medical students’ cognitive knowledge. Obstet Gynecol. 1995;86:458–460
25. Case SM, Ripkey DR, Swanson DB. The effects of psychiatry clerkship timing and length on measures of performance. Acad Med. 1997;72(10 Suppl 1):S34–S36 Reference cited only in Table 4
26. Spellacy WN, Dockery JL. A comparison of medical student performance on the obstetrics and gynecology National Board Part II examination and a comparable examination given during the clerkship. J Reprod Med. 1980;24:76–78