Papers: Testing Conditions

Using the NBME Self-Assessments to Project Performance on USMLE Step 1 and Step 2: Impact of Test Administration Conditions

Sawhill, Amy; Butler, Aggie; Ripkey, Douglas; Swanson, David B.; Subhiyah, Raja; Thelman, John; Walsh, William; Holtzman, Kathleen Z.; Angelucci, Kathy

Section Editor(s): Frye, Ann PhD

Purpose of Study

All medical students who wish to practice (allopathic) medicine in the United States must pass Step 1 and Step 2 of the United States Medical Licensing Examination (USMLE™). Step 1 and Step 2 are computer-based tests administered under secure conditions at Prometric® Test Centers worldwide. Step 1 is a 350-item computer-based test administered over eight hours, while Step 2 is a nine-hour computer-based examination consisting of 368 items. Approximately 64% of U.S. medical schools require a passing score on USMLE Step 1 for promotion to the third year. Additionally, passing scores on Step 1 and/or Step 2 are required for graduation by 17% and 57% of medical schools, respectively.1 Performance on Step 1 and Step 2 are also major factors considered by residency programs in determining whom to interview and select for resident positions.2,3 Thus, from a student's perspective, Step 1 and Step 2 are clearly high-stakes examinations, making it useful for students as well as medical schools to be able to project likely Step 1 and Step 2 performance prior to taking the tests.

Since the inception of USMLE, the National Board of Medical Examiners (NBME®) has provided paper-and-pencil–based comprehensive examinations for the basic and clinical science disciplines to interested medical schools. The Comprehensive Basic Science Examination (CBSE) and the Comprehensive Clinical Science Examination (CCSE) share the same content coverage and item formats as their USMLE counterparts, but each contains fewer items. Because of the similarity between comprehensive examinations and their corresponding USMLE examination, students and schools commonly find the comprehensive examinations to be valuable tools for examinees preparing to take USMLE. The CBSE and CCSE are securely administered by medical schools and are commonly used to predict performance on USMLE and to identify students at risk for failing Step 1 and Step 2.4,5

In 2003, the NBME introduced a new series of Web-based self-assessment examinations. Like the medical school-administered comprehensive examinations, the Web-based self-assessments were designed to reflect the format and content of their analogous USMLE examination. The Comprehensive Basic Science Self-Assessment (CBSSA) consists of 200 items recently retired from the Step 1 item pool, while the Comprehensive Clinical Science Self-Assessment (CCSSA) contains 184 items recently retired from the Step 2 pool. In contrast to the Step examinations, the four-section CBSSA and CCSSA can be taken via the Web at any time and from any location, provided that the examinee's computer is Internet capable and meets system requirements.

Prior to official implementation of the CBSSA and CCSSA, examinees were given the opportunity to take the self-assessments free of charge for quality assurance purposes. Examinees were provided with vouchers permitting them to take the examinations during three-day field tests. Although the self-assessments did not have to be completed in one sitting, the free self-assessments could only be accessed and completed during the field test period. Thereafter, users paid a fee to take the self-assessment examinations.

Examinees elect to take the self-assessment test forms under two timing conditions: Standard-Paced, analogous to the one-hour-per- section timing of their Step 1 and Step 2 examinations (CBSSA 50 items/section; CCSSA 46 items/section) or Self-Paced, where examinees have up to four hours to complete each section. Regardless of the timing condition elected, within an assessment section, examinees are free to complete test items in any order, skip items, review responses, and change answers. Examinees are also permitted to exit and resume the assessment as frequently as they choose, provided that the allotted time for the section has not expired. Upon completion of the full self-assessment, examinees are given immediate feedback in the form of a performance profile, which includes a total score and a graphical profile (similar to those for Step 1 and Step 2) indicating general content areas of relative strength and weakness. The graphical profile defines a borderline level of performance in each of the content areas addressed by the individual self-assessment; the CBSSA includes information covered during basic science education courses, while the CCSSA includes information covered during the core clinical clerkships. Examinees may choose to use the performance profile provided by the self-assessment for further preparation for Step 1 and Step 2. Total scores on each self-assessment range from 200–800 and are scaled to have a (statistically projected) mean of 500 and a standard deviation of 100 in reference groups of first-time takers from U.S./Canadian schools (2001 Step 1 cohort, 2003 Step 2 cohort).

This study was designed to examine the extent to which performance on the CBSSA and CCSSA can be used to predict performance on USMLE Step 1 and Step 2, respectively. Since the CBSSA and CCSSA are composed of retired items from the Step 1 and Step 2 item pools, it was expected that examinees performing well on the Web-based self-assessments would also perform well on USMLE Step 1 and Step 2. In addition, the performance of examinees taking the self-assessments under Standard-Paced conditions similar to Step 1 (50 items/hour) and Step 2 (46 items/hour), was expected to provide a more accurate basis for predicting performance on USMLE. We also hypothesized that paid assessments would provide a better basis for projecting USMLE scores than field tests provided free of charge, since examinees’ paying for assessments would likely be more motivated to perform consistently well across sections.


The first set of subjects included 848 U.S. medical school students who took the Web-based version of the CBSSA between April and December 2003 and subsequently took the USMLE Step 1. The second set of subjects consisted of 308 U.S. medical students who completed the Web-based version of the CCSSA between October 2003 and January 2004 and subsequently took the USMLE Step 2. Subjects within each sample were eliminated if their pattern of performance on the self-assessment indicated that they had not made a serious attempt at completing all sections of the form—defined as scoring below chance levels of performance (less than 20% correct) on one or more sections. In addition, they must have completed their self-assessment prior to their first attempt on Step 1 or Step 2. Within each of the samples, the subjects were divided into four subgroups based on their test administration conditions, a combination of chosen timing condition and whether it was a paid or free self-assessment. The four subgroups were (1) Standard-Paced Paid; (2) Self-Paced Paid; (3) Standard-Paced Free; and (4) Self-Paced Free.

Multiple regression analyses investigated performance on the associated USMLE Step as a function of (1) performance on the corresponding self-assessment, CBSSA, or CCSSA; (2) the self-assessment timing condition elected by the examinee (Standard or Self-Paced); and (3) the cost of the self-assessment (paid or free).


Table 1 provides descriptive statistical information concerning the performance of study participants on the self-assessment examinations and USMLE Steps 1 and 2 for each test administration condition. Average Step 1 and Step 2 scores for study participants indicate that the ability level of the samples is slightly higher than the cohorts of first-time takers from U.S./Canadian schools in 2003. The CBSSA sample reported a mean Step 1 score of 224 and a standard deviation (SD) of 22, compared with the latter group's mean and SD of 217 and 20. The mean and SD of Step 2 scores for the CCSSA sample was 222 and 23, compared with 217 and 23 for first-time takers from U.S./Canadian schools.

Descriptive Analyses of Observed Self-Assessment Scores with Observed and Predicted USMLE Scores by Test Administration Condition

Mean performance on USMLE Steps varied markedly across the four test administration groups, as did mean performance on the CBSSA and CCSSA. For both self-assessments, higher mean scores were observed in the Paid groups. These differences among the Paid and Free test-takers could be due to differences in incentive among examinees paying for the assessment compared to those who took it for Free, or to differences in length of time between the self-assessment and the subsequent Step administration. Free examinations were taken during the field trial prior to official implementation of the self-assessments. Therefore, the average length of time between the tests was generally longer for examinees in the Free conditions than in Paid conditions (44 days for the CBSSA Free test-takers compared to 11 days for the Paid test-takers, and 32 and nine days, respectively, for the CCSSA examinees).

The multiple regression model predicting Step 1 performance regressed Step 1 scores against CBSSA performance as a function of each test administration condition. This model explained 62% of the total variation in Step 1 scores and indicated statistically significant differences (p < .01) among both the intercepts and slopes for the lines defined by each of the test administration conditions. The analogous model for predicting Step 2 scores from CCSSA performance and test administration conditions explained 57% of the variance in Step 2 scores; statistically significant differences (p < .01) were observed in the intercepts, but not the slopes, for the test administration groups. The lack of statistical significance for the latter may reflect the relatively small sample sizes in some CCSSA groups and the resulting lack of power to detect such differences.

Figure 1 graphically depicts the results of the regression analyses for each sample. In Figure 1 (top), the relationship between CBSSA and Step 1 scores is strongest in the Standard-Paced Paid group (R2 = .69), while the weakest relationship was for the Self-Paced Free group (R2 = .49). Across most of the CBSSA score range, the self-assessment score maps into a higher, predicted Step 1 score if the self-assessment was obtained under Standard-Paced conditions. This suggests that the extra time allotted under the Self-Paced conditions resulted in improved performance on the CBSSA; however, such scores did not provide as accurate an estimate for future Step 1 scores as the Standard-Paced conditions. Scanning vertically across the four regression lines, an observed score of 500 on the CBSSA was associated with predicted Step 1 scores of 218, 221, 237, and 241 for Self-Paced Paid, Standard-Paced Paid, Self-Paced Free, and Standard-Paced Free conditions. Thus, for a given CBSSA score in this range, a lower Step 1 score is expected for examinees in the Self-Paced Paid group, probably reflecting: a high level of motivation to do well on the self-assessment, ample time for CBSSA completion, and the opportunity to look up answers to questions while taking the exam. In contrast, if an examinee in the Standard-Paced Free condition received the same CBSSA score, it resulted in a higher, expected Step 1 score, reflecting the likelihood that these examinees took the self-assessment earlier in the Step 1 preparation process under realistic Step 1-like pacing conditions.

(Top) Relationship between scores on CBSSA and Step 1 by test administration condition. (Bottom) Relationship between scores on CCSSA and Step 2 by test administration condition.

In the second sample, similar results were obtained for predicting Step 2 performance from CCSSA scores: the relationship was strongest in the Standard-Paced Paid group (R2 = .74) and weakest in the Self-Paced Free group (R2 = .40). Additionally, an observed score of 500 on the CCSSA was associated with observed Step 2 scores of 204, 218, 224, and 230 for Self-Paced Paid, Standard-Paced Paid, Self-Paced Free, and Standard-Paced Free conditions; these results parallel those for the CBSSA/Step 1 sample.


Across test administration conditions, performance on the new CBSSA and CCSSA examinations provided accurate predictions of performance in USMLE Step 1 and Step 2, with the best predictors produced when self-assessments were taken under Standard-Paced conditions. The difference in explained variance probably reflects the greater similarity of Standard-Paced conditions to the test administration conditions for Step 1 and Step 2, as well as the opportunity that self-assessment examinees have to use reference material under the Self-Paced conditions. Additionally, examinees who paid for the assessments, rather than taking them free of charge, tended to have self-assessment scores that provided more accurate estimates of their future Step 1 and Step 2 performance. These differences may be due to the length of time between the self-assessments and subsequent Step 1 or Step 2 administrations. Since the free administrations were available only in the initial release phase of the self-assessments, most of these examinees may have taken the self-assessment earlier in their preparation time period for the associated Step examination.

Comparisons of the results of this study with previous research6,7 suggest that the self-assessments, taken under Standard-Paced Paid conditions, provide a more accurate basis for predicting Step 1 (R2 = .62) and Step 2 (R2 = .56) performance than NBME subject tests given by medical schools (R2 for NBME basic science subject tests range: .35 for histology to .50 for pathology; clinical science subject tests range: .28 for psychiatry to .55 for internal medicine). The predictive accuracy of the self-assessments could be due to: (1) the relatively short time interval between the self-assessments and the Step administrations in this study, (2) the greater test length of the self-assessments, and/or (3) the greater similarity in content coverage to the Step examinations. Assuming these results are replicable in future studies, it appears that the performance profiles provided by the self-assessments should furnish prospective Step 1 and Step 2 examinees with an excellent basis for judging their readiness to sit for USMLE.


