A license to practice medicine in the United States is general in nature; it does not restrict a physician to practice within a particular specialty. Consequently, the examination system that informs the licensing decision is designed to be broad based to reflect the general nature of the medical license. Medical school graduates, however, often gain experience in their specialty of interest before completing the licensure process. For persons pursuing a more specialized career than the general practice of medicine, the general nature of the license and their training experience can be at odds. Given this potential discrepancy, the likelihood of success in the examination process may vary as a function of the nature and amount of postgraduate training and clinical experience. Previous research has shown that candidates who receive postgraduate training that provides frequent contact with a broad range of patients under a variety of circumstances are likely to perform better on the final component of the medical licensing examination than are those whose training is more narrowly focused.1–4 This outcome supports the validity of the licensing examination. However, it also suggests that the clinical experience of the candidates may differentially affect their success in obtaining an initial license to practice medicine.5 While this should clearly not influence the type of postgraduate training that a candidate selects, it may be a consideration in determining when he or she should sit for the examination. Since the 1999 introduction of year-round, computer-based testing in the United States Medical Licensing Examination (USMLE), there has also been a significant increase in the percentage of examinees who wait until later in their postgraduate training experience to sit for Step 3, the final component of USMLE.
Designed in part to provide assistance to medical school graduates deciding when to sit for the licensing examination, this study examined the extent to which the length and type of postgraduate training affects performance on USMLE Step 3 after controlling for initial performance on Steps 1 and 2. It was expected that graduates receiving general, broad-based training would perform better on Step 3 than would those in more specialized training programs. Performance differences were expected to vary in proportion to increases in the amount of training (positive) and degree of specialization (negative). In other words, more training in a general, broad-based area is expected to increase performance, while increased training in a more specialized area will likely have a negative effect on performance.
Participants for the study included 36,805 U.S. and Canadian medical school graduates who took the computer-based version of USMLE Step 3 for the first time between November 1999 and December 2002. Study inclusion criteria were (1) achievement of passing scores on Steps 1 and 2; (2) a self-reported graduation date between 1985 and 2002 on the Step 3 application; and (3) less than 85 months (seven years, one month) of postgraduate training. Analyses focused on subjects reporting postgraduate training in the following areas: anesthesiology, combined medicine–pediatrics, emergency medicine, family practice, internal medicine, obstetrics–gynecology, orthopedics, pathology, pediatrics, psychiatry, radiology, and surgery. There were more than 500 examinees in each of these areas.
The dependent measure for the study was performance on USMLE Step 3; first-attempt scores on Steps 1 and 2 were used as covariates6 to control for differences in the characteristics of examinees entering different types of postgraduate training. Typically taken near the end of the second year of medical school, Step 1 assesses examinees’ understanding of basic science. Step 2, usually taken during the senior year of medical school, measures understanding of clinical science necessary to provide patient care under supervision. Successful completion of Steps 1 and 2, in addition to graduation from medical school, is required for eligibility to sit for Step 3, which focuses on readiness for the unsupervised practice of medicine.
Self-reported information on the type of postgraduate training and its start date were derived from Step 3 application forms, with the latter used to calculate the number of months of training completed before the first Step 3 administration. Descriptive statistical analyses investigated performance on Steps 1, 2, and 3 as a function of the type and length of residency training.
A baseline group was defined by identifying first-time Step 3 examinees in family practice postgraduate training who took Step 3 after they had completed between nine and 15 months of training. This group was chosen because it was relatively large (n = 4,092) and because the nature of their postgraduate training seemed conceptually consistent with the focus and content coverage of USMLE Step 3. A regression analysis was performed, using Step 2 scores to predict Step 3 performance for members of the baseline group. The resulting regression equation from the baseline group was then used to predict Step 3 scores for all participants in the study, and unstandardized residuals were calculated by subtracting predicted Step 3 scores from actual Step 3 scores. These residuals were analyzed to determine the effect of type and length of postgraduate training on Step 3 performance, relative to the baseline group of family practice residents sitting after approximately one year of training. Similar analyses were performed using Step 1 (rather than Step 2) scores as predictors of performance in the baseline group.
Table 1 provides counts, means, and standard deviations for Steps 1, 2, and 3 observed scores for each postgraduate training group. Summarily, the performance of the entire study group on all three steps was similar to the performance of U.S. and Canadian first-time examinees taking the examinations during the study period. Differences in Step 3 performance were observed across training programs, with higher mean scores observed in programs where examinees receive general, broad-based training (combined medicine–pediatrics, emergency medicine, internal medicine, and family practice programs). Observed means on Steps 1 and 2 did not show this pattern, suggesting that the observed differences on Step 3 were not due to preexisting differences in the training groups.
The regression of Step 3 scores on Step 2 performance in the baseline group explained 56% of the variation in Step 3 scores; the analogous equation predicting Step 3 performance from Step 1 scores explained 40% of the variation. The actual equations are shown below.
Step 1 prediction equation: Predicted Step 3 score = 102.77089 + .53213 (Step 1 score).
Step 2 prediction equation: Predicted Step 3 score = 97.95615 + .55733 (Step 2 score).
Results of the analyses of the unstandardized residuals are depicted graphically in Figure 1. Each line in the figure provides the regression of unstandardized residuals on months of training for the associated type of training program. Solid lines represent training programs offering broad, general training (in internal medicine, emergency medicine, family practice, combined medicine–pediatrics, and pediatrics). Dashed lines provide analogous information for training in the surgical specialties and other areas. Points above zero indicate performance better than that predicted for a family practice resident with approximately one year of training; points below zero indicate that predicted performance is worse than this.
In the family practice group, residuals increase as the number of months of training increases. As one would expect, the mean residual is near zero for 12 months of training, reflecting the baseline group’s composition of family practice residents with approximately 1 year of training. A similar pattern of increasing residuals is seen for training in combined medicine–pediatrics, emergency medicine, internal medicine, and pediatrics, although the heights of the lines and the number of months of training associated with a mean residual of zero varied substantially. Residuals for surgery and for obstetrics–gynecology also increase with additional months of training, although these lines stay well below zero. The remaining (dashed) lines (for psychiatry, anesthesiology, radiology, and orthopedics) are all below zero and have flat or mildly negative slopes. This indicates that Step 3 performance for these groups is below the level expected for family practice residents with 1 year of training and comparable Step 2 scores, and additional months of training are associated with little change in Step 3 performance. Analyses of residuals using the Step 1 prediction equation for the family practice baseline group yielded similar results.
Consistent with previous research, results of this study indicate that examinees who receive postgraduate training that provides exposure to a broad range of patient problems are likely to perform better on USMLE Step 3. This is true both in absolute terms and when controlling for differences in Steps 1 and 2 performances. In addition, examinees in these programs tend to do better on Step 3 as they receive additional training. In contrast, examinees who train in other programs tend to perform worse than expected when compared with a baseline group of family practice residents. This pattern of performance is consistent with a degree of mismatch between content coverage on the generalist-oriented Step 3 examination and the narrowed spectrum of patients seen during postgraduate training.
Contrary to expectations, after Step 2 performance is controlled for, Step 3 performance in more narrowly focused postgraduate training programs does not appear to decrease significantly as training progresses. It is unclear whether this pattern reflects the general nature of Step 3 test material or whether it is influenced by examinee strategies in preparing for the examination (e.g., more preparation by examinees who sit later). Overall, however, the pattern of results supports the validity of the Step 3 examination.
The study has several notable limitations. First, postgraduate training experiences were inferred from self-reported responses to questions on the Step 3 application. For examinees who switched from one training program to another before taking Step 3 (most likely a relatively small percentage), these inferences may be incorrect. Second, no information was available to describe examinees’ patient care experiences unrelated to postgraduate training (such as from moonlighting). Such experiences may result in exposure to a broader spectrum of patient problems. Finally, because of the study design, causal inferences are not possible. For example, the generally negative residuals found in Figure 1, in some postgraduate training areas, may reflect extensive preparation for Step 2 in order to obtain a position in a popular, highly selective training program.
In summary, the results of this study indicate that examinees who receive postgraduate training that provides exposure to a broad range of patient problems are likely to perform better on USMLE Step 3, and additional training in such programs tends to improve Step 3 performance. In addition, after controlling for Step 2 performance, performance on Step 3 for examinees in more narrowly focused postgraduate training programs tends to decline gradually as training progresses. Overall, the pattern of results supports the validity of the Step 3 examination.
1.Gonnella JS, Veloski JJ. The impact of early specialization on the clinical competence of residents. N Engl J Med. 1982;306:275–7.
2.Xu G, Veloski JJ. A comparison of Jefferson Medical College graduates who chose emergency medicine with those who chose other specialties. Acad Med. 1991;66:366–8.
3.Dillon GF, Henzel TR, LaDuca A, Walsh WP. The influence of type of residency training and gender on an examination for medical licensure. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco, California, April, 1995.
4.Dillon GF, Henzel TR, Walsh WP. The impact of postgraduate training on an examination for medical licensure. In: Scherpbier AJJA, van der Vleuten CPM, Rethans JJ, van der Steeg AFW, eds. Advances in Medical Education. Boston: Kluwer Academic Publishers; 1997:146–8.
5.Dillon GF, Henzel TR. The relationship between amount of postgraduate training and performance on a physician licensing examination. Paper presented at the Annual Meeting of the American Educational Research Association, Montreal, Canada, April, 1999.
6.Swanson DB, Case SM, Nungester RJ. Validity of NBME part I and part II scores in prediction of part III performance. Acad Med. 1996;66(9 suppl):S7–9.