Extensive resources are devoted to preparing medical students to practice in the demanding world of medicine. While students' progress is extensively monitored during medical school, very few medical schools have reported research showing the relationship of medical school preparation to performance during residency education.1,2,3,4 There is growing recognition of the need for measurable outcomes of medical education. Performances of graduates in their residency programs provide one outcome that could be used to assess the quality of medical school educational programs. The purpose of this study was to consider information about the performances of our graduates, assessed early in their residency education by residency program directors, and to explore the relationship between those ratings and our graduates' performance evaluations during medical school.
In the spring of 1997, the University of Michigan Medical School (UMMS), in Ann Arbor, Michigan, began a longitudinal follow-up program designed to collect residency directors' assessments of the performances of our graduates at the end of their first year of residency. This investigation was, in part, inspired by the Liaison Committee on Medical Education (LCME) statement that medical schools must evaluate the effectiveness of educational programs and document graduates' achievement, showing the extent to which institutional and program purposes are met.5 Initiation of this study coincided with the completion of an extensive and incremental curricular change. The goals of the curricular change, reflecting changes in educational goals, included more opportunities for clinical applications of medical science and hands-on, active learning in the first two years. Extensive efforts were made to encourage collegiality and professionalism among students, and more frequent and earlier patient encounters to promote a more humanistic, patient-centered approach to medical decision making. The evaluation system was also revised to pass/fail grading in the first year, with additional mechanisms implemented to ensure earlier and increased feedback to students from objective measures throughout the first two years of medical school.
An important goal of this research project was to validate the system used to assess students' performances in medical school by comparing the medical school's assessments with performance assessments of UMMS graduates early in their residency education. In particular, we wanted to assess the contributions of academic assessments at various intervals during medical school to ratings of residency performance across all students and by subgroups based on academic achievement, gender, and ethnicity.
To collect residency directors' ratings of our graduates' skills and abilities, we developed an instrument representing various domains of medical practice and aligned with the key goals of our revised curriculum. The seven domains included in the instrument were clinical judgment, patient management, clinical skills, professional qualities, humanistic qualities, oral and written presentation skills, and a final overall performance assessment question. The survey instrument used a five-point Likert-type response format (1 = poor, 2 = fair, 3 = good, 4 = very good, and 5 = excellent). Residency directors were also asked to make written narrative comments on the instrument. Our intention was to construct an instrument that would be self-explanatory and that could be completed in five minutes or less. Curriculum committee members approved the finalized survey.
The survey was mailed to the residency directors of the UMMS graduating classes of 1996, 1997, and 1998 in May of the graduates' first year of residency. Responses were categorized by each graduate's residency specialty type and by his or her program's affiliation with either a community-based or a university-based hospital.
Medical school assessments considered in the analyses included overall grade-point average (GPA) of the second medical science year (M2), U.S. Medical Licensing Examination (USMLE) Step 1 scores, overall grade-point average of the seven required clerkships in the third (clinical) year (M3), USMLE Step 2 scores, and a cumulative composite score at graduation. This composite score at graduation was composed of a grade-point average computed over all second-, third-, and fourth-year courses, with a small fraction representing USMLE Step 1 and Step 2 scores, using the formula of medical school cumulative grade-point average (GPA) + [(USMLE 1 + USMLE 2)/4,000].
The structure of the instrument was assessed using principal-components factor analysis. Cronbach's alpha was used to determine the instrument's internal consistency. Responses were initially analyzed by graduating class. Demographic, program, and academic achievement variables were compared to determine representativeness of responses. Descriptive statistics for the individual items on the survey were compared based on the residency program's affiliation, specialty subgroup, and the gender of the graduate. A lack of differences among individual graduation years allowed the combination of data from all three years. Correlations were computed between measures of medical school performance and directors' ratings. A one-way analysis of variance (ANOVA) was used to compare subgroup means, utilizing post-hoc tests for mean differences.
A single mailing of the survey instrument was sent to residency directors of 498 graduates of three consecutive graduating classes, and 338 (68%) were returned. The graduates represented by directors' responses were 61% men and 39% women. The residents' racial-ethnic subgroups were Asian (16%), underrepresented minority (15%), and white and all others (69%). The residents' specialty subgroups were primary care (50%), surgery and surgery subspecialties (27%), and all other specialties (23%). The 136 graduates not represented by directors' responses were statistically similar in distribution by gender, ethnicity group, average overall M2 GPA, average overall M3 clerkship performance GPA, and average USMLE Step 1 scores. The return rate from directors of surgery subspecialties was lower than those of other residency specialty groups (chi-square = 10.2, p <.002).
Across all responses, the average ratings for individual survey items were above 4.0 (very good), with the highest average ratings given for the items assessing humanistic and professional qualities. Although several content areas were included in the instrument, factor analysis of the domains represented by the instrument's seven items demonstrated a single factor, explaining 74% of the variance in scores. Internal consistency of the items in the single factor was high (Cronbach alpha of .94). These findings suggested that the survey was measuring the directors' singular perceptions of the residents' performances. A decision was made to use the instrument's final item, “overall performance,” to represent directors' assessments in all further analyses.
Inter-item correlations were high (p <.000) between the individual grading indices during medical school (M2 GPA, M3 GPA, USMLE Step 1 scores, USMLE Step 2 scores, and overall cumulative composite score). (See Table 1.) The correlation between M3 clinical grades and the overall performance item assessed by program directors was stronger (r =.41) than that between the composite cumulative grade at graduation (r =.32). The correlation between M3 grade average and overall residency performance rating was nearly twice the magnitude of the correlations between M2 overall GPA, USMLE Step 1, or Step 2 scores and the overall residency performance. (See Table 1.) When we looked at the inter-item correlation between the various medical school assessment components and the seven individual domains of our instrument, we found the relationships to be positive and statistically significant for all individual domains except one; humanistic qualities, assessed by residency directors, was not related to overall M2 grades (r =.07, p =.12).
Another analysis examining the relationship of undergraduate medical school grades to assessments of residency performance compared subgroups composed of thirds of the class, based on an overall composite score at graduation. Performance of graduates who had been in the top third of their class, on average, was rated higher than was performance of those who were in the lowest third of the graduating class (see Table 2). This relationship held when comparing top and lower thirds based on all medical school assessment components considered in our study (M2 overall GPA, USMLE Step 1, M3 overall GPA, and USMLE Step 2 scores.) The greatest differences between groups were found when comparing thirds of the class based on M3 overall GPA. Statistically significant differences were found when comparing directors' mean ratings of overall performances between those in the lowest and middle thirds, and again when comparing the middle third's with the top third's average ratings (p <.05).
Comparisons of our graduates' ratings by gender, by residency specialty (grouped by primary care, surgery and surgery subspecialties, and all other subspecialties), and by residency program affiliation (either community-based or university-based residency programs) showed no difference, on average, for overall residency performance. When the race-ethnicity of graduates was considered, using the three subgroups of underrepresented minority students, Asian students, and all other students, no difference was found in the comparison of program directors' ratings of overall residency performance with cumulative composite scores at graduation. Regardless of their racial-ethnic subgroups, the students in the top third of the class, based on the cumulative score, were rated higher by program directors than were the students in the lowest third.
Concerns expressed about the participation rates of residency directors at the onset of this project were dispelled. Our relatively high response rate without follow up is consistent with other researchers' efforts,3,4 and it provides evidence that residency directors are willing to provide assessments of graduates' performances and feedback to medical schools regarding graduates.
Finding that the survey measured essentially one dimension of our graduates' early residency performance was consistent with findings of other studies.4 Unlike our medical school's composite index, which was computed from many individual measures, the program directors were providing ratings on single items. It is possible that the residency directors based their ratings on a single overarching impression of our graduates that spilled over into ratings of performance in all domains, rather than making distinctions of the strengths and weaknesses of individuals.6 Just as our medical school combined performance measures across multiple courses and learning experiences in an “overall” percentage of GPA index for our students, the residency directors in our study tended to make global assessments of the residents' performances rather than distinctions among the items in the survey.
We were encouraged that our graduates were rated, on average, as “very good” or higher by residency directors. The consistency of our graduates' ratings, across specialty areas and regardless of university- or community-based program affiliation, provided confirmation that our graduates are prepared and adaptable to medical practice in a variety of settings.
As expected, positive and relatively high correlations were found among the grading components during medical school (M2 GPA, M3 composite scores, cumulative composite score, and USMLE Step 1 and Step 2 scores). While low in magnitude, the correlations between residents' medical school grades and their residency directors' assessments were statistically significant. Although these findings support the relationship between medical school achievement and later performance, academic performance in this study explained less than 20% of the variance in overall residency performance. Academic assessments of this type in medical school do not appear to be capturing other important factors contributing to directors' assessments after graduation.3,4 The strength of the correlation between the M3 GPA and the residency directors' assessments may have been due to a “method effect” of the ratings provided.6 Just as the residency directors made largely subjective assessments of our graduates, the majority of the overall clerkship grades for the required clerkships are provided by attendings' ratings of students' clinical performances. It is possible that the number of students in a residency program and the degree of familiarity between the residency director and the graduate may have contributed to the rating patterns.
Combining data across three years achieved an “n” large enough to compare a variety of subgroups. We found that the students represented in the top thirds of their classes, for all academic measures in this study except the USMLE Step 1 scores, were rated higher by residency directors, on average, when compared with the students in the middle and lowest thirds of their classes. Average ratings based on thirds of the class by clerkship performance in the M3 year proved to be the most consistent with the residency directors' average ratings of our graduates' performances. Further, subgroup analyses showed that medical school performance accounted for the difference in program directors' ratings, regardless of a graduate's gender or race-ethnicity.
While it may be intuitive that quality of performance after medical school relies on quality of achievement before graduation, our findings provide evidence to support this. We were able to demonstrate a correspondence between students' performances in components of our school's evaluation system and residency directors' ratings of their subsequent performances. The relationship we found between academic achievement during medical school and performance in residency lends validity to the evaluation system utilized by our medical school, and supports the use of these postgraduate outcomes as measurements of educational programs. Identifying standardized, objective measures that could be utilized as an index of residency performance, similar to those used by our medical school evaluation system, might enhance the value of residency performance ratings as an educational outcome.
The findings of this study are additionally important in increasing our understanding of factors that do not appear to contribute to performance ratings in graduate education. Based on our data, specialty type, gender, and race-ethnicity of graduates, when academic achievement was taken into account, were not contributing factors in residency performance ratings. Discovering and measuring contributing factors other than those included in our evaluation system is our challenge in medical education.
1. Hojat M, Borenstein BD, Veloski JJ. Cognitive and noncognitive factors in predicting the clinical performance of medical school graduates. J Med Educ. 1988;63:323–5.
2. Blacklow RS, Goepp CE, Hojat M. Further psychometric evaluations of a class-ranking model as a predictor of graduates' clinical competence in the first year of residency. Acad Med. 1993;68:295–7.
3. Dawson-Saunders B, Paiva REA. The validity of clerkship performance evaluations. Med Educ. 1986;20:240–5.
4. Yindral KJ, Rosenfeld PS, Donnelly MB. Medical school achievements as predictors of residency performance. J Med Educ. 1988;63:356–63.
5. Liaison Committee on Medical Education. Functions and structure of a medical school: Accreditation and the Liaison Committee on Medical Education—Standards for accreditation of medical education programs leading to the M.D. Degree. Washington, DC: Association of American Medical Colleges, 1998,13.
6. Hull AL, Hodder S, Berger B, et al. Validity of three clinical performance assessments of internal medicine clerks. Acad Med. 1995;70:517–22.
Research in Medical Education: Proceedings of the Thirty-ninth Annual Conference. October 30 - November 1, 2000. Chair: Beth Dawson. Editor: M. Brownell Anderson. Foreword by Beth Dawson, PhD.