Grades on core clerkships are considered important predictors of future residency performance, yet the components of clinical grading and how they reflect student performance are complex and poorly understood.1–4 Clerkship grades derive from evaluations from clinical teachers and performance on standardized tests. Grades may also include other components (performance on objective structured clinical examinations and performance during simulations, presentations, oral exams, etc.). At most institutions, grades on clinical rotations, which are determined by performance evaluations in the clinical setting, constitute the largest contributor to overall clerkship grades.5,6
Although clearly important, clinical evaluations have shown poor reliability and validity across clerkships, and variance attributable to the student is uncertain.7–9 Reliable evaluation is challenging due to differences in structural and individual factors beyond a student’s performance. Evaluators have variable experience, training, and preferences.10,11 Additionally, structural factors such as setting, contact time, and level of observation can influence clinical evaluation.12,13 Furthermore, the competencies managed during daily ward performance may differ from those assessed in the grading process. A prior survey of clinical teachers identified that cognitive skills were prioritized for grading, whereas interpersonal skills and professionalism were more important for daily ward performance.14
Clerkship performance is now an increasingly emphasized component of residency selection, yet what is actually measured at the student–evaluator level remains unclear.15–17 Prior studies have examined characteristics of top-performing medical students in the clinical setting but predominantly reflect consensus opinions of evaluators and clerkship or program directors,14,18–20 rather than empirical data.
In this study, we used existing workplace performance evaluations to explore which structural and student-level components are associated with a clinical honors (or top grade) recommendation. Therefore, we examined the following questions: what evaluation components predict a clinical honors recommendation in 3 core clerkships (internal medicine [IM], pediatrics, and surgery)? Does the relative importance of components differ among those 3 disciplines?
Study design, setting, and participants
We conducted this study at the University of Alabama at Birmingham School of Medicine (UAB SOM), in metropolitan Birmingham, Alabama. UAB SOM enrolls approximately 190 students annually. Over half of the students remain in Birmingham to complete their clinical training; the remainder complete clerkship requirements at regional campuses around the state. We used data from existing student evaluations completed by faculty and resident physician educators at the Birmingham campus. We included all completed evaluations of medical students rotating through the third-year, 8-week IM, surgery, and pediatric clinical clerkships from July 2015 to June 2017. The Institutional Review Board at the University of Alabama at Birmingham deemed this study exempt.
IM comprises two 4-week rotations on inpatient general medicine wards at the university and Veterans Affairs hospitals. Pediatrics consists of 4 weeks on inpatient pediatric wards and 4 weeks of outpatient pediatric care, which includes, respectively, 1 week of newborn nursery and 1 week of adolescent medicine. Surgery consists of 4 weeks of general surgery and 2 weeks each of specialty and subspecialty surgical services.
All included clerkships employ the same student evaluation that is distributed electronically. This UAB SOM clerkship evaluation consists of 19 items. Twelve items address student performance in the clinical workplace, and each of these is measured with 4 unique milestone-based ordinal descriptors. For example, the descriptors for interviewing skills are as follows:
- “Disorganized, incomplete, lacks focus”;
- “Organized; obtains basic history but points often missed including pertinent positive and negative review of systems (ROS)”;
- “Organized, usually complete including pertinent ROS, but often with extraneous information”; and
- “Excellent skills; thorough yet succinct and focused history.”
Evaluators may also select “Not observed” for all 12 of the items measuring student skills. Two items address situational variables and focus on interviewing and exam skills (evaluators indicate if their assessment is based on direct observation and presentations or presentations alone). Another situational item addresses whether visible signs of anxiety or awkwardness hamper a student’s presentation (yes, no). Two items—number of contact weeks and number of contact hours per week—address contact time. Finally, the last 2 items require evaluators to indicate whether the student has satisfactorily passed the rotation (yes, no, uncertain) and whether they recommend that the student receive clinical honors (yes, no, insufficient contact with student to make a recommendation).
The clinical honors recommendation is accompanied by the following prompt:
The UAB SOM recommends an Honors grade be given only to students with superior or outstanding achievement in all evaluable competencies (clinical skills, fund of knowledge, systems-based practice, practice-based learning, interpersonal and communication skills, and professionalism). This level of achievement would be expected from the top 20% of the class.
When evaluators recommend clinical honors, they are asked to provide a narrative justification. A grade of clinical honors is assigned when more than half of the faculty and senior resident evaluations recommend honors. The honors recommendation is based solely upon the student’s clinical performance since clinical evaluators do not have any knowledge of students’ performance on standardized testing.
The Office of Undergraduate Medical Education provided coded, deidentified records of student clerkship evaluations. We used no student demographic information, and evaluator characteristics were not available.
The unit of analysis was each completed student evaluation. We used descriptive statistics (means, frequencies, and percentages) to describe the sample.
After describing the sample, we examined the constructs of the 12 items in the student evaluation form using exploratory factor analysis (principal-component factor). We retained items if their eigenvalues were ≥ 1.0 and if the rotated factor loading was > 0.4. We assessed internal consistency reliability with Cronbach’s alpha (and considered alpha > 0.9 to be excellent and > 0.8 to be good).
Next, we performed bivariate analyses (unadjusted) to test the association between a clinical honors recommendation (main outcome) and the following:
- clinical performance (12 items, dichotomized to any with top/highest rankings on the ordinal scale vs all others),
- contact time (number of weeks and number of contact hours per week), and
- qualifying questions (final 2 questions, as described above).
To maintain a balanced number of evaluations within each variable and clerkship, we arbitrarily dichotomized contact time. We dichotomized number of weeks into 3 weeks or fewer vs more than 3 weeks for medicine and pediatrics and into 2 weeks or fewer vs more than 2 weeks for surgery. We similarly dichotomized contact hours per week into 30 hours or fewer vs more than 30 hours/week for medicine and pediatrics and into 20 hours or fewer vs more than 20 hours/week for surgery.
Then, to explore the relationship between a recommendation for clinical honors and independent variables, we employed multilevel logistic regression and generalized linear latent and mixed models (GLLAMM). Independent variables were the 12 items of clinical performance and the following structural factors:
- person evaluating (faculty or resident),
- contact time in weeks,
- number of contact hours per week,
- direct observation of the physical exam, and
- direct observation of interviewing.
As students may receive evaluations from several educators during the clerkship, GLLAMM adjusts for such multiple student-level evaluations. We considered P values < .05 to be statistically significant. We conducted all analyses using STATA 11.2 (StataCorp LP, College Station, Texas).
During the study period, 3,947 evaluations were completed for 175 third-year medical students. Of these evaluations, 1,405 (35.6%) were from IM, 1,467 (37.2%) from surgery, and 1,075 (27.2%) from pediatrics. We have shown information about the evaluator role, structural characteristics, passing grade recommendation, and clinical honors recommendation in Tables 1 and 2. We noted that about half of the evaluations in pediatrics (50.7%) occurred after just 1 week, whereas 70.7% of evaluations in IM and 44.5% in surgery occurred after 3 or more weeks (P < .001). Overall, 38.4% of evaluations indicated a recommendation of clinical honors, but we noted differences among clerkships: 46.9% of IM evaluations indicated a recommendation of clinical honors, 35.8% of surgery evaluations indicated a recommendation of clinical honors, and 30.7% of pediatrics evaluations indicated a recommendation of clinical honors (P < .001). Residents completed more evaluations per student than faculty for the IM and surgery clerkships (all P < .001), but in pediatrics, faculty completed more evaluations per student than residents (3.7 vs 2.9, < .001).
For all 3 clerkships, one factor explained most of the variance, and the factor comprised the same items: knowledge application, interviewing skills, physical exam skills, oral presentation skills, clinical reasoning, ward and clinic duties, and record keeping. An additional item—procedures—was included for surgery but not for IM and pediatrics due to low totals. This single factor explained 52% of the variance for IM (eigenvalue of 5.70), 61% of the variance for surgery (eigenvalue of 7.32), and 53% of the variance for pediatrics (eigenvalue of 5.85). For factor loadings, see Supplemental Digital Appendix 1 at https://links.lww.com/ACADMED/A996. The internal consistencies were good to excellent (IM, Cronbach’s alpha = 0.88; surgery, Cronbach’s alpha = 0.92; pediatrics, Cronbach’s alpha = 0.87).
The second factor comprised the following: response to feedback, patient interactions, dependability, and team interaction. This factor explained 12% of the variance for IM (eigenvalue of 1.26), 10% for surgery (eigenvalue of 1.21), and 11% for pediatrics (eigenvalue of 1.24). The internal consistency was good (IM, Cronbach’s alpha = 0.87; surgery, Cronbach’s alpha = 0.90; pediatrics, Cronbach’s alpha = 0.89).
For all 3 clerkships, more contact time—whether the contact time was in the number of weeks or in the number of hours per week (all P < .001)—increased the likelihood of a clinical honors recommendation (see Table 3). For example, in the IM clerkship, 58.3% of evaluations indicating contact time of ≥ 4 weeks received a clinical honors recommendation vs 21.8% of evaluations indicating just 1 week of contact time. In a stratified analysis of contact time, we observed the same finding for faculty and residents (data not shown). Residents were more likely than faculty to recommend clinical honors for all 3 clerkships (all P < .001; see Table 3). For all 3 clerkships, evaluators selected the highest ordinal score for all rating areas more frequently for those recommended for clinical honors than they did for students who did not receive an honors recommendation (see Table 4). Evaluators who directly observed the physical exam and interviewing skills of students were more likely to recommend clinical honors than evaluators who based their ratings only on patient presentations (see Table 4).
In the multivariable analysis, we adjusted for clustering effects at the student level since each student received multiple evaluations. We also adjusted for the following structural factors: person evaluating (faculty or resident), contact time in weeks, number of contact hours per week, direct observation of the physical exam, and direct observation of interviewing.
Of the 5 top items used when also recommending clinical honors, 4 were common across IM, surgery, and pediatrics clerkships: clinical reasoning, knowledge application, record keeping, and presentation skills. Response to feedback was included in IM, team interaction in surgery, and dependability in pediatrics (see Figure 1 and Supplemental Digital Appendix 1 at https://links.lww.com/ACADMED/A996). The 3 items that best predicted a clinical honors recommendation were as follows:
- Clinical reasoning skills (odds ratio [OR] 2.8; 95% confidence interval [CI], 1.9 to 4.2; P < .001) and knowledge application (OR 2.7; 95% CI, 1.8 to 4.1; P < .001) for the IM clerkship;
- Presentation skills (OR 2.6; 95% CI, 1.6 to 4.2; P < .001) and record keeping (OR 1.9; 95% CI, 1.2 to 3.0; P = .010) for the surgery clerkship; and
- Knowledge application (OR 4.8; 95% CI, 2.8 to 8.2; P < .001) and dependability (OR 3.2; 95% CI, 1.4 to 7.5; P < .001) for the pediatric clerkship.
In this study, we used data from existing evaluations to explore which evaluation components predict a clinical honors recommendation in core clerkships and how these components vary across disciplines. Using factor analysis, we found that evaluation components organized into 2 factors—Cognitive characteristics (Factor 1) and Interpersonal characteristics (Factor 2)—as well as structural components. The 4 characteristics that best predicted clinical honors were consistent across disciplines: clinical reasoning, knowledge application, record keeping, and presentation skills.
Cognitive characteristics—Factor 1
Characteristics within Factor 1 represent the knowledge and skills shown to be valuable in clinical practice.21 Components such as clinical reasoning, knowledge application, and presentations skills are closely linked to cognition and reflect a student’s ability to synthesize information and demonstrate their thought processes to evaluators. These characteristics reflect traditional definitions of clinical competence and are thought to predict residency success.22–25
The top 4 characteristics were consistent across disciplines; however, within these 4 characteristics, knowledge application and clinical reasoning were prioritized within IM, presentation skills and record keeping within surgery, and knowledge application and presentation skills within pediatrics.
Interpersonal characteristics—Factor 2
Characteristics within Factor 2 represent a medical student’s skills interacting with others in the clinical setting. These include dependability and interactions with patients, team members, and evaluators (i.e., responding to feedback). Components of this factor represent updated interpretations of clinical competency that incorporate aspects of communication, professionalism, and attitude—all characteristics that are often prioritized by patients and correlate with improved patient satisfaction.26,27 The most important characteristic within this factor varied across disciplines: response to feedback was prioritized by IM, team interaction by surgery, and dependability by pediatrics.
We identified additional structural components that reflect the landscape and organization of the clinical learning environment. These components include the type of evaluator (resident or faculty), contact time, and whether clinical skills were directly observed. Such components are extrinsic to a student’s performance yet still influence clinical performance evaluation and thereby clerkship grading. Contact time contributed significantly across all disciplines and was particularly relevant within pediatrics and IM when a student worked with an evaluator for at least 3 weeks.
Improved understanding of clinical evaluation and its grading implications is critically important because of the growing emphasis of clerkship performance for residency selection. Despite trends toward competency-based assessment throughout undergraduate medical education (UME), we are not aware of any definitive normative means to stratify and rank the growing number of residency applicants.
Our study adds to the existing literature. First, we provide empiric evidence, beyond which factors are reported as important by evaluators, to demonstrate what they do when considering a clinical honors (or top grade) recommendation for medical students. Prior studies on high-performing clerkship students have predominantly addressed opinions of evaluators and clerkship directors. Wimmers and colleagues surveyed clinical teachers and identified 3 broad competency domains important for ward performance and grading: cognitive abilities, interpersonal skills, and professional qualities.14 Cognitive skills were prioritized for grading, but interpersonal skills and professionalism were more important for daily ward performance.14 Through our study, we identified similar factors and confirmed that evaluators prioritize the cognitive domain when considering implications for grading, such as an honors recommendation.
In a recent multi-institutional survey of clinical faculty, investigators found that clinical reasoning, ownership, curiosity, dependability, and high ethical standards were most important for recommending IM clerkship students for clinical honors.20 With the exception of clinical reasoning, these skills may not be fully captured by most medical schools’ current evaluation mechanisms despite their perceived importance. The study’s findings indicate that the most important domains entail clinical skills and aspects of communication and attitude,20 which overlap with, respectively, our Cognitive and Interpersonal domains.
Second, we demonstrate the influence of structural components, such as contact time, direct observation, and evaluator level, on the clinical performance evaluation. These findings build upon prior studies that highlight how factors external to a student can affect evaluation and grading.7
Observability remains an important part of clinical performance evaluation, yet not all clinical competencies are equally observable. Traditional clinical skill characteristics are more easily observed and scored than those that represent attitude, which may explain why Cognitive characteristics (Factor 1) better predicted clinical honors than Interpersonal characteristics (Factor 2). Evaluators who directly observed the physical exam and interviewing skills of students were more likely to recommend clinical honors than evaluators who based their ratings only on patient presentations (see Table 4). Prior studies have shown that direct observation happens infrequently in the clinical setting28,29; these prior findings, along with our data, have important implications for evaluating the competency of medical students.
We found that more contact time with evaluators significantly increased the likelihood that evaluators would recommend honors—across all clerkships. While the number of observations by evaluators is generally thought to improve the reliability of clinical evaluation, the optimal amount of time and observation to increase reliability and mitigate potential bias remains unclear.7
Finally, as most institutions employ residents in the clinical evaluation process, our finding that residents are significantly more likely than faculty to recommend clinical honors is important. The finding may explain the observed heterogeneity across disciplines in the overall recommendations of honors (46.9% in IM, 35.8% in surgery, 30.7% in pediatrics) since residents completed more evaluations in IM and surgery than in pediatrics. Residents have less experience and often less training than faculty when evaluating students. Additionally, residents often have more contact time with students than faculty, which may contribute to our finding that more contact time increased the likelihood of an honors recommendation.
Improved understanding of clinical evaluations is necessary and important because they are the largest determinant of clerkship grades and therefore influence academic advancement and residency match decisions. UME leaders should recognize the effect of structural components on clinical evaluation and advocate robust faculty and resident training and feedback to improve evaluation skills.30,31 Additionally, institutions should continue to scrutinize evaluation schemas to consider what components are missing or inappropriately present. Characteristics such as ownership, motivation, and curiosity have been prioritized by clinical teachers but may be inadequately captured by current rubrics.
First, our study was restricted to a single institution, which limits its generalizability; however, we intentionally included nonprocedural and procedural specialties with similar evaluation rubrics to compare differences. Second, we included only main campus evaluations because of variation in rotations (e.g., longitudinal clerkships) and grading processes at other sites. Third, we did not include several student- or evaluator-specific characteristics that have been associated with performance evaluation in recent investigations (e.g., gender, ethnicity, personality traits).12,32 Last, assessment of the structural components (contact time, direct observation, etc.) was self-reported by evaluators when they completed their evaluations of students.
In conclusion, this study provides empirical insight into the value faculty and residents place on evaluation components when determining a clinical honors recommendation for students within 3 core clerkships. The evaluation components included those specific to the student, as well as structural components of the clerkships. We believe an improved understanding of these structural and student-level components, which determine clinical grades and influence residency placements, will benefit students, faculty, and residency programs when setting, interpreting, and evaluating targets for achievement.
1. Amos DE, Massagli TL. Medical school achievements as predictors of performance in a physical medicine and rehabilitation residency. Acad Med. 1996;71:678–680.
2. Thompson RH, Lohse CM, Husmann DA, Leibovich BC, Gettman MT. Predictors of a successful urology resident using medical student application materials. Urology. 2017;108:22–28.
3. Raman T, Alrabaa RG, Sood A, Maloof P, Benevenia J, Berberian W. Does residency selection criteria predict performance in orthopaedic surgery residency? Clin Orthop Relat Res. 2016;474:908–914.
4. Bhat R, Takenaka K, Levine B, et al. Predictors of a top performer during emergency medicine residency. J Emerg Med. 2015;49:505–512.
5. Hemmer PA, Papp KK, Mechaber AJ, Durning SJ. Evaluation, grading, and use of the RIME vocabulary on internal medicine clerkships: Results of a national survey and comparison to other clinical clerkships. Teach Learn Med. 2008;20:118–126.
6. Kassebaum DG, Eaglen RH. Shortcomings in the evaluation of students’ clinical skills and behaviors in medical school. Acad Med. 1999;74:842–849.
7. Zaidi NLB, Kreiter CD, Castaneda PR, et al. Generalizability of competency assessment scores across and within clerkships: How students, assessors, and clerkships matter. Acad Med. 2018;93:1212–1217.
8. Alexander EK, Osman NY, Walling JL, Mitchell VG. Variation and imprecision of clerkship grading in U.S. medical schools. Acad Med. 2012;87:1070–1076.
9. Westerman ME, Boe C, Bole R, et al. Evaluation of medical school grading variability in the United States: Are all honors the same? Acad Med. 2019;94:1939–1945.
10. Yeates P, O’Neill P, Mann K, Eva K. Seeing the same thing differently: Mechanisms that contribute to assessor differences in directly-observed performance assessments. Adv Health Sci Educ Theory Pract. 2013;18:325–341.
11. Govaerts MJ, Schuwirth LW, Van der Vleuten CP, Muijtjens AM. Workplace-based assessment: Effects of rater expertise. Adv Health Sci Educ Theory Pract. 2011;16:151–165.
12. Lee KB, Vaishnavi SN, Lau SK, Andriole DA, Jeffe DB. “Making the grade:” Noncognitive predictors of medical students’ clinical clerkship grades. J Natl Med Assoc. 2007;99:1138–1150.
13. Fay EE, Schiff MA, Mendiratta V, Benedetti TJ, Debiec K. Beyond the ivory tower: A comparison of grades across academic and community OB/GYN clerkship sites. Teach Learn Med. 2016;28:146–151.
14. Wimmers PF, Kanter SL, Splinter TA, Schmidt HG. Is clinical competence perceived differently for student daily performance on the wards versus clerkship grading? Adv Health Sci Educ Theory Pract. 2008;13:693–707.
15. Green M, Jones P, Thomas JX Jr. Selection criteria for residency: Results of a national program directors survey. Acad Med. 2009;84:362–367.
16. Takayama H, Grinsell R, Brock D, Foy H, Pellegrini C, Horvath K. Is it appropriate to use core clerkship grades in the selection of residents? Curr Surg. 2006;63:391–396.
17. Fazio SB, Ledford CH, Aronowitz PB, et al. Competency-based medical education in the internal medicine clerkship: A report from the Alliance for Academic Internal Medicine Undergraduate Medical Education Task Force. Acad Med. 2018;93:421–427.
18. Goldie J, Dowie A, Goldie A, Cotton P, Morrison J. What makes a good clinical student and teacher? An exploratory study. BMC Med Educ. 2015;15:40.
19. Lipman JM, Schenarts KD. Defining honors in the surgery clerkship. J Am Coll Surg. 2016;223:665–669.
20. Herrera LN, Khodadadi R, Schmit E, et al. Which student characteristics are most important in determining clinical honors in clerkships? A teaching ward attending perspective. Acad Med. 2019;94:1581–1588.
21. Swing SR, Clyman SG, Holmboe ES, Williams RG. Advancing resident assessment in graduate medical education. J Grad Med Educ. 2009;1:278–286.
22. Hubbard JP, Levit EJ, Schumacher CF, Schnabel TG Jr. An objective evaluation of clinical competence. New technics used by the National Board of Medical Examiners. N Engl J Med. 1965;272:1321–1328.
23. Durning SJ, Cation LJ, Markert RJ, Pangaro LN. Assessing the reliability and validity of the mini-clinical evaluation exercise for internal medicine residency training. Acad Med. 2002;77:900–904.
24. Durning SJ, Dong T, Hemmer PA, et al. Are commonly used premedical school or medical school measures associated with board certification? Mil Med. 2015;1804 Suppl18–23.
25. Kogan JR, Holmboe ES, Hauer KE. Tools for direct observation and assessment of clinical skills of medical trainees: A systematic review. JAMA. 2009;302:1316–1326.
26. Boissy A, Windover AK, Bokar D, et al. Communication skills training for physicians improves patient satisfaction. J Gen Intern Med. 2016;31:755–761.
27. Clever SL, Jin L, Levinson W, Meltzer DO. Does doctor-patient communication affect patient satisfaction with hospital care? Results of an analysis with a novel instrumental variable. Health Serv Res. 2008;435 Pt 11505–1519.
28. Burdick WP, Schoffstall J. Observation of emergency medicine residents at the bedside: How often does it happen? Acad Emerg Med. 1995;2:909–913.
29. Howley LD, Wilson WG. Direct observation of students during clerkship rotations: A multiyear descriptive study. Acad Med. 2004;79:276–280.
30. Holmboe ES, Ward DS, Reznick RK, et al. Faculty development in assessment: The missing link in competency-based medical education. Acad Med. 2011;86:460–467.
31. Rojek AE, Khanna R, Yim JWL, et al. Differences in narrative language in evaluations of medical students by gender and under-represented minority status. J Gen Intern Med. 2019;34:684–691.
32. Sobowale K, Ham SA, Curlin FA, Yoon JD. Personality traits are associated with academic achievement in medical school: A nationally representative study. Acad Psychiatry. 2018;42:338–345.