In 2017, 43,157 medical students and graduates applied for 31,757 available postgraduate medical residency training positions.1 A key component of the residency application is the medical student performance evaluation (MSPE), also known as the dean’s letter, which provides comparative data about individual medical students relative to their peers. Two key pieces of MSPE data are records of the student’s preclinical and clinical clerkship performance and a summative comparison with his or her peers. The information in the MSPE is used by program directors (PDs) in selecting the next class of incoming residents.2
To achieve accreditation by the Liaison Committee on Medical Education (LCME), a U.S. medical school must have a multimodal evaluation system in place for evaluating medical student achievement and ensuring competency.3 However, methods for evaluating students are not uniform across medical schools. In response to concerns that clerkship evaluations are imprecise and highly variable, in 2002, the Association of American Medical Colleges (AAMC) suggested modifications to the MSPE. An AAMC task force defined the professional attributes expected of medical students and requested that schools provide detailed graphical data to allow interpretation of performance-based assessment across core clerkships.4,5 Despite these recommendations, prior studies have documented significant inter- and intraschool grading variability.4,6–8 In addition, concerns exist regarding grade inflation and the possibility that this may decrease the reliability of medical student evaluations.9–14
In 2012, a review of clerkship grading at U.S. medical schools by Alexander and colleagues reported significant heterogeneity of grading systems within the U.S. medical education system.4 Importantly, it was noted that 97% of students were awarded 1 of the top 3 grades, and less than 1% of students nationally failed any clerkship.4 The authors called for a shift toward more reliable, competency-based student assessment.4 More recently, in 2016, Hom and colleagues performed a comprehensive analysis of MSPEs and reported that nearly half of medical schools did not follow AAMC guidelines for MSPEs.8
Given the importance placed on medical school grades in the residency application process, this lack of standardization may unfairly penalize students from institutions with strict grading (i.e., few honors given) compared with those with more lenient grading, making decisions regarding applicant selection more difficult. We therefore sought to build upon prior work, including that by Hom and colleagues,8 to describe and evaluate contemporary patterns of clerkship grade distributions across U.S. medical schools.4,8 We reexamined the 2 key features of MSPEs: core clerkship grades and “key words,” often used to convey class rank. We hypothesized that not only do medical schools still fail to provide adequate data to allow for accurate and reliable interpretation of an applicant’s overall performance but also the overall percentage of students receiving the highest grade in a given rotation continues to increase.
Electronic Residency Application Service (ERAS) data submitted to Mayo Clinic Departments of Urology and Orthopedic Surgery for the 2016–2017 Match cycle were reviewed. Eligible schools were LCME-accredited MD-granting medical schools in the United States and Puerto Rico with students anticipating graduation in 2017. When more than 1 campus was available, we included both campuses if separate data were provided. Using a methodology similar to that of Alexander and colleagues, we collected MSPE data related to each school’s description of clerkship evaluations as well as summative performance data for analysis.4 One MSPE per school was reviewed, and institutional data were extracted. Given that institutional data are identical in each MSPE from a given school, we reviewed the first MSPE listed in ERAS if more than 1 MSPE per school was available. All individual and medical school identifiers were removed, leaving only deidentified data for analyses. Because data were not linked to any individuals, this report was deemed exempt from oversight by the Mayo Clinic Institutional Review Board.
For each school, we identified the type of grading system used both for the preclinical years and for core clinical clerkships. Core clerkships included internal medicine, surgery, pediatrics, obstetrics–gynecology, psychiatry, neurology, and/or family medicine and generally occurred during the third year of medical school. We did not collect or analyze data pertaining to elective rotations. Schools were classified by number of grades available. For example, we categorized honors/near-honors/pass/fail as a 4-tier system, while A/B/C/D/F was considered 5-tier. We then determined the percentage of each class within a given tier (M.E.W., R.B., C.B.). Some schools explicitly provided numerical percentage data, while others provided graphical distributions. When graphs were provided, we estimated the specific percentages for each tier. A sample of estimates were checked to confirm accuracy and interreported reliability (M.E.W.). The majority of schools provided data for all tiers, with a few providing details only for their top group. In this situation, only data from within the provided tiers were included.
Similarly, we reviewed data provided regarding comparative performance in the final (summative) paragraph of MSPEs. Information regarding internal ranking (yes/no), performance ranking (quartiles, quintiles, etc.), and the use of key words describing overall performance was extracted. We recorded the percentage of students receiving each key word designation when this was provided.
For additional analyses, we grouped schools into 4 regions by the AAMC GEA (Group on Educational Affairs) geographic regions: Northeast, Southern, Central, and Western.15 School rank was based on the 2017–2018 U.S. News and World Report (USNWR) research ranking of medical schools.16 We obtained data on number of enrollees and graduates from the AAMC website.17,18 Raw number of students receiving a failing grade was calculated from enrollment data and reported percentages.
To assess correlation of grades year to year, we selected 20 schools that provided complete grade distribution data for 2016–2017 at random. We then extracted core clerkship grade data from 2017–2018 ERAS applications from those schools for comparison.
We assessed inter- and intraschool variability by comparing percentages of students placed into the top tier. Continuous features were summarized with medians and interquartile ranges (IQRs) with means and standard deviations (SDs) provided where appropriate. We summarized categorical features with frequency counts and percentages. Univariate comparisons for means were made using Student t test and 1-way ANOVA and Wilcoxon or Kruskal-Wallis tests for medians. Multivariable regression was performed using a generalized linear model to determine variables associated with the percentage of students receiving the highest available grade. We performed all statistical analyses using JMP software package (SAS Institute, Inc., Cary, North Carolina). All tests were 2-sided, with P < .05 considered statistically significant.
Data were available for 136/140 (97%) MD-granting U.S. medical schools with enrolled students anticipated to graduate in 2017. We included 1 additional campus because it used a different grading scheme from that of the primary campus, for a total of 137 schools. Of the 137 schools, 81 (59%) were public and 56 (41%) were private (Table 1). All but 1 school (136; 99%) provided information regarding their grading system and terminology used. Most (116/137; 85%) provided complete data defining the proportion of students receiving each specific grade within their grading system. Seven (5%) of the remaining schools only provided a partial breakdown of grade distribution, typically limiting data to those receiving the top grade, while 14 (10%) schools did not provide any grade distribution at all. Two schools (1.5%) used a pass/fail system for clinical grading; neither provided grade distributions. The majority (113/137; 83%) of grading schemes did have failure as the lowest tier, and 97/113 schools (86%) provided the percentage of students failing a rotation.
For preclinical years, pass/fail grading was most commonly used (65/137; 47%), generally with no internal ranking. During the clinical years, a 4-tier system was most commonly used for clerkship grading (59/137; 43.1%); however, the number of tiers used ranged from 2 to 7 (median, 4; IQR, 3–4), and at least 19 different schemes were identified (Supplemental Digital Appendix 1, available at http://links.lww.com/ACADMED/A700).
For all rotations, the highest available grade was achieved by a median of 33% of students (IQR, 24%–46%) and a mean of 35.9% (SD, ± 16.8). Per rotation, the highest available grade was achieved by a median of 30% to 40% of students per core clerkship, but the range (5%–97%) was broad (Figure 1). Failure was uncommon, with only 32.8% (45/137) of schools reporting any student failing a rotation. Of the estimated 125,211 clerkship grades given, approximately 438 were failures (0.35%). Nationally, for the 2016–2017 application cycle, the 2 most common rotations for a student to receive a failing grade were family medicine (83/17,688; 0.47%) and surgery (81/18,766; 0.46%).
Overall, schools ranked in the top 20 by 2017 USNWR research rank had higher median percentages of students receiving the highest core clerkship grade compared with students at schools outside the top 20 (40% vs 32%, P < .001). For specific rotations, students at schools in the top 20 had higher median top grades in surgery (35% vs 27%, P = .01), pediatrics (40% vs 31%, P = .03), and family medicine (46% vs 33%, P = .04). Interestingly, schools in the top 20 were also less likely to provide complete grade distributions (13/20 [65%] vs 102/117 [87%], P = .01). There were also significant differences between rotations (P < .001), school region (P = .03), number of grading tiers (P < .001), and school type, with public schools giving a lower median percentage of top grades compared with private schools (32% vs 36%, P = .01) (Supplemental Digital Appendix 2, available at http://links.lww.com/ACADMED/A700). There was no difference based on total enrollment (P = .4), years since school established (P = .9), or whether the lowest tier was failure (P = .5).
We then created a generalized linear model to determine associations with higher percentages of students receiving the top grade (Table 2). In this model, USNWR top 20 ranking remained significantly associated with a higher percentage of students receiving the top grade (P < .001). Additionally, students were more likely to receive the top grade during their psychiatry rotation (P < .001), at schools in the Southern region (P < .001), and at schools using a 4- or 5-tier grading system (P < .001). Students were less likely to receive the top grade during the surgery rotation (P = .002) or if they attended a school in the Western region (P < .001).
Finally, we looked at the summative performance paragraph of the MSPE and evaluated the methodology for ranking students. Of the MSPEs we evaluated in 2016–2017, 37 of 137 (27%) did not rank students. Of the remaining 100 schools (percentages not reported because number and percentage are identical), most (94) used 1 ranking method, while 6 used 2. The majority (57) summarized student performance using adjectives, while 26 ranked by tertiles (5), quartiles (16), or quintiles (5). Finally, 12 provided a numeric ranking, while the remaining 5 used another type of ranking, such as the number of honors (2), sextiles (1), an internal assessment (1), or “standard deviation.” For the schools using 2 methods of ranking, 4 used descriptive adjectives (key words) and 2 used another system as their secondary ranking method.
Among the 61 schools employing key word adjectives, the median number used was 5 (IQR, 4–5), and 32 different adjectives were used to describe students in various ranking tiers (Table 3). Most (43/61; 71%) provided the distribution of students of at least the highest tier, and the median percentile of students in the highest tier was 22% (IQR, 16%–26%).
In assessing year-to-year variability, we found that the percentage of students receiving the highest grade in the 2016–2017 Match cycle was highly correlated with receiving the highest grade in the 2017–2018 Match cycle (R = 0.8, P < .001) (Supplemental Digital Appendix 3, available at http://links.lww.com/ACADMED/A700).
The purpose of medical school grades is to indicate a baseline level of student competence, thus ensuring that minimum standards are met as well as distinguishing levels of performance.19 Despite reports published in 2012 by Alexander and colleagues and by Hom and colleagues in 2016 calling for systematic changes to increase consistency and transparency of medical student grading, we find that little progress has been made.4,8 We examined medical school grading patterns available from 1 residency program from the 2016–2017 application cycle and found that significant variability persisted in the type of grading systems used by institutions, as well as in the number of students receiving the highest grade in a particular clerkship. In fact, the percentage of students receiving the highest grade continues to rise. In 1987–1988, the mean number of students receiving honors in a 4-tier system was 21.7%, and this rose to 25.2% for 1995–1996.12 This subsequently increased to 33.9% in 2009–2010.4 We find that this trend has continued in 2016–2017, with 37.4% of students in a 4-tier system receiving honors.
In medical school, core clinical clerkships are the first time when students are subjectively evaluated on their skills as physicians—that is, bedside manner, interaction with a medical team, ability to follow up on test results, and ability to perform basic procedural skills. Recently, we reviewed urology residents at the Mayo Clinic from 2000 to 2011 and evaluated their medical student application materials associated with high-performing residents.20 The strongest feature associated with an “excellent” overall resident was an honors grade in all core clinical clerkships.20 Similarly, in evaluating predictors of Accreditation Council for Graduate Medical Education competency-based performance, Tolan and colleagues found that the number of honors received was significantly associated with overall resident quality based on core competency scores.21 Taken together, these data suggest that medical student grades, especially in the core clinical clerkships, are predictive of resident performance and facilitate selection of the strongest candidates for the future medical field.
However, the variability in clerkship grading between institutions and the continued increase in students receiving honors may present difficulties for PDs reviewing applications. In 2009, Green and colleagues surveyed PDs evaluating selection criteria for residency.2 At that time, urology PDs rated clerkship grades and United States Medical Licensing Examination (USMLE) Step 1 scores equally as the top factor in selecting residents.2 However, in 2015, Weissbart and colleagues reported that among urology PDs, USMLE scores remained the most important factor in selecting applicants, while clerkship grades, including the surgery grade, were considered less important.22 One may hypothesize that over time, less emphasis is being placed on clerkship grades because of persistent grade variability and concerns of grade inflation.
Grade inflation, defined as a “greater percentage of excellent scores than student performances warrant,”12 has been well documented at the undergraduate level, particularly among prestigious schools.23,24 Likewise, it has been identified as an issue in medical student clerkship grading. Surveys of internal medicine PDs have shown that a rising proportion admit to passing students who should have failed (18% in 2004, 38% in 2009). Up to 55% of clerkship directors have reported that grade inflation occurs within their clerkship,9,25,26 due in part to the belief that it assists students in obtaining better residency positions.19 Other explanations for grade inflation fall into 1 of 4 categories—students, assessors, student–assessor relationship, and grading instruments.14 For example, students may pressure assessors to give good grades regardless of performance, while inexperienced assessors have been reported to have more difficulty giving negative feedback.9,14 Conversely, nontenured assessors rely on student evaluations for continued employment, while those who are tenured may give higher grades to avoid dealing with appeals.14 The continued trend of increasing grades may be due to generational differences between learners as well.27 Twenge found that contemporary students scored higher on traits such as assertiveness, self-liking, and high expectations as well as some measures of stress, anxiety, and poor mental health compared with prior generations of students.27 These traits may result in more students pressuring assessors for good grades or appealing poor grades. In addition, we found that students at a USNWR top 20 school were more likely to receive a grade in the highest tier. The reasons for this are unclear but may be due to pressure on such schools to help their students obtain prestigious residencies or, conversely, may indicate a higher caliber of students at these institutions. It is also possible that awareness of grade inflation may hasten the desire to elevate grades for a higher proportion of students, grading them not against each other but against the perception of a medical student standard. While we do not have specifics regarding the framework each institution uses for grading, it is possible that schools in the USNWR top 20 may use a different approach for measuring performance (i.e., criterion based vs normative based).28
We also found that schools using a 4- or 5-tier system gave the highest grade more often than schools using other rubrics. This is consistent with previous reports, though admittedly surprising, as one might expect that the diversity of grading options would allow for better granularity and dispersion of students across tiers.4,19 Alexander and colleagues reported that students attending a school that used a 4-tier grading system were most likely to receive the highest grade,4 and despite hypothesizing that increasing the number of available tiers would reduce grade inflation, Fazio and colleagues also found that with more tiers available, there was a trend toward higher grades.19
Following the 2002 recommendations from the AAMC to modify and improve the MSPE, Shea and colleagues examined institutional compliance in 2005 and found that only 17% of programs provided clear, comparative data on overall student performance.29 Hom and colleagues reported some improvement by 2014, with 51% of schools providing complete information about clerkship grades and key word distributions.8 We found continued improvement with 60% of schools reporting complete information on comparative performance and key word distributions. However, there continues to be significant variability in how schools report these data; we found at least 32 different adjectives used for comparison purposes. The information used to decipher such variability is not always readily available. Even if all schools were to provide complete data, without standardization it would remain difficult to interpret and apply.
In September 2016, the AAMC MSPE task force issued specific recommendations for revision of the MSPE to improve its usefulness during the residency selection process.30 In regard to the summative paragraph, members recommended that a final adjective be included only if a school-wide comparison is included.30 Our findings support this recommendation given the ongoing lack of standardized reporting of comparative performance. As this recommendation is incorporated into current practice, future studies to assess its impact will be needed.
Here, we have presented what is, to our knowledge, the most current description of the state of clinical clerkship grading at LCME-accredited U.S. medical schools including variables associated with receiving higher clerkship grades. Our findings show not only persistent variability in grading between institutions but also higher grades overall compared with reports from 10 and 20 years ago.12 As grades are an integral component of residency applications, more consistency is needed to maintain their utility and better differentiate applicants. We believe that a nationally standardized grading system may aid PDs when comparing residency applications.4,19
We recognize that our study is limited by the fact that our sample consisted of MSPEs from medical schools from a single application year; thus, the grades may be a reflection of that year’s students as opposed to the school’s grading tendencies. However, we did assess a small sample of schools from consecutive years and found that grading was highly consistent over 2 years, indicating that these trends are likely independent of class composition. In addition, we did not study how each school defines an honor student, which likely varies by institution.
Overall, we documented significant institutional variation in clinical grading practices at U.S. medical schools but year-to-year stability. For some core clerkships, as many as 97% received the highest grade, which diminishes the ability to distinguish applicants. This study is a contemporary analysis of current grading schemes at U.S. medical schools. While we noticed improvement in the summative paragraph following AAMC MSPE task force recommendations compared with our observations of previous applications, the wide distribution of grading patterns and highly variable percentage of students receiving the highest clerkship grade warrants further efforts at standardization and consistency amongst institutions. A standardized approach to reporting clinical performance may allow for better comparison of medical student applicants.
2. Green M, Jones P, Thomas JX Jr.. Selection criteria for residency: Results of a national program directors survey. Acad Med. 2009;84:362–367.
3. Liaison Committee on Medical Education. Functions and structure of a medical school: Standards for accreditation of medical education programs leading to the MD degree. http://lcme.org/publications/#Standards
. Accessed May 21, 2019.
4. Alexander EK, Osman NY, Walling JL, Mitchell VG. Variation and imprecision of clerkship grading in U.S. medical schools. Acad Med. 2012;87:1070–1076.
5. Association of American Medical Colleges, American Association of Orthopaedic Medicine. A guide to the preparation of the medical student performance evaluation. http://www.aamc.org/members/gsa/mspeguide.pdf
. Published 2002. [No longer available.]
6. McLeod PJ. So few medical schools, so many clerk rating systems! CMAJ. 1992;146:2161–2164.
7. Plymale MA, French J, Donnelly MB, Iocono J, Pulito AR. Variation in faculty evaluations of clerkship students attributable to surgical service. J Surg Educ. 2010;67:179–183.
8. Hom J, Richman I, Hall P, et al. The state of medical student performance evaluations: Improved transparency or continued obfuscation? Acad Med. 2016;91:1534–1539.
9. Cacamese SM, Elnicki M, Speer AJ. Grade inflation and the internal medicine subinternship: A national survey of clerkship directors. Teach Learn Med. 2007;19:343–346.
10. Grover S, Swisher-McClure S, Sosnowicz S, et al. Grade inflation in medical student radiation oncology clerkships: Missed opportunities for feedback? Int J Radiat Oncol Biol Phys. 2015;92:740–744.
11. Roman BJ, Trevino J. An approach to address grade inflation in a psychiatry clerkship. Acad Psychiatry. 2006;30:110–115.
12. Speer AJ, Solomon DJ, Fincher RM. Grade inflation in internal medicine clerkships: Results of a national survey. Teach Learn Med. 2000;12:112–116.
13. Weaver CS, Humbert AJ, Besinger BR, Graber JA, Brizendine EJ. A more explicit grading scale decreases grade inflation in a clinical clerkship. Acad Emerg Med. 2007;14:283–286.
14. Donaldson JH, Gray M. Systematic review of grading practice: Is there evidence of grade inflation? Nurse Educ Pract. 2012;12:101–114.
19. Fazio SB, Torre DM, DeFer TM. Grading practices and distributions across internal medicine clerkships. Teach Learn Med. 2016;28:286–292.
20. Thompson RH, Lohse CM, Husmann DA, Leibovich BC, Gettman MT. Predictors of a successful urology resident using medical student application materials. Urology. 2017;108:22–28.
21. Tolan AM, Kaji AH, Quach C, Hines OJ, de Virgilio C. The Electronic Residency Application Service application can predict Accreditation Council for Graduate Medical Education competency-based surgical resident performance. J Surg Educ. 2010;67:444–448.
22. Weissbart SJ, Stock JA, Wein AJ. Program directors’ criteria for selection into urology residency. Urology. 2015;85:731–736.
24. Rojstaczer S, Healy C. Where A is ordinary: The evolution of American college and university grading, 1940–2009. Teachers Coll Rec. July 2012;114:1–23.
25. Bowen RE, Grant WJ, Schenarts KD. The sum is greater than its parts: Clinical evaluations and grade inflation in the surgery clerkship. Am J Surg. 2015;209:760–764.
26. Fazio SB, Papp KK, Torre DM, Defer TM. Grade inflation in the internal medicine clerkship: A national survey. Teach Learn Med. 2013;25:71–76.
27. Twenge JM. Generational changes and their impact in the classroom: Teaching Generation Me. Med Educ. 2009;43:398–405.
28. Durning SJ, Hemmer PA. Commentary: Grading: What is it good for? Acad Med. 2012;87:1002–1004.
29. Shea JA, O’Grady E, Morrison G, Wagner BR, Morris JB. Medical student performance evaluations in 2005: An improvement over the former dean’s letter? Acad Med. 2008;83:284–291.