Letters of recommendation (LORs) are a time-honored aspect of the application process for undergraduate and graduate medical training. Despite this long tradition, many researchers and educators have questioned the utility of LORs for this purpose. In his classic article from 30 years ago, “Fantasy land,” Friedman1 has described how LORs are universally inflated and, as a result, become “useless” as predictors of future performance.1
Even with this skepticism, LORs continue to be used for the purpose of medical education selection.2–9 Given their widespread use, it is important to understand the degree to which LORs are a value-added step to the application process, with value being defined as benefit divided by cost. LORs would be beneficial if they could reliably predict educational outcomes, and indeed, two single-institution studies suggest this is possible.10,11 Cullen and colleagues10 showed that the overall rating in the LORs for internal medicine residency applications predicted professionalism scores in internship. Stohl and colleagues11 found that comments about patient care, medical knowledge, and interpersonal and communication skills were more common in the LORs of top- versus bottom-rated obstetrics–gynecology residency graduates. Additional studies also have suggested positive associations with outcomes, but these studies have significant design limitations.12–15
Other authors, however, have found LORs to have no predictive validity.16–19 In terms of cost, the addition of LORs to the application packet is not only taxing for the letter writer but is also a burden on the admissions committee members who must read and interpret each letter. For example, our medical school received 2,778 applications for the 2013 class. If faculty took five minutes to read a single candidate’s LORs, it would take 231 hours to read all of the letters.
Despite the significant investment of time and resources in producing and interpreting LORs, there is limited empirical evidence about how LORs should be used in the selection of medical students, if at all. The purpose of the present study was to determine whether LORs submitted for application to one medical school could predict the students who would become the top and the bottom of the class at graduation. In doing so, we hoped to inform medical school admission committees about how LORs might best be used.
We retrospectively studied three consecutive graduating classes (2007–2009) of the Uniformed Services University of the Health Sciences (USU) in Bethesda, Maryland. USU matriculates 170 students per year. In each class, the top 27 students are elected into the Alpha Omega Alpha (AOA) Honor Medical Society, which is approximately 16% of the graduating class, as per AOA regulations; these students embody desirable characteristics beyond grade point average (GPA) such as leadership among peers and professionalism.20 For a comparison group, we designated the 27 students with the lowest cumulative GPA (which includes preclinical and clinical performance) for each class as the “bottom of the class” graduates. We chose this extreme groups approach21 to bolster power and thereby minimize the number of LORs that required coding. For each student, we selected the first three LORs from their medical school application packet. Some packets contained letters that combined LORs from multiple authors into one document. If a section of this document was clearly written by one author, then we considered that section as a single letter for our study. We excluded portions of such documents written by premedical-school committees (sometimes referred to as “committee letters”) because these were infrequent and our goal was to focus solely on the individual-author LOR. As a result of these LOR selection criteria, 10 students in this study had only two LORs.
Rating form development and rating of LORs
Using the available literature, we created a list of characteristics used in interpreting LORs.10,12,22–29 We then added additional characteristics based on our experience as medical educators interpreting LORs for medical school as well as graduate medical education applications. Additionally, we interviewed five faculty members (who were not otherwise involved with the study) with experience reviewing LORs for medical school applications to delineate additional important characteristics. Next, we consolidated these characteristics to produce a single list, with the goal of including as many characteristics as possible. From this comprehensive list, we developed a data abstraction form that we tested and refined through an iterative process to produce the final form.
The final rating form contained 76 items, 45 of which were based on the literature review. These included 12 general characteristics: total sentence count27,29; number of sentences without the student’s name or person pronoun (assessing for “filler” sentences); number of sentences about the author’s experience evaluating students; presence of typographical or grammatical errors; gender or name errors (i.e., the author copied a previous letter but didn’t change the name/personal pronoun); use of letterhead; use of an official recommendation form; whether the right to see the letter was waived26,27,29; whether the author indicated the student asked him or her to write the letter; number of original sentences that were clearly original (i.e., not adapted from a previous LOR, classified as zero, one, or two or more)22,25,29; whether the reader was invited to contact the author26,29; and whether the contact information was explicitly provided.26,29 We also collected 13 author characteristics12,24,25: academic rank (assistant professor, associate professor, professor)25; graduate medical education position; physician; science professor; nonscience professor; employer/supervisor; nurse; currently military affiliation; relative; friend; or clergy. Next, we assessed 17 student characteristics: specific examples (by name) from outside work/school, exposure to the medical field, and volunteer experience; any comments about intellectual ability (analytic ability, learning ability, etc.),28,29 interpersonal skills (caring, consideration of others, etc.),26–29 and character (conscientious, honesty, maturity, work ethic, etc.)12,26–29; mention of awards (dean’s list, honor roll, etc.); how well the author knew the student (whether or not the word “knew” was used, which was then classified as very well, fairly well, or only slightly); whether the author was enthusiastic to write the letter (e.g., “I’m excited to write this letter for Jane”); the context of the observations (patient care, classroom [including number of courses, course level, and associated lab], and personal life); and whether the observations were mostly from lab experience. We also assessed for 18 comparative rankings: the presence and classification of semiquantitative overall ranking of the student (classified as best, one of the best, better than peers, at the level of peers, and below peers); the presence and the percentile of a quantitative overall ranking; any qualitative or quantitative overall ranking10,22,24,26,27,29; the presence and classification for a denominator for the overall ranking (classified as specific number, estimated denominator, nonspecific, or time span alone)10,22,26; any positive overall description of student (e.g., “Bob is an outstanding student”)24; comparison to graduate students; any comment and explanation ranking the student at the level of peers or below peers27; any other comments that were nonpositive (and explanation); whether the author spontaneously stated that the student would be accepted to his or her own institution; and whether an employer/supervisor spontaneously stated he or she would accept the student for continued employment or promotion in role. For the final category, we abstracted 16 summary characteristics: any “I recommend” statement22,24; the presence of descriptor(s) with the “recommend” statement (absolutely, with confidence, enthusiastically, without hesitancy, highest, highly, in strongest possible terms, without reservation, strongly, unqualified, wholeheartedly)24; whether the descriptor was combined with the term “very”; the number of descriptors; and whether the USU recommendation form was used, and if so, the ranking circled on the USU form (enthusiastically recommend, recommend, recommend with reservations, or do not recommend).
To blind the coders, we deidentified the LORs prior to rating them by removing all names, contact information, and school affiliations; this left only an identification code that could be linked back to the student’s AOA status. Using the rating form, two investigators (K.J.D., C.D.M., or G.R.) independently coded each LOR, with disagreements resolved by consensus. After all coding was complete, we linked the LOR data to each student’s undergraduate GPA, average Medical College Admission Test (MCAT) score,30 and AOA status.
Our null hypothesis was that LOR characteristics would not differ between AOA and bottom of the class students. To ensure an adequate sample size, we wanted at least 75 students per group, which would provide 90% power to detect a difference of 30% or larger between groups, assuming a prevalence of 30% for the characteristic and a P value of < .01. We used each LOR (not the individual student) as the unit of analysis. We chose this approach because many LORs had missing data (e.g., numerical comparative rating), which made creating an average score for each LOR characteristic for each student problematic. For each LOR characteristic, we screened for bivariate association between AOA and bottom of the class students using the chi-square, Fisher exact test, or Student t test as appropriate. We employed a significance level of α = .01, given the multiple comparisons. To control for confounding, we used logistic regression with AOA versus bottom of the class student as the dependent variable and undergraduate GPA (by quartile), average MCAT score (by quartile), and all LOR factors with a P value < .05 on bivariate analysis as the independent variables. We performed a sensitivity analysis on comparative rankings and summary statements using the author’s academic rank (any academic rank versus none and professors versus all others). We used STATA 11.2 statistical software (StataCorp LP; College Station, Texas) for all calculations. The USU institutional review board approved the study.
We identified 437 LORs (AOA = 214, bottom of the class = 223, Table 1). The average LOR was 18 sentences long (SD 8 sentences, about 1 page long), most were on letterhead, almost half used an official recommendation form, and most students waived their right to see the LOR. Surprisingly, over a third contained a typographical or grammatical error. All general characteristics were statistically similar between AOA and bottom of the class students.
Slightly more than half of the authors across both groups of students had an academic rank, with roughly half of them being professors (Table 2). Most authors were science professors, and one-third were employers or supervisors. Relatively few were physicians (14%) or nurses (2%). Of all the author characteristics, only employer/supervisor differed between student groups, as AOA students’ LORs were more likely to have been written by an employer/supervisor (36% versus 22%, P = .001).
Nearly every author commented about the student’s intellectual ability, interpersonal skills, and character (Table 3). Most observations were based on classroom experience. The only student characteristic that was different between student groups was the author description of how well they knew the student, with AOA students more likely than bottom of the class students to be classified as being known “very well” by the author (41% versus 22%, P = .003).
Most comparative rankings were similar between student groups (Table 4). Quantitative rankings were essentially identical, with the average rank being the top 11th percentile. Authors who were professors or any academic rank were more likely to provide a quantitative comparative ranking, but their ranking did not predict AOA status. AOA students were more likely to be labeled as the “best” (e.g., “X is the best student I have had in ANY course,” 41% versus 17%, P = .01), and this was not different by academic rank status. Although rare (n = 12), LORs in which the employer/supervisor spontaneously stated that he or she would promote or give an expanded role to the applicant were only for AOA students (P = .003). For example, an author wrote about a volunteer teaching aid who would become AOA: “Would I hire X to work with me? Absolutely!” In contrast, bottom of the class students were more likely to have nonpositive comments (13% versus 6%, P = .005). To illustrate, an author wrote about one student who, despite this, went on to become AOA: “His early academic career is spotted with withdrawals and marginal grades.”
Summary statements, defined as sentences starting with “I recommend,” were of no value in differentiating performance (Table 5). The most common descriptor with “I recommend” was “highly recommend.” Neither the particular descriptor with “I recommend” (absolutely, highly, etc.) nor the number of descriptors was different between student groups. As above, the results were not different for professors or those with any academic rank.
Three variables remained significant when controlling for undergraduate GPA and MCAT score (Table 6). A semiquantitative rating of “the best” compared with peers (OR 2.5, 95% CI 1.1–5.6, P = .02) and the author being an employer/supervisor (OR 1.9, 95% CI 1.2–3.0, P = .01) were associated with an increased likelihood of graduating AOA. Conversely, having a nonpositive comment (OR 0.47, 95% CI 0.22–1.0, P = .05) decreased the likelihood of graduating AOA. Of note, the “accept for promotion” variable described above could not be included because it was true only in AOA students and therefore was excluded from the logistic regression model, as per regression rules.
As the medical education community seeks to improve the value of assessment tools in admissions practices, it is important to use tools that can reliably predict outcomes. Students, faculty, and admissions committees, in aggregate, spend a great deal of time requesting, writing, and using LORs, respectively; yet there is no data-driven guidance on how to accomplish this task efficiently and effectively. Our study provides data to inform this process. For students who were accepted to medical school, we have shown that very few aspects of LORs submitted in support of the medical school application packet were associated with whether or not a student graduated at the top of the class. Despite these limited associations, LORs do have some limited value. In particular, our findings suggest that medical school admissions committees should consider giving higher priority to applicants whose LORs rate them as the “best” among their peers, have employer or supervisor authors, and possibly those that include comments about promotion. Further, admissions committees might downgrade, but not necessarily eliminate, applicants with LORs that include nonpositive comments. Although our study thoroughly examined the LORs of students accepted to medical school, we did not examine LORs of students who were rejected. This is an important distinction, and future research might attempt to determine whether the most important purpose of LORs is to reject applicants with overtly negative narratives.
Only one previous report has studied LORs for medical school application, to our knowledge. This single-school study showed that “careful interpretation” of LORs from medical school admissions weakly predicted a variety of preclinical and clinical outcomes, but this study neither controlled for other variables (e.g., GPA) nor was it able to be replicated in the following year.14 Thus, our study is the first to provide any concrete guidance on this important issue.
Previous authors have made recommendations for faculty when writing LORs for graduate medical education,22,26,29,31 but our findings provide only limited support for the characteristics that might differentiate between high- and low-performing students. For example, we attempted to quantify the construct of “depth of understanding” by determining the number of original sentences, which we defined as sentences that clearly could not have been copied from a previous LOR about another student. In practice, this proved difficult to achieve agreement, and we found it did not predict performance. Perhaps another method of determining depth of understanding would be able to show a difference between high and low performers. We also did not find a difference in quantitative numerical ranking, but did find that the top category of semiquantitative ranking, “the best,” was associated with top performers. Lastly, the summary statement of “I recommend” in our study had no predictive validity. It may mean that summary statements need to have a comparison to peers, as others have advocated.26,29,31
Although it is intuitive that the “best” students would go on to become AOA students and that nonpositive comments would be more likely attributed to students at the bottom of the class, we can only speculate as to why having an LOR from an employer or a supervisor is associated with medical school success. Perhaps weaker students spend all of their efforts on their class work and therefore have little time left for outside employment. Another possibility would be that stronger students are able to impress employers or supervisors enough to be hired more often than weaker students. Our finding of a recommendation for “promotion” only in LORs for AOA students supports this theory, as these would presumably be the most impressive students.
Our findings support the notion that readers should not try to search for hidden meanings beyond being labeled “the best,” nonpositive comments, and a recommendation for “promotion.” We found no relationship with performance with other characteristics, such as author enthusiasm in the opening paragraph (e.g., “I am thrilled to write this letter for Jane”), grammatical errors (postulating that an author might subconsciously produce a sloppy letter to indicate a weak student), or the absence of any comparison with peers.
Our findings provide guidance for future research. First, we have shown how LORs might be used to help identify successful students, but we only found one item related to weaker students. Perhaps replication of our work in the context of the entire application packet might help in developing a more robust “scoring system” that could identify these students. Second, if the implications of our study become widely implemented, will applicants and authors try to “game” the system, thereby potentially devaluing the LOR as an admissions tool? Following contents of LORs over time would be useful here, particularly if a standardized LOR for application to medical school comes to fruition.32
The present study had several limitations. First, the data come from a single institution. Second, as USU is a school for military and public health physicians, our applicants may be different from applicants to other medical schools, and LOR authors may have tailored their letters for USU. However, less than 25% of the LOR authors were currently affiliated with the military, making the authors quite similar to those found at other medical schools. Third, we did not attempt to control for other applicant characteristics beyond GPA and MCAT scores. Fourth, it may be that LORs only predict performance on selected attributes (e.g., integrity, resilience, empathy) instead of global performance, though it would be difficult to develop an LOR scoring system for these attributes. Lastly, we did not review the LORs from any rejected applicants. This is an important limitation of our study that restricts the inferences we can make about the value of LORs for admissions decisions.
In conclusion, we found that most characteristics of LORs for medical school application did not predict students’ performance in medical school as measured by top or bottom of the class status. Medical school admissions committees might use LORs written by employers or supervisors and those labeling students as the “best” among peers to rank candidates more strongly. Conversely, students with LORs containing nonpositive comments might be placed lower on the priority list. Future research would be useful to place these aspects in the context of other parts of the admissions packet, to help determine their true predictive power.
Acknowledgments: The authors would like to thank Mr. Allen Kay and Ms. Danielle Fenton for their assistance.
1. Friedman RB. Sounding board. Fantasy land. N Engl J Med. 1983;308:651–653
2. Bajaj G, Carmichael KD. What attributes are necessary to be selected for an orthopaedic surgery residency position: Perceptions of faculty and residents. South Med J. 2004;97:1179–1185
3. Bernstein AD, Jazrawi LM, Elbeshbeshy B, Della Valle CJ, Zuckerman JD. An analysis of orthopaedic residency selection criteria. Bull Hosp Jt Dis. 2002;61:49–57
4. Crane JT, Ferraro CM. Selection criteria for emergency medicine residency applicants. Acad Emerg Med. 2000;7:54–60
5. DeLisa JA, Jain SS, Campagnolo DI. Factors used by physical medicine and rehabilitation residency training directors to select their residents. Am J Phys Med Rehabil. 1994;73:152–156
6. Grantham JR. Radiology resident selection: Results of a survey. Invest Radiol. 1993;28:99–101
7. Green M, Jones P, Thomas JX Jr. Selection criteria for residency: Results of a national program directors survey. Acad Med. 2009;84:362–367
8. Wagoner NE, Gray GT. Report on a survey of program directors regarding selection factors in graduate medical education. J Med Educ. 1979;54:445–452
9. Wagoner NE, Suriano JR, Stoner JA. Factors used by program directors to select residents. J Med Educ. 1986;61:10–21
10. Cullen MW, Reed DA, Halvorsen AJ, et al. Selection criteria for internal medicine residency applicants and professionalism ratings during internship. Mayo Clin Proc. 2011;86:197–202
11. Stohl HE, Hueppchen NA, Bienstock JL. The utility of letters of recommendation in predicting resident success: Can the ACGME competencies help? J Grad Med Educ. 2011;3:387–390
12. Brothers TE, Wetherholt S. Importance of the faculty interview during the resident application process. J Surg Educ. 2007;64:378–385
13. Hayden SR, Hayden M, Gamst A. What characteristics of applicants to emergency medicine residency programs predict future success as an emergency medicine resident? Acad Emerg Med. 2005;12:206–210
14. Rippey RM, Thal S, Bongard SJ. A study of the University of Connecticut’s criteria for admission into medical school. Med Educ. 1981;15:298–305
15. Schaider JJ, Rydman RJ, Greene CS. Predictive value of letters of recommendation vs questionnaires for emergency medicine resident performance. Acad Emerg Med. 1997;4:801–805
16. Boyse TD, Patterson SK, Cohan RH, et al. Does medical school performance predict radiology resident performance? Acad Radiol. 2002;9:437–445
17. Chole RA, Ogden MA. Predictors of future success in otolaryngology residency applicants. Arch Otolaryngol Head Neck Surg. 2012;138:707–712
18. Clemente M, Michener MW. The dean’s letter of recommendation and internship performance. J Med Educ. 1976;51(7 pt 1):590–592
19. Leichner P, Eusebio-Torres E, Harper D. The validity of reference letters in predicting resident performance. J Med Educ. 1981;56:1019–1021
21. Preacher KJ, Rucker DD, MacCallum RC, Nicewander WA. Use of the extreme groups approach: A critical reexamination and new recommendations. Psychol Methods. 2005;10:178–192
22. DeZee KJ, Thomas MR, Mintz M, Durning SJ. Letters of recommendation: Rating, writing, and reading by clerkship directors of internal medicine. Teach Learn Med. 2009;21:153–158
23. Dirschl DR, Adams GL. Reliability in evaluating letters of recommendation. Acad Med. 2000;75:1029
24. Fortune JB. The content and value of letters of recommendation in the resident candidate evaluative process. Curr Surg. 2002;59:79–83
25. Greenburg AG, Doyle J, McClure DK. Letters of recommendation for surgical residencies: What they say and what they mean. J Surg Res. 1994;56:192–198
26. Keim SM, Rein JA, Chisholm C, et al. A standardized letter of recommendation for residency application. Acad Emerg Med. 1999;6:1141–1146
27. Larkin GL, Marco CA. Ethics seminars: Beyond authorship requirements—ethical considerations in writing letters of recommendation. Acad Emerg Med. 2001;8:70–73
28. O’Halloran CM, Altmaier EM, Smith WL, Franken EA Jr. Evaluation of resident applicants by letters of recommendation: A comparison of traditional and behavior-based formats. Invest Radiol. 1993;28:274–277
29. Wright SM, Ziegelstein RC. Writing more informative letters of reference. J Gen Intern Med. 2004;19(5 pt 2):588–593
30. Zhao X, Oppler S, Dunleavy D, Kroopnick M. Validity of four approaches of using repeaters’ MCAT scores in medical school admissions to predict USMLE Step 1 total scores. Acad Med. 2010;85(10 suppl):S64–S67
31. Lang VJ, Aboff BM, Bordley DR, et al. Guidelines for writing department of medicine summary letters. Am J Med. 2013;126:458–463