Reliability in Evaluating Letters of Recommendation

Dirschl, Douglas R. MD; Adams, George L. MD

Author Information
Letters of recommendation are considered an important source of information about residency applicants,1 but no published study documents the reliability of rating these letters. This study measured the interobserver reliability of evaluating letters of recommendation for applicants to an orthopedics training program.


Three letters of recommendation (from a chair of orthopedics, an orthopedist, and a non-orthopedist) were collected from the application files of 58 orthopedics residents who completed training at the University of North Carolina at Chapel Hill School of Medicine, 1983–1987. The applicant's name, author's name, and author's institution were all masked.

Six orthopedics faculty rated the content of each letter “outstanding,” “average,” or “poor,” the method of evaluation faculty use to rate such letters. Interobserver reliability was expressed using the kappa statistic.2 Agreement by kappa value was characterized as <0 = poor; 0-.2 = slight; .21− .40 = fair; .41− .60 = moderate; .61− .80 = substantial; and .81−1 = almost perfect.2


The mean kappa values were .17 ± .03 (mean ± standard error) for letters from a chair,.28 ± .03 for letters from an orthopedist, and .20 ± .04 for letters from a non-orthopedist. The differences in kappa values were not statistically significant (p = 0.15).

Of the residents whose letters were evaluated, eight had performed at a superior level in the residency program and eight had had inferior performances. There was no significant difference in kappa values for rating letters for either the eight superior or eight inferior residents (p > 0.5). While the superior and inferior groups had equal percentages of letters rated “poor” (15%), the superior group had a greater percentage of letters rated “outstanding” (33 versus 18%).


No previous investigation has established benchmarks to determine acceptable assessments of letters of recommendation. Using the kappa statistic, this preliminary study found slight interobserver reliability.

Although many faculty believe the identity of the author of a letter is important in determining the letter's value, the observers in this study focused solely on each letter's content. Some faculty believe that references to an applicant's positive or negative attributes are the most important information to glean from a letter. However, the study of content yielded only slight interobserver reliability.

Residency programs should be aware of the potential for interobserver variability in the interpretation of letters of recommendations. While use of an objective scoring system for letters might improve reliability, it is possible that much of the variability is due to the letters themselves. Authors tend to give glowing reports, making the letters less discriminatory (the phenomenon of range restriction) and the author's true opinion of the applicant less clear. The continued assessment of the reliability of rating letters is necessary.


