The use of peer evaluation has a long history of identifying successful physicians, perhaps more so than grades and standardized exam scores. Peer evaluation is used to guide patient referrals, hospital privileges, grant and manuscript reviews, and promotion and tenure decisions. Medical student peer evaluation has long been suggested as a better predictor of performance during residency training than medical school grades or faculty evaluations,1,2 and it seems to be stable from year to year.3 One study does note, however, that peer ratings are not necessarily better than other measures of achievement, such as grade point average (GPA) and faculty clinical performance evaluations.4
Peer evaluation has been suggested to be especially useful in measuring interpersonal skills.5 Studies of correlations between self, peer, and faculty evaluations in medical education generally show low correlations between self and other evaluations, and moderate to high correlations between peer and faculty evaluations.6–9 We have previously reported that peer evaluation of professional competence is best correlated with measures of cooperative learning and, to a much lesser degree, with medical school GPAs and performance on National Board of Medical Examiners licensing exams, and it is not correlated with admission data.10 This is an important point because it suggests that medical students who demonstrate cooperative rather than competitive learning behaviors are later recognized by their peers as being more professionally competent and are predicted to be better doctors.
There are two common methods of peer evaluation. The most common method is for peers to rate one another’s performance for specific abilities or behaviors using four- to nine-point Likert scales, thus providing specific information about each subject. Formative peer evaluation using peer ratings forms can be an effective tool for providing feedback and influencing student behavior, especially for cooperative learning and interpersonal skills.5,11,12
The second peer evaluation method is peer nomination, in which students are asked to nominate a limited number of classmates who best fit various situations.1,10 Although peer nomination can be complex to design, administer, and analyze, it is the simplest method for the raters, and it has been suggested to be more reliable than peer ratings.1,4,13 The distribution of a limited resource built into a peer nomination form allows better discrimination of top achievers and avoids the Lake Wobegon phenomenon, that is, peer evaluators cannot rate most of their classmates as above average. Although it is less useful for student feedback, because it only provides information for a small subset of subjects, the power and utility of peer nomination are that this peer assessment method effectively discriminates extremes.
Although some investigators have suggested that medical student peer evaluation may be weakened by social biases,14,15 others have reported that medical students are capable of providing reliable and valid evaluations of their peers’ performance.1,13 Interestingly, Arnold and coworkers4 compared results obtained when peer evaluation of medical students was performed primarily for research purposes versus as an assessment tool for promotion. Their results suggested that peer evaluation in the context of formal assessment provides ratings that are internally consistent, unbiased, and valid.
One of the goals of this study was to assess the reliability of a revised peer nomination form using a multiinstitution study design to investigate possible relationships between rankings of humanism, professionalism, clinical competence, and interpersonal skills. The second goal was to test the hypothesis that peer nomination could be used to identify students who demonstrated humanism (e.g., caring, altruism, respect, and empathy). The third goal was to compare the use of factor analysis with a less complex method of obtaining student rankings. On the basis of our findings, we suggest that the numbers of peer nominations students receive for selected items should be used by medical schools as an important addition to their medical student evaluation program.
Peer nomination forms
We adapted peer nomination forms from a form originally developed at Case Western Reserve University School of Medicine16 and used at the University of Florida for more than 25 years.10 In these surveys, medical students are asked to identify the top three classmates that best fit different scenarios. Sample form items addressing student attributes of humanism and community service were pilot tested at the University of Florida College of Medicine in 2002 (class of 2003), and preliminary data were presented at an Arnold P. Gold Foundation Barriers Conference in January 2003 (data not shown). The wording of peer nomination items was discussed by conference participants, and the 12-item form A was developed (Table 1). Form A included several items used previously at the University of Florida (items one, two, four, six, seven, and nine), thus providing a basis for comparison with historical data. As described below, results from the administration of the 12-item form at three medical schools were used to identify a subset of items to develop a simpler, six-item form B (Table 1).
Participating medical schools
The three medical schools participating in this study were two public institutions (University of Florida College of Medicine and University of Michigan Medical School) and one private institution (Tulane University School of Medicine). They constitute a convenience sample, as collaborators were sought from attendees at the January 2003 Barriers Conference. The coinvestigators had long-standing interests in the teaching and assessment of humanism and/or professionalism and had plans to use the peer nomination results in the selection of exemplar students for awards at their respective schools. The peer nomination items, deidentification of student data, and data analysis were approved by the institutional review boards at all three institutions.
Surveys at all three schools were administered in 2003 (form A) and again in 2004 (form B). At Michigan, the survey period was during the middle of students’ fourth year, as the results were also used to select one senior student for an award. The survey period was in the middle of the third year at Tulane and during the summer before the fourth year at Florida. The results were used in dean’s letters and for selection of students to the Gold Humanism Honor Society at Florida and Tulane. Peer nomination forms were Web-based at Florida and Michigan, and paper forms were administered at Tulane. The cover sheet or instructions stated that (1) the form was being used for educational research purposes, (2) the results of the peer nomination method typically result in recognition for about 15% of the class10 but do not discriminate among the other 85%, and (3) when mentioned in dean’s letters of recommendation at Florida, those students were often accepted by residencies that, on the basis of standard criteria such as class rank, would not otherwise have accepted them. Class sizes and response rates for both forms are summarized in Table 2.
We used factor analysis of the resulting datasets to analyze relationships among peer evaluation survey items and to identify latent variables, or factors, that account for correlations among the survey items. Factor analysis also provided each student a score for each factor identified, allowing a ranking of students for each characteristic. Factor analysis of numbers of nominations each student received for each survey item was done using the SAS factor procedure and principal-components analysis (SAS Institute Inc, Cary, NC). A default eigen value of 1.0 was used to determine major statistical factors, which were then correlated with medical student behavioral characteristics as defined by the relevant survey items. The Harris–Kaiser rotation was used to determine standardized regression coefficients, which were used to plot rotated factor patterns. The SAS score procedure was used to provide listing of student scores for each factor. The students were then sorted in descending order to determine their rankings for each behavioral characteristic.
Rankings based on counting nominations
Peer evaluation scores were calculated for each student using a personal computer spreadsheet (Microsoft Excel) as sums of the numbers of nominations for subsets of survey items that represented each student characteristic—that is, form B, items one (emergency) and four (residency) for clinical competence (factor one), items two (caring) and six (listening) for caring (factor two), and item three (community) for community service (factor three). Student rankings were determined by simply sorting scores in descending order.
Twelve-item peer nomination form A
The numbers of students and response rates for each school for the 12-item form are shown in Table 2. The response rates at the two schools at which the peer nomination form was introduced for the first time (Michigan and Tulane) were 46% to 50%, compared with nearly full participation at Florida, where a similar form has been administered to rising seniors for decades.
Factor analysis was used to identify major factors addressed by the 12-item form and the relative contributions of each item to each factor. Two major factors (eigen values >1) were identified at all three medical schools.
The weighting of peer nomination items for each of the three factors is shown graphically in Figure 1, part A. Consistent with earlier studies,10 items dealing with emergency situations (item one), accepting data (item six), and best choices for residencies (item nine) accounted for most of the variance (37%–61%) (Table 2). These items contributed the most to factor one, defined as clinical competence. Items related to interpersonal skills, such as the classmates who are the most likeable (item two) or who would be called on to discuss a personally disturbing event (item seven), accounted for 13% to 26% of the variance (Table 2), and weight on factor two. Interestingly, most of the new items designed to address behavioral traits related to humanism also contributed to factor two, including items related to caring, giving bad news, respect, and listening (items 3, 8, 10, and 12). Thus, factor two includes items related to interpersonal skills and humanism (e.g., caring, empathy, respect, and listening) and is hereafter referred to simply as caring.
A third major factor was identified at Tulane, although the eigen values at Florida and Michigan approached a value of 1 (0.86 and 0.94, respectively). One survey item dealing with community service defined this third factor, accounting for 7% to 9% of the variance at all three medical schools (Table 2). The higher significance of this factor at Tulane may be related to the medical curriculum requirements for service learning at Tulane.17 It should be noted that the wording of this item (service to communities) was designed to be intentionally vague so that medical students might think of community service in a broad fashion, ranging from service at a local free clinic to medical mission trips, or whatever opportunities might be available through their medical school.
The items about which classmates will make the best all-around doctors (item 4) and who they would want as the doctor for themselves or loved ones (item 11) captured elements of both clinical competence (factor one) and caring (factor two), as the data points fall between the factor one and factor two axes (Figure 1, part A).
Six-item peer nomination form B
Most of the items in form A displayed remarkably similar factor characteristics at all three medical schools, as illustrated by the clustering of data points in Figure 1, part A. Some items, however, were not consistent across all three medical schools, such as those related to likeability, discussing personally disturbing events, and respect. On the basis of these observations, a subset of six items was selected that best defined each of the three characteristics. This six-item form B (Table 1), consisting of two clinical competence items, two caring items, one community service item, and one overall evaluation item (own doc), was administered at all three medical schools during calendar year 2004.
Factor analysis of form B again revealed two major factors at all three institutions: clinical competence (factor one) and caring (factor two). Community service represented a minor third factor at all three medical schools, with eigen values in the range of 0.62 to 0.97. The weighting of peer nomination items for each of the three factors is shown graphically in Figure 1, part B.
Consistent with the results using form A, clinical competence accounted for the most variance when using form B (Table 2) and was defined by items about emergency situations (item one) and the best choices for residencies (item four). Caring was defined by items related to caring (item two) and listening skills (item six). Community service (item three) was defined by the same item as in form A. The item about who students would want as their own doctor (item five) again captured elements of both clinical competence and caring (Figure 1, part B).
The factor analysis procedure can be used to score individual students for each characteristic, and such scores can be sorted in descending order to arrive at a ranking. We have reported previously that such scores distinguish up to about 15% of a medical school class and do not differentiate among the remainder of the class.10 An alternative approach to obtaining student rankings is simply to count the number of nominations each student receives for the items relevant to the characteristic, that is, items one and four for clinical competence, items two and six for caring, and item three for community service. As shown in Figure 2, using this simpler approach the top 15% of the class can be distinguished at all three medical schools for all three factors using the six-item form B. As shown in Chart 1, which compares the student rankings at all three institutions for the three characteristics, the ranking obtained by counting nominations is very similar to the student ranking obtained by the more complex factor analysis. For all three competencies at all three institutions, counting nominations identified 85% to 90% of the same students as ranking based on factor analysis. Furthermore, more than half of the top 10 students and greater than 90% of the top five students are ranked identically.
It should be noted that, consistent with previous studies, the groups of medical students distinguished by each of the factors are different.10 There was an average overlap of about 20% among the clinical competence, caring, and community service groups at Florida and Tulane, whereas at Michigan the overlaps were somewhat higher, at about 40% (data not shown).
The one survey item (own doc) that has characteristics of both clinical competence and caring (Figure 1) can also be used to distinguish about 15% of a class (data not shown). When this group was compared with those identified by the three individual factors, about half of them were also distinguished by clinical competence and by caring, and about a third by community service (data not shown). About 10% of the medical students distinguished by the own doc survey item were not identified in any of the other three groups (data not shown).
Two major characteristics are consistently identified by factor analysis of peer ratings by medical students, residents, and physicians.5,10–13,18–23 The first factor is usually defined as medical knowledge or technical skills, and the second factor is usually defined as interpersonal skills or patient relationships, suggesting that physicians in training and in practice may organize their perceptions of their peers’ performance into these two major categories.21
The peer nomination format has been noted to be especially useful for identifying and rewarding students with outstanding performance in domains of professionalism.24 Our findings for peer nomination at three medical schools revealed the same pattern of two major factors. Factor analysis of data from both peer nomination forms revealed that clinical competence (factor one) still accounts for the most variance and is characterized by peer recognition of classmates who would perform best in an emergency situation, have the most reliable data, and be the best choices for residency positions. Similarly, the second characteristic identified (factor two) by both peer nomination forms was related to interpersonal skills.
One of the goals of this study was to determine whether medical student traits related to humanism and professionalism in medicine, such as caring, respect, and altruism, would be identified as an independent factor(s) in peer assessment, and to investigate possible relationships between rankings of humanism, professionalism, clinical competence, and interpersonal skills. Four of the new peer nomination items in the 12-item form related to humanism or professionalism factored with interpersonal skills. Thus, humanism could not be identified as an independent characteristic. This is consistent with the problem that even faculty have difficulty in recognizing characteristics associated with professionalism.25 A better description of factor two may be simply caring, because to be perceived as caring, one must have both good interpersonal skills and humanistic intent.
Three of the peer nomination items tested in the 12-item form did not fit the typical two-factor model (Figure 1). The new item related to dedication to community service (item five on form A; item three on form B) defined a third independent factor, albeit with lower significance. It is interesting to note that students nominated by their peers for commitment to community service during medical school can sometimes be identified by their prior community service activities,26 although they may be less likely to be in the top quartile of their class at graduation.17 The peer nomination items asking medical students who they would want as the doctor for themselves or a loved one (own doc item 11 on form A; item 5 on form B), or which classmates will make the best all-around doctors (item four on form A), have characteristics of both factor one and factor two. It is interesting that this intuitively obvious fact emerges from the data. The own doc question also distinguished about 15% of each class, including about 10% of the students not identified by the three individual factors, suggesting that this survey item may allow additional students to be recognized.
Remarkably similar results were obtained at all three medical schools (two public, one private) participating in this study. Use of a previous version of the peer nomination survey at the University of Florida yielded similar results during a period of more than 20 years. In addition, about 70% of the more than 60 medical schools with chapters of the Gold Humanism Honor Society are now using form B, or slightly modified versions of it, as one of the selection tools for recognition of medical student exemplars (A.F. Sole, N. Wagoner, S.O. Gold, personal communication, 2007). When all of these considerations are taken together, this peer nomination survey seems to be both generalizable and acceptable.
The use of factor analysis to obtain student rankings has been a barrier to widespread use of peer evaluation. Here, we present a much simpler method. Simply counting the number of times a student has been nominated for items relevant to clinical competence, caring, and community service identifies a very similar ranking of students as that obtained by factor analysis. This approach should make the use of peer evaluation data much more accessible.
There are barriers to the implementation of peer evaluation in addition to those addressed in this paper—that is, the lack of reliable, validated instruments and availability of a simple analysis system. For example, peer evaluation requires prior peer interaction to be reliable. Fortunately, the curricula at many medical schools seem to be moving away from many hours spent in lecture to more time spent in small-group learning. Another possible barrier is the context in which peer evaluation data are collected; however, medical students have been shown to be “willing to engage in peer assessment of professionalism provided the appropriate institutional support, anonymity, faculty oversight, timely assessment, counseling or commendation of peers, and protection for the student evaluator are present.”27 This may explain why the percentages of students participating in this study at the three institutions varied considerably (40%–99%). Florida students are exposed to peer evaluation throughout the curriculum. A variation of the peer nomination forms has been in use for decades at Florida, and resulting student rankings (top 10%–15%) are used for dean’s letters of recommendation. This inclusion has helped students be accepted by residencies that, on the basis of standard criteria such as class rank, may not have accepted them otherwise.10 More recently, peer nomination has been used as a selection tool for the Florida chapter of the Gold Humanism Honor Society. In contrast, this was the first experience for Tulane and Michigan students with such peer nomination forms. In addition to being part of this research study, Tulane students were told that results would be used in medical student performance evaluation (MSPE) letters and for selection of students to the Gold Humanism Honor Society, whereas at Michigan the results were used to select one student for one award.
The reluctance medical students may have to participate in peer assessment, for instance, because of concerns about confidentiality or how the data will be used, may be reduced in a supportive environment that is attentive to learner improvement and professionalism.24,27 As stated by Epstein,28 “Peer assessments depend on trust and require scrupulous attention to confidentiality.” Students’ interest in participating depends on the context of the peer assessment within the academic environment, and whether it covers issues about which they believe they are particularly perceptive.
In conclusion, we have described a six-item peer nomination form that can be used by medical schools to discern outstanding students in the areas of clinical competence, caring, and community service, as well as which medical students other students would want as a doctor for themselves or a loved one. At least the top 15% can be identified by a simple count of number of times they have been nominated by classmates. Such peer nomination results provide medical schools with a reliable tool to identify exemplars in these areas of medical competence and facilitate the selection of medical students for honors (e.g., Gold Humanism Honor Society). Inclusion of peer nomination results in MSPE letters may be especially important now that assessment of professional attributes relative to peers and compatibility with peers are requirements for the MSPE, and should help some students obtain their most desired residency positions. Widespread use of such peer evaluation and increased student awareness of its potential impact on residency selection may promote more cooperative student behavior and less unhealthy competition in medical schools.
The authors would like to thank their many colleagues for helpful discussions at meetings sponsored by the Arnold P. Gold Foundation. They also thank Drs. John Paling and Norma Wagoner for their professional competence and caring comments about the manuscript. This research project was supported by a grant entitled “Evaluation of Peer Rankings of Professional Competence by UF Medical Students” from the Arnold P. Gold Foundation, and by the College of Medicine Chapman Education Center.
1 Kubany A. Use of sociometric peer nominations in medial education research. J Appl Psychol. 1957;41:389–394.
2 Korman M, Stubblefield RL. Medical school evaluation and internship performance. J Med Educ. 1971;46:670–673.
3 Lurie SJ, Nofziger AC, Meldrum S, Mooney C, Epstein RM. Temporal and group-related trends in peer assessment amongst medical students. Med Educ. 2006;40:840–847.
4 Arnold L, Willoughby L, Calkins V, Gammon L, Eberhart G. Use of peer evaluation in the assessment of medical students. J Med Educ. 1981;56:35–42.
5 Schumacher CF. A factor-analytic study of various criteria of medical examinations. J Med Educ. 1964;39:192–196.
6 Risucci DA, Tortolani AJ, Ward RJ. Ratings of surgical residents by self, supervisors and peers. Surg Gynecol Obstet. 1989;169:519–526.
7 Calhoun JG, Woolliscroft JO, TenHaken JD, Wolf FM, Davis WK. Evaluating medical student clinical skill performance: relationships among self, peer and expert ratings. Eval Health Prof. 1988;11:201–212.
8 Morton JB, MacBeth WA. Correlations between staff, peer and self assessments of fourth-year students in surgery. Med Educ. 1977;11:167–170.
9 Sullivan ME, Hitchcock MA, Dunnington GL. Peer and self assessment during problem-based tutorials. Am J Surg. 1999;177:266–269.
10 Small PA Jr, Stevens CB, Duerson M. Issues in medical education: basic problems and potential solutions. Acad Med. 1993;68: S89–S98.
11 Verhulst SJ, Colliver JA, Paiva RE, Williams RG. A factor analysis study of performance of first-year residents. J Med Educ. 1986;61: 132–134.
12 Davis JK, Inamdar S. Use of peer ratings in a pediatric residency. J Med Educ. 1988;63:647–649.
13 Linn BS, Arostegui M, Zeppa R. Performance rating scale for peer and self assessment. Br J Med Educ. 1975;9:98–101.
14 Frank HH, Katcher AH. The qualities of leadership: how male medical students evaluate their female peers. Hum Relat. 1977;30:403–416.
15 Kane JS, Lawler EE. Methods of peer assessment. Psychol Bull. 1978;85:555–586.
16 Brozgal JL. Evaluation of the Clinical Performance of Medical Students in an Experimental Program [dissertation]. Cleveland, Ohio: Case Western Reserve University; 1957.
17 Brush DR, Markert RJ, Lazarus CL. The relationship between service learning and medical student academic and professional outcomes. Teach Learn Med. 2006;18:9–13.
18 Ramsey PG, Wenrich MD, Carline JD, Inui TS, Larson EB, LoGerfo JP. Use of peer ratings to evaluate physician performance. JAMA. 2003;269:1655–1660.
19 Ramsey PG, Carline JD, Blank LL, Wenrich MD. Feasibility of hospital-based use of peer ratings to evaluate the performances of practicing physicians. Acad Med. 1996;71:364–370.
20 Thomas PA, Gebo KA, Hellmann DB. A pilot study of peer review in residency training. J Gen Intern Med. 1999;14:551–554.
21 Arnold L. Assessing professional behavior: yesterday, today, and tomorrow. Acad Med. 2002;77:502–515.
22 Norcini JJ. Peer assessment of competence. Med Educ. 2003;37:539–543.
23 Dannefer EF, Henson LC, Bierer SB, et al. Peer assessment of professional competence. Med Educ. 2005;39:713–722.
24 Arnold L, Stern D. Content and context of peer assessment. In: Stern DT, ed. Measuring Medical Professionalism. New York, NY: Oxford University Press; 2006:175–194.
25 Ginsburg S, Regehr G, Lingard L. Basing the evaluation of professionalism on observable behaviors: a cautionary tale. Acad Med. 2004;79(10 suppl):S1–S4.
26 Blue AV, Basco WT, Geesey ME, Thiedke CC, Sheridan MEB, Elam CL. How does pre-admission community service compare with community service during medical school? Teach Learn Med. 2005;17:316–321.
27 Shue CK, Arnold L, Stern DT. Maximizing participation in peer assessment of professionalism: the students speak. Acad Med. 2005;80(10 suppl):S1–S5.
© 2007 Association of American Medical Colleges
28 Epstein RM. Assessment in medical education. N Engl J Med. 2007;356:387–396.