Secondary Logo

Journal Logo

Faculty Evaluation of Surgery Clerkship Students

Important Components of Written Comments


Section Editor(s): Hemmer, Paul MD


Correspondence: Margaret A. Plymale, MSN, RN, Department of Surgery, Education Office, University of Kentucky College of Medicine, Lexington, KY 40536.

Subjective evaluation forms are almost universally used in grading clerkship students' clinical performances. The forms usually consist of a list of clinical performance and personal characteristics on which faculty evaluate student performance using a numeric scale. In addition, space for written comments is usually available on the evaluation form for the faculty to elaborate on the assigned numeric grade and ratings of student performance. Based on faculty numeric ratings of performance, previous research has identified two performance characteristics that best differentiate an “A” level performance from a “B” level: oral presentations and fund of knowledge.1 It is not clear, however, how written comments by faculty are reflected in a student's clinical grade.

Previous research provides little information about the importance of specific written comments to the grade assigned by the preceptor. In one recent study, medicine clerkship faculty reported analysis of written evaluation and verbal comments to detect how well deficiencies in medical students' funds of knowledge were identified by each method.2 In another study, the authors compared written and verbal comments to detect students' deficiencies in professionalism.3 In both studies, the authors found that formal verbal evaluation sessions among faculty best identified the students' deficiencies in question. Although verbal feedback sessions may be most productive, they may not be practical in terms of trying to gather faculty within a large department.

Elzubeir and Rizk surveyed students, interns, and residents in an attempt to identify the characteristics that those groups found most important in their role models.4 All respondents identified faculty and physician teachers as role models. The most important factors identified by the three groups were personality, teaching, and clinical skills.4 In the contrasting relationship of teacher to student, however, the student characteristics of importance to teachers expressed through written comments are not as obvious.

This study is part of a larger effort to determine the characteristics of student performances that faculty consider most important in assigning grades. The specific purpose of this study was to determine the categories of written comments that differentiate students with higher preceptor grades from students with lower preceptor grades.

Back to Top | Article Outline


At our institution, the surgery portion of our combined medicine/surgery clerkship is eight weeks in duration, and consists of a four-week general surgery block and two two-week subspecialty blocks. A few students elect to remain on a single surgery subspecialty service for four weeks rather than complete two subspecialty blocks. At the end of each of the rotations, the faculty preceptor evaluates the student's performance by completing an evaluation form. These evaluations consist of an assigned numeric grade; ten ratings of specific performance characteristics, such as clinical reasoning skills and professionalism; and written comments. Cumulative faculty evaluations in this format account for 50% of the surgery clerkship course grade.

The grading scale used in our clerkship is on the same scale as that used for the National Board of Medical Examiners (NBME) Surgery Shelf Examination in the sense that a score of “66” on the NBME shelf-examination is interpreted in the same way as a preceptor rating of “66.” (Both would be a low “B.”) The scale used is: “A” = 77–86; “B” = 66–76; “C” = 61–65; “U” = 56–60; and “E” = 20–55.

All of the written comments provided by faculty on student evaluations from August 2000 through November 2002 were transcribed. Written comments on each evaluation were masked by changing the student's name to “he” or “she.” Otherwise, no changes were made to the comments. Two of the authors (MP, MD) read the comments to “generate initial categories.”5 Based on this analysis, the comments could be classified into 22 categories. These categories are listed in Table 1.



After the categories were identified, two of the authors (MP, MD) then reviewed each comment and classified it into one of the 22 categories. A comment was coded “+1” if it was positive, and “-1” if it was negative. If the faculty member did not comment on the category for a student, it was coded as “0” (no comment). Both investigators reviewed the evaluation comments without being aware of the students' assigned preceptor grades or their scores on the NBME Surgery Shelf Examination.

For analytical purposes, we viewed these categories as unordered nominal scales. Inter-rater agreement was assessed by Cohen's kappa. The salience of each of the 22 comment categories to the faculty preceptor's grade was determined by one-way analyses of variance in which the comment category (−1, 0, +1) was used as the classification variable, and the preceptor grade was the dependent variable. Tukey's b post-hoc tests were used to determine the exact patterns of differences. The importance of the mean differences (effect sizes) of the three groups was determined by η2.

Back to Top | Article Outline


We analyzed 331 faculty evaluations, which represented 120 surgery clerkship students and covered a time period of four rotations. A total of 121 of the evaluations were completed by general surgery faculty (oncology, trauma, vascular, gastrointestinal, and VA general surgery). The remaining 210 evaluations were completed by subspecialty surgery faculty representing the following disciplines: otolaryngology, urology, neurosurgery, orthopedics, plastic surgery, pediatric surgery, cardiothoracic surgery, and transplant surgery. A total of 331 evaluations represents an average of 2.8 evaluations per student, which is the expected number of evaluations considering that most students experience three surgical rotations; however, a small percentage of students rotate on only two surgical services. Of the 331 evaluations, 217 (66%) contained written comments.

Table 1 presents the kappa values for inter-rater agreement, ranked by level of agreement. Degree of agreement for kappa values can be interpreted according to the following parameters: values > .75 indicate excellent agreement, beyond chance; values .40–0.75 indicate fair to good agreement, beyond chance; and values < .40 indicate poor agreement, beyond chance.6 Agreement in classifying comments among investigators (MP, MD) was excellent for 12 of the categories, good for eight categories, and poor for two categories. Overall, we judged the level of agreement to be satisfactory, and used the category ratings of one of the investigators (MP) in subsequent analyses.

Table 2 presents the mean faculty grade for each of the three rating level groups (−1, 0, +1) for each of the 22 comment categories. Also included in Table 2 are the p values from the analyses of variance indicating the significances of differences among the mean grades by comment rating level (−1, 0, +1). There were significant differences among the three rating level groups on 13 of the 22 comment categories. In all but one case, there was no difference between the mean grades for students in the positive-comment and no-comment groups (Tukey's b post-hoc test). Logically enough, students who received negative comments in any of these 13 categories received lower preceptor grades than the others. It appears that the difference in receiving a positive comment and no comment in a category is not directly related to obtaining a higher preceptor grade. The exception was in the category of general comments, such as “nice guy.” For this category there was no comment with a negative rating, and students who received a comment with a positive rating had a lower average preceptor grade than did students with no comment in this category.



Effect sizes are represented as η2. The η2 values can be interpreted using the following criteria: values .01–.05 indicate a small effect, values .06–.13 indicate a medium effect, and values. 14 and higher indicate a large effect.7 As can be seen in Table 2, the largest effect sizes were seen in the comments for the following four categories: (1) overall performance (p < .001, η2 = .20); (2) clinical reasoning skills (p < .001, η2 = .19); (3) prepared for and participates in patient care activities (p < .001, η2 = .14); and (4) fund of knowledge (p < .001, η2 = .14).

For each of the four categories showing large effect sizes, the mean preceptor grades showed the most important differences for evaluations including a negative comment rating (−1). For example, the mean preceptor score for evaluations containing negatively rated overall performance written comments was 51.3, compared with a mean score of 78.2 for evaluations with no overall performance written comment and 79.9 for evaluations with a positively rated comment.

Back to Top | Article Outline

Discussion and Conclusions

In this study we attempted to identify characteristics of student performances in a surgery clerkship that faculty preceptors deemed important, based on the written comments of the faculty. We found that written comments in the categories of overall performance, clinical reasoning skills, preparation for and participation in patient care activities, and fund of knowledge had the strongest relationship to the assigned grades.

Previous research identified two performance characteristics that best differentiated “A”-level performance from “B”-level performance based on faculty numeric ratings of performance characteristics. These characteristics were oral presentations and fund of knowledge.1 In the current study, a large effect size was seen for fund of knowledge and a medium effect size was seen for written and/or oral skills based on the written comments, results which support the earlier findings.

As previously mentioned, the 331 surgery clerkship student evaluations that were used in the current study were provided by a diverse group of surgery faculty. An area for further study includes analysis of the comment data used in this study by surgery discipline to determine whether there are important differences found by discipline. It is possible that characteristics of student performance that are important to a subspecialty faculty preceptor may not have the same importance to a general surgeon preceptor. For this same reason, it would be important to analyze written comments from internal medicine clerkship faculty preceptors to determine whether there are differences among categories found to be important to assigned grades.

Written comments provided by surgery clerkship faculty in student evaluations previously had not been analyzed for the relationship of the comments to grades assigned. The findings of this study are being used in conjunction with other data to develop a more meaningful evaluation form for our clerkship students.

The reason for the lack of discriminating power for some of the comment categories is related to the fact that there was no comment with a negative rating for those categories. This was true for professionalism, work ethic, and intelligence. Thus, while these are important categories of performance to assess, most students perform satisfactorily in these areas. Note that in all but one case, it was the negative-comment group that determined whether a comment category was discriminating or not. Thus, these categories are reasonable to include in an evaluation form even though it may be the rare student who will demonstrate deficits in these areas.

The importance of written comments on a clerkship student's evaluations from the student's viewpoint should not be overlooked. Due to ever increasing clinical and administrative duties, faculty frequently lack adequate time to spend with medical students providing formative feedback during a busy surgical rotation, particularly when a student spends as little as two weeks rotating on a surgical subspecialty service. Written comments provided by the faculty on end-of-rotation evaluations can be very important for students to understand their performances for that rotation. However, it is important to understand the meaning, or lack of meaning, of particular comments to the student clinical grade. Students may read comments that seem positive, such as “was a very organized, hard-working student,” and not understand why their grades do not directly reflect what appeared to be positive comments. While organization and work ethic are not to be discarded as important characteristics, factors such as fund of knowledge, clinical reasoning ability, and active preparation and participation in patient care activities appear to be given more weight by surgery faculty in assigning students' grades.

Back to Top | Article Outline


1. Pulito A, Donnelly MB, Plymale MA. Important factors in faculty evaluation of medical student performance. Paper presented at the 2001 Annual Meeting of the Association for Surgical Education, Nashville, TN, April 2001.
2. Hemmer PA, Pangaro L. The effectiveness of formal evaluation sessions during clinical clerkships in better identifying students with marginal funds of knowledge. Acad Med. 1997;72:641–3.
3. Hemmer PA, Hawkins R, Jackson JL, Pangaro LN. Assessing how well three evaluation methods detect deficiencies in medical students' professionalism in two settings of an internal medicine clerkship. Acad Med. 2000;75:167–73.
4. Elzubeir MA, Rizk DEE. Identifying characteristics that students, interns and residents look for in their role models. Med Educ. 2001;35:272–7.
5. Strauss A, Corbin J. Basics of Qualitative Research. 2nd ed. Thousand Oaks: Sage, 1998.
6. Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York: John Wiley & Sons, 1981.
7. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates, 1988.
© 2002 by the Association of American Medical Colleges