Arnold, Louise PhD; Shue, Carolyn K. PhD; Kalishman, Summers PhD; Prislin, Michael MD; Pohl, Charles MD; Pohl, Henry MD; Stern, David T. MD, PhD
The assessment of medical students’ professional behavior, reflecting their aspiration to and application of the principles of excellence, humanism, accountability, and altruism,1 continues to challenge medical educators.2 Measuring competency in professionalism is difficult, in part because no single perspective can be identified as the “gold standard.” Perceptions of instructors, peers, and patients all have relevance for an assessment of students’ professional behavior.3,4 However, peers are one of the most important sources for comprehensive assessment.4–6
Prior research on peer assessment reveals its potential as a measure of medical students’ professional behaviors.6–8 Past work shows that students have been able to distinguish between their peers’ professionalism, on the one hand, and their technical knowledge and skills, on the other.9 Other work indicates that peer assessments among medical students are reasonable measures of students’ interpersonal skills, ability to communicate, relationships with patients, and humanism.9–12 Students’ responses to rating scales have displayed internal consistency and interrater reliability.13,14 Furthermore, because of students’ frequent and close contact with each other in a variety of contexts not available to faculty,6 their observations of peers’ professionalism may be superior to those of their instructors in some respects.15 Students in two Midwestern schools have agreed with that contention,6 and indeed another study showed that students provided insight into medical students’ performances that was distinct from information derived from course grades and instructors’ evaluations.16
However, other research has raised issues about medical students’ peer assessments. Their psychometric properties can be problematic.6,13,14,17,18 Further, reactions to participating in peer assessment range from negative to indifferent to positive. Some medical students declined to participate in research on peer assessment9 or were wary of participation.19 Other students expressed concerns about whistle blowing in considering peer assessment as part of self-regulation of the profession20 and lacked confidence in the usefulness of peer assessment.10 They said they worry about the impact their assessments could have on their relationships with each other.21
Yet, other studies report student acceptance of peer assessment.8,22 Medical students have expressed interest in the process.9 They have said they would be willing to participate in peer assessment as long as their concerns were addressed.21 In fact, research has documented that students have provided feedback to peers,8,13,22,23 and in one study, a little more than half of the students found peer assessment helpful, and more than a third said they would include peer feedback in their learning plans.8 Identifying the characteristics of peer assessment that affect its acceptance among students and thereby its psychometric integrity seems warranted.
Moreover, the literature suggests that the institutional and social contexts surrounding the assessment of professionalism are critical to its success.3,24 This generalization particularly applies to peer assessment of professionalism.6 For example, one study found that the number of negative peer assessments generated in small groups depended on the kind of faculty leadership exercised in the group.13 Further grounds, then, exist for exploring the conditions that encourage or inhibit medical students’ participation in peer assessment of professionalism if its full potential is to be achieved among medical students. Accordingly, we embarked on just such a research program.
Three of us (L.A., C.K.S., and D.T.S.) first held focus groups in two schools to gather students’ thoughts about the characteristics of an ideal peer-assessment system.21 Students suggested that at least some features of peer assessment would depend on whether they observed a peer who had acted unprofessionally or professionally. We then used this and other results to develop a survey instrument that we pilot tested in two schools to systematically identify characteristics of a system for assessing peers’ professionalism. Students’ response rates in one of these schools were strong enough to permit analysis of the data. Conditions identified that would encourage participation of students in peer assessment at their school25 included (1) an impartial counselor should receive a student's assessment if a peer acted unprofessionally, and in that case the peer should receive corrective instruction, but without penalty, and (2) if a peer acted professionally, she or he should receive the assessment directly, and it could be used to select members of a professionalism honor society. Further, in instances of both unprofessional and professional behavior, the process should be anonymous and should not disrupt the relationship between the student evaluator and the peer. Few year differences in students’ preferences for system characteristics were obtained.
Continuing this line of investigation, we (all authors of the present report) sought to examine whether the results obtained at one school, which has a six-year combined degree program, would generalize to students in more traditional four-year schools. Therefore, in determining a peer-assessment system that would prevent or encourage honest assessment of peers’ professionalism, we posed the following research questions. Do students from different schools and year levels prefer different characteristics of a peer-assessment system? Or can a single system be designed that targets the majority of students’ preferences? What would be the characteristics of the resulting preferred system(s)?
The study sites for this research were four U.S. medical schools. Two are private schools in the northeast region of the United States, as classified by the Association of American Medical Colleges (AAMC). They have comparatively large student bodies, with enrollments of 524 and 925 at the time of the study. Neither of these schools uses peer assessment. The other two are public schools in the western region. Their student enrollments are comparatively smaller, with 297 and 369 students during the study period. Both of these schools make limited use of peer assessment. At one of those schools, tutorials in the early phases of the curriculum include peer assessment, but at the instructor's discretion. At the other school, first- and second-year Introduction to Clinical Medicine courses include formative peer review of students’ communication skills.
Students signed consent forms but did not identify themselves on the completed survey questionnaire. The questionnaire, administered either electronically or on paper (available at http://www.umkc.edu/medicine/omer/projects/index.htm), elicited the opinions of first-, second-, third-, and fourth-year students at the beginning of the academic year 2004–2005.
The questionnaire design and items were based on findings from focus groups21 and a pilot study.25 According to those findings, the type of peer behavior students assessed (i.e., unprofessional or professional behavior) could influence their preferences for characteristics of peer assessment. Thus, the questionnaire contained 23 questions that asked students to rate circumstances that would prevent or encourage their honest assessment of a classmate when the peer acted unprofessionally, and then to rate the same circumstances when the classmate acted professionally. The circumstances were who receives the peer assessment, the timing of the assessment, its anonymity, whether peer assessment should be optional or required, the setting of the assessment, and its implications for the assessed classmate and for the student who provides the assessment.
Another set of 12 questions addressed more general characteristics of a peer-assessment system that could affect students’ willingness to offer an honest evaluation of a peer, regardless of the type of peer behavior they assessed. These asked who does the assessment, the nature of the behavior to be reported, institutional support for peer assessment, and educational programs on professionalism.
We examined the face validity of the survey items and then studied their construct validity by principal-component-factor analysis with Varimax rotation using standard criteria for factor retention (eigenvalues of > 1.0, factor loadings of > .40, and a minimum of three items per factor). We selected a standard technique, Cronbach alpha, to see whether the items of each resulting factor were sufficiently intercorrelated to ensure consistent measurement of a common latent variable.26
Next, we analyzed the survey data by using descriptive and inferential statistics to detect potential differences among institutions and among students from each year level. We first explored the interaction effects among students from the four schools and four year-levels on their preferred characteristics of peer assessment. To do this, we used multivariate analysis of variance, treated school and year as independent variables, and treated as a dependent variable each questionnaire item that described a circumstance that could promote or inhibit honest peer assessment. In the absence of significant interaction effects, we studied the main effects of each student's school and year on the preferences, and we followed up, as needed, by using post hoc tests to ascertain which school or year contributed to the significant difference in students’ preferences for each item. Because of the large sample size in this study, we set the alpha level at ≤ .01; or, in cases when we made a large number of comparisons, we applied a Bonferroni adjustment. Also, we computed an effect size, where applicable, as a way to evaluate the practical significance of a statistically significant result to discern meaningful differences. According to Cohen,27 eta-squared values generally range from .01 to .09 in the social sciences. We set the criterion of eta-squared at ≥ .03 to detect small to large effect sizes while minimizing the possibility of spurious results. We used SPSS, edition 13.0.1, to run all the statistical analyses.
All portions of this research were reviewed and approved by all participating schools’ institutional review boards before student recruitment and data collection.
A total of 1,661 of the 2,115 students surveyed (overall response rate = 78%) returned completed questionnaires. The individual school response rates ranged from 69% to 87%, and the year response rates ranged from 58% to 93%. These response rates suggest that a representative sample of students from each school and from each year level participated in the study.
The questionnaire items have face validity, because they were generated from focus group discussions,21 reviewed by focus group participants, and altered accordingly. The items also have a measure of construct validity, as indicated by principal-component-factor analysis, and they tended to load on factors that reflected the categorization built into the questionnaire itself. The six factors obtained from the analysis of students’ preferences for characteristics of peer assessment when assessing unprofessional peer behavior explained 46% of the variance. They were who receives the assessment (Cronbach alpha = .812), institutional support (alpha = .795), whether the assessment should be optional or required plus its setting (alpha = .779), implications for the classmate (alpha = .587), other (alpha = .503), and no anonymity/anonymity (alpha = .414).
The first four factors obtained from the analysis of students’ preferences for characteristics of peer assessment when assessing professional peer behavior explained 45% of the variance. They were who receives the assessment (Cronbach alpha = .914), optional/required plus setting (alpha = .840), institutional support (alpha = .795), and implications for the classmate (alpha = .797).
Preferred characteristics for peer assessment of various behaviors
Unprofessional behavior: preferred characteristics, by school and year.
The multivariate analysis of variance to determine whether there was an interaction effect between the independent variables of school and year on students’ preferences for characteristics of peer assessment when they assessed peers’ unprofessional behavior yielded the following results: whereas the alpha for the interaction effect met the established criterion for significance (described in the Method section), the effect sizes for the items that contributed to the overall significant interaction effect did not.
The main effect of school on students’ preferences for characteristics of a peer-assessment system when assessing unprofessional peer behavior also met the alpha criterion for significance. However, the effect size criterion was not met. Mean ratings of peer assessment characteristics across schools appear in Table 1 (second column).
The main effect of year on students’ preferences for characteristics of a peer-assessment system when assessing unprofessional peer behavior was significant (F = 2.236; df = 2, 1,567; P < .001). Only 3 of the 23 items contributed to the significant main effect: (1) the course instructor, attending, or course director should receive the evaluation (eta-squared = .03), (2) peer evaluation was a part of all classes (eta-squared = .03), and (3) peer evaluation was only a part of classes with small-group work (eta-squared = .05). Table 1, columns three through six, presents the year means for each peer-assessment system characteristic and details regarding the post hoc Tukey test.
Professional behavior: preferred characteristics, by school and year.
The multivariate analysis of variance to determine whether there was an interaction effect between the variables of school and year on students’ preferences for characteristics of peer assessment of professional behavior did not reveal a significant interaction effect.
The main effect of school on students’ preferences was not significant. Mean ratings of characteristics across schools appear in Table 2 (second column).
The main effect of year on students’ preferences for characteristics of peer assessment was significant (F = 2.538; df = 2, 1,559; P < .001). As with the assessment of unprofessional behaviors, very few items contributed to the significant main effect of year. The four contributing items were (1) who would receive the peer assessment (a faculty group [eta-squared = .03]), (2) when the peer assessment was done (at the end of year [eta-squared = .03]), and in which settings it would occur—(3) small groups (eta-squared = .03) and (4) outside of classes/rotations [eta-squared = .03]). Year means for each peer-assessment-system characteristic and details regarding the post hoc Tukey test are presented in Table 2, columns three through six.
Other preferred characteristics, by school and year.
The results of a multivariate analysis of variance to determine whether there was an interaction effect between the variables of school and year on students’ preferences for other characteristics of peer assessment regardless of whether they were assessing unprofessional or professional behavior of peers indicated that there was not a significant interaction effect. The main effect for both school and year met the alpha level criterion for significance; however, the effect size criterion was not met. Mean ratings for each of the categories are reported in Table 3.
A single system for peer assessment
Overall, there were no interaction effects between school and year on students’ preferences for characteristics of peer assessment, according to the criteria we had established. The main effect of school on students’ preferences for peer-assessment characteristics was not significant. The main effect of students’ year level on their preferences was significant for the assessment of both unprofessional and professional behavior of peers. Generally, the differences in students’ preferences by year were one of magnitude, with first-year students holding stronger preferences for a particular characteristic than did older students; for the vast majority of peer-assessment characteristics, there were no year-level differences. Examination of the bolded means in Tables 1 and 2 indicates that the year differences found do not show conflicting views about the most desired and least desired system characteristics. In light of these results, we turned to the question of building a system for peer assessment of students’ professionalism that would address student preferences and that could be used by students across schools and year levels.
A model system
As Table 4 shows, a model for a peer-assessment system built on student preferences drawn from our pooled data from the four schools and four year-levels would contain the following characteristics. Regardless of whether students were asked to assess unprofessional or professional peer behavior,
* peer assessment would clearly be 100% anonymous and would be designed so that the relationship between the student evaluator and the classmate would not be disrupted;
* it would be a part of all clinical rotations and occur immediately after the classmate behaved either unprofessionally or professionally (Table 4);
* students would be asked to assess both positive and negative professional behaviors of peers who were at the same year level (Table 3); and
* each student would receive an annual summary of strengths and weaknesses identified by peers (Table 3).
At the same time, guidance from the students regarding whether peer assessment should be required or optional is less straightforward. Even though students prefer it to be required (Table 4), the mean rating falls toward the middle of the rating scale, and students at different year levels did not consistently favor required peer assessment over optional peer assessment (Tables 1 and 2).
In addition, the model system needs to embrace differing characteristics related to who receives the peer assessment and the consequences it holds for classmates, depending on whether the assessment task focuses on unprofessional or professional peer behavior (Table 4). An impartial counselor or student advocate should be appointed to receive peer assessments of unprofessional behavior. In contrast, arrangements should be made so that the classmate herself or himself receives the peer assessment of professional behavior. In the same way, implications of peer assessment for the classmate should be tailored to whether the classmate has behaved unprofessionally or professionally (Table 4). In the former instance, the classmate should receive instruction to correct the negative behavior that students report; in the latter case, peer assessments should be used to select members for a professionalism honor society. These distinctions suggest, in short, that the system should have a formative effect when students report unprofessional peer behavior and a summative effect when students report professional peer behavior.
Moreover, according to the students, the system of peer assessment would be set into an institutional context that offered the following kinds of support (Table 3):
* Faculty would clearly define what expected professional behavior is.
* Students would receive training in how to give and receive peer feedback.
* Faculty and residents would evaluate their peers and thereby offer role modeling of peer assessment.
* Importantly, the school would take appropriate action regarding the behavior of students in light of information it received about the behavior from peers.
In addition, students need to believe that the environment for peer assessment will be safe, as indicated by their preferences for anonymity and implications for the student evaluator (Table 4).
Tables 3 and 4 also offer insight into characteristics of peer assessment that students least preferred. When assessing either unprofessional or professional behavior of peers, students by and large thought that the peer assessment should not occur at the end of the year, that the name of the evaluator should not be on the assessment the classmate received, that the assessment should not include behavior outside of classes and rotations, that negative behavior should not be the sole focus of the assessment, and that more advanced students should not assess younger students.
Our findings delineate characteristics of a peer-assessment system that students across the schools and years we studied would find acceptable. They echo some aspects of practical experience with successful peer-assessment programs for medical students reported in the literature. For many years, Small and McCormack at the University of Florida College of Medicine have been using an anonymous peer-nomination system to identify both clinical and professional competence of peers.7 An example of the formative use of peer assessment occurs in that institution's microbiology course, where students in small groups are asked to provide anonymous complimentary and critical feedback to peers in a system that is purely formative—and never seen by faculty. In the fourth year of medical school, an anonymous peer-nomination survey is used in a summative fashion to identify students for a humanism honor society. According to an e-mail communication from McCormack on June 22, 2006, evidence of the generalizability to other schools of this approach to peer assessment is forthcoming, because more medical schools are using this instrument to identify students for a similar group.
Students at the University of Missouri–Kansas City School of Medicine have used peer evaluation in a longitudinal internal medicine clerkship13 and an obstetrics–gynecology clerkship for over 30 years. Although the peer assessments were initially anonymous, the school, with input from students themselves, moved to having students sign their assessments to avoid capricious or malicious comments. The loss of anonymity of the peer reports brought other difficulties, including disruption of relations between peers, negotiations among peers to change a negative assessment, and student teams agreeing that they would rate each other positively. These developments show that a lack of anonymity in peer assessment can ultimately change the quality and characteristics of the data collected.
Dannefer et al8 studied a protocol for peer assessment of work habits and interpersonal habits, including respect and trustworthiness. Students completed these evaluations at the end of the second year and were assured that it was both anonymous and only formative. These investigators found that this instrument was psychometrically sound and produced results that about half of the students found valuable. The anonymous and formative nature of the protocol suggests this process might be acceptable to students in our sample.
That our present study is largely consistent with results of our pilot study in one school25 provides preliminary evidence for the generalizability of the results presented in this report. We argue that the context in which peer assessment occurs can significantly influence students’ willingness to participate honestly in the assessment process. The students themselves, across the schools, endorsed the importance of contextual factors—such as institutional support and appropriate action in light of peers’ assessments—by awarding some of their highest ratings to questionnaire items about context.
Although we expected to find differences among schools regarding students’ preferences for characteristics of peer assessment that could be related to contextual differences, it seems that student preferences across schools were relatively uniform. These findings, then, support the contention that a single system for all schools may be appropriate. Perhaps this is because peer assessment of professionalism is still not part of the mainstream institutional culture. In some of the schools we studied, there is no peer assessment; in others, it occurs in a few clerkships or courses, or as a one-time event for a specific purpose. These uses notwithstanding, peer assessment is not yet a key component of the students’ longitudinal assessment processes or professional development. Further, this study may have tapped into societal attitudes regarding peer assessment and whistle blowing, which would account for the relative lack of school and year differences. As more institutions focus on professionalism and peer assessment and make it a principal component of the educational process as well as a visible part of the medical school culture, more differences among institutions may emerge. Future research should study cultural changes within medical schools that adopt peer assessment of professionalism to determine whether students’ preferences for system characteristics change.
Further work on understanding several anomalies in the results is also warranted. In particular, we do not understand the dynamics underlying students’ preferences that negative peer assessments should be used formatively whereas positive peer assessments should be used summatively. Are these preferences consonant with societal attitudes toward peer assessment, and/or faculty members’ tendency to ignore unprofessional behavior and reward professional behavior, and/or the medical profession's stance28 to use self-regulation to protect its members? Similarly, the social function of students’ strong preference that peer assessment of not only unprofessional but also professional behavior should not disrupt their relationships with each other could usefully be explored. Additional work could well be directed toward refining the survey instrument itself for use by other schools that might want to establish their students’ preferences for peer assessment. A generalizability study of the sources of variance in the data elicited by the survey, including type of students, students’ training year, type of school, and questionnaire items, could be instructive. It could offer a more elegant way of ascertaining reliability of the instrument. It could potentially remedy an issue of the current approach, specifically, low reliabilities of items for 3 of the 10 factors, attributable in all likelihood to the small numbers of items constituting those factors. However, the current approach did yield high reliabilities of items that loaded on the other seven factors.
The results of this study must be interpreted within the constraints of the following limitations. In this work, the characteristics of a system encouraging honest assessment of peers’ behavior are statistically derived and should be verified by students themselves. As with all survey research, the responses students gave may reflect their notions of social desirability or their need to criticize the prevailing circumstances surrounding professional behavior and its assessment. A next step to address that limitation could entail a revised peer-assessment system incorporating student recommendations derived from this work, along with an examination of students’ attitudes and behaviors before and after the new system was implemented. Finally, the data are cross-sectional; a longitudinal study of the development of overall views toward peer assessment and preferences for the characteristics of peer assessment would be instructive.
In summary, we found that regardless of medical school and year, most students would be accepting of a system for peer assessment of professionalism if schools can do the following:
* Create an environment conducive to peer assessment of professionalism:
1. Engage faculty in learning to define and report professional behavior.
2. Help students learn to provide peer feedback.
3. Create policies and procedures by which schools can promptly and appropriately respond to observations of unprofessional and professional behavior of students. Stress the formative use of peer assessment while holding students responsible for repetitive and/or serious unprofessional behavior and acknowledging and rewarding professional behavior.
4. Create a system of peer assessment that includes all levels of trainees and physicians.
* Craft a peer-assessment system that meets students’ needs:
1. Create a system that is 100% anonymous.
2. Provide immediate feedback.
3. Assess in all clinical rotations.
4. Allow for reporting of both unprofessional and professional behavior.
5. Provide an annual summary report for each student.
6. Link peer assessment in medical school with the types of peer assessment that students will meet later in their careers as physicians in training and practice.
With these elements in place, we believe a system of peer assessment of professionalism to be possible. We encourage those schools interested in including peer assessment as part of an overall protocol to base their system on these principles and to follow student acceptability of their assessments closely. Whether peer assessment should be required or optional remains unresolved; it is an issue that deserves careful study when a school goes about instituting peer assessment among medical students.
This project was funded in part by a National Board of Medical Examiners (NBME) Edward J. Stemmler, MD, Medical Education Research fund grant, NBME project number IG35-0304 “Towards Assessing Professional Behaviors of Medical Students through Peer Observation: A Multiinstitutional Study.”
The project does not necessarily reflect NBME policy, and NBME support provides no official endorsement.
1 Arnold L, Stern DT. What is medical professionalism? In: Stern DT, ed. Measuring Medical Professionalism. Oxford, England: Oxford University Press; 2006:15–37.
2 Arnold L. Responding to the professionalism of learners and faculty in orthopedic surgery. Clin Orthop Relat Res. 2006;448:205–214.
3 Ginsburg S, Regehr G, Hatala R, et al. Context, conflict, and resolution: a new conceptual framework for evaluating professionalism. Acad Med. 2000;75(10 suppl):S6–S11.
4 Arnold L. Assessing professional behavior: yesterday, today, and tomorrow. Acad Med. 2002;77:502–515.
5 Embedding Professionalism in Medical Education: Assessment as a Tool for Implementation. Report from: Invitational Conference, Association of American Medical Colleges and the National Board of Medical Examiners; 2002; Baltimore, Md.
6 Arnold L, Stern DT. Content and context of peer assessment. In: Stern DT, ed. Measuring Medical Professionalism. Oxford, England: Oxford University Press; 2006:175–194.
7 Small PA Jr, Stevens CB, Duerson MC. Issues in medical education: basic problems and potential solutions. Acad Med. 1993;68(10 suppl):S89–S93.
8 Dannefer EF, Henson LC, Bierer SB, et al. Peer assessment of professional competence. Med Educ. 2005;39:713–722.
9 Linn BS, Arostegui M, Zeppa R. Performance rating scale for peer and self assessment. Br J Med Educ. 1975;9:98–101.
10 Helfer RE. Peer evaluation: its potential usefulness in medical education. Br J Med Educ. 1972;6:224–231.
11 Schumacher CFJ. A factor-analytic study of various criteria of medical examinations. J Med Educ. 1964;39:192–196.
13 Arnold L, Willoughby L, Calkins V, Gammon L, Eberhart G. Use of peer evaluation in the assessment of medical students. J Med Educ. 1981;65:35–41.
14 Panszi S, Gruppen L, Grum C, Stern DT. What do peers know about professionalism? Presented at: Research in Medical Education Conference, Group on Educational Affairs, Association of American Medical Colleges Annual Meeting; November 1, 2000; Chicago, Ill.
15 Hundert EM, Douglas-Steele D, Bickel J. Context in medical education: the informal ethics curriculum. Med Educ. 1996;30:353–364.
16 Kubany AJ. Use of sociometric peer nominations in medical education research. J Appl Psychol. 1957;41:389–394.
17 Sullivan ME, Hitchcock MA, Dunnington GL. Peer and self assessment during problem-based tutorials. Am J Surg. 1999;177:266–269.
18 Morton JB, Macbeth WA. Correlations between staff, peer, and self assessments of fourth-year students in surgery. Med Educ. 1977;11:167–170.
19 Liske R, Ort RS, Ford AG. Clinical performance and related traits. J Med Educ. 1964;39:69–80.
20 Rennie SC, Crosby JR. Students’ perceptions of whistle-blowing: implications for self-regulation. A questionnaire and focus group survey. Med Educ. 2002;36:173–179.
21 Arnold L, Shue CK, Kritt B, Ginsburg S, Stern DT. Medical students’ views on peer assessment of professionalism. J Gen Intern Med. 2005;20:819–824.
22 Cuddy P, Oki J, Wooten J. Online peer evaluation in basic pharmacology. Acad Med. 2001;76:532–533.
23 Rudy DW, Fejfar MC, Griffith CH III, Wilson JF. Self- and peer assessment in a first-year communication and interviewing course. Eval Health Prof. 2001;24:436–445.
24 Regehr G. The persistent myth of stability on the chronic underestimation of the role of context in behavior. J Gen Intern Med. 2006;21:544.
25 Shue CK, Arnold L, Stern DT. Maximizing participation in peer assessment of professionalism: the students speak. Acad Med. 2005;80(10 suppl):S1–S5.
27 Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Mahwah, NJ: Lawrence Erlbaum; 1988.
28 Friedson E. Profession of Medicine: A Study of the Sociology of Applied Knowledge. New York, NY: Dodd, Mead, and Co.; 1970.