GINSBURG, SHIPHRA; REGEHR, GLENN; HATALA, ROSE; MCNAUGHTON, NANCY; FROHNA, ALICE; HODGES, BRIAN; LINGARD, LORELEI; STERN, DAVID
During medical school, students are taught the knowledge, skills, and attitudes required to become competent physicians. Knowledge and skills are rigorously evaluated by written and oral exams, standardized patient scenarios, and ward evaluations. However, evaluation of behaviors, including professionalism, is often implicit, unsystematic and, therefore, inadequate. This is problematic for several reasons. First, medical schools are doing a disservice to future postgraduate training programs, as well as to society, by not explicitly and accurately evaluating this area during medical school. It is recognized that more complaints against physicians to medical societies relate to unprofessional conduct than to lack of knowledge or poor technical skills.1 Yet students who display unprofessional behavior may not be identified in the current system, and will be promoted academically on the basis of adequate performance on tests of knowledge and skills alone.2,3
Second, we are doing a disservice to our students by not providing explicit feedback in this domain, thereby missing valuable opportunities to bring about awareness and improvement. The American Board of Internal Medicine, in its report “Project Professionalism,” discussed the problem of erosion of professionalism during medical training. While knowledge and skills improve markedly over the four years of medical school, there is ample anecdotal evidence, and substantial quantitative evidence, that professional behaviors can diminish over this period.4,5,6 There appears to be an unrealistic expectation that students will arrive at medical school lacking in knowledge and skills, but with a full complement of appropriate behaviors that require no further attention. However, all students are vulnerable to lapses in professional behavior and can benefit from explicit, systematic attention in this domain. The focus of medical education in the past century was on knowledge and skills. For the future of medicine, attention to the teaching and evaluation of professionalism is vital.
While this need to evaluate professionalism effectively has been recognized for some time, traditional methods of addressing the problem have not been particularly successful, for several reasons. The traditional approach to this issue has involved the identification and definition of the attitudes and concepts that comprise the concept of professionalism (such as altruism, accountability, excellence, duty, honor, integrity, and respect). Evaluation methods that rely on such abstract and idealized definitions lead us to discuss people, rather than their behaviors, as being honest or dishonest, professional or unprofessional. This implies that professionalism represents a set of stable traits.
Interestingly, a large literature exists that suggests the opposite. Many studies in personality psychology have shown that the presence of specific personality traits does not predict behavior.7,8. For example, in one study of psychiatry residents, Minnesota Multiphasic Personality Inventory testing revealed serious personality disorders in the two individuals who eventually lost their licenses for professional misconduct.7 However, several other participants showed the same personality traits, yet had no difficulty reported in 15 years of follow up. Thus, evidence suggests that the identification of specific traits does not allow us to predict an individual's behavior.
There are several reasons why this issue is important when discussing the evaluation of professionalism. Stable trait measures do not take into account a recognition that behaviors enacted often involves an effort at resolving a conflict between two (or more) equally worthy professional or personal values. For example, it is easy to say that one must always tell the truth, and that one must always protect patient confidentiality. However, these values may occasionally come into conflict, and the ultimate choice the student makes will depend on the specifics of the situation.9,10
In addition, professional behaviors are known to be highly context-dependent.10,11 One can imagine a basically honest person lying to a patient given a particular context. This does not automatically mean that that person is dishonest, and therefore unprofessional. Certainly in social situations, a decision to always tell the full truth would be considered highly inappropriate.
Although the issues of conflict and context are separate at a theoretical level, in day-to-day practice they are likely to interact. One study has shown that 87% of physicians surveyed indicated that deception is acceptable on rare occasions, for example, if the patient would be harmed by knowing the truth, in order to circumvent “ridiculous rules,” or to protect confidentiality.12 Yet, when two specific professional values are in conflict, it is not always predictable which of the two values will take precedence. For example, while it is sometimes appropriate to lie in order to protect patient confidentiality, there are circumstances in which it would be considered more appropriate to break confidentiality rather than tell a lie. As one participant stated, honesty is “usually” the best policy, but everything is taken on a case-by-case basis, and any actions taken depend on the specifics of the people and the situation.12 Traditional ways of evaluating professionalism do not make allowances for these gray areas.
Another element of evaluating professionalism involves the process of resolving the conflict. The ultimate choice an individual makes, manifested as the behavior witnessed, does not tell us how he or she arrived at the decision. We know nothing of whether the student recognized the professional “values” that were in conflict, or why the student chose to act in that particular way. So while focusing on behaviors rather than personality or character traits is important, we must also attempt to understand the process that led to the behavior.
Thus, if we do not include conflict, context, and the process of resolution in our evaluation methods, we might not be able to conduct the most reliable, valid, and appropriate evaluation of these behaviors.
Another reason for the lack of success of traditional approaches is that evaluators have not been willing to identify an individual as unprofessional for actions that appear to be relatively minor. Thus, lapses in professional behavior tend to be ignored or suppressed, due to an understandable reluctance to apply the broad, harsh label of “unprofessional.”13 In one study, clinician supervisors admitted and demonstrated their reluctance to give negative feedback regarding unprofessional behavior, even though in interviews they had stated strongly that they would do so.14 Even if faculty have this willingness, they have been found to have “difficulty in identifying problems, an inability to verify problems, and fear of litigation” that inhibit their reporting of behavioral problems.2
This outcome arises, in part, from the fact that educators and researchers have traditionally focused on this problem from an abstract perspective. The definitions and subcategories of the broader concept of professionalism describe the idealized person, the “consummate professional,” with no room for mistakes. With this theoretical basis, if someone tells a lie, even for a “good” reason, he or she could be suddenly labeled “dishonest,” and therefore, “unprofessional.” The only thing left for the evaluator to decide, then, is how unprofessional the individual is. This top-down focus on professionalism as an abstraction rather than a bottom-up focus on professionalism as a set of actions in context, therefore, is flawed.
This paper elaborates on the issues around this problem. First, we review the literature on the types of evaluation instruments used for measuring professionalism in medical education. We then outline fundamental conceptual deficiencies that exist in this literature. We argue that the three most important missing components are: consideration of the contexts in which unprofessional behaviors occur, the conflicts that lead to these lapses, and the reasons students make the choices they make. We then propose strategies for resolving these issues.
We conducted searches through Medline, Psychlit, and ERIC for literature published over the past 20 years. We included studies that contained original research on the topic of assessment or evaluation of professionalism in medical education, or included instruments to measure professional behavior, professionalism, humanism, behaviors, values, and attitudes. After initial articles were identified, bibliographies were used to identify additional references, and experts in the field were consulted for missing but relevant papers. This process uncovered few studies addressing specific efforts to evaluate professionalism. There was an abundance of articles calling for new and better methods of evaluation, and arguments for why this is so important and neglected. Some papers dealt with certain aspects of professionalism, for example, ethics, communication skills, interpersonal skills, and humanistic behavior, but they did so without extrapolation to the larger notion of professionalism. These studies were included if they highlighted difficulties in evaluating professionalism or provided new insights or solutions, and contained original research.
Evaluations by Faculty Supervisors. In 1979, the AAMC interviewed approximately 500 clerkship directors about “problem students.” They identified 21 types of problem students, and then asked how often each type of problem was seen, and how difficult the problem was. Among the results from the University of Washington School of Medicine, researchers found that “noncognitive” issues (e.g., bright but poor interpersonal skills) were “frequent and difficult,” but that the very disturbing ones (e.g., cannot be trusted, manipulative) were seen only infrequently.15 Though this survey was done many years ago, it provides an early glimpse of faculty's concerns about the professional behaviors of students. Since then, various other studies have analyzed approaches used by faculty in the evaluation of professionalism, including global rating scales, intraining evaluations, and encounter cards.
Ward rating forms, completed by the physician-supervisor, are the most commonly used instruments. In addition to assessing medical knowledge and clinical skills, many of these forms have a single global item to assess professional behavior, which may be subject to extensive rater bias.16,17 A study by Woolliscroft et al. highlights some of the problems of using this type of assessment. The authors found that using a questionnaire, faculty could assess the humanistic qualities of internal medicine residents, at least for the item “doctor-patient relationships.”18 However, it would take 20–50 faculty members per resident to achieve acceptable reproducibility, which calls into question the utility of this instrument. This also suggests that the trait doctor-patient relationships is probably not stable, but rather may be subject to context bias. Different evaluators might see different behaviors or make different interpretations. In a related study, Johnson found that physicians' and nurses' evaluations of intensive care unit residents correlated highly with respect to all criteria except the assessment of humanistic qualities, further highlighting the importance of context.19
To compensate for the problem of infrequent observations, systems have been developed that encourage the repeated observation and documentation of the performances of medical trainees (often on a daily or weekly basis).20,21 This allows for the assessment of knowledge, skills, or professional behaviors with reasonable interrater reliability and construct validity. Such real-time evaluations permit early intervention, facilitate feedback, and guide remediation. However, in a study of encounter cards in the evaluation of anesthesia residents, despite numerous negative comments by supervisors, only 1% of the comments were found to be about unprofessional behaviors.22 Further, those residents who received these negative comments were only rarely rated overall as “performing below level” by their supervisors, despite their all having had critical incident reports and scoring lower on objective testing. This, again, highlights the difficulties faculty have in documenting unprofessional behavior.
Faculty can, in fact, be trained to accurately observe and assess specific behaviors. One group developed a reliable assessment of a very specific set of humanistic skills (e.g., introduced self to the patient, acknowledged the agenda from the last visit) by asking faculty to view videotapes of residents' interactions with patients.23 However, even if faculty can identify problematic behavior in a reliable way, they are often reluctant to record it. Burack, using a rigorous qualitative method, demonstrated that faculty have a marked reluctance to respond unambiguously to behaviors that indicate negative attitudes towards patients.14 In interviews, faculty stated that they would not tolerate “this sort of behavior” and would “definitely lay down the law” if such behavior were observed. However, in practice they usually did not respond at all, or did so in such a way as to require interpretation by the learner. The feedback can then be misinterpreted to be permissive. As explanations for this dichotomy, clinicians reported their sympathy for the learners' stress, as well as the possible penalties educators can face for giving negative feedback, such as receiving bad teaching evaluations and being open to personal and legal risks. They felt that if the observed behavior is only a lapse, and the learner is fundamentally “good,” corrective feedback might discourage or frustrate the resident. Conversely, for fundamentally “bad” residents, corrective feedback is seen as futile.
Therefore, methods that exist for faculty evaluation of professional behavior are problematic. Evaluations cannot be kept on theoretical, abstract, or definitional levels; thus, these scales have poor reliability. Numerous observations in various contexts need to be made, but attending physicians are present for only a small proportion of the time. In addition, even when lapses in professional behavior are identified, there is great reluctance to report them.14
Nurses and Patients. Some of the reluctance faculty have in evaluating professional behavior results from potential conflict in their roles as teacher, mentor, and evaluator. Other groups, such as patients24,25 or nurses,18,26,27 may not be subject to these conflicts. In addition, these other groups may see the students and residents more often and in different contexts. Woolliscroft's study included groups of nurses and patients; unfortunately, the patients' ratings were not reliable, and it would have required up to 50 patients' assessments to achieve a reproducible estimate of professional behavior.18 Nurses achieved good reproducibility with ten to 20 assessments per resident, but this amount may still be impractical. Because professional behavior is so context-specific, it is not surprising that only low to modest correlations exist between ratings by these different assessors. Also, nurses and patients may face different kinds of pressures that could deter their unbiased reporting of unprofessional behaviors; for example, a patient may be reluctant to jeopardize the continuity of a relationship with a physician even though it is problematic. In addition to highlighting some of the difficulties in evaluating professional behavior, Woolliscroft et al.'s study provides a good example of an attempt to triangulate results as a measure of validity.
Peer Evaluation. Peers are in a good position to evaluate each other's professional behaviors because of frequent, close, and varied contact. Thus, the use of peer assessment of professional behaviors may solve many of the problems described for faculty's assessment. However, several problems remain and some new problems may arise through the use of peer assessment.
On a positive note, there is some suggestion that medical students' peer evaluations may be the best measures of interpersonal skills available.28,29,30 Thomas et al. reported a pilot study of peer review in residency training using a ten-item questionnaire.31 The items on the form clustered into two domains: “technical skills” and “interpersonal skills,” which included humanistic behaviors. Of particular interest is this study's finding that intern peer evaluations of a composite “professionalism” domain correlated well with faculty evaluations of the same dimension (r =.57, p <.05). An interesting modification of a ranking system that forces students to discriminate among their peers based on certain dimensions of professionalism has been described.32 The authors suggest that such a system enables identification of the top 10–15% of the class, but it is not helpful in discriminating among the rest, perhaps because the students were asked for only positive nominations on the peer-evaluation form.
On the other hand, peers, like faculty, seem to have a difficult time discriminating the abstract dimensions of professionalism from each other and from other skills. For example, in a study of peer assessment of professional dimensions, Arnold found very high internal consistency (coefficient alpha) across the dimensions, suggesting a strong halo effect in the ratings of the separate dimensions.29 Further, scores were highly correlated with more knowledge-based measures such as National Board of Medical Examiner's exam (Parts I and II) and grade-point average, suggesting that dimensions other than professionalism were also contributing to the scores. Also, as with faculty ratings, it would appear that a fairly large number of ratings are necessary to obtain stable measures across raters.33,34 Interestingly, the numbers of negative peer evaluations generated in the small groups depended upon the kind of faculty leadership exercised in each group.29 This constitutes yet another example of the importance of context and social climate in peer (and other) assessment methods.
In fact, the social climate of peers assessing peers may have negative consequences. That is, while some studies report positive reception of peer feedback, others report marked resistance to peer evaluation even though the evaluations were anonymous and for research purposes only.31,35 Helfer found that senior medical students were more accepting of peer evaluations than were junior students, who lacked confidence in the usefulness of the system.30 Van Rosendaal found that residents worried that the process would undermine their work and personal interrelationships.35
In summary, peer evaluations hold promise for evaluating professionalism. However, before they are likely to be very useful, many of the same problems facing faculty's evaluation of professionalism will have to be solved, and evaluation systems must be developed that will overcome the reluctance of peers to rate one another.
Self Evaluation. Several early studies were conducted that involved self-reports of attitude changes during medical training. To varying degrees, these students reported increases in certain attitudes, such as cynicism; were more concerned about making money; or felt that their ethical principles had become eroded or lost.5,6,36,37 Some positive attitudes increased as well, for example, concern for patients, and helpfulness.5 More recently, Clack studied gender differences in medical graduates' self-assessments of personal attributes and found that women generally felt more confident than men in possessing nine of the 16 “ideal” attributes listed.38 These studies indicate that our understanding of students' attitudes, some of which may reflect aspects of “professionalism,” can benefit from self-report questionnaires. However, these studies are comparing groups and trends, not assessing the qualities of individuals. The utility of self-reporting for these purposes might be much more severely limited.
Most studies of self-assessment in medicine focus on the assessment of knowledge and skills rather than on professional behavior, but they generally conclude that self-assessment is quite inaccurate.28,39 If physicians are inaccurate at self-assessment in relatively concrete domains (e.g., knowledge), they are likely to have even greater difficulty in a domain such as professionalism, which is less well defined and more socially value-laden. A recent line of research, for example, introduced a model of self-assessment described as the relative ranking technique, in which each participant ranks a set of skills relative to each other from the skill that needs the most work to the one that needs the least.40,41 Despite some success as a self-assessment tool in the relatively constrained domain of interviewing skills, the technique was far less useful when applied to residents' self-assessments of the standard components of a ward assessment form. In this context, the authors discovered that although residents were quite willing to say they need “the most work” with their surgical skills, or to improve their knowledge base, all residents responded that they needed “the least work” in colleague and/or team relationships.41 It appears that when statements are value-laden and abstract (as in issues of professionalism), the bias of social desirability is strong, and self-assessment becomes distorted and potentially misleading.
It is apparent that the use of self-assessment in the evaluation of professionalism is difficult. The methods used do not take context into account, making them somewhat threatening. Perhaps a relative ranking system could be attempted that included only elements of professionalism, such as interpersonal skills, communication skills, respect, and integrity. However, it would still be unlikely for a student to say he or she needs more work with honesty. Again, behaviors rather than abstract definitions would need to be incorporated to overcome this limitation. Until further research is done to better understand the nature of self-assessment, its utility for assessing professional behaviors is likely to be limited to formative evaluations and the setting of personal goals.
Standardized Patients. There is an extensive body of literature on objective structured clinical examinations (OSCEs) and standardized patients (SPs) and their importance in the evaluation of clinical skills. There is no literature specific to the role of either in the evaluation of professionalism or professional behaviors within medicine; however, there are areas in which issues of professionalism and professional behaviors are touched on indirectly.
Using an adaptation of the American Board of Internal Medicine's Physician Satisfaction Questionnaire, Klamen et al. found that SPs could reliably identify some of the professional characteristics of the doctor-patient interaction, including using understandable language and encouraging patients to ask questions.24,42 By contrast, Schnabel et al. asked SPs to assess empathy, interpersonal skills, and patient satisfaction on a 13-item checklist used in a senior-medical-student OSCE, and found that up to 20 ratings were needed to generate reliable measures.43 At the extreme, research conducted using OSCE stations to assess students' skills in dealing with ethical issues concluded that 41 stations would be required to achieve good reliability, even if the content domain were narrowed down to one specific ethical dilemma.44,45,46
At least in part, the difficulty with using OSCE scenarios is the ambiguity with which the concepts are defined on the evaluation form. For example, one set of forms used such anchors as “major problems in demeanor or ethical standards resulting in inadequate ability to deal with the patient's problems” and “actions taken may harm the patient.”47,48 In both instances, unacceptable behaviors are not specified, and judgment is left up to the examiner. On a related note, Arnold suggests that the OSCE, as it now exists, does not discriminate between ethical analysis of a problem and communication skills.49
Another issue with SPs' assessment is the problem of artificiality. Norman, for example, reported on the experience with a physicians' remediation program that uses standardized patient scenarios.50 SPs in a simulated office practice, as well as in standard OSCE stations, were asked to rate physicians' interpersonal skills during each encounter. Compared with the office simulations, the OSCE stations had a low reliability and were felt to be “artificial.” This may increase the likelihood that students in this setting might act as they should rather than as they would. On the other hand, one study has reported several professional lapses in the context of a psychiatry OSCE (the most extreme case involving a student's placing a fleeing SP in a headlock for the purpose of restraint).51 Hodges et al. argue that if stations are more demanding, they may very well discriminate effectively in terms of professional dimensions. Similarly, Vu et al. suggested that SPs' ratings were highly reliable and valid when compared with comments real patients would be expected to make regarding the behaviors they witnessed.52
Again, it is apparent that context is important. Methods of assessment that are more true to life may be more useful than those that involve obviously artificial situations. Students may be aware that there is a professionalism station and respond with actions they assume are on the checklists. It would be interesting to include values conflicts in SP scenarios to specifically assess the students' awareness of the professional values that are involved, and to evaluate their responses. In such a case, there may be more than one right answer, so the students' thought-processes about their actions may be more important than the behaviors they actually display. The low reliability of OSCEs, even when limited to specific dimensions of professionalism, is concerning, and many authors have concluded that the greatest utility of this type of assessment may be in the formative evaluation of students.
Longitudinal Observations. More recently, researchers have developed systems for assessing students' professionalism that are triggered by the observation of problematic student behaviors.1,2,4 The evaluation instrument is a specific form that is completed by a clerkship director or faculty member when a student exhibits unprofessional behavior during a rotation. When more than one form has been completed for a specific student, a meeting between an academic committee and the student occurs and remediation is instituted. These systems are based on the concept that students' professional behaviors must be assessed longitudinally, across numerous clinical rotations. Both studies describing this evaluation tool have been qualitative descriptions of systems that are in place, and further reliability and validity studies are anticipated. Such systems are very promising, despite a lack of rigorous evaluation, and may work well for identifying those students with significant lapses in professional behavior. However, in their present state, they may not prove as useful as a method of evaluating all students. The important advance these authors have made is their acknowledgement that labeling a student as “unprofessional” carries a greater negative connotation than simply recording examples of unprofessional behavior.
Discussion: Future Directions in the Evaluation of Professional Behavior
It should be apparent from the preceding discussion that evaluating professionalism in medical students and residents has proved to be a difficult task. The definition-driven abstract way of thinking about professionalism creates a dichotomy for faculty: either apply a harsh label, or let the lapse go. We know from previous research that faculty are much more likely to let the lapse go, which effectively suppresses discussion, feedback, and attempts at remediation.14
On the other hand, evaluation methods that consider behaviors, rather than individuals, as professional or unprofessional become much less threatening and would be more likely to gain acceptance by faculty and students. The studies reported by Papadakis et al. and Phelan et al. provide two good examples of such systems.1,2 Perhaps these methods will decrease faculty's reluctance to report behaviors that should lead to remediation; this can only help in promoting students' professional development. As developed, these evaluation forms are intended to identify and document serious lapses in professional behavior, which fortunately occur in only a few students. Future research might focus on ways to make these forms useful in the evaluation of all students. However, it is likely that some barriers to their use would still exist; for example, faculty would still have to decide what constitutes a major or minor infraction. These limitations might be minimized if the behavior is placed in a context (of the person, the situation, the harm caused to others), a fair process of review is used, and reasonable judgment is applied.53 Then, any decision made would be justifiable and well supported. Arnold and colleagues use a hybrid of the behavioral and abstract in their measurement tool by attaching behavioral descriptors (such as “I have seen residents refer to patients in derogatory terms”) to abstract dimensions of professionalism (such as “respectfulness”), which is an interesting potential step in this direction.54
We have also argued that professional behavior is much more context-dependent than has usually been acknowledged. All physicians are exposed to situations that challenge their abilities to act professionally, and medical students and residents are no different. In fact, they may be more vulnerable to lapses in professional behavior because of the nature of their training and environment. It is crucial to be aware of the specific context in which a behavior occurs before attempting to evaluate it. For example, Christakis et al. found that the teaching students had received on ethical dilemmas seemed to lack real-life relevance and related more to the context of a practicing physician.55 Focus groups described different dilemmas, which were unique to a third-year student's experience. They highlighted the conflicts between education, patient care, wanting to be a team player, and fear of a poor evaluation. One overriding feature was the construct of authority: students lack it and are wary of challenging it, which often puts them into conflict.
It may be necessary to study these behaviors in context more closely to determine their frequency and severity. Since we know that faculty, nurses, students, and residents all see different aspects of professionalism in students, it would be important to gain the perspectives of each of these groups in order to be comprehensive. One way could be to involve each of these groups in focus-group discussions, to determine what they consider to be professional and unprofessional behaviors. Their unique perspectives would help in the design of instruments used in all forms of student assessment. Another technique could be to use an anonymous encounter card system to collect information from students, residents, faculty, and nurses, about what behaviors are actually occurring. This may provide us with a more comprehensive set of behaviors on which to base future evaluation methods.
Conflict has also long been identified as a critical component of professional development, and is found as a dominant element in some measures of professional behavior.9,10,11 Although such paper-and-pencil instruments are limited by their artificial nature, some researchers have found that professional behavior can best be identified at the time that students are grappling with these conflicts. One potential implication of this finding is that students could be placed in a situation that involves a conflict of values, for example, with a standardized patient. The behaviors the students display, based on the choices they make, could be evaluated. What might be even more informative is an evaluation of the thought process a student goes through to arrive at his or her ultimate choice.
Alternatively, students could be asked to write about professional conflicts they have encountered.56 The language or text from these experiences could be subjected to linguistic or rhetorical analysis to uncover the underlying values of individual students and explore how these values affect the resolution of professional conflicts. Lingard and Haber's studies use a rhetorical framework to explore how the structural patterns of case presentations inform medical students' developing attitudes towards patients and colleagues.57,58 The authors demonstrate that a rhetorical analysis of discourse patterns can reveal critical relationships between the stories novices learn to tell about patients and the decisions they make about how to act on behalf of and in relation to them. Other studies in a similar vein reinforce the potential usefulness of this method.59,60,61 However, the texts that students generate may suffer from the same sense of artificiality that affects OSCE stations, and research in this area would have to be designed to take this issue into account.
It is unrealistic to think that one evaluation instrument could capture all that is important in the complex domain of professionalism. As with all high-stakes evaluations, reliability, which depends in part on sample size, is important. No student should receive a grade on his or her knowledge of cardiology from a single-item test; similarly, no student should receive a grade on professionalism without adequate sampling of the domain. Some of the measures outlined above have large sample sizes and are likely to be more useful (peer evaluation, encounter cards), while others rely on a single report or a few reports (SP scenarios, ward evaluations). While the latter may be useful for outliers, the former are more useful for the larger group of students who experience only occasional lapses in professional behavior. It is certain that more than one measurement technique would need to be used, and the greatest validity may result from triangulating results from different sources.
Future efforts at understanding professionalism, and future methods of evaluating professionalism, must focus on behaviors rather than personality traits or vague concepts of character. Our understanding and evaluation must include context and conflict in order to be relevant and valid. Ideally, methods of evaluation should include elements of peer assessment and self-assessment, which are becoming required elements in the continuing professional development of all practicing physicians. Finally, we should attempt to understand what drives students to demonstrate occasional lapses in professional behavior, in order to develop effective teaching and remediation in this domain.
1. Papadakis MA, Osborn MC, Cooke M, Healy K. A strategy for the detection and evaluation of unprofessional behavior in medical students. Acad Med. 1999;74:980–90.
2. Phelan S, Obenshain S, Galey WR. Evaluation of the non-cognitive professional traits of medical students. Acad Med. 1993;68:799–803.
3. Hunt DD, Scott CS, Phillips TJ, Yergan J, Greig LM. Performance of residents who had academic difficulties in medical school. J Med Educ. 1987;62:170–6.
4. Project Professionalism. Philadelphia, PA: American Board of Internal Medicine, 1995.
5. Wolf TM, Balson PM, Faucett JM, Randall HM. A retrospective study of attitude change during medical education. Med Educ. 1989;23:19–23.
6. Feudtner C, Christakis DA, Christakis NA. Do clinical clerks suffer ethical erosion? Students' perceptions of their ethical environment and personal development. Acad Med. 1994;69:670–9.
7. Garfinkel PE, Bagby RM, Waring EM, Dorian B. Boundary violations and personality traits among psychiatrists. Can J Psych. 1997;42:758–63.
8. Graham JR. The MMPI: A Practical Guide. Second edition. New York: Oxford University Press, 1987.
9. Oser FK. Moral education and values education: the discourse perspective. In: Wittrock MC (ed). Handbook of Research on Teaching. New York: Macmillan Publishing Company, 1986:917–41.
10. Stern DT. Hanging out: teaching values in medical education [dissertation]. Stanford, CA: Stanford University, 1996.
11. Rezler AG, Schwartz RL, Obenshain SS, Lambert R, Gibson JM, Bennahum DA. Assessment of ethical decisions and values. Med Educ. 1992;26:7–16.
12. Novack DH, Detering BJ, Arnold R, Forrow L, Ladinsky M, Pezzullo JC. Physicians' attitudes toward using deception to resolve difficult ethical problems. JAMA. 1989;261:2980–5.
13. Stern DT. Practicing what we preach? An analysis of the curriculum of values in medical education. Am J Med. 1998;104:569–75.
14. Burack JH, Irby DM, Carline JD, Root RK, Larson EB. Teaching compassion and respect: attending physicians' responses to problematic behaviors. J Gen Intern Med. 1999;14:49–55.
15. Hunt DD, Carline J, Tonesk X, Yergna J, Siever M, Loebel JP. Types of problem students encountered by clinical teachers on clerkships. Med Educ. 1989;23:14–8.
16. Hunt DD. Functional and dysfunctional characteristics of the prevailing model of clinical evaluation systems in North American medical schools. Acad Med. 1992;67:254–9.
17. Gray JD. Global rating scales in residency education. Acad Med. 1996;71(10 suppl):S55–S63.
18. Woolliscroft JO, Howell JD, Patel BP, Swanson DB. Resident-patient interactions: the humanistic qualities of internal medicine residents assessed by patients, attending physicians, program supervisors, and nurses. Acad Med. 1994;69:216–24.
19. Johnson D, Cujec B. Comparison of self, nurse, and physician assessment of residents rotating through an intensive care unit. Crit Care Med. 1998;26:1811–6.
20. Rhoton MF. A new method to evaluate clinical performance and critical incidents in anesthesia: quantification of daily comments by teachers. Med Educ. 1989;23:280–9.
21. Brennan B, Norman GR. Use of encounter cards for evaluation of residents in obstetrics. Acad Med. 1997;72(10 suppl):S43–S44.
22. Rhoton MF. Professionalism and clinical excellence among anesthesiology residents. Acad Med. 1994;69:313–5.
23. Beckman H, Frankel R, Kihm J, Kulesza G, Geheb M. Measurement and improvement of humanistic skills in first-year trainees. J Gen Intern Med. 1990;5:42–5.
24. Klamen DL, Williams RG. The effect of medical education on students' patient-satisfaction ratings. Acad Med. 1997;72:57–61.
25. Klessig J, Robbins AS, Wieland D, Rubenstein L. Evaluating humanistic attributes of internal medicine residents. J Gen Intern Med. 1989;4:514–22.
26. Matthews DA, Feinstein AR. A new instrument for patients' ratings of physician performance in the hospital setting. J Gen Intern Med. 1989;4:14–22.
27. Butterfield PS, Mazzaferri EL. A new rating form for use by nurses in assessing residents' humanistic behavior. J Gen Intern Med. 1991;6:155–61.
28. Linn BS, Arostegui M, Zeppa R. Performance rating scale for peer and self assessment. Br J Med Educ. 1975;9:98–101.
29. Arnold L, Willoughby L, Calkins V, Gammon L, Eberhart G. Use of peer evaluation in the assessment of medical students. J Med Educ. 1981;56:35–42.
30. Helfer RE. Peer evaluation: its potential usefulness in medical education. Br J Med Educ. 1972;6:224–31.
31. Thomas PA, Gebo KA, Hellmann DB. A pilot study of peer review in residency training. J Gen Intern Med. 1999;14:551–4.
32. Parker PA Jr., Stevens CB, Duerson MC. Issues in medical education: basic problems and potential solutions. Acad Med. 1993;(10 suppl):S89–S93.
33. Ramsey PG, Carline JD, Blank L, Wenrich MD. Feasibility of hospital-based use of peer ratings to evaluate the performance of practicing physicians. Acad Med. 1996;71:364–70.
34. Ramsey PG, Wenrich MD, Carline JD, Inui TS, Larson EB, LoGerfo JP. Use of peer ratings to evaluate physician performance. JAMA. 1993;269:1655–60.
35. Van Rosendaal GMA, Jennett PA. Resistance to peer evaluation in an internal medicine residency. Acad Med. 1992;67:63.
36. Flaherty JA. Attitudinal development in medical education. In: Rezler A (ed) The Interpersonal Dimension in Medical Education. New York: Springer, 1985:147–82.
37. Testerman JK, Morton KR, Loo LK, Worthley JS, Lamberton HH. The natural history of cynicism in physicians. Acad Med. 1996;(10 suppl):S43–S5.
38. Clack GB, Head JO. Gender differences in medical graduates' assessment of their personal attributes. Med Educ. 1999;33(2):101–5.
39. Jankowski J, Crombie I, Block R, Mayet J, McLay J, Struthers AD. Self-assessment of medical knowledge: do physicians overestimate or underestimate? J Royal Coll Phys London. 1991;25:306–8.
40. Regehr G, Hodges B, Tiberius R, Lofchy J. Measuring self-assessment skills: an innovative relative ranking model. Acad Med. 71(10 suppl):S52–S4.
41. Harrington JP, Murnaghan JJ, Regehr G. Applying a relative ranking model to the self-assessment of extended performances. Adv Health Sci Educ. 1997;2:17–25.
42. PSQ Project Co-Investigators. Final Report on the Patient Satisfaction Questionnaire Project. Philadelphia, PA: American Board of Internal Medicine, 1989.
43. Schnabel GK, Hassard TH, Kopekow ML. The assessment of interpersonal skills using standardized patients. Acad Med. 1991;66(10 suppl):S34–S36.
44. Singer P, Cohen R, Robb A, Rothman A. The ethics OSCE. J Gen Intern Med. 1993;8:23–8.
45. Singer P, Robb A, Horman G, Turnbull J. Performance-based assessment of clinical ethics using an OSCE. Acad Med. 1996;71:495–8.
46. Smith SR, Balint JA, Krause K, Moore-West M, Viles PH. Performance-based assessment of moral reasoning and ethical judgment among medical students. Acad Med. 1994;69:381–6.
47. Reznick RK, Regehr G, Yee G, Rothman A, Blackmore D, Dauphinee D. Processrating forms versus task-specific checklists in an OSCE for medical licensure. Acad Med. 73;1998(10 suppl):S97–S99.
48. Medical Council of Canada. Information Pamphlet: Qualifying Examination Part II. Ottawa ON: Medical Council of Canada, 1999:11.
49. Arnold RM. Assessing competence in clinical ethics: are we measuring the right behaviors? J Gen Intern Med. 1993;8:52–4.
50. Norman GR, Davis D, Lamb S, Hanna E, Caulfor P, Kaigas T. Competency assessment of primary care physicians as part of a peer review program. JAMA. 1993;270:1046–51.
51. Hodges B, Turnbull J, Cohen R, Bienenstock A, Norman G. Evaluating communication skills in the OSCE format: reliability and generalizability. Med Educ. 1996;30:38–43.
52. Vu NV, Marcy ML, Verhulst SJ, Barrows HS. Generalizability of standardized patients' ratings of their clinical encounter with fourth-year medical students. Acad Med. 1990;65(10 suppl):S29–S30.
53. Irby DM, Milam S. The legal context for evaluating and dismissing medical students and residents. Acad Med. 64;1989:639–43.
54. Arnold EL, Blank LL, Race KEH, Cipparone N. Can professionalism be measured? The development of a scale for use in the medical environment. Acad Med. 1998;73:1119–21.
55. Christakis DA, Feudtner C. Ethics in a short white coat: the ethical dilemmas that medical students confront. Acad Med. 1993;68:249–54.
56. Szauter K, Boisaubin E. Professional ethical dilemmas of medical students during the medicine clerkship. Oral abstract presented at the 38th Annual Research in Medical Education Conference, Washington, DC, October 1999.
57. Lingard L, Haber RJ. What do we mean by “relevance”? A clinical and rhetorical definition with implications for teaching and learning the case-presentation format. Acad Med. 1999;74(10 suppl):S124–S127.
58. Lingard LA, Haber RJ. Teaching and learning communication in medicine: a rhetorical approach. Acad Med. 1999;74:507–10.
59. Arluke A. Social control rituals in medicine: the case of death rounds. In: Dingwall R, Heath C, Reid M, Stacey M (eds). Health Care and Health Knowledge. London, U.K.: Croom Helm, 1977:108–25.
60. Caldicott CV. What's wrong with this medical student today? Dysfluency on inpatient rounds. Ann Intern Med. 1998;128:607–10.
61. Stern DT, Caldicott CV. Turfing: patients in the balance. J Gen Intern Med. 1999;14:243–8.
Research in Medical Education: Proceedings of the Thirty-ninth Annual Conference. October 30 - November 1, 2000. Chair: Beth Dawson. Editor: M. Brownell Anderson. Foreword by Beth Dawson, PhD.