The ability to communicate successfully and to accurately appraise one’s performance during interaction is vital for effective practice in medicine, yet notoriously difficult to teach and assess. The Accreditation Council for Graduate Medical Education has recognized interpersonal and communications skills as a competency equal in importance to clinical skill and medical knowledge.1
Methodologies highly recommended by the Accreditation Council for Graduate Medical Education to teach and assess interpersonal and communication skills include multirater feedback, standardized patients, role play, and self-assessment.2 As each method possesses different strengths and weaknesses, it would be desirable to combine several to achieve a more holistic approach. This article describes an assessment approach, inspired by 360-degree evaluation, which combines simulation methodology with multirater and self-assessment.
Drawing from the field of 360-degree assessment, multirater feedback calls into question the hierarchical approach characteristic of traditional assessment3 and seeks to capture the same broad perspective derived by assessing differing professional groups4,5 while combining it with a more flexible usage, allowing for different combinations of raters and differing timescales for the evaluative process.6 This approach is growing as a means of assessment in medical education, with application in fields as diverse as surgery,7,8 anesthesia,9 internal medicine,5 and emergency medicine.10 In this last study, raters of different disciplines were found to have different foci of assessment in rating same videotaped material, further supporting the value of the multirater approach.
An additional benefit to multirater evaluation is its ability to assess self-appraisal by means of gap analysis. Gap analysis compares global scores obtained from other participants in the evaluation process with the self score, giving the evaluator the ability to discern potentially important discrepancies in self-perception.4 To date, no studies have evaluated gap analysis in healthcare environments, although many studies did include self-assessment.9,11–13
The open and nonlongitudinal nature of the multirater approach makes it an ideal instrument to assess self-appraisal in the simulated environment. Current frameworks treat self-appraisal as a global, longitudinal entity,14–17 but this approach has lead to poor results when self-appraisal is assessed. A recent systematic review by Davis et al,18 and a further study by Lockyer et al examining self-appraisal in the context of a longitudinal study,19 concluded that self-assessment abilities are stable perceptions, unlikely to change over time without intervention, and usually quite poor. On the basis of this information, Eva et al suggest that these current constructs of self-appraisal, which center on longitudinal reflection over the course of training, are inadequate.20,21 They further argue that the instability of self-appraisal as a global, longitudinal construct is due to the context-dependent nature of that skill and, thus, to ask if a trainee is “good” at self-appraisal over time may not be useful. Rather, self-appraisal should be construed as an “in-the moment” process of determining whether one has the skills to deal with a specific event. This situational approach to self-appraisal is well suited to the immersive but transient nature of the simulated environment.
We developed a multirater communication skills assessment instrument with gap analysis for use in enacted family meetings conducted as part of the Program to Enhance Relational and Communication Skills (PERCS). PERCS addresses the teaching and assessment of communication skills through realistic enactments of practitioner-patient and practitioner-family difficult conversations. The program was developed by the Institute for Professionalism and Ethical Practice at Children’s Hospital, Boston, Massachusetts with the goal of providing an educational atmosphere where difficult conversations in healthcare can be broached in a safe environment. The pedagogical approach of the program is based on relational learning and reflective practice,22 which creates a safe atmosphere for discussion and self-appraisal. The aim of this pilot study was 3-fold: (1) to study the use of multirater methodology during simulated difficult healthcare conversations; 2) to explore the use of gap analysis for the assessment of self-appraisal; and 3) to assess the feasibility of this combined approach.
Structure of the Simulated Environment
PERCS offers day long, simulation-based, communication skills, and relational ability training courses designed to address difficult conversations in healthcare.22 The program offers a brief didactic presentation detailing different approaches to communicating difficult news supplemented with videotapes depicting patient and family perspectives. It then proceeds with a series of realistic enactments typically lasting 15 minutes followed by debriefing and group discussion. Enacted case scenarios include a 5-year-old child hospitalized in the pediatric intensive care unit (PICU) with significant neurologic injury after a near-drowning event, a 17-year-old girl with relapsed acute myelogenous leukemia, and a premature infant on high ventilator settings who has recently suffered a grade IV intraventricular hemorrhage. Family members are portrayed by professional actors, who are provided training regarding the scenarios, and then asked to improvise in real time according to their understanding of the situation during the encounter.22
Interdisciplinary participants with varying levels of experience enroll in the courses, with an average attendance of 10 to 15 per session. Interdisciplinary teams of 2 to 3 practitioners (eg, physician and nurse, social worker or other psychosocial provider) participate in each encounter, engaging the actors-family members in a discussion as to their child’s clinical status, whereas others view the proceedings on closed-circuit television. Each interaction is followed by a debriefing and discussion led by faculty facilitators. The debriefings are reflective in nature, focusing on global issues that arise within the conversation. The actors-family members join the group to give feedback and participate in the discussion as “ethical understudies,” representatives of actual families.22
To develop the multirater communication skills instrument, we reviewed the literature for currently existing instruments to measure communication skills. Our goal was to locate an instrument of reasonable length with acceptable psychometric properties that could be adapted for both clinician and patient/actor use. We used several review articles in which the properties of multiple assessment instruments were compared in a systematic fashion.23,24 Twelve instruments were reviewed and examined.25–36 The Kalamazoo Consensus Statement Essential Elements Communication Checklist was chosen27 due to its reasonable length (1 page), high measures of internal consistency (Cronbach’s alpha of 0.88),24 and ease of adaptability. The Kalamazoo Essential Elements framework includes seven dimensions (ie, competencies) of communication: building a relationship, opening the discussion, gathering information, understanding the patient’s perspective, sharing information, reaching agreement on problems and plans, and providing closure.37 In a slightly adapted version of the original Kalamazoo assessment instrument,27 each competency is evaluated by a 5-point Likert scale (1 = poor, 2 = fair, 3 = good, 4 = very good, 5 = excellent). These descriptors were provided to raters on the instrument.
We added two additional dimensions (demonstrates empathy and communicates accurate information), not explicitly stated in the original Kalamazoo Checklist, to expand our measures of communication skills during the delivery of difficult information. The Kalamazoo Consensus Statement, an evidence-based framework of the essential elements of physician-patient communication, was designed to facilitate the development, implementation and evaluation of communication-related teaching, assessment, and standards in medical education. Although empathy was considered as a dimension during the development of the Kalamazoo framework, it was implicitly included in the build a relationship competency which is considered an ongoing task in all encounters. We chose to add an explicit empathy item to our multirater communication skills instrument to enhance our ability to evaluate difficult conversations, particularly those involving end-of-life care and the delivery of terminal diagnoses, encountered in the PERCS Program. This choice was supported by several articles dealing with end-of-life communication in which empathy emerged as a primary consideration for families,38–40 and by the Toronto Consensus Statement, a document reviewed by the Kalamazoo Consensus participants, that also includes empathy as a vital component of effective doctor-patient interaction.41 The communication of accurate information item was added because of the concern that an individual might engage well in a difficult conversation, might open effectively, exchange information, listen, and relate empathically to a patient or their family, but still not convey an accurate, direct picture of the clinical situation. The ability to provide accurate information is crucial during the conveyance of life-threatening diagnoses and when addressing end-of-life issues. Although not based on a particular model of delivering difficult news, this dimension speaks to the importance of truth-telling in the medical encounter.42–44
To counteract the potential inflation of absolute Likert scores (eg, halo effect), two forced choice rankings were also appended to our modified instrument. The first of these items asked the rater to identify the trainee’s strongest three communication competencies, and the second asked for the three communication competencies most in need of improvement. After each forced choice question, space was provided to give reasons why the choices had been made.
A similar version of the instrument was also established for family/actor use, with language at a sixth grade reading level. Reading level was determined by Microsoft Word’s calculated Flesch-Kincaid Grade Level. As with the clinician version, care was taken to maintain the instrument’s original structure and content. Modifications were made with the assistance of a coauthor and member (ER) of the original Kalamazoo Consensus Statement Group.
Physician subjects were recruited from the Neonatal Intensive Care (NICU) and PICU fellows at Children’s Hospital, Boston, Massachusetts. At present, all PICU fellows are required, and all NICU fellows are strongly encouraged, to enroll in the PERCS courses. All fellows were approached for inclusion in the study; the exclusion criterion was prior participation in the PERCS program within the past 6 months. Each subject was evaluated after one realistic enactment with the professional actors.
This Study was Conducted as a Descriptive, Prospective Case Series. Before the PERCS course, each subject was asked to participate in one of the interdisciplinary realistic enactments described earlier. Subjects were typically accompanied in these encounters by a member of the nursing staff or another psychosocial provider also enrolled in the course. Subjects then participated in the debriefing and discussion with faculty facilitators, other participants, and the actors themselves. All course participants present were then asked to complete the communication skills instrument and the subject was asked to complete a self-evaluation. All instruments were completed simultaneously.
Course participants were also asked to complete a set of demographic and evaluative items regarding their experience with the new instrument. Questions on this evaluation included the amount of time needed to complete the instrument, a yes/no question asking whether use of the instrument had changed the rater’s perceptions of the PERCS program, and a question asking whether the instrument added to, detracted from, or did not change their perception of the realistic enactment and why. This study was approved by the Institutional Review Board of Children’s Hospital Boston.
Cronbach’s alpha was calculated using the Likert scale data for the original seven dimensions of the instrument, and the nine dimensions comprising the modified instrument to determine if the two additional dimensions affected the instrument’s internal consistency. We further confirmed the uni-dimensionality of the instrument by performing a factorial analysis. Average Likert Scale data derived from different respondent groups (physician, actor/family, nursing, psychosocial providers, and self) were then compared using Spearman’s rank coefficient to determine whether those groups may have perceived the interaction differently. This comparison was performed for each of the seven enactments. All statistical analyses were performed using SPSS.
Instrument Scoring and Gap Analysis
For the Likert scale items, a dimension was defined as in need of improvement if the mean score for all raters fell below 3 (good) on the 1 to 5 scale. No criteria for strength was applied to the Likert items, as scores typically clustered in the 4 (very good) to 5 (excellent) range, making exceptional strengths difficult to distinguish.
Because all multirater instruments previously piloted in the medical literature employed Likert scales only, the interpretation of the forced choice questions required the development of a novel analytic approach. The forced choice items were assessed by calculating the net score for each dimension (defined as the number of times a dimension was indicated as a strength minus the number of times a dimension was indicated as in need of improvement). This result was then divided by the number of instruments completed, resulting in a net percentage of strength or weakness. Significance was defined as a net percentage greater than 40% or less than −40%. This cutoff was determined by examining the relative numbers of significant dimensions identified using 50%, 40%, and 30%. Given that the primary intent of this instrument was to determine areas of absolute and relative strength and need for improvement that could be used for focused debriefing and formative feedback, ±40% seemed the most reasonable criteria to generate a meaningful number of significant dimensions per meeting. Our goal in this process was to produce an index of relative strength or need for improvement that could be used to generate feedback.
Little research has been done to identify what represents a “significant” gap. One approach used in the business sector defined 0.5 as a cutoff for significance, assuming a 5-point Likert scale (20% difference in score).45 Our approach defined as significant a gap of ± 1 or more with respect to the self-reported score for each dimension. This more conservative approach was justified given the lack of literature on gap analysis in the healthcare context. Areas of potential self-under-appraisal were defined as those with a gap of +1 or more. Areas of potential self-over-appraisal were defined as those with a gap score of −1 or less. Global instrument data for a typical subject are depicted in Figure 1.
Seven fellows (5 PICU and 2 NICU) were assessed during seven simulated family meetings. Each meeting occurred as a scheduled part of the day-long PERCS educational courses. A total of 108 PERCS participants served as raters. Individual PERCS courses had between 11 and 18 participants and faculty facilitators (median 16). Within each program, between 2 and 4 participants (median 3) were physicians, between 1 and 6 (median 4) were nurses, between 5 and 7 (median 6) were psychosocial providers, and between 1 and 3 (median 2) were professional actors portraying patient family members.
Statistical analysis of internal instrument consistency revealed a Cronbach’s alpha of 0.84 for the original seven Kalamazoo dimensions, and 0.87 for the nine dimensions of the modified communication skills instrument. Factorial analysis indicated that all nine dimensions of the modified instrument contributed to a single measured construct.
Relationship Between Disciplinary Groups Surveyed
All of the realistic enactments showed a diverse pattern of correlation between rater group scores. These are displayed in Table 1. Although 89% of meetings had at least one significant correlation, in no meeting did the number of significant correlations exceed 50%. Only two (29%) of the enactments showed significant correlation between self-ratings and the ratings of other participants.
Of the seven surveyed enactments, 58% had average Likert scores of 3 or above across the nine dimensions of communication studied. The remaining 42% had one dimension (“Understands the patient/family’s perspective” in each case) scored below 3. Forced choice data showed three to five significant strengths or needs for improvement per enactment based on participant ratings. A total of 30 such dimensions were found distributed across all seven meetings.
A total of 24 significant self-over-appraisals and self-under-appraisals were found. Nine (38%) correlated with a dimension of communication also identified as a significant strength or need for improvement by course participants. The global relationships between these are depicted in Figure 2.
The communication skills instrument was generally well received. Average time to complete the instrument was 7 minutes (±2.7). Of the 108 PERCS participants who completed the instrument, 73 (68%) felt that their overall learning experience was unaffected by the instrument, 27 (25%) felt that the instrument was a positive addition to the learning environment, and only 6 (5%) felt that the instrument detracted from their experience. Two participants did not answer either question.
Those who felt that completing the instrument enhanced their own learning stated that the instrument stimulated critical reflection on the encounter, improved recall of otherwise overlooked events that occurred during the encounter, and spurred them to consider constructive means of delivering feedback. Those few who felt that completing the instrument detracted from the PERCS learning environment were concerned about a perceived “judgmental” nature to the instrument, over-focus on one individual to the detriment of the group, loss of a gestalt sense of the interaction, and difficulty administering the instrument within the time constraints of the program.
Our findings suggest that a multirater communication skills instrument, in combination with an analysis of the gaps between the perceptions of the trainee and those of observers from different professional groups, is a useful means to generate formative feedback and to potentially promote trainees’ insight in the area of communication and relational skills. The general perception of the instrument was neutral (68% of raters) to positive (25% of raters), and the increased sense of critical reflection and attention to generating constructive feedback mentioned by several of the raters gives evidence of how, in a best case scenario, such an instrument could enhance the debriefing process. Completion time for the instrument was reasonable at less than 10 minutes. Our modification of the Kalamazoo Essential Elements instrument to include two additional dimensions did not significantly alter the internal consistency of the instrument as evidenced by Cronbach’s alpha and factor analysis.
When correlation coefficients were calculated between rater groups within each encounter, most realistic enactments showed substantial discrepancy between subgroup scores (physicians, nurses, psychosocial providers, actors/families, and self). We interpret this as a possible evidence of the value of the multirater method in obtaining novel information, as the responses of members of one discipline were not necessarily predictive of the responses of other disciplines. Although such interpretation is supported by previous studies of the 360-degree methodology in medicine,12,46 further qualitative research is needed to explore the nature of the information gained by multirater evaluations.
The Likert scale component of the instrument seemed to suffer from some degree of positive skew. Reasonably high (3 = good or above) scores were common. Although low Likert scores seem to have meaning, it is difficult to determine whether scores of 4 (very good) or 5 (excellent) represent true strengths or merely “average” scores given characteristically high ratings. It is interesting that, for those enactments where Likert Scores achieved significance, the same dimension (Understands the patient/family’s perspective) was repeatedly identified as needing improvement. This finding could indicate a specific sensitivity of the Likert scale regarding this dimension and/or the need for more effective teaching in this area. These findings supported our decision to include the forced choice questions, as information useful for feedback would have been more difficult to derive from Likert scale results alone.
Despite our conservative approach to gap measurement, many significant areas of self-over-appraisal and self-under-appraisal were noted, some of which corresponded with significant strengths or needs for improvement as detected by other components of the communication skills instrument. If significant gaps do correspond to self-over- or under-appraisal, then correlating these scores with patterns in the rest of the data represents the most fruitful means of interpretation. In one approach to interpretation, shown in Table 2, significant strengths that correlate with areas of self under-appraisal represent areas of unrecognized strength, whereas strengths that correlate to areas of self over-appraisal represent known strengths not subject to active reflection. Areas needing improvement that are also under-appraised represent known weaknesses subject to active reflection. Of greatest concern in terms of feedback are areas where self over-appraisal is paired with a need for improvement. These areas likely represent those in which significant communication deficiencies are exacerbated by a lack of self awareness, and are thus unlikely to change without intervention. In previous studies, feedback has been cited as less useful, and potentially inflammatory, in situations where large gaps exist between self assessment and the group average,13,47 yet these are arguably the situations where feedback, education, and mentoring are most needed.
This study has several limitations, including a small sample size, and a focus on end-of-life situations that may limit its overall generalizability to larger educational contexts. Because the communication skills instrument was administered after the initial debriefing had already occurred, the additional issue of rater independence must be raised. This issue was considered at the beginning of the study, but was counterbalanced by the need to preserve the educational context of the simulation, debriefing, and discussion as initially structured. We were not able to calculate inter-rater reliability of the instrument due to the small sample size and uneven number of raters per meeting. Further work will be necessary to establish the psychometric properties of our modified instrument, and the relationship between the situational results generated by the instrument within the PERCS context and global communication competency.
With regard to the forced choice questions, it is important to recognize the provisional nature of the tests for significance used in this study. We did not interpret the results of these questions in isolation, instead treating them as a relative scale of strength and need for improvement that required contrast with the absolute Likert numbers to attain their proper meaning. Little statistical data exists regarding the appropriate means of interpreting questions of this nature, and it will be necessary to confirm our approach by both statistical and qualitative means as more data are obtained. Further statistical and qualitative work is also needed to confirm the optimal significance cutoffs for the gap analysis.
To comprehensively assess communication skills, it is necessary that a variety of approaches be used. This can be facilitated by combining simulation methodology with multirater assessment and self-appraisal. The simulated environment provides an immersive, engaging, and safe context in which an interaction can take place, whereas the multirater instrument quantifies the individual components of communication displayed during the interaction. The use of gap analysis contributes further insight by assessing situational self-appraisal abilities and correlating them with patterns of strength and need for improvement. We believe that this approach will allow for the provision of more comprehensive and thorough formative feedback on relational and communication skills than was previously possible.
2. Rider EA, Keefer CH. Communication skills competencies: definitions and a teaching toolbox. Med Educ
3. Pieperl M. Getting 360-degree Feedback Right. Harv Bus Rev
4. Lockyer J. Multisource feedback in the assessment of physician competencies. J Contin Educ Health Prof
5. Violato C, Marini A, Toews J, et al. Feasibility and psychometric properties of using peers, consulting physicians, co-workers, and patients to assess physicians. Acad Med
1997; 72(10 Suppl 1):S82–S84.
6. Foster C, Law M. How many perspectives provide a compass? differentiating 360-degree and multi-source feedback. Int J Sel Assess
7. Violato C, Lockyer J, Fidler H. Multisource feedback: a method of assessing surgical practice. Bmj
8. Lockyer J, Violato C, Fidler H. Likelihood of Change: A Study Assessing Surgeon Use of Multisource Feedback Data. Teach Learn Med
9. Lockyer JM, Violato C, Fidler H. A multi source feedback program for anesthesiologists. Can J Anaesth
10. Xiao Y, MacKenzie C, Orasanu J, et al. Information acquisition from audio-video-data sources: an experimental study on remote diagnosis. The LOTAS Group. Telemed J
11. Potter TB, Palmer RG. 360-degree assessment in a multidisciplinary team setting. Rheumatology (Oxford)
12. Wood J, Collins J, Burnside ES, et al. Patient, faculty, and self-assessment of radiology resident performance: a 360-degree method of measuring professionalism and interpersonal/communication skills. Acad Radiol
13. Weigelt JA, Brasel KJ, Bragg D, et al. The 360-degree evaluation: increased work with little return? Curr Surg
2004; 61:616–626; discussion 27–28.
14. Epstein RM. Mindful practice. JAMA
15. Westberg J, Jason H. Fostering learners’ reflection and self-assessment. Fam Med
16. Arnold L, Willoughby TL, Calkins EV. Self-evaluation in undergraduate medical education: a longitudinal perspective. J Med Educ
17. Epstein RM, Hundert EM. Defining and assessing professional competence. JAMA
18. Davis DA, Mazmanian PE, Fordis M, et al. Accuracy of physician self-assessment compared with observed measures of competence: a systematic review. JAMA
19. Lockyer JM, Violato C, Fidler HM. What multisource feedback factors influence physician self-assessments? A five-year longitudinal study. Acad Med
2007; 82(suppl 10):S77–S80.
20. Eva KW, Regehr G. Knowing when to look it up: a new conception of self-assessment ability. Acad Med
2007; 82(suppl 10):S81–S84.
21. Eva KW, Regehr G. Self-assessment in the health professions: a reformulation and research agenda. Acad Med
2005; 80(suppl 10):S46–S54.
22. Browning DM, Meyer EC, Truog RD, et al. Difficult conversations in health care: cultivating relational learning to address the hidden curriculum. Acad Med
23. Boon H, Stewart M. Patient-physician communication assessment instruments: 1986 to 1996 in review. Patient Educ Couns
24. Schirmer JM, Mauksch L, Lang F, et al. Assessing communication competence: a review of current tools. Fam Med
25. Lang F, McCord R, Harvill L, et al. Communication assessment using the common ground instrument: psychometric properties. Fam Med
26. Frankel R, Stein T. Getting the most out of the clinical encounter: the four habits model. J Med Pract Manage
27. Rider EA, Nawotniak RH, Smith GD. A Practical Guide to Teaching and Assessing the ACGME Core Competencies. Marblehead, MA: HCPro, Inc; 2007.
28. van Thiel J, Kraan HF, Van Der Vleuten CP. Reliability and feasibility of measuring medical interviewing skills: the revised Maastricht History-Taking and Advice Checklist. Med Educ
29. Kalet A, Pugnaire MP, Cole-Kelly K, et al. Teaching communication in clinical clerkships: models from the macy initiative in health communications. Acad Med
30. Spruill T. Approaches to Competency Assessment. 23 Annual Forum on Behavioral Sciences in Family Medicine. 2002 Sept 30, Chicago.
31. Makoul G. The SEGUE Framework for teaching and assessing communication skills. Patient Educ Couns
32. Novack D, Dube C, Goldstein M. Teaching medical interviewing: a basic course on interviewing and the physician-patient relationship. Arch Intern Med
33. Cohen DS, Colliver JA, Marcy MS, et al. Psychometric properties of a standardized-patient checklist and rating-scale form used to assess interpersonal and communication skills. Acad Med
1996; 71(suppl 1):S87–S89.
34. Epstein RM, Dannefer EF, Nofziger AC, et al. Comprehensive assessment of professional competence: the Rochester experiment. Teach Learn Med
35. Lipner RS, Blank LL, Leas BF, et al. The value of patient and peer ratings in recertification. Acad Med
2002; 77(suppl 10):S64–S66.
36. Stewart M. The impact of patient-centered care on outcomes. J Fam Pract
37. Participants in the Bayer-Fetzer Conference on Physician-Patient Communication in Medical Education. Essential elements of communication in medical encounters: the Kalamazoo Consensus Statement. Acad Med
38. Gillotti C, Thompson T, McNeilis K. Communicative competence in the delivery of bad news. Soc Sci Med
39. Meyer EC, Ritholz MD, Burns JP, et al. Improving the quality of end-of-life care in the pediatric intensive care unit: parents’ priorities and recommendations. Pediatrics
40. Finlay I, Dallimore D. Your child is dead. BMJ
41. Simpson M, Buckman R, Stewart M, et al. Doctor-patient communication: the Toronto consensus statement. BMJ
42. Surbone A. Truth telling. Ann N Y Acad Sci
43. Resnik DB. Ethical dilemmas in communicating medical information to the public. Health Policy (Amsterdam, Netherlands)
44. Blackhall LJ, Frank G, Murphy S, et al. Bioethics in a different tongue: the case of truth-telling. J Urban Health
46. Joshi R, Ling FW, Jaeger J. Assessment of a 360-degree instrument to evaluate residents’ competency in interpersonal and communication skills. Acad Med
47. Brett JF, Atwater LE. 360 degree feedback: accuracy, reactions, and perceptions of usefulness. J Appl Psychol