The ability to understand and demonstrate professionalism is integral to the practice of medicine.1 More than a decade ago, Papadakis and colleagues2 showed that physicians who appeared before five state medical boards for professional misconduct had higher rates of unprofessional behavior in medical school than their peers, emphasizing the need for early identification, management, and remediation of professionalism lapses. Although the Liaison Committee on Medical Education standard (Element 3.5) requiring schools to identify and correct violations of professional standards was implemented in 2008,3 years later Ziring and colleagues4 found wide variations in defining, identifying, and remediating lapses among 93 medical schools. They also identified a major barrier to the effective management of lapses—faculty reluctance to report them.
Several possible reasons for faculty reluctance to report professionalism lapses have been suggested. One reason is a lack of conceptual clarity and consensus about the definition of professionalism in medical education.5 Three dominant frameworks with different discourses and definitions currently exist, contributing to the lack of a unified mental model.6 Another is that evaluators are often reluctant to report relatively minor lapses7 , 8 or to fail underperforming trainees for fear of harming their reputation (e.g., getting into a “good” residency).9–11 Lack of faculty development and training also may limit reporting. Ziring and colleagues4 found that, while 93.5% (87/93) of schools had policies/expectations that faculty address professionalism lapses directly with students, fewer than half had any formal faculty development for this role. Finally, competing priorities for faculty, such as time spent on clinical tasks and completing electronic medical records, may make reporting feel burdensome.12
Although rich conceptually, empirical evidence about faculty reluctance to report professionalism lapses is limited. We conducted this study to address this gap in knowledge by collecting and analyzing faculty perceptions from three U.S. and one Canadian medical school. Using data from multiple medical schools had the benefit of greater generalizability of results and the potential for faculty development interventions that are salient, focused, and specific.
We conducted a mixed-methods study using an innovative asynchronous approach, group concept mapping, to identify perceived barriers to reporting medical students’ professionalism lapses. The study was conducted at the Cleveland Clinic Lerner College of Medicine, Drexel University College of Medicine (DUCOM), Indiana University School of Medicine (IUSOM), and the University of Ottawa Faculty of Medicine, from June 2015 to January 2016. Institutional review board approval was obtained at all institutions.
Process and participants
Group concept mapping combines “qualitative (item collection) and quantitative methods (multi-dimensional scaling and hierarchical cluster analysis).”13 Initially introduced for program planning and evaluation, this methodology is now an established educational research tool.13–18 Compared with focus groups, group concept mapping has the advantage of generating online asynchronous data from participants, thereby allowing broad geographic participation.
Participants first nominate ideas (virtual brainstorming). Next, pruning and adding items that were not already identified, but were felt to be important, are done by researchers to reduce redundancy and maximize the range of items included. Participants then sort (i.e., for similarity) and rate the ideas (i.e., for level of agreement). Twenty to thirty participants are optimal for generating valid results from sorting, whereas the average number of participants in the rating task, according to a meta-analytical review of 69 group concept mapping studies, is 81.77.19 Next, multidimensional scaling and hierarchical cluster analysis are used to aggregate the individual inputs from participants and to generate patterns. A key characteristic of group concept mapping is its reliance on visual representations during this step, which enables data structures to be analyzed and interpreted as spatial relationships. See Figure 1 for an overview of these five steps in the group concept mapping process.
We crafted an initial prompt and demographic queries to use in our study over the course of three hour-long conference calls. The prompt read: “You are supervising a medical student who demonstrates a professionalism lapse. What do you consider to be the barriers to reporting this student?” The prompt was left sufficiently broad to maximize variations in responses by faculty.
To generate a list of faculty involved in medical student education who could participate in our study, we contacted via e-mail department chairs and clerkship directors at each school and requested the contact information for faculty in supervisory roles. A total of 431 physicians (range 100–117 per school) received an e-mail inviting them to participate in step 1 (brainstorming responses to the prompt). Participants were assured of anonymity and given a link to the brainstorming page of a web-based tool hosted by Concept Systems GlobalMax (Concept Systems Incorporated, Ithaca, New York), which we used for data collection and analysis. A reminder e-mail was sent two weeks later. Because responses were anonymous, written consent was not obtained, and responding to the prompts was considered to be consent in accordance with the institutional review board protocol approval we received. No incentives for participation were offered.
A total of 184 participants took part in step 1 (brainstorming) and generated 191 unique statements in response to the prompt (42.7% response rate) over a three-week period from June to July 2015. The statements were independently reviewed and thematically coded by three members of the project team (D.D., D.Z., H.L.). Redundancies were removed by consensus, reducing the number of unique statements to 45. A gap analysis to ensure maximum variation identified 10 additional statements, bringing the total number of statements to 55.
Next, all 431 initially identified physicians were sent e-mail invitations to participate in the sorting and/or rating process online. Invitees could sort only, rate only, or complete both tasks. Reminder e-mails were sent 2, 4, and 11 weeks later.
Each participant sorted the statements into similarly themed clusters using the following rules: (1) statements can be placed into only one group; (2) each statement cannot be placed into its own group; and (3) all statements cannot be placed into a single group. Participants then rated each statement on a Likert scale from 1 (strongly disagree) to 5 (strongly agree) to indicate their agreement with that statement as a barrier to faculty reporting medical students’ professionalism lapses. This sorting and rating occurred from October 2015 to January 2016.
Demographic questions collected participants’ (1) institutional affiliation, (2) discipline of practice, (3) gender, (4) years since graduation from medical school, and (5) years of experience supervising medical students.
We performed multidimensional scaling and hierarchical cluster analyses and generated a point map positioning each of the 55 statements (step 4). Members of the project team (H.L., D.Z., D.D.) also determined the ideal number of clusters (themes) by checking different potential solutions provided by the Ward hierarchical cluster analysis until the best fit for the data was achieved. The best fit used six clusters; we labeled the clusters by consensus. The stress value to check goodness of fit for group concept mapping studies should be in the range of 0.205 to 0.36519; ours was 0.248.
The position of the clusters in the point map relative to one another reflected how often statements were sorted into similar themes. Clusters with greater distance between them represented distinctly sorted themes, whereas more closely positioned clusters indicated more highly related themes. Mean cluster ratings were determined by averaging the mean statement ratings within each cluster. The importance of a cluster was determined by the number of highly rated statements it contained and was illustrated by layering. Clusters with more highly rated statements were visually represented with more layers (the maximum was five layers based on the Likert rating scale we used). Cluster size was also important—smaller clusters indicated closely related statements more frequently sorted into similar piles by participants, and larger clusters contained statements that were related but not to the same degree.
The correlation of agreement with demographic variables was computed and illustrated using a pattern ladder match, with r representing the Pearson product–moment correlation between pairs of clusters as a measure of congruence. The individual statement ratings were analyzed further using mean ratings by selected subgroups. This information was compiled into a simple correlation graph that enabled us to compare the highest- and lowest-rated statements by selected demographic subgroups.
Participation rates varied by step: brainstorming (184/431; 42.7%), sorting (48/431; 11.1%), and rating (83/431; 19.3%). Although response rates varied by school for each step, we found no significant differences in the statement ratings between schools, so we aggregated the data for analysis. We identified six thematic clusters in participants’ responses to the prompt and ranked them from highest to lowest based on their mean statement ratings:
- Uncertainty about the process (3.18)
- Ambiguity about the “facts” (3.13)
- Effects on the learner (3.09)
- Time constraints (3.07)
- Fear of retribution (2.80)
- Responsibility for reporting (2.62)
Differences among the top four rated clusters were nonsignificant, but those between the highest-rated cluster and the two lowest were significant (P < .005). A three-dimensional representation of the clusters and the number of layers each contained is summarized in the cluster rating map (see Figure 2).
Mean statement ratings for all 55 statements are shown in Chart 1. Although the overall mean rating was highest for cluster 1, “uncertainty about the process,” the most highly rated individual statements were in cluster 2, “ambiguity about the ‘facts.’” These highly rated statements were (1) “if the event was not witnessed by me personally” and (2) “lack of information about the student as to whether this is a pattern of behavior.”
We also compared ratings by participant gender. Of the 83 raters, 52 were female (62.7%). The pattern ladder match in Figure 3 compares the mean cluster ratings by gender. A pattern ladder match graphically illustrates the pairwise relative ratings (of the thematic clusters) by group. The highest level of gender-based agreement was in cluster 1, “uncertainty about the process” (females 3.25; males 3.08). The lowest-rated cluster by both groups was cluster 6, “responsibility for reporting” (females 2.62; males 2.55). Although females rated cluster 3, “effects on the learner,” higher than males (females 3.10; males 3.07), this difference was not significant. The overall Pearson product coefficient of correlation for agreement between genders was 0.98.
Because agreement regarding “uncertainty about the process” was strongest for both genders, we conducted additional analyses of the individual statements (see Figure 4, x-axis = females, y-axis = males). Statements in the upper right quadrant were rated highest by both genders, while those in the lower left quadrant were rated lowest. For example, statement 6, “the need to have corroborating evidence in case of a challenge or appeal,” and statement 12, “easier to warn the student directly and avoid a formal report,” had the highest mean statement rating (3.67) by both genders among the 12 statements in the “uncertainty about the process” cluster.
Another demographic subgroup analysis we conducted was based on years of experience supervising medical students. We compared participants with less than 5 years of supervisory experience (21/83; 25.3%) versus those with more than 20 years (18/83; 21.7%). Those with more than 20 years gave the highest mean rating (3.15) to cluster 2, “ambiguity about the ‘facts,’” while those with less than 5 years rated cluster 1, “uncertainty about the process,” highest (3.36). The difference was nonsignificant (P > .05), and the overall correlation of agreement was high (r = 0.90).
An additional subgroup analysis by highest and lowest medical school response rates (IUSOM: 32/83 [38.6%]; DUCOM: 9/83 [10.8%]) revealed a high degree of correlation (r = 0.76). The subgroup analysis comparing participants who were 10 years or less from medical school graduation (18/83; 21.7%) with those who were more than 10 years (65/83; 78.3%) was nonsignificant and highly correlated (r = 0.86). Lastly, the subgroup analysis comparing participants with surgical training (12/83; 14.5%) versus those with nonsurgical training (71/83; 85.5%) revealed no significant differences in ratings. Overall correlation for all statement ratings was r = 0.75.
The mean statement ratings for the 10 statements we added to the original brainstorming results varied from 1.83 to 3.40. The lowest-rated statement among these was “not my responsibility” (1.83).
We were interested in exploring the gap between conceptual and empirical frameworks for understanding faculty reluctance to report medical students’ professionalism lapses. Our analysis revealed that faculty of both genders across four schools identified “uncertainty about the process” as the most significant barrier. Individual statement ratings in this cluster demonstrated that, while faculty recognized an individual’s responsibility for reporting, lack of information about students’ behavior in other educational contexts made the task more difficult and nuanced. Similarly, lack of access to and continuity of information from setting to setting, which is an administrative or organizational function, was a challenge in terms of knowing whether a lapse was an isolated event or part of a pattern of unprofessional behavior.
The cluster “uncertainty about the process” represents several different concerns as evidenced by the mean statement ratings in this category. First, the concern about corroborating evidence in case of a challenge or appeal may be partly myth and partly real. For example, in the case of an apparent professionalism lapse that was not witnessed firsthand (e.g., three students turn in identical answers to a take-home quiz), faculty willingness to report the lapse may waver without corroboration that it was a case of cheating and not an unlikely, but possible, coincidence. Without independent corroborating evidence, the veracity of a faculty member’s report can be challenged, especially if the incident was not witnessed by others or was not part of a larger pattern. Second, professionalism lapses vary in severity, and it is not always clear what the thresholds and consequences for reporting are. As a matter of policy to clarify this issue at IUSOM, for example, there is a two-tiered system consisting of a Professionalism Concern (handled informally and does not appear on the student’s transcript) and an Isolated Deficiency (requires a hearing before the student promotions committee and appears on the student’s transcript).20 Third, not knowing or controlling the process after a report is made creates a potential burden in terms of time, effort, and outcomes. It also creates an interpersonal barrier in terms of anxiety over whether reporting a professional lapse could turn into a legal battle. With such uncertainty about the process, the path of least resistance is to avoid reporting.
In other clusters, we found greater variability in terms of the strength of faculty endorsement of individual statements. For example, although time was identified as a barrier, there was less consensus over what aspects of time were most significant, the [institutional] process of reporting or the [interpersonal] lack of time to discuss the lapse immediately with the student. A similar split between institutional and interpersonal processes affected highly rated statements in cluster 3, “effects on the learner.” By contrast, cluster 2, “ambiguity about the ‘facts,’” included interpersonal concerns that were quite similar—being a primary observer of the event and knowing whether the behavior was isolated or not.
Cluster 4, “fear of retribution,” was one of the lower-rated themes, with avoiding conflict before and after reporting being the dominant concerns. Although we cannot distinguish the source of these fears (e.g., retribution from students or superiors), conflict avoidance is known to be a major barrier to reporting in highly bureaucratized environments, like the military21 and medicine, surgery in particular.22 , 23 Faculty development in communicating across authority gradients and “stopping the line” might be useful in addressing this barrier and could be translated from quality and safety initiatives to the professionalism reporting environment.24 Evidence of strong agreement between male and female faculty in rating “fear of retribution” below the other barriers is reassuring.
While the cluster rankings are important, the individual statement ratings provide additional insights. For example, faculty rated “not my responsibility” and “not my problem” lowest among the statements, indicating that they believed that responsibility for reporting was their job. Yet, among the highest-rated statements was “lack of information about the student as to whether this is a pattern of behavior,” which illuminates the need for greater contextual longitudinal information to enhance faculty reporting. However, the risks and benefits of forward-feeding information about students continue to be debated.25–27
Our subgroup analyses by demographic variables revealed no significant differences in ratings by group and instead showed a high degree of correlation. This finding supports the uniformity of dominant faculty concerns about reporting professionalism lapses that can be addressed with faculty development efforts broadly rather than requiring more specific approaches based on demographics, such as years since medical school, years supervising medical students, or discipline of practice.
This study has several limitations. First, it was based on convenience sampling. Larger, more representative samples, including those faculty with fewer educational responsibilities, will be needed to gain a clearer picture of faculty attitudes toward reporting professionalism lapses. Second, the study may have been subject to selection bias with certain types of faculty volunteering to participate. We partly addressed this potential bias by using multiple sites in the United States and Canada. Third, participants may not reflect the faculty body as a whole, with potential overrepresentation of those who had a previous negative experience or those with an interest in professionalism. Fourth, the four medical schools we included are located in urban locations, and their faculty may not reflect the faculty at other medical schools. Fifth, a small percentage of invited faculty completed the sorting and rating steps. However, similar group process approaches, like focus groups involving small numbers of participants, are considered reasonable, and 82 raters are considered sufficient to generate valid conclusions for group concept mapping.19 Finally, the prompt was kept deliberately vague to elicit a broad range of responses. Perhaps, with more information, participants may have answered differently, creating a different concept map.
Despite these limitations, our findings are important indicators of faculty-perceived barriers to reporting professionalism lapses. Acknowledging and understanding these perceptions are critical to developing intervention strategies to improve the process. These findings demonstrate the complexity that underpins faculty decisions to report lapses. On the one hand, it would be easy to “blame” faculty for not doing a better job of reporting. On the other hand, our findings suggest that reporting is more complex and nuanced than a simple binary choice to report or not, and it involves interpersonal and organization-level considerations. These data can help focus and enhance our interventions related to this essential element of medicine as we move forward.
The findings from this study suggest several next steps. First, the failure to report professionalism lapses is both an individual and a systems problem and should be addressed as such. At the individual level, it will be important to ensure that policies and procedures are clearly stated and that there are sufficient faculty development programs to help implement and sustain these efforts. At the systems level, crafting effective reporting programs, developing and pilot testing systems approaches (similar to error reporting), and engaging faculty will be important. Finally, dialogue among faculty, students, and administrators about definitions, expectations, and the evaluation of professionalism, including the criteria for reporting lapses and the consequences that follow, would help clarify the process for all concerned. Placing the challenges of reporting on more empirical footing represents a first step in designing interventions that clarify and strengthen faculty and institutional commitment to professionalism as a cornerstone of medical education and practice.
The authors would like to acknowledge the important support provided by Meredith MacKay, MA.
1. Rabow MW, Remen RN, Parmelee DX, Inui TS. Professional formation: Extending medicine’s lineage of service into the next century. Acad Med. 2010;85:310–317.
2. Papadakis MA, Hodgson CS, Teherani A, Kohatsu ND. Unprofessional behavior in medical school is associated with subsequent disciplinary action by a state medical board. Acad Med. 2004;79:244–249.
3. Liaison Committee on Medical Education. Functions and structure of a medical school: Standards for accreditation of medical education programs leading to the MD degree. http://lcme.org/publications/#Standards
. Published March 2017. Accessed February 5, 2018.
4. Ziring D, Danoff D, Grosseman S, et al. How do medical schools identify and remediate professionalism lapses in medical students? A study of U.S. and Canadian medical schools. Acad Med. 2015;90:913–920.
5. Birden H, Glass N, Wilson I, Harrison M, Usherwood T, Nass D. Defining professionalism in medical education: A systematic review. Med Teach. 2014;36:47–61.
6. Irby DM, Hamstra SJ. Parting the clouds: Three professionalism frameworks in medical education. Acad Med. 2016;91:1606–1611.
7. Ginsburg S, Regehr G, Hatala R, et al. Context, conflict, and resolution: A new conceptual framework for evaluating professionalism. Acad Med. 2000;75(10 suppl):S6–S11.
8. Phelan S, Obenshain SS, Galey WR. Evaluation of the noncognitive professional traits of medical students. Acad Med. 1993;68:799–803.
9. Dudek NL, Marks MB, Regehr G. Failure to fail: The perspectives of clinical supervisors. Acad Med. 2005;80(10 suppl):S84–S87.
10. Cleland J, Knight L, Rees C, et al. Is it me or is it them? Factors that influence the passing of underperforming students. Med Educ. 2008;42:800–809.
11. Monrouxe LV, Rees CE, Lewis NJ, Cleland JA. Medical educators’ social acts of explaining passing underperformance in students: A qualitative study. Adv Health Sci Educ Theory Pract. 2011;16:239–252.
12. Sinsky C, Colligan L, Li L, et al. Allocation of physician time in ambulatory practice: A time and motion study in 4 specialties. Ann Intern Med. 2016;165:753–760.
13. Trochim W, Kane M. Concept mapping: An introduction to structured conceptualization in health care. Int J Qual Health Care. 2005;17:187–191.
14. Trochim WM. An introduction to concept mapping for planning and evaluation. Eval Program Plann. 1989;12:1–16.
15. Kane M, Trochim WM. Bickman L, Rog DJ. An introduction to concept mapping. In: Concept Mapping for Planning and Evaluation. 2007.Thousand Oaks, CA: Sage Publications Inc..
16. Novak JD. Concept mapping: A useful tool for science education. J Res Sci Teach. 1990;27:937–949.
17. Sutherland S, Katz S. Concept mapping methodology: A catlyst for organizational learning. Eval Program Plann. 2005;28:257–269.
18. Hynes H, Stoyanov S, Drachsler H, et al. Designing learning outcomes for handoff teaching of medical students using group concept mapping: Findings from a multicountry European study. Acad Med. 2015;90:988–994.
19. Rosas SR, Kane M. Quality and rigor of the concept mapping methodology: A pooled study analysis. Eval Program Plann. 2012;35:236–245.
20. Frankel RM. Feldman MD, Christensen JF. Professionalism. In: Behavioral Medicine: A Guide for Clinical Practice. 2008.3rd ed. Columbus, OH: McGraw-Hill Companies, Inc..
21. Mengeling MA, Booth BM, Torner JC, Sadler AG. Reporting sexual assault in the military: Who reports and why most servicewomen don’t. Am J Prev Med. 2014;47:17–25.
22. Wild JR, Ferguson HJ, McDermott FD, Hornby ST, Gokani VJ; Council of the Association of Surgeons in Training. Undermining and bullying in surgical training: A review and recommendations by the Association of Surgeons in Training. Int J Surg. 2015;23(suppl 1):S5–S9.
23. Marshall P, Robson R. Preventing and managing conflict: Vital pieces in the patient safety puzzle. Healthc Q. 2005;8(spec no):39–44.
24. Haig KM, Sutton S, Whittington J. SBAR: A shared mental model for improving communication between clinicians. Jt Comm J Qual Patient Saf. 2006;32:167–175.
25. Cleary L. “Forward feeding” about students’ progress: The case for longitudinal, progressive, and shared assessment of medical students. Acad Med. 2008;83:800.
26. Cohen GS, Blumberg P. Investigating whether teachers should be given assessments of students made by previous teachers. Acad Med. 1991;66:288–289.
27. Cox SM. “Forward feeding” about students’ progress: Information on struggling medical students should not be shared among clerkship directors or with students’ current teachers. Acad Med. 2008;83:801.