Colliver, Jerry A. PhD; Conlee, Melinda J.; Verhulst, Steven J. PhD; Dorsey, J. Kevin MD, PhD
Empathy is arguably the most important psychosocial characteristic of a physician engaged in patient care. “Empathy” refers to a caregiver's cognitive and vicarious understanding of the patient as a person—an understanding that is thought to generate confidence and trust in the doctor–patient relationship and to promote effective treatment and healing. Self-report instruments have been used to measure physician empathy,1–3 and a number of studies have examined factors related to empathy in medical school and residency training.4–14
The disturbing conclusion of these studies is that empathy declines during medical education. One study says it “provides empirical evidence to show that empathy declines in medical school by using a new psychometrically sound tool developed specifically to measure empathy in patient care situations.”8 Another paper notes, “A decline in empathy during medical training has been reported in North American medical students and is regarded as a cause for concern.”9 The depth of this concern is reflected in journal article titles, such as “Is there hardening of the heart during medical school?”14 and “Vanquishing virtue,”15 as well as in statements like, “The relevant question is not how to create (humane qualities) but how it comes about that medical education destroys them.”16 This thinking is summed up in the following: “Both anecdotal reports and research studies point to significant negative shifts in students' attitudes toward patients between the preclinical and clinical years.”17
Several authors have speculated that the decline in empathy is a result of a self-protective cynicism that blunts the pain and suffering encountered during clinical interactions with patients and their families.12,14,15 However, others criticize the “medical school selection process, arguing that the premedical treadmill gives precedence to science majors who have high grades and test scores, and who demonstrate personality characteristics such as detachment and competitiveness” and argue that this process “undervalues qualitative or affective aspects of the applicants' characters and accomplishments.”15 Others question whether “the decline is reflective of the prevalent teaching methods and modifiable with better methods or is an unavoidable psychological effect of the acculturation process into the medical profession.”12 The anxiety and frustration generated by the purported decline are reflected in one article's subtitle: “How can we stop the rot?”17
These studies showing a decline in empathy during medical school and residency training raise serious questions about medical education practice—questions that call for a further critical reexamination of the studies and their results to determine the extent of the decline and the seriousness of the problem. That is, did the results of these studies show large declines in self-reported empathy with serious implications for patient care that call for corrective action? Or were the declines small and inconsequential, showing little of practical significance? And, could the observed declines possibly be accounted for by common confounding factors (i.e., constant biases),18,19 such as low and differential response rates?
The primary purpose of this study, then, was to address the strong reaction to the conclusions of these empathy studies by reexamining their results to obtain a meaningful picture of the magnitude of the decline reported in the group of studies—to determine just how serious the problem really is. We started by transforming the results reported back to the units of the original rating scales, to base conclusions on the scale actually rated by students and residents and to make results more interpretable by reporting them in the metric of the original anchors that give meaning to the ratings. Second, we reexamined the relationship between empathy ratings and response rates to see whether response bias was a plausible threat to the validity of the empathy decline conclusion.20,21 Finally, in this report we discuss self-report empathy instruments and consider whether such assessment instruments are valid indicators of empathy in patient care.
Beginning in 2008, we obtained copies of those articles that are routinely cited as showing a decline in empathy during medical education and examined their references for further articles. We then conducted searches on Google Scholar, PubMed, and Web of Science, first using the phrase “empathy in medical education,” followed by various other search terms that included “empathy.” From these results, we examined article abstracts published from 2000 to 2008 to identify recent studies that examined change in empathy over time in medical students and residents. In addition to this, we did a hand search of the tables of contents of four medical education journals (Academic Medicine, Advances in Health Sciences Education, Medical Education, and Teaching and Learning in Medicine) and identified titles containing the word “empathy” published from 2000 to 2008. Again, from these results we examined the reference lists of all articles for more related publications. We eliminated any of these articles that did not directly examine empathy over time (i.e., intervention evaluations based on previous claims of empathy decline and editorials regarding the state of medical education in light of studies showing decline). For example, one large-scale study that compared empathy ratings throughout training and into practice reported that about half of the respondents had “participated in classes or sessions about empathy,”22 which confounds the usual medical education with this special training. Through this process we identified 11 articles that examined change in empathy over time for various stages of medical training, and these articles became the focus of the present study.4–14
Three authors (J.A.C., M.J.C., S.J.V.) carefully read the 11 studies identified by the search and checked the results extracted from them. This involved checking the accuracy of entries in Table 1 and in the text based on that information. For Study 8 (Table 1), we estimated the response rates given the n values reported for each training level, based on the assumption that each class consisted of one-fourth of the total 214 respondents.
Three self-report instruments were used to measure empathy in the 11 studies reviewed: the Interpersonal Reactivity Index–Empathy Concern subscale (IRI-EC),1 the Jefferson Scale of Physician Empathy (JSPE),2 and the Balanced Emotional Empathy Scale (BEES).3 The IRI-EC subscale consists of seven items rated on a five-point scale from 0 (does not describe me well) to 4 (describes me very well). The total scores are summed ratings that range from 0 to 28. The JSPE consists of 20 Likert scale items rated on a seven-point scale from 1 (strongly disagree) to 7 (strongly agree), with total (summed) scores that range from 20 to 140. The BEES has 30 items rated on a nine-point scale from −4 (very strongly disagree) to +4 (very strongly agree), with total (summed) scores that range from −120 to +120.
The purpose of this review was primarily descriptive and involved determining the means, sample sizes, and response rates for each year of training reported in the 11 studies. The means of sums reported in the papers were transformed back to the units of the original rating scales, so results could be reported as mean ratings in the metric based on the original anchors. This was accomplished by dividing the means of sums by the number of items in the scale. The transformation required dividing the means of sums by 7 (items) for IRI-EC, by 20 for JSPE, and by 30 for BEES. For each study, we report the size (n) of the target sample (all eligible students or residents) and, for each level of training, the mean rating and the response rate (percentage of sample who responded). Surprisingly, the three scales provide labels for only the extreme anchors and not for the midrange anchors which are needed to give narrative meaning to the mean ratings. Consequently, for the purposes of this presentation, we inferred labels for the midrange anchors, that corresponded to the observed mean ratings, based on the extreme labels and the general pattern used to label such anchors, as follows: for IRI-EC, 3 = “describes me well”; for JSPE, 4 = “neutral” and 6 = “agree”; and for BEES, 1 = “somewhat agree.”
Mean empathy ratings
On average, the 11 studies in Table 1 showed a decline in mean empathy ratings, a change of about −0.2 points from the first year of training to the last. Specifically, the mean ratings changed on average −0.2 points for the 5-point IRI-EC, −0.2 points for the 7-point JSPE, and −0.3 points for the 9-point BEES. Across studies, the changes in mean ratings over time varied from a 0.1-point increase (Study 6) to a 0.5-point decrease (Study 9). For the most part, mean ratings throughout medical school and residency on the 5-point IRI-EC subscale (Studies 1–4) clustered around 3 = “describes me well.” Mean ratings on the 7-point JSPE Likert scale (Studies 5–9) were a little below 6 = “agree.” And, on the 9-point BEES scale (Studies 10 and 11), mean ratings were slightly above 1 = “somewhat agree.” (One exception: on the JSPE for MBBCh students in Study 6, the three mean ratings were right at 4 = “neutral.”) In addition, visual inspection of Table 1 shows that the mean ratings for medical school were quite similar to those for residency, starting at about the same level and ending at the same level. For IRI-EC, the ranges were from 3.2 to 3.0 for medical school and 3.2 to 2.8 for residency. For JSPE, the ranges were 6.2 to 5.3 and 5.9 to 5.7, respectively. (BEES mean ratings were available only for medical school, so comparison with residency was not possible.)
Response rates were reported (or estimated) by year of training for four studies (Studies 1, 2, 8, and 9). The response rates were considerably higher for the first level of training and then declined on average by about 26 percentage points for the last level of training. Visual inspection of Table 1 shows that, in general, higher mean ratings were associated with higher response rates and lower mean ratings with lower response rates. For example, Study 9, which used the seven-point JSPE scale, reported the largest change in mean ratings of empathy with a decline of 0.5 points: Mean ratings declined from 5.8 for premedical school with a response rate of 96% to 5.3 for fourth-year students with a response rate of 59%. In addition, the overall response rates for two other studies (Studies 6 and 8) were 46% and 61%, and for the three longitudinal studies that used paired analyses (Studies 3, 4, and 5), response rates were 56%, 69%, and 80%, which suggests that response bias is a plausible explanation of the small differences in mean ratings.
In summary, the findings are consistent across studies, showing little or no change in self-ratings of empathy that can convincingly be attributed to medical education training. The changes in mean empathy ratings ranged across studies from a 0.1-point increase in empathy (Study 6) to a 0.5-point decrease (Study 9), with an average of about a 0.2-point decline for the 11 studies. Even the largest decline (Study 9) was only from 5.8 (below 6 = “agree”) to 5.3 (above 5 = “somewhat agree”), showing little of practical or theoretical significance, especially given that response rates also declined from 96% to 59%. Also, mean ratings in medical school were similar to those in residency, showing no decline from medical school to residency. The results do not warrant the strong, disturbing conclusion that there is a serious decline in empathy due to medical education, with the implication that something must be done about it. The results do not show that medical education destroys humane qualities16 and causes a hardening of the heart,14 nor do they reveal a rot that must be stopped.17
The results illustrate the advantage of scoring based on means. The mean ratings are on the same scale rated by students and residents and provide a direct interpretation in terms of the labels of the anchors. The sums of ratings reflect both the ratings and the number of items, which magnifies differences between scores and makes differences appear more important than they are. For example, a one-point difference between a rating of 2 = “somewhat agree” and 3 = “agree” on the BEES scale becomes a 30-point difference between scores based on sums (60 versus 90).3 The use of mean ratings versus sums makes it clear that the empathy changes reported in the studies we reviewed are just a fraction of the distances between two neighboring anchors.
On the other hand, the small declines in mean empathy ratings reported in Table 1 (average decline of −0.2 points; range from +0.1 points to −0.5 points) might be thought to establish the weaker conclusion that empathy declined “just slightly” because of medical education. However, response bias is a very plausible alternative explanation of the small changes in mean ratings for the 11 studies. The weaker conclusion that empathy declined just slightly entails the assumption that the nonrespondents were a random or representative sample of all students or residents, in which case there would be no response bias. However, it is just as reasonable to argue the opposite: that the nonrespondents were burned out and less empathic (which would minimize the observed decline)—or that nonrespondents were generally more empathic and felt no need to repeat their positive responses on readministrations of the same empathy instrument (which would increase the observed decline).
In brief, the observed small decline does not establish the weaker conclusion that empathy declined just slightly. The rival explanations cannot be ruled out—the case is still open. Studies of empathy are needed in which researchers take steps to ensure that response rates are high (approaching 100%) and remain high throughout medical school and residency to address this threat to validity and provide a more convincing test of the weak effect hypothesis. But, as they are, the results of the studies do not constitute a cause for concern, nor do they warrant a call for action. Why, then, is the purported decline of empathy during medical training so readily accepted? Perhaps the answer lies in the subject of these studies as well as the medical educators who read them.
Empathy self-report instruments
Another concern about these studies is that they are based on instruments that assess student and resident self-reports of empathy. But empathy—like beauty—would seem to be in the eye of the beholder—the patient, not the caregiver. And yet, to our knowledge, none of these empathy instruments have been validated by determining empirically their relationship with patients' perceptions of the caregivers' empathy. What are needed are studies of the correlations of self-ratings on the self-report empathy instruments with real or standardized patients' assessments of caregiver empathy in actual or simulated clinical encounters.23 Patients' perspectives on empathy and its increase or decline seem critical to determining whether the confidence and trust needed for effective treatment and healing have been established. In keeping with the current evidence-based medical education initiative, at some point even correlations with the ultimate outcomes of effective treatment and healing must be demonstrated to document the predictive value of the empathy ratings for clinical practice. It seems reasonable that self-reports of empathy might decline for a number of reasons that have nothing to do with the patient's experience of empathic concern and effective clinical care. For example, trainees might confuse loss of empathy with feeling ill-prepared for their new responsibilities, or with guilt regarding a lack of compassion for patients who “brought on” their own illness by factors under their control, such as smoking or weight control, or by trading empathy for the need to get through the day's work in the time allotted. Whatever the reasons, the observed changes in mean self-ratings in medical school and residency are so small that they would not seem to be predictive of diminishing practice outcomes.
The fundamental problem here is that the self-report empathy instruments are basically self-assessments and suffer from the same problems as discussed extensively in recent critiques of self-assessment.24–27 As the authors of these critiques note, the problem is that “studies have consistently shown that the accuracy of self-assessment is poor”23 and that “the preponderance of evidence suggests that physicians have a limited ability to self-assess.”25 If students, residents, and physicians are unable to accurately self-assess knowledge and skills, as has been shown, it seems reasonable to question whether they can accurately assess something more indefinite—like empathy. Also, the self-report instruments treat empathy as a “trait” that is either present or absent rather than as a “state” that manifests itself in varying ways across encounters with different patients with different problems, backgrounds, and personalities.
A related question about the three empathy instruments used in the 11 studies reexamined here is whether they measure the same thing. Do they mean the same thing by “empathy?” Clearly, they differ in terms of their operational definitions which involve differences in their items, the wording of the items, the number of items, the rating scales, the anchors, etc. So, do these instruments measure the same construct? One study examined the correlations between scores on two of the three instruments—IRI-EC and JSPE—and reported r = 0.41 for n = 193 medical students and r = 0.40 for n = 41 residents.2 Not only do these correlations raise doubts about whether the instruments measure the same construct, they bring into question the validity of using either of these instruments to assess the change in empathy throughout medical education given the low to moderate interinstrument reliabilities.
Empathy is intuitively an important consideration in medical practice and the care of patients. The concept of empathy, however, is elusive, theoretically and operationally.28 Theoretically, empathy has been defined in part by metaphor and contrast. Metaphorically, for example, empathy has been described as “putting oneself in the other's shoes” or “standing in the patient's shoes.” Empathy is commonly contrasted with sympathy, whereby empathy is said to refer more to a cognitive understanding of a patient's situation and feelings, and sympathy is used to refer to a sharing and feeling of the patient's emotions. Operationally, statements or items are developed that express the theoretical meanings based on metaphor and contrast and that, in the case of the three instruments used in the 11 studies reviewed here, ask respondents to self-report their opinions about themselves on each item. But, as mentioned above, instruments could also be developed for patients or standardized patients—or even for expert observers trained to capture empathy as a contributor to clinical practice. It would not be surprising to find that self-ratings of empathy differ from real or standardized patient ratings and that both differ from ratings by a trained expert (especially given that self-ratings differ from instrument to instrument). One study examined the relationship between self-reported empathy on the JSPE administered before clerkships and residency directors' ratings of empathic behavior three years later and found a weak, nonsignificant difference (two tailed) in mean supervisor ratings (0.5 SDs) between extreme groups on the JSPE (>2.0 SDs difference).29
When it comes to training empathic physicians, medical educators might be better served by shifting their focus from the elusive empathy concept and concentrating more on good interpersonal behaviors to better achieve the sought-after psychosocial characteristic important to the doctor–patient relationship. That is, this aspect of the “hidden curriculum”30 may better serve medical education by being actively modeled by teachers. A recent intriguing article entitled “Etiquette-based medicine”31 suggests that “patients may care less about whether their doctors are reflective and empathic than whether they are respectful and attentive” and suggests that “pedagogically an argument could be made for etiquette-based medicine to take priority over compassion-based medicine.” This etiquette-based approach to the psychosocial aspect of the doctor–patient relationship would be “built on a base of good-manners” and would emphasize behavior over feelings and attitudes, because behavior is easier to change and may have “more immediate benefits.”31
The reputation of medical education research has been questioned for some time.32–35 Critics say the research “lacks methodological rigour” and call for randomization and greater control.34 However, given that research in medical education is primarily and necessarily quasi-experimental, we have argued that the problem is not the methodological flaws per se but, rather, the failure of researchers, in light of the flaws, to critically interpret the research and its results in reaching and stating their conclusions.21 In quasi-experimentation terms, threats to validity must be ruled out or used to qualify the conclusions. This seems to be the problem with the empathy decline studies. The problem is not the low or differential response rates per se but, rather, the researchers' failure to acknowledge those threats to validity in reaching a research conclusion. The problem is compounded here by a failure to critically examine the empathy measurements (sums of ratings, self-ratings, and small differences) to see exactly what the numbers mean and take this into account in interpreting research findings. Even without the possibility of response bias, are self-report measures of empathy indicative of the quality and effectiveness of the physician–patient interaction in a clinical encounter?
Summary and conclusions
Our reexamination of these studies revealed that their results do not warrant the strong, disturbing conclusion that empathy declines during medical education. At best, the results show a very weak decline in mean ratings—trivial for all practical purposes—only a 0.2-point decline on average in mean ratings. And even the weak decline is questionable because of the possibility of response bias across levels of training, which could plausibly account for such small effects. Moreover, the self-report instruments involve self-assessment, and it isn't clear what these instruments measure or whether they measure the same thing, or if what they measure is indicative of effectiveness of patient care. With apologies to Mark Twain,36 reports of the decline of empathy are greatly exaggerated.
An earlier version of this paper was presented at the Rush Medical University, Department of Medical Education Grand Rounds, Chicago, Illinois, February 12, 2009.
1 Davis M. A multidimensional approach to individual differences in empathy. JSAS Catalog of Selected Documents in Psychology. 1980;10:85.
2 Hojat M, Mangione S, Nasca TJ, et al. The Jefferson Scale of Physician Empathy: Development and preliminary psychometric data. Educ Psychol Meas. 2001;61:349–365.
3 Mehrabian A. Manual for the Balanced Emotional Empathy Scale (BEES). 1996 [unpublished; available from Albert Mehrabian, 1130 Alta Mesa Road, Monterey, CA 93940].
4 Bellini LM, Baime M, Shea JA. Variation of mood and empathy during internship. JAMA. 2002;287:3134–3146.
5 Bellini LM, Shea JA. Mood change and empathy decline persist during three years of internal medicine training. Acad Med. 2005;80:164–167.
6 Rosen IM, Gimotty PA, Shea JA, Bellini LM. Evolution of sleep quantity, sleep deprivation, mood disturbances, empathy, and burnout among interns. Acad Med. 2006;81:82–85.
7 Stratton TD, Saunders JA, Elam CL. Changes in medical students' emotional intelligence: An exploratory study. Teach Learn Med. 2008;20:279–284.
8 Hojat M, Mangione S, Nasca TJ, et al. An empirical study of decline in empathy in medical school. Med Educ. 2004;38:934–941.
9 Austin EJ, Evans P, Magnus B, O'Hanlon K. A preliminary study of empathy, emotional intelligence and examination performance in MBChB students. Med Educ. 2007;41:684–689.
10 Mangione S, Kane GC, Caruso JW, Gonnella JS, Nasca TJ, Hojat M. Assessment of empathy in different years of internal medicine training. Med Teach. 2002;24:370–373.
11 Sherman JJ, Cramer BS. Measurement of changes in empathy during dental school. J Dent Educ. 2005;69:338–345.
12 Chen D, Lew R, Hershman W, Orlander J. A cross-sectional measurement of medical student empathy. J Gen Intern Med. 2007;22:1434–1438.
13 Newton BW, Savidge MA, Barber L, et al. Differences in medical students' empathy. Acad Med. 2000;75:1215.
14 Newton BW, Barber L, Clardy J, Cleveland E, O'Sullivan P. Is there hardening of the heart during medical school? Acad Med. 2008;83:244–249.
15 Coulehan J, Williams PC. Vanquishing virtue: The impact of medical education. Acad Med. 2001;76:598–605.
16 Downie R. Towards more empathic medical students: A medical student hospitalization experience. Med Educ. 2002;36:504–505.
17 Spencer J. Decline in empathy in medical education: How can we stop the rot? Med Educ. 2004;38:916–918.
18 Shadish WR, Cook TD, Campbell DT. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin; 2002.
19 Colliver JA, Kucera K, Verhulst SJ. Meta-analysis of quasi-experimental research: Are systematic narrative reviews indicated? Med Educ. 2008;42:858–865.
20 Colliver JA, Markwell SV. Research on problem-based learning: The need for critical analysis of methods and findings. Med Educ. 2007;41:533–535.
21 Colliver JA, McGaghie WC. The reputation of medical education research: Quasi-experimentation and unresolved threats to validity. Teach Learn Med. 2008;20:101–103.
22 DiLalla LF, Hull SK, Dorsey JK. Effect of gender, age, and relevant course work on attitudes toward empathy, patient spirituality, and physician wellness. Teach Learn Med. 2004;16:165–170.
23 Colliver JA, Willis M, Robbs RS, Cohen DS, Swartz MH. Assessment of empathy in a standardized-patient examination. Teach Learn Med. 1998;10:8–11.
24 Ward M, Gruppen L, Regehr G. Measuring self-assessment: Current state of the art. Adv Health Sci Educ Theory Pract. 2002;7:63–80.
25 Colliver JA, Verhulst SJ, Barrows HS. Self-assessment in medical practice: A further concern about the conventional research paradigm. Teach Learn Med. 2005;17:200–201.
26 Eva KW, Regehr G. Self-assessment in the health professions: A reformulation and research agenda. Acad Med. 2005;80(10 suppl):S46–S54.
27 Davis DA, Mazmanian PE, Fordis M, Harrison RV, Thorpe KE, Perrier L. Accuracy of physician self-assessment compared with observed measures of competence: A systematic review. JAMA. 2006;296:1094–1102.
28 Stepien KA, Baernstein A. Educating for empathy. J Gen Intern Med. 2006;21:524–530.
29 Hojat M, Mangione S, Nasca TJ, Gonnella JS, Magee M. Empathy scores in medical school and ratings of empathetic behavior in residency training 3 years later. J Soc Psychol. 2005;145:663–672.
30 Hafferty FW, Franks R. The hidden curriculum, ethics teaching and the structure of medical education. Acad Med. 1994;69:861–871.
31 Kahn MW. Etiquette-based medicine. N Engl J Med. 2008;358:1988–1989.
32 Kaestle CF. The awful reputation of education research. Educ Res. 1993;22:23–31.
33 Lurie SJ. Raising the passing grade for studies of medical education. JAMA. 2003;290:1210–1212.
34 Todres M, Stephenson A, Jones R. Medical education remains the poor relation. BMJ. 2007;335:333–335.
35 Norman G. Editorial—How bad is medical education research anyway? Adv Health Sci Educ Theory Pract. 2007;12:1–5.
36 Twain M. The report of my death. In: DeVoto B, ed. Mark Twain in Eruption: Hitherto Unpublished Pages About Men and Events. New York, NY: Capricorn Books; 1968.