Cognitive errors made by doctors while diagnosing cases form a substantial part of preventable mistakes.1,2 Research has shown that cognitive errors are often a result of faulty reasoning rather than a lack of knowledge.1,2 Developing physicians’ ability to avoid pitfalls in clinical reasoning is crucial for patient safety, but this requires better understanding the sources of faulty reasoning processes.
Faulty reasoning has been frequently attributed to cognitive biases associated with nonanalytical reasoning.3,4 Physicians mostly generate diagnostic hypotheses based on recognition of similarities between a case at hand and scripts of diseases or examples of previous patients stored in memory.5–7 This “pattern recognition” is largely unconscious, effortless, and usually efficient. Yet it has been said to open the door for cognitive biases,3,4 such as the confirmation bias (i.e., a tendency to search for evidence that confirms rather than refutes initial hypotheses), or the availability bias, which leads physicians to overestimate the likelihood of a diagnosis when it comes to mind easily.8,9
Showing that errors may result from bias, however, does not help explain the underlying reasoning mechanisms. It explains what was wrong in physicians’ reasoning but not why they engaged in faulty reasoning despite having the knowledge to solve the problem. Thus, to increase our understanding of why such errors occur (and ultimately find ways to prevent them), the mechanisms underlying flawed clinical reasoning might be investigated in more detail.
A recent study conducting a post hoc analysis of errors made by physicians under experimental conditions suggested that one such potential mechanism might be the influence of “salient distracting features” (SDFs).10 Certain features in a patient’s history may be salient because they are strongly associated with a particular disease and, therefore, tend to catch physicians’ attention, triggering pattern recognition. When such features, despite being salient, are indeed unrelated to the current problem, the physician may generate an incorrect initial hypothesis based on that salient feature. For example, the information about a previous diagnosis of reflux esophagitis and use of omeprazole may become salient in a patient presenting with retrosternal pain but would turn out to be unrelated to the present problem when the other findings indicate that the patient has, in fact, acute viral pericarditis. Overcoming the salience of distracting features is probably easy for simple cases, but complex cases may require the physician to carefully analyze the entire set of findings to recognize other, less salient cues as relevant and to consider alternative diagnoses. When a SDF fully grabs the physician’s attention, he or she may be barred from reflecting on the less salient cues, which often seems to happen.1,4,11 This kind of error is said to be the result of “premature closure”: The physician terminates reasoning about a case prematurely because the automatic diagnostic response triggered by the SDF presents itself as so self-evident that it is difficult to escape from. Because the response to a SDF is automatic, a consequence important to the study we are to present here is that those who fall victim to premature closure would be expected to spend less time on a case, would indicate less effort, and would exhibit more confidence in their diagnosis, relative to those who would not be carried away by a SDF but approach the case analytically.
Our study experimentally investigated the role of such SDFs in simple cases (frequent, typical, nonambiguous) and complex cases (rare, atypical, ambiguous), by presenting residents in internal medicine with descriptions of patients that contained either no such features, cases that contained SDFs early in the description, or cases that contained SDFs late in the description. The latter condition was included because we assumed that under such circumstances no premature closure was to be expected and that, therefore, there would be no negative effect of SDFs on diagnostic accuracy. We hypothesized that the presence of SDFs would lead residents to make more diagnostic errors in the direction suggested by the SDF, spend less time on a case, and report more confidence in the decision and less effort in reaching it. In addition, we assumed that these effects would emerge with complex but not simple cases, because simple cases are less cognitively demanding, and physicians might therefore be less prone to distraction.
Internal medicine residents from the Erasmus Medical Centre, Erasmus University Rotterdam, were invited to participate in the study, which took place during a regular educational activity in March 2012. All 98 residents enrolled for the activity were invited for the study, and those who volunteered were tested on the same occasion. Participation was voluntary, without any financial compensation. We obtained oral consent from the volunteers, after explaining their tasks, because the nature of the study prevented prior disclosure of its objectives. Participants were debriefed later. The ethics committee of the Department of Psychology, Erasmus University Rotterdam, approved the study.
Material and procedure
We used 12 written clinical cases in the study (List 1). Each case consisted of a brief description of a patient’s medical history, signs and symptoms, and findings from physical examination and lab tests. The cases were prepared by two board-certified specialists in internal medicine who had worked together with one of us (S.M.) on previous studies with internal medicine residents.12,13 Both specialists had more than 15 years of clinical practice working in large teaching hospitals. They prepared the cases based on their real patients, and all cases had a confirmed diagnosis. Prior to the present study, the cases were evaluated by two other experts, including one of us (J.S.), to ensure that they were appropriate for the local context. Six cases were complex—that is, consisting of diseases that are not frequently seen or atypical presentations of diseases. The other six cases were simple, consisting of typical presentations of frequent diseases. All cases had been previously used in studies with internal medicine residents,9,12–14 and the distinction between complex and simple cases was supported by the mean diagnostic accuracy scores obtained in those studies.
We created three versions of each case: without a SDF, with a SDF presented at the beginning of the case, and with a SDF presented at the end of the case. To prepare the second and the third versions of the case, a sentence displaying a SDF—that is, a feature that is strongly associated with a particular disease but turns out to be unrelated to the present problem—was added to the original version of the case, either relatively early or towards the end of the description. In other words, the cases were the same in all versions, presenting exactly the same findings, except for the presence and location of a SDF. (See Appendix 1 for an example.)
In a within-subjects design, each participant diagnosed the 12 cases in the three different formats: 2 simple cases without SDF, 2 simple cases with SDF in the beginning, 2 simple cases with SDF at the end, 2 complex cases without SDF, 2 complex cases with SDF in the beginning, and 2 complex cases with SDF in the end. Which cases were diagnosed in which format was counterbalanced across participants (to ensure that any effects of SDF were not an artifact of a particular case). To avoid sequence effects, we presented the cases in random order in a booklet, and participants were instructed to read the case and write down the most likely diagnosis for the case, as accurately and quickly as possible. Twenty minutes were allocated for the diagnosis of the cases, an amount of time that proved to be sufficient to solve the cases in previous studies.9,12 A digital clock was visible in the room, and participants were requested to write down the time immediately before they started reading the case and after they made the diagnosis for each case. No information about the correctness of their diagnoses was provided to participants.
After diagnosing the 12 cases, participants were presented the cases again, one by one, and were requested to assess, for each case, how confident they were with their diagnosis and how much effort they had invested to diagnose the case (the latter being a measure of experienced cognitive load).15 Their assessments were to be provided as percentages ranging from 0 to 100.
We first classified the diagnoses provided by the participants according to their accuracy. We used the confirmed diagnosis of each case to evaluate the diagnoses given by the participants as fully correct (1), partially correct (0.5), or incorrect (0). A diagnosis was considered correct whenever the core diagnosis was cited. When the diagnosis was not the correct one but could be considered a constituent of the diagnosis, we judged it as partially incorrect. Finally, responses that did not fall into one of these categories were classified as incorrect. For example, on the case of acute viral pericarditis, “pericarditis” was judged as correct, “pleuritis” as partially incorrect, and “esophageal reflux” as incorrect. We scored the participants’ responses according to a scoring grid that was prepared by experts in internal medicine, based on the scoring of diagnoses that were provided by participants on the same cases in previous studies.9,12,14 In those previous studies, the two experts showed high interrater agreement (84% or higher), and in cases of discrepancies the scoring was discussed and agreed on. The scoring grid that we created for this study listed all the diagnoses that had been given by participants in the previous studies for each case, along with the score that was agreed on by the two raters in the prior studies. Because the diagnoses provided by participants in the present study were the same that had been given in previous studies (or differed only slightly), scoring based on the grid was a straightforward procedure, conducted by one author (S.M.). Nevertheless, as an additional check, another coauthor (K.B.) independently and blindly scored the responses from the present study and agreed with the scoring grid (and the first author) on 94% of the scores.
In a subsequent analysis, we classified the incorrect diagnoses as either associated with the SDF (i.e., the disease to which the SDF was strongly associated) or not associated with the SDF (i.e., other errors not likely to result from the SDF). Note that versions of the cases with SDF were created by adding to the neutral version a feature strongly associated with a particular disease (e.g., tuberculosis on the case presented in Appendix 1). There was, therefore, an a priori definition of the diagnosis that would be considered associated with the SDF in each case, and participants’ responses were classified according to this a priori definition.
For each participant and for each condition of the experiment, diagnostic accuracy was expressed as the sum of scores. On the basis of this score, we computed the proportion of correct diagnoses produced for each type of case. To test our hypothesis that diagnostic accuracy would be affected by SDFs in the beginning, we performed a repeated-measures analysis of variance (ANOVA) with “presence of SDF” (without SDF, with SDF in the beginning, or with SDF near the end) and “case complexity” (simple cases or complex cases) as within-subjects factors on the proportions of correct diagnoses. To the extent possible, parametric results were also evaluated using nonparametric methods and provided confirmatory results. Post hoc paired t tests were performed to compare the proportion of correct diagnoses made for simple and complex cases in each SDF condition.
We conducted a second analysis into the nature of the errors made: To which extent were these errors SDF related? For each participant, the number of SDF-related incorrect diagnoses was summed for both simple and complex cases and for the three conditions of the experiment. We then computed the proportion of SDF-related mistakes as a function of the total number of errors. The hypothesis that the errors were significantly associated with the SDF was tested by performing a second repeated-measures ANOVA, with “presence of SDF” and “case complexity” as within-subjects factors, on the proportions of incorrect SDF-related diagnoses.
We analyzed the average time spent in diagnosing the cases, as well as participants’ confidence and effort judgments, using a repeated-measures ANOVA with “presence of SDF” and “case complexity” as within-subjects factors.
For all comparisons, significance level was set at P < .05 (two-tailed). PASW 18.0 (SPSS Inc, Chicago, Illinois) for Mac was used for the statistical analyses.
Seventy-two internal medicine residents (mean [SD] age, 29.2 [2.6] years, 49 female) participated in the study. Half of the residents were in the first year of their training, and the other half were in the second year. Because no difference between the performances of the two subgroups was found, we grouped them together for the analyses.
Table 1 presents the proportions of correct diagnoses produced for simple and complex cases under the three conditions of the experiment. The repeated-measures analysis showed, as expected, a significant main effect of case complexity (F[1, 71] = 84.21, P < .001, ηp 2 = 0.54), with participants’ diagnostic performance being lower on complex than on simple cases. It also showed a significant main effect of SDF (F[2, 142] = 8.19, P < .001, ηp 2 = 0.10) and a significant interaction effect between SDF and case complexity (F[2, 142] = 6.56, P = .002, ηp 2 = 0.09). Follow-up t tests to clarify this interaction showed that for the simple cases, diagnostic accuracy did not differ significantly between the different levels of the experiment. For complex cases, however, diagnostic accuracy was significantly lower when participants encountered a SDF early in the case description than when a SDF was not present (P < .001). The decrease in performance was substantial: Participants produced around 58% fewer accurate diagnoses when they encountered a SDF early in a complex case. When the SDF was present at the end of complex cases, however, diagnostic performance was not negatively affected; there was no significant difference between diagnostic accuracy for cases with a SDF at the end and cases without a SDF (P = .13). Diagnostic accuracy for cases with a SDF in the beginning was also significantly lower than when a SDF was present at the end of the case (P < .001).
To check the extent to which the decrease in diagnostic accuracy was indeed caused by the SDFs, we computed the proportions of SDF-related errors as a function of the total number of errors (see Table 2). There was again a main effect of case complexity (F[1, 71] = 5.80, P = .02, ηp 2 = 0.08), with more SDF-related errors being made on complex cases. And, again, there was a significant main effect of SDF (F[2, 142] = 13.49, P < .001, ηp 2 = 0.16) as well as a significant interaction effect (F[2, 142] = 3.97, P = .02, ηp 2 = 0.05). Follow-up t tests to clarify this interaction showed that, on complex cases, participants provided significantly more incorrect diagnoses associated with the SDF when those features came early in the case than when they were not present (P = .002) or when they came near the end of the case (P = .01). On simple cases, however, the presence of a SDF either in the beginning or by the end of the case both led to higher proportions of diagnoses associated with the SDF than when the SDFs were absent (both P < .001). It is also interesting to note that even when SDFs are not present, participants made a considerable number of SDF-related mistakes.
There was a significant main effect of case complexity on time spent in diagnosing the cases (F[1, 71] = 63.59, P < .001, ηp 2 = 0.47) and a significant interaction effect between SDF and case complexity on time spent in diagnosing (F[2, 142] = 4.71, P = .011, ηp 2 = 0.06) (Table 3). Participants spent more time diagnosing the cases with SDF in the beginning than cases with no SDF (P < .01) or with the SDF at the end (P = .01). Regarding the analyses of participants’ judgments about the cases, there was a significant main effect of case complexity on participants’ confidence in their diagnosis (F[1, 71] = 61.77, P < .001, ηp 2 = 0.47) and on effort invested to diagnose the case (F[1, 71] = 71.30, P < .001, ηp 2 = 0.50). Participants had less confidence in the accuracy of their diagnoses and reported having invested more effort in complex than in simple cases. There was no significant main effect of SDF, nor was there a significant interaction effect.
Our findings suggest that SDFs indeed can decrease physicians’ diagnostic performance, at least if cases are complex and if the SDFs are encountered early in the case description. That SDFs have a special role to play in the production of diagnostic errors is particularly apparent in the findings reported in Table 2. When SDFs are found in the beginning of a case, a significantly greater proportion of errors can be directly attributed to their presence. The adverse effect of a SDF encountered in the beginning of a case was substantial. Compare effects of the two conditions in which SDFs were presented. Although under these conditions the cases were exactly the same, except for the location of the SDF, the number of accurate diagnoses produced when the SDF was present in the beginning of the case was half that of those produced when the SDF was at the end of the case. These findings are partially in line with findings from a study by Eva and Cunnington.16 They also found a primacy effect: Information presented early in a case has a stronger influence on the diagnosis than information presented later in the case. Because the purpose of their study was to study order effects on diagnosis per se, they demonstrated bias as a result of primacy. The particular contribution of our study is that we have shown that such bias can lead to mistakes. Even if information included in a case is in fact unrelated to the present patient’s problem (see the example in Appendix 1), it can cause considerable error.
The question now is, What mechanism can be held responsible for our findings? Superficially, we seem to have demonstrated that the presence of SDFs early in a case can lead to premature closure and, therefore, to mistakes. Such interpretation finds support in studies of human perception. These studies have shown that attention tends to be directed, without the observer’s awareness, to certain features of a task either because they are perceptually salient (e.g., in terms of color or contrast) or because they have become salient to observers on the basis of their experience with similar problems.17–19 Similarly, it is likely that some features in a case tend to become salient for a physician who has had experiences with patients with similar histories or when the clinical significance of a particular finding is known. When salient features that strongly point towards a particular illness script are encountered early in a case, that script may become highly activated in working memory, blocking access to scripts pointing in other directions. This is what premature closure is about. Note, however, that premature closure is considered an automatic process that, as we outlined earlier, takes little time and is largely effortless. Our prediction was that if SDFs lead to premature closure, less time and effort would be spent on a case, but confidence in the diagnosis would be higher. However, we found that participants spent more time on cases with early SDFs, seemed to invest more effort, and were less confident (although the latter two were only marginally significant). These findings suggest that cases with early SDFs do not induce less analytical (or automatic) processing but, on the contrary, provoke more analytical processing. How can we reconcile this finding with the idea of premature closure?
Keep in mind that although 28% of the mistakes were produced by the presence of early SDFs, this implies that 72% were mistakes unrelated to SDFs. We therefore tentatively propose a twofold model of the influence of SDF on diagnostic reasoning. For some doctors, early SDF may lead to premature closure (with associated automaticity and reduced time on task), but for other physicians, who are less susceptible to premature closure, early SDF may lead to ambiguity and confusion, with longer processing times as a consequence. Because we were not able to distinguish these two groups (and in fact assume that it is the doctor–case interaction that produces premature closure rather than a stable characteristic of a particular group of doctors), we could not test this proposition directly. Future research should clarify this issue.
In training environments and in everyday clinical practice, SDFs may be easily generated, for example, by patients’ comorbidities or previous medical history, and physicians might frequently need to overcome the influence of SDFs. Whether they can be taught to do that is still to be determined. Nevertheless, continuing education programs for practicing physicians and clinical teachers who want to make their students and residents more able to minimize the potentially adverse influence of salient features might consider teaching techniques to counteract automaticity, such as deliberate reflection on cases that are perceived to be complex or to contain possible SDFs. Deliberate reflection on to-be-solved cases has been shown to help improve diagnostic performance and counteract bias.9,13,15 Other tools, such as the use of checklists20 and cognitive forcing strategies,21 have been proposed to prevent excessive reliance on intuitive judgments and may also be helpful, but these tools still need to be tested.21–23
The limited experience of the participants may restrict generalization of findings to older physicians who may be not so prone to be distracted by salient features. Expertise, a factor not addressed in the present study, may make many cases easier for more experienced doctors, which might reduce the influence of SDFs. Nevertheless, there will always be nonstraightforward cases, even for experienced doctors. Moreover, experienced doctors are known to rely more on pattern recognition and to have more difficulties restructuring initial reasoning.16,24 For example, in the aforementioned study by Eva and Cunnington,16 older doctors were more subject to the primacy effect than younger ones. It may, therefore, even be that the negative influence of SDF among more experienced doctors is more likely and potentially more serious. Females were overrepresented in our sample, which restricts generalization for male doctors, though we are unaware of studies showing gender differences in proneness to bias in diagnostic reasoning. This study was conducted under laboratory conditions and used written cases, though based on real patients; it could be argued that real clinical problems are likely to provide physicians with more cues that may help in correcting initially wrong judgments. On the other hand, in clinical settings there are also other conditions present (e.g., time pressure) that might exacerbate the effect we found. Finally, in our study, participants were requested to provide the most likely diagnosis for the case, which involves considering possible alternatives but choosing a single diagnosis. In real clinical practice, doctors can postpone a decision while searching for more information. A wrong initial diagnosis, therefore, does not equate to an error, because it could be corrected later on. However, the studies on diagnostic errors have shown that this repair, in fact, does not occur so frequently as it should. Indeed, faulty verification of an initial diagnosis has been shown to be the major cause of diagnostic errors.1
Besides exploring whether expertise affects susceptibility to SDFs by including experienced physicians as participants, future studies should investigate the mechanisms underlying the effect of SDFs—for example, by using eye tracking to study physicians’ attention allocation to SDF and other features while reading the cases. Future research should also test the effectiveness of different strategies for overcoming the adverse influence of SDFs (e.g., deliberate reflection or cognitive forcing strategies).
This study has shed some light on the mechanisms underlying faulty reasoning and diagnostic errors, suggesting that flaws may derive from difficulties in overcoming the influence of features that are salient but unrelated to the problem encountered early in the case. Studies on such mechanisms may contribute to our understanding of diagnostic reasoning and ultimately to the design of approaches to help physicians avoid flaws in clinical reasoning.