Clinicians sometimes make mistakes while diagnosing a patient. These mistakes often result from flawed reasoning, faulty data gathering, or poor interpretation.1,2 Indeed, a recent study found that such cognitive factors contributed to 74% of the diagnostic errors in internal medicine.3
Because cognitive errors emerge from inadequate reasoning processes, their prevention requires a better understanding of the mental processes through which clinicians arrive at diagnoses. Research has shown that one of two main modes of reasoning—nonanalytical and reflective—tends to underlie diagnostic decisions.4,5 Nonanalytical reasoning is based on the more or less immediate recognition of similarities between the case under review and “illness scripts” (i.e., cognitive structures containing clinically relevant information about a disease) or even actual patients previously seen, that are retrieved from memory. Largely beyond conscious control, such pattern recognition characterizes expert physicians' reasoning in routine situations.5,6 At the opposite pole of the reasoning spectrum, reflective reasoning comprises effortful, conscious analysis of features exhibited by a case.4,7
The nonanalytical mode of reasoning allows practitioners to efficiently solve most of the problems encountered in professional practice. However, it may introduce biases in diagnostic reasoning, thereby provoking errors.8,9 This seems to occur particularly with unusual or complex problems, which would require from physicians a more reflective approach.2,4,7
Potential benefits of this turn to reflection in troublesome situations have been reaffirmed by studies on reflective practice in medicine. Recent studies have indicated that reflective practice in medicine has a multidimensional structure that comprises the willingness and ability to thoroughly explore the problem at hand while simultaneously examining one's own reasoning. When engaged in reflection for solving a case, physicians tend to more carefully consider case findings, search for alternative diagnoses, and examine their own thinking.10,11 A recent study indicated that reflective reasoning positively affected diagnoses of complex cases, whereas it made no difference in the diagnosis of simple cases.12 Diagnostic decisions would, therefore, benefit from adjusting reasoning approaches to the demands of the situation. However, as experienced physicians tend to reason in a highly automatic way, how would they recognize when a problem requires further reflection? Empirical studies have shown that physicians sometimes shift to reflective approaches,11,13,14 but conditions that actually trigger reflection are not known. In addition, little is known about how physicians process cases while in a reflective mode.
Overview of the Two Experiments
In this report, we describe two combined experiments aimed at studying contextual factors leading physicians to shift from automatic to reflective approaches, and the effects of reflective reasoning on diagnostic accuracy. Regarding the first issue, we investigated whether physicians who were diagnosing cases in a context that they perceived as problematic would be more likely to use reflective reasoning than would physicians diagnosing the same cases in a nonproblematic context. In both experiments, we induced the perception of cases as being problematic by informing the participants, under one experimental condition, that other physicians had failed to accurately diagnose the cases they would see. Under the other experimental condition, participants did not receive this information, again in both experiments.
In Experiment 1, we used a diagnosis task (medical residents were asked to diagnose a case) and a decision task (they were subsequently required to decide as quickly as possible whether or not concepts presented on a computer screen were related to the case). The speed of decisions (explained below) was considered an indication of the extent to which the participants had processed the case analytically. When reflectively diagnosing a case, physicians will more extensively process the findings presented in the case and consider possible explanations and alternative hypotheses. Clinical findings, causal mechanisms, and plausible diagnoses will therefore become part of the physicians' mental representation of the case much more extensively than when they arrived at a diagnosis through pattern recognition. Consequently, these concepts will be more easily accessible in memory after reflection. If the physicians subsequently encounter these concepts in a decision task for judging whether the concept was “related” or “nonrelated” to the case, they will be able to decide more quickly15 than if they had diagnosed the case through automatic reasoning. Therefore, shorter response times indicate that the particular finding, mechanism, or diagnosis had been reflected on earlier.
All residents diagnosed the same cases, half of them within a “problematic context” and half in a “nonproblematic context.” Cases were selected from the most difficult ones used in a previous study.14 We hypothesized that a “problematic context” would trigger reflection, provoking more careful analysis of presenting features, causal mechanisms, and alternative hypotheses.11,16 This would make these concepts more easily accessible in memory, thereby speeding up decisions within the “problematic context.” Because the cases were complex, we predicted that reflection triggered by the problematic context would lead to higher diagnostic accuracy.
In Experiment 2, conducted to verify the findings of Experiment 1, residents diagnosed cases within a “problematic” and a “nonproblematic” context while thinking aloud. If problems perceived as difficult do, in fact, trigger reflection, we expected participants to spend more time in diagnosis within the problematic context. Furthermore, as participants tend to more extensively analyze signs and symptoms, possible explanations, and alternative diagnoses while reflectively processing the cases, protocols of cases solved in the problematic context would contain more clinical findings and more inferences based on these findings than those obtained in the nonproblematic context.
Method of Experiment 1
This study was an experiment, with all participants performing under both experimental conditions (i.e., the problematic and the nonproblematic contexts). The independent variable was the context within which the case was diagnosed (problematic context versus nonproblematic context). The dependent variables were diagnostic accuracy, response times, number of errors, and participants' rating of confidence in their diagnoses, degree of reflection required during diagnosis, case complexity, and frequency of having seen similar cases.
Ethical approval to carry out both experiments was provided by the Committee for Ethics on Research of the School of Public Health of the State of Ceará, accredited by the Brazilian National System of Ethics on Research.
Participants and setting
The participants were 20 second-year internal medicine residents from three teaching tertiary hospitals in the Brazilian State of Ceará (12 men, 8 women; mean age = 27.10, SD = 1.77). All 24 residents from the three teaching hospitals were invited to voluntarily participate in the study, and informed consent was obtained from those who volunteered.
Material and procedure
The material consisted of 10 clinical cases with the following diagnoses:
- Acute bacterial endocarditis
- Inflammatory bowel disease
- Acute viral hepatitis
- Acute bacterial meningitis
- Deficiency of vitamin B12
- Addison disease
- Celiac disease
- Acute myeloid leukemia
- Acute appendicitis
Case descriptions contained contextual information, complaints, findings from physical examinations, and test results, as in the following example of a case of acute bacterial endocarditis.
A 27-year-old man presents with fever, shaking chills, cough, arthralgia, and headache of four days' duration. The patient denies nausea, vomiting, diarrhea, or dysuria. He smokes 20 cigarettes a day, drinks five bottles of beer daily, and has a recent history of illicit drug use (intravenous cocaine). He is HIV soropositive. There is no history of previous hospital admissions or surgical procedures. Family history: His mother was treated for pulmonary tuberculosis five years ago.
Temperature: 39°C; blood pressure: 120/80; pulse: 114; respirations: 18. Eyes and oral cavity without abnormalities. Lung examination: Bilateral ronchi, rales at the left lung basis. Heart examination: Regular rhythm, systolic murmur (2+/6) at the left sternal border, with no irradiation. Abdominal examination: Normal. There was no edema. Neurological examination: No abnormal signs. Ophthalmoscopy: Without abnormalities.
White blood cell count = 18.000/mm3 (85% segs, 10% bands, 5% lymphs)
Hematocrit = 38%
Platelet count = 170.000/mm3
PPD = 12 mm
Urinalysis = 10 red blood cells, protein (+/4)
Chest X-ray: Bilateral nodular infiltrates
Cases were selected from a set of cases used in a previous study with internal medicine residents.14 Because reflection is not expected in response to routine problems, only those cases that were shown to be difficult in the previous study (i.e., those cases in which the diagnostic score was below the mean) were selected.
All participants diagnosed the same 10 cases. First, five cases were diagnosed without additional information. After that, participants were informed that the five subsequent cases had been previously seen by experienced physicians, who failed to diagnose them accurately. Presentation order of the cases was randomized for each participant in such a manner that each case was diagnosed equally often in each experimental condition. To ensure familiarity with the procedure, participants first diagnosed two sample cases.
Before the experiment, concepts to be judged in each case were generated by asking eight recent graduates from the residency to diagnose the cases while thinking aloud. Concepts used by at least 70% of these physicians were selected to be presented to participants. This procedure was required to ensure that the concepts that were used were, at least in principle, consistent with the participants' knowledge.15
Three types of concepts were presented: findings literally stated in case descriptions (e.g., signs and symptoms), inferences, and filler terms unrelated to the case. Inferences are not presented in the case description but are generated on the basis of information existing in the description. See List 1 for examples of concepts used.
Participants were tested individually. Cases were presented to them one at a time. Participants were asked to study the case for three minutes and provide a diagnosis. Subsequently, they had to decide whether a concept presented on a computer screen was related to the case. A concept was considered related to the case when (a) it was literally stated in the case description or (b) it was not literally stated in the description but expressed inferences about, for instance, pathophysiological mechanisms or alternative diagnoses that could reasonably be made based on the case description.15 Participants were asked to make their decisions as quickly and accurately as possible, by pressing the key ‘?/' for Yes and the key ‘Z' for No. The computer registered the response times and the correctness of responses automatically. Twenty concepts were presented for each case: 5 that were literally stated findings, 5 that were inferences, and 10 fillers that were terms unrelated to the case. Presentation order of concepts for each case was randomized for each participant.
At the end of the experiment, participants saw each case again and were asked to state in percentages (1) how confident they were about each diagnosis, (2) the extent to which they reflected while diagnosing it, (3) how complex they considered the case (for example, 100% would have meant “extremely complex”), and (4) how often they had seen a similar case among the cases they had encountered in the last three months.
Diagnostic accuracy was independently rated by two experts in internal medicine (J.C.P. and J.M.C.F.) on a scale ranging from zero (completely incorrect) to four (completely correct). Raters were not aware of the condition in which the case had been solved. The interrater agreement was 88%. Remaining differences were resolved by discussion. For each participant, data from cases diagnosed in each experimental condition were collapsed. Means for diagnostic accuracy in the two conditions were calculated, and a paired t test was performed. For each condition, mean response time and number of errors for each concept type were calculated. These data were analyzed using repeated-measures analysis of variance with context (problematic versus nonproblematic) and concept type (literally stated findings versus inferences) as within-subject factors. Effect size was calculated for main effects and interactions. Post hoc paired t tests were performed for comparisons across conditions. Finally, mean scores of the participant's confidence in his or her own diagnosis, extent of reflection, case complexity, and frequency of seeing similar cases were calculated for each condition. Paired t tests were performed to analyze these data. Significance level was set as P < 0.05 for all tests. Data were analyzed using SPSS for Windows 13.1.
Results of Experiment 1
Accuracy of diagnosis was significantly higher when participants diagnosed cases within the problematic context (mean = 2.78; SD = 0.72) than in the nonproblematic context (mean = 2.20; SD = 0.74), as indicated by the paired t test (t = 2.76, P < .01).
Table 1 presents participants' mean response times to decide on the relatedness of two types of concepts to the cases. Participants were faster in judging both concept types after they diagnosed cases within a problematic context. Analysis of variance showed a significant main effect of context (F[1,19] = 6.42, P < .05, partial η2 = 0.252) and a significant main effect of concept type (F[1,19] = 10.33, P < .01, partial η2 = 0.352). There was no significant interaction effect (F[1,19] = 0.86, P = .37, partial η2 = 0.043).
There was no significant main effect of context (F[1,19] = 1.28, P = .27) and concept type (F[1,19] = 1.93, P = .18) on number of errors, and there was no significant interaction effect (F[1,19] = 0.02, P = .88).
Table 2 shows the participants' mean ratings for the four different types of evaluations of the cases. Paired t tests showed significant differences between the two conditions for case complexity (t = 3.00, P < .01), extent of reflection (t = 2.23, P < .05), and frequency of encountering similar cases (t = 4.61, P < .05).
Discussion of Experiment 1
Experiment 1 aimed at exploring the influence of contextual factors on diagnostic reasoning and types of knowledge used when physicians process cases in a reflective mode. On the basis of previous studies on medical expertise4,13 and reflective practice,11 it was hypothesized that physicians would shift from predominantly automatic to mainly reflective reasoning when perceiving cases as problematic. Reflection would entail further analysis of presenting clinical features and consideration of additional hypotheses, leading to more extensive processing of the findings in the case and more inferences. Because cases were quite complex for the participants involved in the study, reflection was expected to improve diagnostic performance.
Our findings are consistent with these predictions. All participants solved the same cases, but half of them after being informed about previous diagnostic failures. Under this condition, diagnostic accuracy was significantly higher. That is, the same cases, when perceived as more problematic, were diagnosed more accurately. This may be explained by differences in participants' modes of reasoning between the two conditions. A set of findings suggests a shift toward reflection within the problematic context. Participants were faster in judging both literally stated findings and inferences under this condition. The less time spent for deciding whether terms expressing signs and symptoms, causal mechanisms, and diagnostic hypotheses were related to the previously diagnosed case indicates that these concepts were more extensively used while residents were diagnosing the case. Under the problematic condition, therefore, residents apparently consciously analyzed to some extent clinical features, pathophysiological processes, and alternative hypotheses instead of diagnosing cases predominantly by automatic reasoning. Furthermore, participants evaluated cases differently in that the same cases were considered to be more complex, to have required more reflection, and to be less familiar when they were solved within the problematic context.
Although not at variance with earlier findings,11,12 our conclusions about the role of reflection in explaining the findings can only be tentative. Because these deductions about the participants' reasoning processes were inferred from performance on the decision task, alternative explanations cannot be excluded. For instance, rather than being generated by different reasoning strategies, differences in response times could have emerged from events occurring only after the diagnosis was reached. Some of the concepts provided in the decision task could have led participants to consider alternative hypotheses, which would, in turn, have speeded up subsequent decisions. To verify whether the findings of Experiment 1 indeed could be attributed to differences in diagnostic reasoning, a second study, Experiment 2, was conducted in which participants were asked to think aloud while diagnosing cases. This methodological approach is generally considered an appropriate way to check whether hypothesized processes actually occur.17
Method of Experiment 2
In the second experiment, the independent variable was the context within which the case was diagnosed (problematic context versus nonproblematic context). Dependent variables were diagnostic accuracy, time needed to diagnose the case, and number of propositions in think-aloud protocols. A proposition corresponds to a meaningful idea unit in the text and consists of two concepts linked by a qualifier, such as causation, specification, temporal information, or location.18 For example, the fragment “the complaints started four days ago” is considered one proposition because it consists of a word expressing temporal information (“started”) linking the concepts of “complaints” and “four days ago.” All participants performed under both experimental conditions.
Participants and setting
The participants were 18 second-year internal medicine residents (11 men, 7 women; mean age = 26.89, SD = 1.60) from the same settings as Experiment 1, recruited by a similar process.
Material and procedure
Two clinical cases (stomach carcinoma and pseudomembranous colitis) were used. They were selected from the same source of cases used for Experiment 1.
Participants were tested individually, and they diagnosed both cases. They were instructed to think aloud while studying a case and providing a diagnosis. They received the first case without additional information. Before receiving the second case, participants were informed that it previously had been incorrectly diagnosed by an experienced physician. The order in which the two cases were presented was counterbalanced for each participant. The time to diagnose the case was calculated from the moment at which participants started to read the case until the diagnosis was stated. The sessions were tape recorded and transcribed.
Diagnostic accuracy was rated by the same procedure used in Experiment 1. Interrater agreement was 90%. Disagreements were resolved by discussion. The think-aloud protocols were analyzed by means of propositional analysis.18 Propositions in each protocol were counted and, by matching them against the propositions in the case description, classified in three categories: literal (or paraphrased) propositions, low-level inferences, or high-level inferences. Low-level inferences are based on only one proposition in the text. High-level inferences result from a combination of propositions in the case description and consist of references to causal mechanisms or plausible diagnoses. For example, for the previously presented case of endocarditis, fever would be a low-level inference (based on one finding in the case), and bacterial infection would be a high-level inference (based on more than one finding).
Two of the investigators (S.M. and R.M.J.P.R.) independently scored a subset of protocols. An interrater agreement of 88% was found. As the procedure was shown to be reliable, the remaining protocols were analyzed by a single judge.
Data from cases solved within each condition were collapsed. Mean diagnostic accuracy and processing time in each condition were calculated and were analyzed using paired t tests. A repeated-measures analysis of variance was performed with context (problematic; nonproblematic) and proposition category (literal proposition; low-level inference; high-level inference) as within-subject factors. Effect size was calculated for main effects and interaction. Post hoc paired t tests were performed for comparisons across conditions. Significance level was set as P < .05 for all tests. Data were analyzed using SPSS for Windows 13.1.
Results of Experiment 2
Diagnostic accuracy for cases diagnosed within the problematic context was higher (mean = 2.89; SD = 1.64) than in the nonproblematic context (mean = 2.67; SD = 1.75), but this difference was not significant (t = 0.345, P = .73).
Time needed to diagnose cases within the problematic context (mean = 198.22; SD = 84.54) was significantly higher (t = 2.57, P < .05) than in the nonproblematic context (mean = 164.17; SD = 44.10).
Protocols of cases diagnosed within the problematic context contained more literal findings and high-level inferences (Table 3). Analysis of variance showed a main effect of context (F[1,17] = 8.26, P < .05, partial η2 = 0.327) and a main effect of proposition category (F[1,17] = 376.03, P < .001, partial η2 = 0.957). A significant interaction effect was also found (F[1,17] = 12.40, P < .001, partial η2 = 0.422). Post hoc t tests showed significant differences between means of the number of literal propositions (t = 3.57, P < .01) and the number of high-level inferences generated (t = 2.31, P < .05) while diagnosing the cases in the two conditions.
Discussion of Experiment 2
In Experiment 2, participants diagnosed cases (within a problematic and a nonproblematic context) while thinking aloud. Cases perceived as problematic were expected to trigger reflection, which would lead residents to spend more time on diagnosis, to analyze the clinical features more extensively, and to consider more alternative hypotheses. The results support these predictions: participants spent more time on diagnosis and produced protocols with more literal propositions and high-level inferences when diagnosing the same cases within a problematic context. These findings in the protocols indicate that participants, respectively, analyzed more extensively signs and symptoms, and generated more pathophysiological explanations and diagnostic hypotheses. These results are coherent with the findings of Experiment 1. Differences in diagnostic accuracy were not significant, possibly because of the small number of cases used.
Discussion of Both Experiments
The findings of the two experiments that we conducted strongly suggest that physicians shift to reflective diagnostic reasoning when they perceive the case to be solved as problematic irrespective of whether the particular case is really problematic or not. When the study participants engaged in reflection, they analyzed the presenting clinical features more thoroughly, thought more about possible underlying mechanisms, and considered more alternative hypotheses. In addition, reflective reasoning was shown to improve diagnostic performance in Experiment 1.
The results illustrate the value of reflective reasoning for diagnosing difficult problems. Both experiments used cases that internal medicine residents had difficulties diagnosing in a previous study.14 Therefore, the cases may be considered complex to participants of a similar level of expertise. A recent study indicated that reflection improved diagnoses of complex problems, whereas it did not affect diagnoses of simple cases.12 The present findings provide additional support for this positive effect of reflection. Whereas nonanalytical reasoning allows clinicians to efficiently solve routine problems, unusual or complex problems would be more accurately diagnosed by reflective reasoning. Clinical judgments would, therefore, benefit from adapting reasoning strategies to situational demands, moving toward more reflective approaches as the level of uncertainty or complexity of the task increases.7 This conclusion deserves attention, because other studies have indicated that physicians differ in their ability to make the shift to reflective reasoning.19,20
Previous studies have suggested that case characteristics, such as complexity, may lead physicians to reflective reasoning.14 The present studies indicate that even the subjective perception of complexity alone may trigger reflection. Information on previous diagnostic failures by others apparently generated a feeling that the cases had something unusual or disconcerting about them, compelling participants to process them in a different mode. In line with the literature on reflective practice,16 reflection may be provoked by a difficulty not entirely definite but sufficient to generate a sense of uneasiness.
The experiments also shed some light on how physicians process cases through reflective reasoning. Literally stated findings and inferences were more prominent in a resident's case representation if the case was seen as problematic as opposed to the nonproblematic context. This suggests that clinical features, knowledge of causal mechanisms, and plausible diagnoses play a different role in nonanalytical and reflective reasoning. Whereas nonanalytical reasoning relies mostly on pattern recognition,5,6 reflective reasoning apparently involves more extensive consideration of signs and symptoms in order to generate hypotheses and/or to check their predictions against presenting features. Reflection also involved analysis of pathophysiological processes, probably to explain manifestations, establish causal connections and relationships between features,21 and an effort to search more thoroughly for alternative hypotheses besides the ones initially considered for the case.
The findings also support the notion that physicians cannot be seen as neutral observers who objectively identify and interpret features in clinical problems.22 It was quite impressive to observe the extent to which physicians' evaluations of a case, including its complexity and the frequency with which it had been encountered, varied depending on whether the participant had been informed that colleagues had missed the diagnosis. A diversity of factors apparently influences clinical judgments, potentially generating faulty reasoning. Our results suggest that more emphasis should be given, in medical education, to the development of the ability to flexibly combine nonanalytical and reflective approaches. This might include acquiring knowledge on clinical reasoning strategies and their consequences, enhancing awareness of factors possibly influencing one's own judgments, and developing the ability to critically appraise one's own thinking and decisions.2,9,10 The present studies point at contextual factors that may trigger reflection and at some processes that constitute reflective reasoning. How these findings could be used for improving clinical education is still to be explored.
The methodological approach adopted in Experiment 1, besides being fairly new in medical expertise research, has an important strength: it allows the investigator to make inferences about participants' reasoning without altering their natural thinking processes. However, both studies also have some limitations. The participants involved in the experiments cannot be considered highly experienced doctors, and the findings may not be entirely valid for expert clinicians. Both studies used small samples and were conducted under experimental conditions, perhaps restricting generalization of the findings to real clinical settings. Finally, the methodological approach adopted has demonstrated the shift to reflective reasoning in a problematic context and a change in the types of knowledge used. We cannot say yet whether physicians would turn back to nonanalytical reasoning if they subsequently encountered simple cases. Moreover, other elements suggested as constituents of reflective reasoning,10,11 such as the ability to critically scrutinize one's own thinking, were not explored and require further investigation.
The authors are grateful to the medical residents that chose to dedicate their time to participate in the study, and to Dr. José Gerardo Paiva for his support in organizing the meetings with the participants.
The study was funded by a grant provided by Erasmus University Rotterdam, Rotterdam, the Netherlands, to the first author.