ELSTEIN, ARTHUR S.
Geoff Norman has presented an extremely rich and stimulating paper that surveys many important themes. In my response to his article,1 I shall not comment on the connections he seeks between psychology, philosophy, and neuroscience, because this attempted synthesis is well beyong my area of expertise. Instead, my discussion focuses on two other issues: the status of research on the psychology of clinical problem solving, and the connections between this research and decision psychology, the framework in which I have worked for the last 20 years. Then I consider the implications of this work for improving the quality of health care decisions.
Status of Research on the Psychology of Clinical Problem Solving
Several schemes have been put forth to explain how diagnostic reasoning is accomplished, including diagnostic categorization by instance-based recognition,2 prototypes,3,4 propositional networks,5,6 forward reasoning or pattern matching,7 and generating competing hypotheses.8 Evidence supporting each of these models is available in the literature. How can this be? Norman argues that no single representation of the process or of the organization of knowledge accounts for all of the phenomena investigators have encountered. Each account is correct sometimes, because individuals adapt their strategies to the demands of the task, including the demands of the experimenter. This implies that experiments designed to test particular hypotheses have also, in some sense, been designed to validate the hypotheses or beliefs of the investigators.
Norman and I agree that problem solvers are adaptive creatures, and we must be careful about concluding that any one account of their behavior will explain all phenomena. He and his collaborator, Henk Schmidt, put it well: “There is more than one way to solve a problem.”9 Viewing problem solvers as adaptive thinkers trying to cope with complexity does not attribute malicious intent either to investigators or to research subjects. On the contrary, it harks back to Newell and Simon,10 who argued that because of the limitations of working memory, complex tasks are represented in simplified problem spaces, and that consequently understanding problem solving is significantly advanced by understanding that cognitive representation. Their view was quite radical for its time, for the concept of a problem space really committed us to the study of what we now call problem representations or mental models.
Different mental models might be employed by different subjects, or the choice might depend on the task. It follows that a hierarchical organization of medical knowledge, with general concepts at the top and specific instances at the bottom, is a plausible representation and is partially correct. So are propositional networks, with their nodes and connections, symptom-by-disease matrices, and semantic networks. In most studies employing each of these frameworks, the model finds reasonable support in the data. Norman argues that this fit occurs because the subjects, whether medical students, residents, or more experienced physicians, figure out how to adapt to the demands of the task, and these demands usually ask them to behave in ways that provide evidence for the models. This view owes much to Rosenthal's research on demand characteristics.11 Within the domain of cognitive studies of medical reasoning, I am not aware of studies that test the fits of different cognitive models to the same set of data, so we do not know which would fit the data best or how often each model is used. Studies to test competing models can and should be designed.
Several prominent investigators in the field of medical cognition have used verbal reports of subjects thinking aloud either while solving a diagnostic problem or retrospectively to construct representations of the problem-solving process. Norman notes that “propositional networks are disturbingly idiosyncratic and not apparently reproducible.”1 I cannot entirely endorse his view that “all of these concept architectures are produced on the fly at retrieval, in order to satisfy the expectations of the researcher.”1 It is at least plausible that these “architectures,” like other blueprints, are plans for a constructive process: if one follows a blueprint and a house or office building results, we should not be surprised. The plan was designed to lead to that output.
Still, his caution is warranted. We should not unhesitatingly embrance verbal reports as the solution to the problem of elucidating cognitive processes. Too much cognitive processing goes on beneath the level of verbal report. And we agree that, to the extent that subjects adapt to the demands of the experimenter, they are likely to tell us what they think we want to hear. These objections imply that research that relies on verbal reports for basic data is not as likely to lead to “truth” as we would like to believe, and that we should move away from thinking-aloud methods back to traditional experimental psychology: the researcher should observe the relationship between the stimulus and the subject's response, and ignore or distrust verbalizations about the task. The subject's response may be verbal, such as a diagnosis or a probability estimate, but a scientific explanation of the thought process should not be based on responses to such questions as “How did you know that?” or “Why do you think this is so?” If these questions are used, we should treat the explanations and justifications as data, not as true accounts of the operations of the subjects' minds.
Both Norman and I have taken these cautionary thoughts to heart over the years. Consequently, we have moved away from thinking-aloud accounts as the primary data source and toward more traditional experimental methods (for examples, see references 12 and 13). We have done this despite knowing that experimental studies will be criticized by clinicians on the grounds that they lack clinical verisimilitude and may not generalize to real clinical settings. A thoughtful clinician will surely ask this question about our work: “Even if I concede that physicians behave as you have shown in this experimental setting, what reason is there to believe that they would behave similarly when dealing with real patients?” Anticipating this question, Norman and his colleagues have worked extensively with visual stimuli, such as radiographs and EGG tracings, that are unquestionably part of the real clinical world.14,15 But this strategy begs the question, “Do the results apply to non-visual stimuli, such as are obtained in taking a good history?” My colleagues and I have done some research using case vignettes16,17,18 that do not use thinking aloud to study clinical reasoning. One objection raised to our findings in those studies relates to motivational factors: clinicians are not motivated to do their best with hypothetical cases and would do “better” with real patients. I think it unlikely that clinical problem solving will be better in complex environments, with many distractions, than in simplified laboratory settings, but I concede that, just as with pharmaceutical research, laboratory findings should be verified in the “real world.” Nobody ever said that doing good research would be easy.
Norman noted that my own research program moved in a different direction after 1980, from a focus on clinical “problem solving” to “decision making.” What is the difference? For over two decades, much of the research on the psychology of decision making has been dominated by statistical decision theory, a model of idealized rationality under uncertainty. Behavioral decision research has concentrated on identifying systematic departures from this model, and these departures are viewed as “errors.” The research has shown that while decision theory may be an account of ideal rationality, it is not a description of how people actually make judgments and choices under uncertainty. In short, limited rationality has its impact on both decision making and problem solving. The psychological processes that produce these errors are called “heuristics and biases.” Indeed, the entire line of research has come to be identified by this term.19,20
Norman argues that there is not much point in identifying cognitive heuristics and biases that violate the rules of statistical decision theory, since people are not trying to reach conclusions using these principles. To quote: “An evident implication is that there is little to be gained in demonstrating that humans are suboptimal Bayesians or algorithm-appliers; they are suboptimal because they are using a substantially different basis for computation.”1
In my judgment, Norman has misunderstood the research agenda of decision psychology and its implications for medical education. The study of clinical diagnostic reasoning from the problem-solving point of view implies one thinks of diagnosis as categorization. The research questions then center around issues such as, “What categories does the problem solver know? What features justify placing the case in one category or another?” These are questions about the knowledge base and feature recognition and interpretation. From the decision-making standpoint, clinical diagnosis is opinion revision with imperfect information, and treatment choice is about how best to balance benefits and harms. Risk and uncertainty are everywhere. The aims of the research are to identify the processes people use in making complex judgments and choices under these conditions, and to ask whether their behaviors are consistent with Bayes' theorem (for diagnostic reasoning) and maximizing expected utility (for treatment choices). If behavior is not consistent with these principles, and if we find these principles sensible and appealing, we might well wonder what kind of educational program could be developed to improve our decision making. Therefore, there is just as much point to studying cognitive heuristics and biases as there is to studying the roles of instances and prototypes in categorization. Indeed, the role of instances in categorization can be seen as a special case of base-rate neglect or of treating irrelevant data as strong evidence: in reality, some of the cues associated with the instance have likelihood ratios close to 1.0 (the decision-theoretic definition of irrelevant), but are treated as if they are meaningful, say >10.0. Using two very different theoretical frameworks, both of us have thrown some light on how the mind works, and we have shown that human inference can be improved upon. To improve clinical decision making, it seems to me that decision theory is at least as promising as the study of categorization processes. I still think that a general strategy applicable to a wide range of clinical situations would be very useful in helping people to think straight. Norman referred to the finding of content specificity, discovered in my early research in this area.8 Given this fact, the need for a general approach to sound thinking is even greater than we had previously suspected.
What is the evidence that clinicians at times need help in thinking about complex problems? Two related bodies of evidence, from cognitive psychology and from health services research, support this claim. From cognitive psychology, we have a series of lessons and findings about limited rationality. Health services researchers have provided a growing body of literature on practice variation (for example, see references 21 and 22), which has repeatedly shown that something besides hard science is involved in many medical decisions, both diagnostic and therapeutic, and that these variations are not necessarily rational responses to differences between patients.
How can We Improve the Quality of Clinical Practice?
Interestingly, in the past 20 years, two related decision technologies have arisen that deal precisely with these issues: evidence-based medicine (EBM) and decision analysis (DA). Both offer to the medical community ways of quantifying the evidence, dealing with uncertainty and error in the evidence, and trying to systematically weight risks and benefits of alternative treatment strategies. The rapid dissemination of these principles may be attributed in part to the diligence and enthusiasm of their devotees, but it cannot be entirely explained by their efforts. The zeitgeist or cultural climate had to be ready. In my view, psychological research on problem solving and decision making has contributed to these developments by showing that expert clinical judgment was not as expert as we had believed it to be, that knowledge transfer was more limited than we had hoped it would be, and that judgmental errors were neither limited to medical students nor eradicated by experience. EBM and DA offer approaches for dealing with these problems, and that is why they are making headway in clinical medicine. Clinical practice guidelines, which are intended to improve the overall quality of care, are another, related, approach to these issues, and the problems encountered in their dissemination and implementation have been widely discussed.23,24,25,26
The reactions to these approaches suggest that the tension between theory and practice will remain. All theories and models are simplifications of reality. They abstract particular features in order to provide a reasonably coherent account of how things work and to guide action. That is precisely why they are useful. Models are not reality, however, and theory is not practice. Consequently, physicians often mistrust the adequacy of scientific accounts or guidelines based on evidence, despite the necessity of relying upon them. Because general principles will never be able to account for all concerns in clinical cases, there will always be room for judgment, applying general principles on a case-by-case basis.
Encomium: Let Us Now Praise…
Geoff left his comments about the connection to Michigan State University (MSU) and its College of Human Medicine for the close of his remarks, and I follow his example. How fortunate that many years ago, Geoff Norman came to MSU and joined our small group of scholars. We were not aware that we were doing classic work that would be argued and discussed and revisited for a generation. Who could possibly have thought that? Yet, if there was ever a golden era of research in medical education, it was there and then. We have made some progress, and we have had a wonderful run. When I think of that medical school and its faculty and students back in the 70s, the wonderful line from Shakespeare's Henry V always comes to mind: “We few, we happy few, we band of brothers.” How appropriate that in Jack Maatsch's memory we have come together to discuss some issues that concerned him and to celebrate that happy band!
1. Norman GR. The epistemology of clinical reasoning: perspectives from philosophy, psychology, and neuroscience. Acad Med. 2000;75(10 suppl):S127–S133.
2. Brooks LR, Norman GR, Allen SW. Role of specific similarity in a medical diagnostic task. J Exper Psychol Gen. 1991;120:278–87.
3. Bordage G, Zacks R. The structure of medical knowledge in the memories of medical students and general practitioners: categories and prototypes. Med Educ. 1984;18:406–16.
4. Bordage G, Lemieux M. Semantic structures and diagnostic thinking of experts and novices. Acad Med. 1991;66(9 suppl):S70–S72.
5. Patel VL, Evans DA, Groen GJ. Biomedical knowledge and clinical reasoning. In: Evans DA, Patel VL (eds). Cognitive Science in Medicine. Cambridge, MA: MIT Press, 1989.
6. Patel VL, Evans DA, Kaufman DR. A cognitive framework for doctor-patient interaction. In: Evans D, Patel V (eds). Cognitive Science in Medicine. Cambridge, MA: MIT Press, 1989:257–312.
7. Patel VL, Groen G. Knowledge-based solution strategies in medical reasoning. Cogn Sci. 1986;10:91–116.
8. Elstein AS, Shulman LS, Sprafka SA. Medical Problem Solving: An Analysis of Clinical Reasoning. Cambridge, MA: Harvard University Press, 1978.
9. Norman GR, Schmidt HG. The psychological basis of problem-based learning: a review of the evidence. Acad Med. 1992;67:557–65.
10. Newell A, Simon HA. Human Problem Solving. Englewood Cliffs, NJ: Prentice-Hall, 1972.
11. Rosenthal R. Experimenter Effects in Behavioral Research. New York: Appleton-Century-Crofts, 1966.
12. Hatala R, Norman GR, Brooks LR. Influence of a single example on subsequent electrocardiogram interpretation. Teach Learn Med. 1999;11:110–7.
13. Elstein AS, Christensen C, Cottrell JJ, Polson A, Ng M. Effects of prognosis, perceived benefit and decision style upon decision making in critical care. Crit Care Med. 1999;27:58–65.
14. Norman GR, Brooks LR, Coblentz CL, Babcook CJ. The correlation of feature identification and category judgments in diagnostic radiology. Mem Cogn. 1992;20:344–55.
15. Regehr G, Cline J, Norman GR, Brooks L. Effects of processing strategy on diagnostic skill in dermatology. Acad Med. 1994;69:S34–S36.
16. Christensen C, Heckerling PS, Mackesy-Amiti ME, Bernstein LM, Elstein AS. Pervasiveness of framing effects among physicians and medical students. J Behav Decis Making. 1995;8:169–80.
17. Bergus G, Chapman GB, Gjerde C, Elstein AS. Clinical reasoning about new symptoms in the face of pre-existing disease: sources of error and order effects. J Fam Pract. 1995;27:314–20.
18. Chapman GB, Bergus GR, Elstein AS. Order of information affects clinical judgment. J Behav Decis Making. 1996;9:201–11.
19. Elstein AS. Heuristics and biases: selected errors in clinical reasoning. Acad Med. 1999;74:791–4.
20. Kahneman D, Slovic P, Tversky A (eds). Judgment Under Uncertainty: Heuristics and Biases. New York: Cambridge University Press, 1982.
21. Vehvilainen AT, Kumpusalo EA, Takala JK. They call it stormy Monday—reasons for referral from primary to secondary care according to the days of the week. Br J Gen Pract. 1999;49:909–11.
22. Lim LL, Heller RF, O'Connell RL, D'Este K. Stated and actual management of acute myocardial infarction among different specialties. Med J Aust. 2000;172:208–12.
23. Greco PJ, Eisenberg JM. Changing physicians' practices. N Engl J Med. 1993;329:1271–4.
24. Asch DA, Hershey JC. Why some health policies don't make sense at the bedside. Ann Intern Med. 1995;122:846–50.
25. Cabana MD, Rand CS, Powe NR, et al. Why don't physicians follow clinical practice guidelines? A framework for improvement. JAMA. 1999;282:1458–65.
26. Poses RM. One size does not fit all: questions to answer before intervening to change physician behavior. Joint Comm J Qual Improvement. 1999;25:486–95.
Research in Medical Education: Proceedings of the Thirty-ninth Annual Conference. October 30 - November 1, 2000. Chair: Beth Dawson. Editor: M. Brownell Anderson. Foreword by Beth Dawson, PhD.