Share this article on:

To Think Is Good: Querying an Initial Hypothesis Reduces Diagnostic Error in Medical Students

Coderre, Sylvain MD, MSc; Wright, Bruce MD; McLaughlin, Kevin MB ChB, PhD

doi: 10.1097/ACM.0b013e3181e1b229
Clinical Reasoning

Purpose Most diagnostic errors involve faulty diagnostic reasoning. Consequently, the authors assessed the effect of querying initial hypotheses on diagnostic performance.

Method In 2007, the authors randomly assigned 67 first-year medical students from the University of Calgary to two groups and asked them to diagnose eight common problems. The authors presented the same primary data to both groups and asked students for their initial diagnosis. Then, after presenting secondary data that were either discordant or concordant with the primary data, they asked students for a final diagnosis. The authors noted changes in students' diagnoses and the accuracy of initial and final diagnoses for discordant and concordant cases.

Results For concordant cases, students retained 84.2% of their initial diagnoses and were equally likely to move toward a correct as incorrect final diagnosis (6.9% versus 8.9%, P = .3); no difference existed in the accuracy of initial and final diagnoses: 85.9% versus 84.0% (P = .4). By contrast, for discordant cases, students retained only 23.3% of initial diagnoses, change was almost invariably from incorrect to correct (76.3% versus 0.4%, P < .001), and final diagnoses were more accurate than initial diagnoses: 80.7% versus 4.8% (P < .001). Overall, no difference existed in the accuracy of final diagnoses for concordant and discordant cases (P = .18).

Conclusions These data suggest that querying an initial diagnostic hypothesis does not harm a correct diagnosis but instead allows students to rectify an incorrect diagnosis. Whether querying initial diagnoses reduces diagnostic error in clinical practice remains unknown.

Dr. Coderre is assistant dean, Undergraduate Medical Education, University of Calgary, Calgary, Alberta, Canada.

Dr. Wright is associate dean, Undergraduate Medical Education, University of Calgary, Calgary, Alberta, Canada.

Dr. McLaughlin is assistant dean, Undergraduate Medical Education, University of Calgary, Calgary, Alberta, Canada.

Correspondence should be addressed to Dr. McLaughlin, Office of Undergraduate Medical Education, University of Calgary, Health Sciences Centre, 3330 Hospital Drive NW, Calgary, Alberta, Canada T2N 4N1; telephone: (403) 220-4252; e-mail:

Despite impressive advances in medical science, physicians still burden 15% of patients with the wrong diagnosis.1–3 Multiple factors lead to diagnostic error, but three-quarters of all cases involve faulty reasoning; premature closure—defined as “failure to consider other possibilities once an initial diagnosis has been reached”—is the most persistent error type.3,4

When physicians are diagnosing patients, two cognitive processes may be operating.5,6 The first is subconscious and is variably referred to as automatic information processing, nonanalytic reasoning, or pattern recognition. This rapid process focuses primarily on the presenting complaint and contextual information—age, gender, clinical setting, disease-specific risk factors, etc.—and generates a diagnostic hypothesis based on similarity to previously encountered cases.7 The second is analytic information processing, through which the brain consciously examines the findings of the case and considers a list of diagnoses that might explain these findings.8 As physicians accrue clinical experience, automatic processing predominates. Indeed, some consider rapid diagnosis without ruminating on possible alternatives a hallmark of expertise.7,9,10 Interestingly, the process through which physicians use heuristics to immediately and confidently reach the right diagnosis is often referred to as “expertise,” but when they reach the wrong diagnosis it becomes premature closure!11–13 And automatic processing is not unique to experts—even novice learners frequently solve clinical problems using pattern recognition alone, albeit with less success than experts.14

Intuitively, people would think that performance improves with experience. Certainly, some physicians do continue to improve throughout their careers, truly earning their “expert” designations. But the performance of many, if not most, physicians remains stable or declines over time.15 Ericsson16 believes that automatic processing may be partly to blame for this—suggesting that rather than conferring expertise, automatic processing may actually retard the attainment of expertise. He describes the typical trajectory whereby a physician quickly attains a satisfactory level of performance and then maintains that level with decreasing effort over time, until the action becomes automatic. On the four stages of competence model, this represents the stage of “unconscious competence.”17 But once any skill, including diagnosing, becomes automatic, the action is no longer under conscious control and is inaccessible to deliberate improvement. Ericsson and others argue that experts prevent their actions from becoming fully automated, thus retaining the ability to reflect on and change their actions.16,18 Others have suggested that this expertise is the fifth stage of competence, that is, “reflective competence.”17 Similarly, having studied experts in several fields, Klein19 concludes that experts make “recognition-primed” decisions, through which they examine their initial selection by mental simulation (or metacognition), considering and reconsidering the consequences of their actions.

As medical educators, we strive to mold the cognitive processes of our students, so how should we handle this double-edged sword of automatic processing? Fearing premature closure, we could abandon it all together, but prior research has suggested that performance declines when participants try to diagnose without forming an initial hypothesis.20 Alternatively, we could try to improve automatic processing, although this might prove challenging because it is inaccessible to introspection. The third option is to recognize premature closure as a lack of inquiry and to encourage students to reflect on and interrogate their diagnoses; after all, this is what expert physicians do.

Instinctively, medical educators may believe that processing more information and considering alternative diagnostic possibilities should improve diagnostic performance; however, the data on this subject are murky. Whereas Ark and colleagues21,22 observed improved performance using both automatic and analytic information processing, Coderre and colleagues14 found that when novices diagnosed using pattern recognition alone, their performance was at least as good as when they used analytic processing; still others have documented reduced performance when processing additional information.23–25 Recognizing the complexity of this topic, we tried, with this study, to answer a simple question: When a student has already generated an initial hypothesis, does querying this hypothesis through processing of additional data improve diagnostic performance? We considered two scenarios—(1) beginning with a correct initial diagnosis and (2) beginning with an incorrect initial hypothesis—and predicted that querying would lead to a maintained initially correct hypothesis and to an improved initially incorrect hypothesis.

Back to Top | Article Outline



Our participants were first-year medical students at the University of Calgary (Calgary, Alberta, Canada). We have a three-year undergraduate program, the first year of which includes combined systems courses in gastroenterology, hematology, cardiology, and respirology. In 2007, we invited all first-year students from the graduating class of 2009 (n = 142) to participate in our study after they completed these courses. We informed students of the purpose of our study (although not our hypotheses) and obtained consent from each student prior to enrollment, and the research ethics board at the University of Calgary fully approved our study protocol. The Office of Undergraduate Medical Education provided funding to allow us to donate Can$30 for each participating student to the class of 2009 graduation fund.

Back to Top | Article Outline

Study content and design

For our content, we chose four common clinical presentations that our students had previously encountered in their first-year courses: chest pain, jaundice/abnormal liver enzymes, dyspnea, and anemia. We created two different cases for each clinical presentation. For each case, we had a single set of “primary data” (see description below) and two versions of the “secondary” dataset: one concordant and one discordant. In concordant cases, the primary and secondary data (see description below) supported the same diagnosis, so the initial diagnostic hypothesis and final diagnosis should be the same. Conversely, the primary and secondary data supported different diagnoses in discordant cases, and for each of these the secondary data supported the correct final diagnosis. Although there was considerable overlap between discordant and concordant secondary data, they differed with respect to critical data elements. This is illustrated by the case in Figure 1, in which the patient has risk factors for both primary sclerosing cholangitis and viral hepatitis. In the concordant version, the cholestatic pattern of liver function abnormality makes primary sclerosing cholangitis the most likely diagnosis, whereas the hepatitic pattern in the discordant case is inconsistent with this diagnosis, making viral hepatitis more likely.

Figure 1

Figure 1

When participants enrolled, we gave them each a sequential study number that we had randomly allocated to one of two groups. The groups differed only in the version of each case that they received: If Group 1 received the discordant version of Case 1, Group 2 received the concordant version of the same case. We kept the same sequence of cases throughout, so that both groups completed all eight cases in the same order, and each completed four discordant and four concordant cases.

We began each case by presenting only the primary dataset and asking for an initial diagnostic hypothesis. We then provided the secondary dataset and instructed our participants to consider these additional data before deciding on their final diagnosis. We did not restrict time; instead, our participants could take as much time as needed to complete each case. Figure 1 illustrates our study design.

Back to Top | Article Outline

Primary and secondary data

For each case, the primary dataset comprised the presenting complaint (e.g., a symptom, sign, or abnormal test result) and the patient's age and gender, any enabling conditions, and the clinical setting. Typically, physicians process these data automatically to generate an initial diagnostic hypothesis.9,26,27 Secondary datasets comprised the features classically used in analytic processing and include all relevant positive and negative historical and examination findings, as well as the results of any further investigations (laboratory tests, chest X-ray reports, EKG reports, etc.).8 Figure 1 shows the primary and secondary datasets for Case 1.

Back to Top | Article Outline

Evaluation of diagnoses

Physicians with domain expertise initially prepared the eight cases we used in this study. We then discussed in detail whether the data for each case were consistent with the final diagnosis, and we revised the data until we concurred that these supported a single best diagnosis for each case. Subsequently, when rating our participants' performance, we gave them either a score of 1 if their diagnosis agreed with our single best diagnosis, or a score of 0 if they chose a different diagnosis.

Back to Top | Article Outline

Statistical analyses

We used descriptive statistics to convey the proportion of initial and final diagnoses that corresponded to the single best diagnosis for concordant and discordant cases. We used a Fisher exact test to compare these proportions between concordant and discordant cases, and McNemar's discordant pair analysis to compare the direction of change of diagnoses. For all our analyses we used Stata version 8.0 (College Station, Texas).

Back to Top | Article Outline


From our intended sample of 142, 67 students (47%) completed the study. For concordant cases, students retained 84.2% of their initial diagnoses, and when diagnoses were changed they were equally likely to be replaced by a correct or incorrect final diagnosis (6.9% versus 8.9%, respectively, P = .3). We recorded no difference in the accuracy of students' initial and final diagnoses: 85.9% (95% confidence interval [CI] 82.8–88.8) versus 84.0% (CI 80.7–87.2), respectively (P = .4). These data are shown in Figure 2.

Figure 2

Figure 2

By contrast, for the discordant cases, students retained only 23.3% of their initial diagnoses, and change was almost invariably from an incorrect to a correct diagnosis (76.3% versus 0.4% changing from correct to incorrect, P < .001). For these cases, final diagnoses were more accurate than initial diagnoses: 80.7% (CI 77.0–84.2) versus 4.8% (CI 3.0–6.7), respectively (P < .001) (see Figure 2).

Overall, there was no difference in the accuracy of the final diagnosis for concordant and discordant cases (84.0% versus 80.7%, P = .18).

Back to Top | Article Outline


Diagnostic errors remain commonplace, but their declining incidence in recent years fuels optimism that doctors can minimize these, if not eliminate them altogether.2 Graber and colleagues3,4 propose a taxonomy for diagnostic error that not only explains why errors occur but also suggests who should be responsible for decreasing the occurrence of these. So, while awaiting a health care system devoid of “system-related” errors, we as physicians and medical educators should address the “cognitive” causes of diagnostic error, among which premature closure is the perennial prime suspect.3,28

Recognizing that premature closure typically arises from overreliance on automatic processing, Croskerry6,29 advocates restoring diagnostic reasoning to consciousness, where it is open to reflection and intervention. Notably, making diagnostic reasoning a conscious process does not eliminate automatic processing; rather, doing so merely allows the application of further knowledge or clinical rules to challenge initial diagnostic hypotheses.30

Here, we sought to investigate the effectiveness of querying an initial hypothesis in reducing diagnostic error. Querying fits the description of a “generic cognitive strategy” designed to expose reasoning to introspection and manipulation.29 Acknowledging the equipoise in the literature, which suggests that analytic processing can both help and hinder diagnostic reasoning, we designed our study to observe the effect of querying both when the initial diagnostic hypothesis is correct and when it is incorrect. Overall, our results suggest that querying does not harm a correct initial diagnosis; most often, our students recognized their diagnosis as correct and retained it. But querying seemed to rescue students from an incorrect initial diagnosis. In this latter situation, most students recognized and abandoned their incorrect initial hypotheses, switched to the correct diagnosis, and, ultimately, achieved the same level of performance as they did in the cases when they started with a correct initial diagnosis.

Our findings may seem straightforward—querying offers a potentially large benefit to diagnostic performance with little associated risk—but they do differ from those of other studies. LeBlanc and colleagues31,32 found that students and residents tended to stick to their initial diagnostic hypothesis whether correct or incorrect. But, in these studies, participants identified clinical features and diagnosed patients from a photograph, and the clinical data were limited to “one to two lines,” which may have been insufficient to allow for effective querying of an initial hypothesis. Yet, more information is not necessarily better. Kulatunga-Moruzi and colleagues24 showed that obliging participants to process a comprehensive list of clinical features—both for and against the correct diagnosis—also reduced performance. Although seemingly contradictory, this finding is not surprising given the limited capacity of working memory.33 Consistent with Kulatunga-Moruzi's findings, others have also observed a decline in performance as information increased.23,25 Kulatunga-Moruzi and colleagues24 subsequently confirmed that processing fewer pieces of data, all supporting the correct diagnosis, improves performance. Thus, much of the discrepancy in the literature on the benefits, or otherwise, of querying may relate to the quantity and quality of the data provided to participants.

Recently, several groups have reported findings consistent with a gain in diagnostic performance associated with querying. Although their methodologies and terminologies differed, each group studied the effect of metacognition, whether in the form of “analytic processing” or “reflective practice,” on diagnostic performance. In two studies involving novices, Ark and colleagues21,22 showed that training how to diagnose using both automatic and analytic processing was superior to either alone. McLaughlin and colleagues34 found that residents retained all of their correct diagnoses and were able to correct two-thirds of their incorrect diagnoses when given the opportunity to query an initial hypothesis. Similarly, Mamede and colleagues35 and Eva and colleagues36 determined that combining automatic and analytic processes could improve diagnostic performance on, respectively, difficult cases or cases that were biased toward an incorrect diagnosis. In these two studies, as in ours, querying improved incorrect initial diagnostic hypotheses but did not significantly alter correct initial diagnostic hypotheses.

Our study has some important limitations. We considered only four clinical presentations, so our results may not be generalizable to other clinical domains. Unlike typical psychology experiments, we did not control the amount of information that our participants processed (they could process as little or as much of the secondary data as they wished) or how long they took to provide their final diagnosis. We intentionally adopted this naturalistic approach to decision making because manipulating the amount of data the students processed may have introduced a performance bias24 and because the surest way to induce premature closure is to rush participants into diagnosing. Finally, we studied only one group of learners, first-year medical students, at only one university. What works for medical students at the University of Calgary may not work for experienced physicians24,37 or other learners elsewhere.

Back to Top | Article Outline


In this study, we have added to the literature on querying an initial diagnostic hypothesis as an effective strategy to reduce diagnostic error. Although contradictory studies exist, the balance of the literature, including this contribution, now seems to support metacognitive strategies as a way to improve diagnostic performance. However, whether questioning initial diagnoses will reduce diagnostic error in clinical practice remains unknown. In each of the studies that found improved performance when physicians or physicians-in-training queried a diagnostic hypothesis, the study protocol prompted them to do so. Can they do this spontaneously in the clinical setting? Also, studies thus far have been limited to novices, students, or residents. Experienced physicians make most final decisions; are these clinicians—who are unconsciously competent (and occasionally incompetent)—ready for metacognition?

Back to Top | Article Outline


The authors would like to thank Shirley Marsh and the administrative staff from the Office of Undergraduate Medical Education at the University of Calgary (Calgary, Alberta, Canada) for their support with this study.

Back to Top | Article Outline


The Office of Undergraduate Medical Education at the University of Calgary (Calgary, Alberta, Canada) provided funding for this research.

Back to Top | Article Outline

Other disclosures:


Back to Top | Article Outline

Previous presentations:

The authors presented these results at the 2008 (August 30 to September 3) Association for Medical Education in Europe meeting in Prague, Czech Republic.

Back to Top | Article Outline


1 Elstein AS. Clinical reasoning in medicine. In: Higgs J, Jones MA, eds. Clinical Reasoning in the Health Professions. Boston, Mass: Butterworth-Heinemann; 1995:50–58.
2 Shojania KG, Burton EC, McDonald KM, Goldman L. Changes in rates of autopsy-detected diagnostic errors over time. JAMA. 2003;289:2849–2856.
3 Graber ML, Franklin N, Gordon R. Diagnostic error in internal medicine. Arch Intern Med. 2005;165:1493–1499.
4 Graber ML, Gordon R, Franklin N. Reducing diagnostic errors in medicine: What's the goal? Acad Med. 2002;77:981–992.
5 Sloman SA. The empirical case for two systems of reasoning. Psychol Bull. 1996;119:3–22.
6 Croskerry P. A universal model of diagnostic reasoning. Acad Med. 2009;84:1022–1028.
7 Norman GR, Brooks LR. The non-analytical basis of clinical reasoning. Adv Health Sci Educ. 1997;2:173–184.
8 Eva KW. What every teacher needs to know about clinical reasoning. Med Educ. 2005;39:98–106.
9 Custers EJ, Boshuizen HP, Schmidt HG. The influence of medical expertise, case typicality, and illness script component on case processing and disease probability estimates. Mem Cognit. 1996;24:384–399.
10 Eva KW, Norman GR. Heuristics and biases—A biased perspective on clinical reasoning. Med Educ. 2005;39:870–872.
11 Bordage G. Why did I miss the diagnosis? Some cognitive explanations and educational implications. Acad Med. 1999;74(10 suppl):S138–S143.
12 Berner ES, Graber ML. Overconfidence as a cause of diagnostic error in medicine. Am J Med. 2008;121(5 suppl):S2–S3.
13 Croskerry P, Norman G. Overconfidence in clinical decision making. Am J Med. 2008;121(5 suppl):S24–S29.
14 Coderre S, Mandin H, Harasym PH, Fick GH. Diagnostic reasoning strategies and diagnostic success. Med Educ. 2003;37:695–703.
15 Choudhry NK, Fletcher RH, Soumerai SB. Systematic review: The relationship between clinical experience and quality of health care. Ann Intern Med. 2006;142:260–273.
16 Ericsson KA. Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Acad Med. 2004;79(10 suppl):S70–S81.
17 Four stages of competence. Wikipedia. Available at: Accessed March 23, 2010.
18 Epstein RM. Mindful practice. JAMA. 1999;282:833–839.
19 Klein G. Sources of Power: How People Make Decisions. Cambridge, Mass: The MIT Press; 1998.
20 Norman GR, Brooks LR, Colle CL, Hatala RM. The benefit of diagnostic hypotheses in clinical reasoning: Experimental study of an instructional intervention for forward and backward reasoning. Cogn Instr. 1999;17:433–448.
21 Ark TK, Brooks LR, Eva KW. Giving learners the best of both worlds: Do clinical teachers need to guard against teaching pattern recognition to novices? Acad Med. 2006;81:405–409.
22 Ark TK, Brooks LR, Eva KW. The benefits of flexibility: The pedagogical value of instructions to adopt multifaceted diagnostic reasoning strategies. Med Educ. 2007;41:281–287.
23 Redelmeier DA, Shafir E. Medical decision making in situations that offer multiple alternatives. JAMA. 1995;273:302–305.
24 Kulatunga-Moruzi C, Brooks LR, Norman GR. Using comprehensive feature lists to bias medical diagnosis. J Exp Psychol Learn Mem Cogn. 2004;30:563–572.
25 Dijksterhuis A, Bos MW, Nordgren LF, van Baaren RB. On making the right choice: The deliberation-without-attention effect. Science. 2006;311:1005–1007.
26 Hobus PP, Schmidt HG, Boshuizen HP, Patel VL. Contextual factors in the activation of first diagnostic hypotheses: Expert–novice differences. Med Educ. 1987;21:471–476.
27 Schmidt HG, Norman GR, Boshuizen HP. A cognitive perspective on medical expertise: Theory and implication. Acad Med. 1990;65:611–621.
28 Kohn LT, Corrigan JM, Donaldson MS, eds; Committee on Quality of Health Care in America; Institute of Medicine. To Err Is Human: Building a Safer Health System. Washington, DC: National Academy Press; 2000.
29 Croskerry P. Cognitive forcing strategies in clinical decision making. Ann Emerg Med. 2003;41:110–120.
30 Rasmussen J, Jensen A. Mental procedures in real-life tasks: A case study of electronic trouble shooting. Ergonomics. 1974;17:293–307.
31 LeBlanc VR, Norman GR, Brooks LR. Effect of a diagnostic suggestion on diagnostic accuracy and interpretation of clinical features. Acad Med. 2001;76(10 suppl):S18–S20.
32 LeBlanc VR, Brooks LR, Norman GR. Believing is seeing: The influence of a diagnostic hypothesis on the interpretation of clinical features. Acad Med. 2002;77(10 suppl):S67–S69.
33 Miller GA. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol Rev. 1956;63:81–97.
34 McLaughlin K, Heemskerk L, Herman R, Ainslie M, Rikers RM, Schmidt HG. Initial diagnostic hypotheses bias analytic information processing in non-visual domains. Med Educ. 2008;42:496–502.
35 Mamede S, Schmidt HG, Penaforte JC. Effects of reflective practice on the accuracy of medical diagnoses. Med Educ. 2008;42:468–475.
36 Eva KW, Hatala RM, LeBlanc VR, Brooks LR. Teaching from the clinical reasoning literature: Combined reasoning strategies help novice diagnosticians overcome misleading information. Med Educ. 2007;41:1152–1158.
37 Ayers P, Chandler P, Sweller J. The expertise reversal effect. Educ Psychol. 2003;38:23–31. Available at:∼chopin/references/tig/kayluga_ayres.pdf.pdf. Accessed March 23, 2010.
© 2010 Association of American Medical Colleges