The development of students’ diagnostic competence, a primary goal of medical education, remains a major challenge for every clinical teacher. Findings from research on physicians’ diagnostic reasoning1,2 have not yet been sufficient to establishing a basic understanding of how clinical diagnosis should be taught. There is very little empirical evidence about effective strategies that clinical teachers might use to foster development of students’ diagnostic reasoning.3,4 We therefore set out to study the effects of three different instructional strategies used during practice with clinical cases on students’ diagnostic competence.
Successfully reasoning to diagnose a clinical problem results from complex interactions between a clinician’s expertise and several context-related and case-related factors.2 Nevertheless, there is little doubt about the centrality of knowledge in the diagnostic process. Diagnostic performance depends on having in memory an extensive and well-organized knowledge base with a rich collection of mental representations of diseases.2,5 Particularly important are “illness scripts,” which are mental representations of diseases consisting of scenarios of patients with a particular disease, with the relationships between its signs and symptoms, its causal mechanisms, and the conditions under which the disease is likely to occur.6,7 Expert physicians have in memory a rich collection of illness scripts, either as general representations of disease categories or as examples of previously seen patients.1,8 Such scripts are activated, usually early in the clinical encounter, by cues in a patient’s history, leading to the generation of diagnostic hypotheses, which are subsequently verified by matching additional information collected from the patient with the expected elements in the script.8,9 Accurate diagnostic performance depends, therefore, largely on the amount and richness of the illness scripts a physician has stored in memory.
Students’ knowledge about diseases gradually takes the format of illness scripts during their clinical years as they repeatedly apply knowledge acquired in earlier stages of their training to solve clinical problems.7,8 To foster the development of illness scripts, clinical teachers are advised to ensure that their students practice with many examples of clinical problems displaying a broad variety of diseases in their different presentations.2,10 Such advice, however, may not be sufficient. Even if students are offered similar opportunities for practice, these learning experiences can lead to different outcomes depending on how students deal with the problem.2 Students may notice different features or think about different aspects of a problem and extract different insights from the problem-solving experience. Teaching strategies have been proposed to guide students’ practice with clinical cases so that they benefit more from it, but their effectiveness remains largely unknown.3,10 In clinical teaching, students are conventionally asked to consider alternative diagnoses in addition to the first hypothesis generated for a case. This may not be, however, the most effective approach to foster enrichment of illness scripts. In a recent study, students who followed a procedure for “structured reflection” on cases during practice learned more—that is, made better diagnoses when they encountered new examples of the same diseases in the future—than students who simply generated a differential diagnosis.11 By encouraging students to match a patient’s presentation to each diagnosis they considered for a case, reflection might have led to enriching students’ illness scripts for the disease, consequently enhancing their competence to diagnose new exemplars of the same disease in the future. If structured reflection during practice with clinical scenarios has such an effect, it would be a powerful tool for clinical teaching. However, to accept structured reflection as a suitable candidate for the toolbox of the clinical teacher, these findings need to be replicated and extended, and the mechanisms through which reflection acts on learning should be explored.
This study investigated whether reflection while practicing with clinical cases leads to better future performance on diagnosing not only new cases of the diseases practiced but also cases of “adjacent” diseases—that is, diseases that were not among the cases seen during practice but that are alternative diagnoses for them. If reflection indeed operates by enriching illness scripts, then one would expect this to affect scripts about such adjacent diseases as well, which would not happen if one merely thought of that alternative diagnosis during differential diagnosis but did not reflect on it. We investigated this question in an experimental study with fourth-year medical students. In the learning phase, participants diagnosed clinical cases by following different procedures depending on the condition to which they were assigned: generating a single diagnosis, generating a differential diagnosis, or structured reflection. No differences between the conditions in initial diagnostic performance were expected. In a test after one week, participants from all conditions were asked to diagnose different exemplars of the same diseases seen in the learning phase as well as cases of adjacent diseases. Our study differs from the prior study, which showed the benefits of reflection for learning,11 in two important ways. First, we only used a delayed test, not an immediate test; this was done to prevent any possible differential effects that initial testing might have had in the instructional conditions (see research on the “testing effect”).12 Second, the test in the prior study only required students to diagnose new examples of the same diseases that they had encountered in the learning phase, whereas in our study the test comprised new, adjacent diseases—that is, diseases that were not seen during the learning phase but might have been considered as differential diagnoses for the cases. This distinction is crucial because it allows for exploring the mechanisms through which reflection affects learning. We expected reflection to lead to restructuring of the mental representations of the diseases considered during the diagnostic process, and, therefore, we hypothesized that students who reflected while diagnosing the cases during the learning phase would outperform students from the other conditions in the diagnostic test one week later, not only on the different cases of the diseases encountered in the learning phase but also on the new, adjacent diseases.
The study was an experiment consisting of two stages: a learning phase, which consisted of practicing with diagnosing clinical cases by following different instructions depending on the condition to which the participant was assigned; and a diagnostic test, administered one week later. Participants’ previous experience with the diseases present in the test was evaluated before the study.
Ethical approval for the study was provided by the research ethics committee of the UNIFENAS Medical School.
The participants were fourth-year medical students at the UNIFENAS Medical College, Belo Horizonte, Brazil. The school has a six-year, problem-based learning curriculum, with clerkships in the two final years. There are two entries per year, in February and August, with the academic year structured in two terms of one semester each. The study took place in February and August 2012, with the students who had entered the seventh semester of the program in those months. Because the initial analyses showed no differences between the two groups of students, the data were aggregated, and the analyses proceeded with a single group. Students from the seventh semester of the program were considered eligible for the study because, at this point of the curriculum, they have already been exposed, in tutorial groups and other didactic activities, to knowledge about the to-be-tested diseases but not yet to many patients with these diagnoses. They were not expected, therefore, to have well-developed illness scripts of the diseases. All 180 eligible students were invited to voluntarily participate in the study, which took place during a regular educational activity. Written consent was obtained from all volunteers, who did not receive any compensation.
The study used two sets of written clinical cases for the learning phase and the test after one week (List 1). Each case consisted of a short description of a patient’s medical history, symptoms, findings from physical examination, and lab tests (see Appendix 1 for an example of a case). Three board-certified internists (A.M., R.F., J.P.) first independently prepared the cases. Subsequently, we jointly reviewed each case until a consensus was reached that the clinical presentation supported one single most likely diagnosis, which was considered the correct response for the case.
The study comprised three sessions with the participants, conducted in three subsequent weeks. In the first session, we invited eligible students to participate in the study, and those who agreed to participate filled in the written consent and a background information form as well as the tool for assessment of previous clinical experience (see below). One week later, the learning phase took place, followed by the test in the subsequent week (see diagram of the study in Figure 1).
Assessment of previous clinical experience.
One week before the learning phase, participants self-reported their clinical experience with each of the diseases they would encounter in this study, which were embedded in a longer list to avoid specific priming. They did so by using a five-point scale ranging from 1 (“I never saw a clinical case of this disease”) to 5 (“extensive, I have seen several clinical cases of this disease”).
The learning phase aimed at simulating practice with clinical cases. It consisted of diagnosing seven clinical cases: four cases of criterion diseases (two cases of acute myocardial infarction, two cases of choledocholithiasis) and three filler cases. The cases were presented in a booklet, one per page, in random order.
Participants were randomly assigned to one of three experimental conditions that differed regarding the instructions to be followed to diagnose the cases. In the single-diagnosis condition, we asked 36 participants to read the case and write down the most likely diagnosis for the case, trying to be as fast as possible but without compromising accuracy. They were asked to solve a word puzzle after having diagnosed the case, a procedure that has been used to reduce the chance that participants would engage in reflection about the case just seen,11,13 and that is required to allow for a distinction between the experimental conditions. We also asked 35 students in the differential-diagnosis condition to read the case and write down the most likely diagnosis. Subsequently, they were asked to think about alternative diagnoses for the case, if their initial diagnosis turned out to be incorrect, write them down, and then make their decision about the final diagnosis. After that, they were requested to move to the next page to solve a word puzzle. In the reflection condition, 39 students were also asked to read the case and write down its most likely diagnosis. They were then requested to follow a structured procedure to reflect on the case. We asked them to list the findings in the case that support the initial diagnosis; list the findings that speak against this diagnosis; list the findings that were expected to be present if the diagnosis were true, but were absent in the case; list alternative diagnoses, if the initial diagnosis made for the case would prove to be incorrect; and then follow the same procedure (steps 1–3) for each alternative diagnosis. Finally, they were asked to draw a conclusion by ranking the diagnoses in order of likelihood and selecting the most accurate diagnosis for the case. We gave no information about the correct diagnoses or about participants’ performance during the learning phase. Feedback was only provided after the one-week-later test.
We allocated a maximum of seven minutes to work on each case in all conditions, an amount of time that has been shown to be sufficient in a previous study.11 If participants diagnosed the case in less time, they could work on the puzzle until the time was up. The researcher controlled the time and informed the participants when they could move to the next case.
Diagnostic performance test.
The test, which we administered one week later, consisted of a new set of nine clinical cases (see List 1): one novel exemplar of each criterion disease seen in the learning phase (i.e., acute myocardial infarction and choledocholithiasis), four cases of new diseases that were not among the cases studied in the learning phase but were plausible alternative diagnoses in patients with those presentations (i.e., stable angina and gastroesophageal reflux disease for the case of acute myocardial infarction, and acute viral hepatitis and hemolytic anemia for the case of choledocholithiasis), and three filler cases. The cases were presented in a booklet, one per page, in randomized order, and we asked participants to read the case and give the most likely diagnosis for the case. Time to diagnose each case was not restricted, but students were informed, before starting, that the whole section would last 40 minutes.
The diagnoses provided by the participants in the learning phase and on the test were first evaluated as correct, partially correct, or incorrect, and scored respectively as 1, 0.5, or 0. We considered a diagnosis correct whenever the core diagnosis of the case was provided (e.g., “hepatitis” in the case of acute viral hepatitis). When the core diagnosis was not mentioned but one component of the diagnosis was given, the diagnosis was evaluated as partially correct (e.g., “gallstones” in the case of choledocholithiasis). Diagnoses that did not fall into one of these categories were considered incorrect. First, we transcribed the participants’ responses from the booklets to sheets of paper, and three internists (A.M., J.P., R.F.) evaluated the participants’ responses for each case, without being aware of the experimental condition under which the diagnoses had been given. We agreed on the scores attributed to 94% of the diagnoses and resolved discrepancies through subsequent discussion.
For each participant, we summed the scores obtained on the four cases of the criterion diseases in the learning phase and computed the mean diagnostic accuracy score (range 0–1) for each experimental condition. To obtain the overall performance in the test, we summed, for each participant, the scores obtained on all the cases (excluding the filler cases) in the test. Subsequently, we separately summed, for each participant, the scores obtained on the two new cases of the previously studied diseases (i.e., the new exemplars of acute myocardial infarction and choledocholithiasis) and on the four new diseases that had not been seen in the learning phase but were plausible alternatives (i.e., stable angina, gastroesophageal reflux disease, acute viral hepatitis, and hemolytic anemia) (see List 1). Mean diagnostic accuracy scores obtained on the two types of cases (i.e., previously studied diseases and new diseases) were computed for each experimental condition.
Because the normality assumption was not met for our data, we used nonparametric tests. We performed Kruskal–Wallis tests (significance level .05) with experimental condition (single-diagnosis, differential-diagnosis, or structured reflection) as between-subjects factor on the proportion of correct diagnoses made in the learning phase to test the hypothesis that the groups would not differ in diagnostic performance in that phase. The same test was performed on the proportion of correct diagnoses obtained in all the cases in the test, on the proportion of correct diagnoses made on the previously studied diseases, and on the proportion of correct diagnoses made on the new diseases. These analyses tested the hypotheses that structured reflection would foster learning not only of the diseases practiced during the learning phase but also of their alternative diagnoses, thereby leading to better performance in both types of cases in the diagnostic test than obtained by the other conditions. Whenever the Kruskal–Wallis test reached the significance level, we performed Mann–Whitney tests ( one-tailed) for comparisons between the experimental conditions (Bonferroni correction applied; significance level set at .0167).
Of the 110 participants, the mean age was 23.79 (SD = 4.85), and 68 were female. From the initial 115 participants (115/180; response rate = 64%), 2 were excluded because of missing values, and 3 outliers were removed after the exploratory data analyses (final response rate = 61%).
The self-reported experience with the to-be-tested disease did not differ among conditions, H(2) = 1.03, P = .61.
Table 1 shows the mean diagnostic accuracy scores obtained by students from each experimental condition in the learning phase and in the test. The three groups did not significantly differ in diagnostic performance in the learning phase, H(2) = 5.32, P = .067, although the differential-diagnosis condition performed marginally better than the other two groups.
The instructional strategy adopted to solve the clinical cases during the learning phase, however, significantly affected overall diagnostic performance in the test, H(2) = 18.70, P < .001. The students who reflected on the cases during the learning phase outperformed those who provided a single diagnosis (U = 302, P < .001) and those who generated a differential diagnosis (U = 432.50, P = .003). No significant difference emerged between students from the single-diagnosis and the differential-diagnosis conditions (U = 532, P = .13).
When the analysis included only the novel exemplars of the previously studied diseases, there was also a significant effect of the strategy used during the learning phase on diagnostic performance, H(2) = 17.91, P < .001. Students from the reflection group performed better than the two other groups when diagnosing different cases of the same diseases that they had seen in the learning phase (reflection versus single-decision: U = 323, P < .001; reflection versus differential-diagnosis: U = 490.50, P = .014). Difference in diagnostic performance between the single-decision and the differential-diagnosis conditions did not reach significance (U = 477, P = .031).
The analysis including only the cases of new diseases—that is, diseases that were not among the cases solved during the learning phase but that would be reasonable alternative diagnoses for those cases—showed a significant effect of the strategy followed during the learning phase on diagnostic performance in the test, H(2) = 6.86, P = .03. Students who reflected on the cases during the learning phase performed better when diagnosing cases of new diseases than students who generated only a single diagnosis (U = 496, P = .01) or a differential diagnosis (U = 493.50, P = .015). Again, no significant difference emerged between the single-diagnosis and the differential-diagnosis conditions (U = 623, P = .47).
We compared the effects of three different strategies adopted during practice with clinical cases—structured reflection, providing a single diagnosis, or generating differential diagnoses—on learning to diagnose. Our findings show that structured reflection was more effective in fostering learning than simply providing a diagnosis or a differential diagnosis. Students who reflected on the cases in the learning phase performed better than students from the other two conditions in a diagnostic test after one week, despite a marginal difference in favor of the differential-diagnosis condition in the learning phase. The effect of reflection on diagnostic performance was substantial and appeared both on novel exemplars of the same diseases and on diseases that were not among the cases seen in the learning phase but that were plausible alternative diagnoses for them.
The learning phase did not involve any teaching about the to-be-diagnosed diseases, and feedback about performance was provided only after completion of the study. Because students were not exposed to any “new” knowledge, where might the observed effect of reflection come from? Our explanation is that reflecting while diagnosing the cases in the learning phase restructured mental representations of diseases that the students already had in mind. Reflection required students to match in detailed fashion the patient’s signs and symptoms with the illness script of the disease initially considered as a possible diagnosis, evaluating the degree to which findings were compatible with the presentation typically associated with that disease and identifying discrepancies. Subsequently, the same analysis was performed for alternative diagnoses. While comparing and contrasting the various possible illness scripts against the patient’s findings, students were likely to have noticed variations of typical presentations of a disease and/or to have identified critical features that discriminate between alternative diagnoses, ending up with enriched scripts that made them better equipped to diagnose the cases in the test. Similar effects of contrastive learning approaches have indeed been found in research in other domains.14
That reflection helped by fostering restructuring illness scripts is supported by the finding of the positive effect of reflection on the diagnosis of diseases that were not among the cases studied in the learning phase but that are alternative diagnoses in patients with those presentations. If reflection had simply facilitated remembering previously studied diseases, the effect would have occurred only on the novel exemplars of the diseases presented in the learning phase. It seems that the newly acquired distinctiveness of new exemplars of a disease with which students have practiced, developed as a result of detailed reflection, helps students also in distinguishing between other diseases that share signs and symptoms with that practiced disease. Note that the differential-diagnosis condition also involved consideration of alternative diagnoses during the learning phase, but this approach brought no benefits for learning to diagnose the diseases seen in the learning phase or their alternative diagnoses.
Strategies for fostering students’ diagnostic competence have been much discussed. The pertinent literature has focused on ways to improve diagnostic performance through designing more effective teaching,10,15,16 but these proposals are seldom accompanied by evaluation of their effectiveness.3,10 The few empirical studies usually consist of a learning phase, in which the diagnosis of a disease is taught, followed immediately by a test that measures performance on the diagnosis of novel exemplars of the same disease studied in the learning phase.17–20 To our knowledge, existing studies did not, therefore, investigate learning by measuring performance in diagnosing novel cases in the future. Moreover, these studies exclusively investigated diagnostic performance on novel exemplars of the same diseases taught during the learning phase. We took a different approach by investigating the effects of different approaches to practice with clinical problems without any added teaching involved, on learning the diagnosis not only of the studied diseases but also of their alternative diagnoses, as measured by performance in the future.
Providing students with opportunities to practice with many examples of patients’ problems in a broad variety of clinical presentations has been recognized as a primary requirement to foster students’ diagnostic competence.2,3 Following this advice may prove challenging because the likelihood that students will encounter the necessary diversity of patients tends to decrease in many countries.2,21 Conventional practice with real patients certainly needs to be associated with practice with different formats of simulated clinical scenarios. Practice with real patients obviously involves the development of additional competences. Nevertheless, for both types of practice, with real patients and with simulated scenarios, the instructional strategy of structured reflection that we have demonstrated can be relatively easily adapted and may prove a powerful additional tool for clinical teachers. It should be noted that the strategy proved beneficial even without any additional teaching or feedback (either verbal or written). This means it does not require much investment in terms of faculty training and allocation of hours for teaching and preparation of study materials.
This study had some limitations. First, although the one-week interval between learning and test is commonly chosen in learning studies,12,22,23 we cannot ensure yet that the positive effects of reflection on learning would last longer. On the other hand, the use of the structured reflection procedure in real clinical teaching would probably comprise many more opportunities to practice than the limited practice (with two cases for each disease) in our study. Such extended practice would almost certainly tend to increase and sustain the effect of reflection. Second, in the learning phase, students in the reflection condition may have spent more time analyzing the case than those in the other conditions, and it is not possible to exclude the possibility that differences in time on task affected the results, as spending more time is inherent to reflection. It should be noticed, however, that this time was not spent studying new information, as no additional information was provided, but consisted only of reflecting on the case on the basis of one’s preexisting knowledge. Finally, the study was conducted with students who were at a particular level of training, and it is not possible to ensure that students at different levels would also benefit from structured reflection for learning. It is likely that more complex clinical cases must be used for the practice so that learning can be fostered, but this issue requires future research.
In sum, we investigated the effects of structured reflection on students’ diagnostic competence while practicing with clinical cases. Our findings suggest that reflection while diagnosing patients’ problems fosters the enrichment of relevant illness scripts and can be, therefore, a useful tool for clinical teachers to help students benefit most from practice with real or simulated scenarios. How to make this tool even more powerful requires further investigation. Possibly, combining reflection with example-based learning (i.e., studying modeling examples or worked examples), which is a powerful instructional technique24 that has not yet been widely researched in medical education,25,26 would be a promising avenue to be explored by future research.
Acknowledgments: The authors are grateful to the students who dedicated their time to participate in the study.
1. Norman G. Research in clinical reasoning: Past history and current trends. Med Educ. 2005;39:418–427
2. Eva KW. What every teacher needs to know about clinical reasoning. Med Educ. 2005;39:98–106
3. Kassirer JP. Teaching clinical reasoning: Case-based and coached. Acad Med. 2010;85:1118–1124
4. Reilly BM. Inconvenient truths about effective clinical teaching. Lancet. 2007;370:705–711
5. Eva KW, Neville AJ, Norman GR. Exploring the etiology of content specificity: Factors influencing analogic transfer and problem solving. Acad Med. 1998;73(10 suppl):S1–S5
6. Schmidt HG, Boshuizen HPA. On acquiring expertise in medicine. Educ Psychol Rev. 1993;5:205–221
7. Schmidt HG, Norman GR, Boshuizen HP. A cognitive perspective on medical expertise: Theory and implication. Acad Med. 1990;65:611–621
8. Schmidt HG, Rikers RM. How expertise develops in medicine: Knowledge encapsulation and illness script formation. Med Educ. 2007;41:1133–1139
9. Charlin B, Boshuizen HP, Custers EJ, Feltovich PJ. Scripts and clinical reasoning. Med Educ. 2007;41:1178–1184
10. Parsell G, Bligh J. Recent perspectives on clinical teaching. Med Educ. 2001;35:409–414
11. Mamede S, van Gog T, Moura AS, et al. Reflection as a strategy to foster medical students’ acquisition of diagnostic competence. Med Educ. 2012;46:464–472
12. Butler AC. Repeated testing produces superior transfer of learning relative to repeated studying. J Exp Psychol Learn Mem Cogn. 2010;36:1118–1133
13. Mamede S, Schmidt HG, Rikers RM, Custers EJ, Splinter TA, van Saase JL. Conscious thought beats deliberation without attention in diagnostic decision-making: At least when you are an expert. Psychol Res. 2010;74:586–592
14. McKenzie CRM. Taking into account the strength of an alternative hypothesis. J Exp Psychol Learn. 1998;24:771–792
15. Bowen JL. Educational strategies to promote clinical diagnostic reasoning. N Engl J Med. 2006;355:2217–2225
16. Custers EJ, Stuyt PM, De Vries Robbé PF. Clinical problem analysis (CPA): A systematic approach to teaching complex medical problem solving. Acad Med. 2000;75:291–297
17. Hatala RM, Brooks LR, Norman GR. Practice makes perfect: The critical role of mixed practice in the acquisition of ECG interpretation skills. Adv Health Sci Educ Theory Pract. 2003;8:17–26
18. Papa FJ, Oglesby MW, Aldrich DG, Schaller F, Cipher DJ. Improving diagnostic capabilities of medical students via application of cognitive sciences-derived learning principles. Med Educ. 2007;41:419–425
19. Eva KW, Hatala RM, Leblanc VR, Brooks LR. Teaching from the clinical reasoning literature: Combined reasoning strategies help novice diagnosticians overcome misleading information. Med Educ. 2007;41:1152–1158
20. Kulatunga-Moruzi C, Brooks LR, Norman GR. Teaching posttraining: Influencing diagnostic strategy with instructions at test. J Exp Psychol Appl. 2011;17:195–209
21. Wimmers PF, Schmidt HG, Splinter TA. Influence of clerkship experiences on clinical competence. Med Educ. 2006;40:450–458
22. Woods NN, Brooks LR, Norman GR. It all make sense: Biomedical knowledge, causal connections and memory in the novice diagnostician. Adv Health Sci Educ Theory Pract. 2007;12:405–415
23. Woods NN, Brooks LR, Norman GR. The role of biomedical knowledge in diagnosis of difficult clinical cases. Adv Health Sci Educ Theory Pract. 2007;12:417–426
24. Van Gog T, Rummel N. Example-based learning: Integrating cognitive and social-cognitive research perspectives. Educ Psychol Rev. 2010;22:155–174
25. Bjerrum AS, Hilberg O, van Gog T, Charles P, Eika B. Effects of modelling examples in complex procedural skills training: A randomised study. Med Educ. 2013;47:888–898
26. Kopp V, Stark R, Fischer MR. Fostering diagnostic knowledge through computer-supported, case-based worked examples: Effects of erroneous examples and feedback. Med Educ. 2008;42:823–829
Appendix 1 Example of a Case Used in a Study of 110 Fourth-Year Medical Students and Their Diagnostic Competence, UNIFENAS Medical School, Belo Horizonte, Brazil, 2012
A 47-year-old female patient presented to the outpatient clinic with complaints of severe epigastric and right upper abdominal pain. The patient was born in Belo Horizonte, where she lives. She is single and works as a salesperson. The pain started three weeks earlier, radiated to the back, and was not related to meals. One week after the first episode of pain, she noticed jaundice, choluria, and fecal hypocholia and started having pruritus. The patient reported chills in the night before presenting to the outpatient clinic, but she did not check her body temperature. She denied vomiting or weight loss, but she reported a couple of episodes of nausea after meals that resolved spontaneously over the past two months. The patient was a social drinker and denied smoking. She had no history of previous surgeries.
On physical examination the patient presented jaundice (3+/4); her temperature was 37.5°C, blood pressure 120/70 mm Hg, pulse 82 bpm, and respiratory rate 16/min. Heart and lung examination showed no abnormalities. The abdomen was nondistended, with normal bowel sounds, and mild tenderness to palpation in the right upper abdomen.