LaRochelle, Jeffrey S. MD; Durning, Steven J. MD, PhD; Pangaro, Louis N. MD; Artino, Anthony R. PhD; van der Vleuten, Cees PhD; Schuwirth, Lambert PhD
Clinical reasoning is at the heart of what a physician does.1 It is an example of complex learning that requires the integration, coordination, and application of knowledge and experience to arrive at a diagnosis and treatment plan specific to a patient’s circumstances and preferences.1,2 Many courses in clinical education present elements of clinical reasoning through a variety of instructional formats. The most common instructional format for developing clinical reasoning is the use of patient cases, ranging from low authenticity (paper cases or lecture discussions of patient presentations) to high authenticity (from video reenactments to standardized patient [SP] encounters to actual patient encounters).3–5 Despite the wide use of these various teaching methods, the medical education community has limited data on which methods are most effective.6 Furthermore, the education literature both suggests that not all learners are the same and identifies characteristics of learners who are low performers (novice learners) and learners who are high performers (expert learners). For example, novice learners often classify problems on a very superficial level, whereas expert learners are able to delve deeper into the structure of a problem.7 In addition, novice learners often draw on poorly organized previous knowledge, whereas expert learners have a hierarchical organization to their previous knowledge.8 In this report, we address whether more authentic instructional formats of presenting clinical cases to students are more effective for content mastery, and we further explore whether this may be true only for students functioning as expert learners.
Many medical educators presume that highly authentic instructional formats can be more effective in teaching clinical reasoning and other skills.9–12 In fact, many curricular reform efforts in U.S. medical schools have proposed providing real patient encounters earlier in the curriculum.13 In theory, these more authentic instructional formats provide needed context for learning and lead to greater student motivation.14,15 Aside from medical students often preferring more authentic instructional formats, there are limited data to indicate that these methods lead to improved educational outcomes. In addition, many medical students in the preclerkship period are probably not expert learners and may require more attention to basic concepts, which can be lost in the added clinical detail implicit in highly authentic instructional formats.16
In a previous study, we demonstrated no statistically significant improvement in educational outcomes when an entire class of preclinical students was taught with more authentic instructional formats.17 However, we did find a trend toward better educational outcomes with increasingly authentic instructional formats, indicating that authenticity may have a small positive impact but that we may have had an inadequate sample size to capture this effect. In the present study, we sought to further investigate the relationship between the authenticity of instructional format and preclerkship students’ educational outcomes. On the basis of our previous data, we hypothesized that a larger sample size would provide adequate power to detect a statistically significant improvement in preclerkship educational outcomes with more authentic instructional formats. We further hypothesized that expert learners would benefit more than novice learners from instructional formats with higher authenticity. Others have identified preclerkship grade point average (GPA) as an important predictor of success in the clerkship, and students performing in the bottom third of their class according to preclerkship GPA have been found to be at a significantly higher risk for delayed graduation and failure on the United States Medical Licensing Examination.18,19 Consistent with educational literature, we stratified our learners into three tertiles—top (expert), middle (intermediate), and bottom (novice) groups—using first-year GPA as a proxy measure for learner ability. We believe this study could identify a particular cohort, through a readily identifiable marker of knowledge and experience (GPA), who might benefit from early clinical experiences with high authenticity, thus better directing our limited resources for teaching in the preclerkship period.
The Uniformed Services University of the Health Sciences, F. Edward Hébert School of Medicine (USU), is the United States’ only federal medical school. During the time of the study (November 2007 to May 2009), USU required two years of preclerkship education, which included a yearlong course in the second year, Introduction to Clinical Reasoning (ICR), followed by two years of clerkship education. The ICR course exposes students to a series of common symptoms, laboratory findings, and syndromes in medicine. Students read in advance of mandatory small-group sessions on each subject area, using whatever resources they deem appropriate. The course is a blend of didactic and problem-based learning curricula and uses a case-based reasoning approach.
Within the normal context of the ICR course, we selected three ICR small-group subject areas/clinical problems to test our hypothesis—abdominal pain, anemia, and polyuria. The standard educational format for the ICR course includes an instructor presenting the content using paper cases, followed by a small-group discussion led by a faculty preceptor. Interdepartmental leadership at USU developed the specific learning goals for each of the cases.
We conducted a two-year, prospective, randomized, crossover study to determine the effect of instructional format on second-year medical students’ performance. All second-year students enrolled in the ICR course at USU over the period of the study (338 total) were eligible to participate, and we randomly assigned participants to one of three main study cohorts using a random number generator (see Figure 1). The course directors and the individual small-group preceptors were blinded regarding student participation in this study. The institutional review board at USU approved this study, and all students who participated signed an informed consent form.
We chose three instructional formats—paper case, DVD presentation of a doctor and patient conveying the case, and a live SP recreation of the case—to present the three selected subject areas/clinical problems. Three cases were presented per subject area (abdominal pain, anemia, and polyuria) during the 90-minute small-group session (i.e., 30-minute discussion per case). Each small group was exposed to identical content for each clinical subject area but was assigned to a different instructional format (paper, DVD, or SP; see Figure 1). Once assigned to a group, students were not able to switch groups. In this way, participants in each group received instruction for each subject area in only one of the three possible instructional formats but were serially exposed to all subject areas and all instructional formats. We gave all students copies of the paper cases to ensure the consistency of case content across all instructional formats. Additionally, we gave all small-group preceptors specific teaching points to discuss regardless of instructional format to further ensure consistency of content regardless of either preceptor expertise or instructional format.
We collected baseline demographic and academic data for all participants, and we achieved effective randomization throughout all the small groups using a random number generator. As described above, we stratified students into tertiles by first-year GPA to distinguish novice, intermediate, and expert learners.18,19 Furthermore, students are routinely separated by USU into tertiles according to overall GPA for the purposes of academic distinction (e.g., deans’ letters identifying a student as in the top third, middle third, or bottom third of his or her class), indicating the institutional significance of these groups. We collected both graded outcome measures within the normal context of the ICR course (an objective structured clinical exam [OSCE] and an essay exam) and added a nongraded outcome (video quiz) for this study. The OSCE is a six-station exam assessing a student’s ability to conduct a focused history and physical on a patient presenting with a common medical problem. Students are evaluated on the OSCE by both an SP using a standard checklist and by clinical faculty using a global rating form. The essay exam primarily assesses clinical reasoning skills and consists of three booklets, each containing three cases. Students must select one case from each booklet for a total of three cases to complete. One of the three booklets contained only cases relevant to the subject areas of our study, so that all students were required to respond to one study case. The video quiz required students to watch three DVD cases, one from each study subject area, and provide short answer responses related to a problem list and differential diagnosis, in addition to further questions to pursue with the patient. We selected these outcome measures to match the instructional formats in our study, and each outcome measure contained three unique study cases so that all students were exposed to all nine study cases without repetition between outcome measures.17
Because each outcome measure had a different total score and we collected data across two academic years, we noted reported scores on all outcome measures as the percentage of total points and standardized those scores using a Z transformation (mean of 0 and standard deviation of 1) to facilitate a comparison of teaching methods across outcome measures and academic years. Because the three cohorts had differing numbers of students, post hoc analysis was accomplished using the Tukey HSD (honestly significant difference) test as a more conservative method to determine the statistical significance of a single instructional format amongst the numerous other comparisons. We used analysis of variance on cohort demographic data and outcome measures to determine the effectiveness of our randomization procedure. We used Student t test for comparisons of less than three different groups and chi-square test for proportional data. We completed all statistical analyses using SPSS version 12 (Chicago, Illinois).
A total of 233 of 338 eligible students (69%) initially signed up to participate in our study over both academic years. However, we did not include 5 of those students in our study because they were repeating the ICR course, leaving us with 228 participants (67%). We found no significant differences in each of the main study cohorts by gender, GPA, or by representation within each GPA tertile, indicating that our randomization was effective (see Table 1). Further, we found no significant differences in overall outcome performance based on assignment to a particular study cohort. Students in the top GPA tertile performed significantly better on study outcomes than those in either the middle or bottom tertiles (Z scores of 0.118, −0.038, and −0.201, respectively), and students in the middle GPA tertile performed significantly better on study outcomes than those in the bottom tertile (see Table 1). Additionally, we found that female students outperformed their male counterparts (Z scores of 0.045 versus −0.055, P = .055) on combined study outcomes (see Table 1), but this difference did not meet statistical significance. This difference in scores between genders was not explained by either differences in first-year GPA or gender distribution in GPA tertiles.
Comparisons across instructional formats revealed no significant differences for any of the educational outcomes individually or combined. However, we identified trends toward improving scores for each individual educational outcome and for the combination of all outcomes with increasing authenticity of the instructional format (see Figure 2).
To better assess the impact of the instructional formats on the learner, we stratified the combined educational outcomes into tertiles according to students’ first-year GPA as described previously. Students in the top tertile outperformed students in the bottom two tertiles regardless of instructional format (see Figure 3). Although not significant, scores for students in the top tertile improved as the authenticity of the instructional format increased (Z scores of 0.023, 0.188, and 0.138, respectively), whereas scores for students in the middle tertile appeared to decrease (0.043, −0.119, and −0.045, respectively). Students in the bottom tertile did significantly worse than both those in the top and middle tertiles on the paper cases (−0.374 versus 0.023 and 0.043, respectively), with an effect size of 0.397 and 0.417, respectively (see Figure 3). Surprisingly, scores for students in the bottom tertile improved as the authenticity of the instructional format increased, bringing the group’s level of performance equal to that of the middle tertile (see Figure 3).
Despite the increased sample size and additional power to detect differences, we could not demonstrate that increased authenticity of the instructional format resulted in improved learner performance on the educational outcomes that we measured. Although these findings seem to argue against the prevailing belief that more authentic instructional formats are superior, there are several possible explanations consistent with current educational theories.
First, contemporary educational theories do not agree on the impact of authenticity on learning and demonstrate that the novice learner may not always benefit from highly authentic instructional formats. Cognitive load theory (CLT) addresses limitations in human cognitive architecture and the instructional strategies that can be used to overcome these limitations. Its central tenet is that humans can only hold and manipulate a few pieces of information (typically 7 ± 2) in their short-term or working memory.20 To manipulate information more effectively, it has to be stored in the long-term memory.20 In CLT, three types of cognitive load are distinguished: intrinsic load, extraneous load, and germane load. Intrinsic load refers to the interaction between the nature of the to-be-learned materials and the level of expertise of the learner. It largely depends on the number of elements that must be simultaneously processed in the working memory. For example, solving an anagram of nine letters (tnicriisn to intrinsic) is more complicated than solving an anagram of three (hte to the). Extraneous load refers to the attention-requiring elements of the problem that are not innate to the problem itself but that surround it and can be caused by instructional formats that require learners to search for information. As such, extraneous load can be altered by instructional interventions, and highly authentic instructional formats often lead to a significant increase in extraneous load for the novice learner.21 For example, students may become overly concerned with the human aspect of an encounter (i.e., patient emotions, personal fears), subsequently overwhelming the learning experience. Finally, germane load is associated with processes that are directly relevant to learning, such as the construction of mental models and automation, and are considered essential for learning. In other words, germane load involves the use of what a student has learned and his or her working memory to build more advanced schema, which allows the student to move from novice to expert. However, if the intrinsic and extraneous loads are too much, then there will be few cognitive resources left for germane processes and, ultimately, learning. Thus, the increase in the cognitive load that accompanies more authentic instructional formats may have mitigated the positive impact that these formats had on the educational outcomes we measured, and future studies would benefit from directly measuring the cognitive load of each instructional format.
In juxtaposition to CLT, one might consider that the degree of a student’s emotional engagement when working on or with a particular instructional format may be as important (or potentially even more important) in overall learning as cognitive load.22 Theories of emotional engagement posit that the more engaging and relevant the information is to the trainee, the more motivated the trainee will be to learn, leading the trainee to dedicate more effort and cognitive resources to the task (thus improving learning). Therefore, cases presented as live encounters with SPs should be a superior instructional format to cases presented in a paper format using only key features. Viewed from this theoretical perspective, our data suggest both that there may be a complex interaction of competing forces that impact students’ performances on educational outcomes and that simply increasing authenticity is not a panacea. Further, an important theoretical and practical question for medical educators is how to balance the potential impact of cognitive load and emotional engagement to best foster the development of clinical reasoning.
Our second hypothesis was supported, though not proven, and is consistent with CLT; specifically, higher-performing students did demonstrate a trend toward a positive response to more authentic instructional formats. Although this positive response within the top tertile was not statistically significant, there was a clear difference when compared with the lower tertiles, where the middle tertile went from an equal standing with the top tertile to performing significantly worse as authenticity increased. This finding suggests that higher-performing students may be better equipped to handle the potentially higher cognitive load associated with more authentic instructional methods, whereas that same high cognitive load may limit the performance of students in the middle and bottom GPA tertiles. Of note, the students in the bottom tertile performed dramatically worse on the paper cases—with effect sizes near 0.4—in comparison with the top and middle tertiles. However, the scores of the students in the bottom tertile actually improved (to the level of the middle tertile) with increases in the authenticity of the instructional format, which would counter the main tenets of CLT. However, there is evidence that novice learners do benefit from a redundancy, or repetition, of material in different formats.23 Because all learners in our study received the paper case as a reference regardless of instructional format, the novice learners may have had the structure, or beneficial repetition, that they needed to engage the more authentic encounters and, subsequently, raise their level of performance.
Our findings have several implications. First, providing instruction in the form of paper cases alone may leave struggling students at a disadvantage compared with their higher-performing classmates; the addition of more authentic instructional formats may have a positive impact on this group. Given limitations in faculty time and other human resources, it may not be feasible to modify the curriculum to meet the specific needs of individual students. However, medical schools should consider presenting both authentic encounters (video presentations may be of sufficient benefit) to better engage all students and organized paper case versions. In this way, students will have an initial reduction in element interactivity (intrinsic load) followed by the progressive introduction of more highly authentic instructional formats.24 Second, we recommend caution when introducing clinical experiences early in the curriculum without well-defined learning objectives, because it could be expected that actual clinical patients, the most authentic instructional format, could introduce an even higher cognitive load and potentially hinder students’ gaining the essential skill of clinical reasoning. The potential benefits from authentic instruction, such as in the formation of a professional identity and the development of interpersonal skills, might be better separated from clinical reasoning skills acquisition. This separation would allow for a more robust development in all skill sets by appropriately mitigating cognitive load, which is especially important as many medical schools shorten the length of the preclerkship period and encourage earlier exposure to clinical encounters. If educators are to use highly authentic instructional formats (e.g., seeing actual patients), they should both give learners clear guidelines regarding how to focus on relevant aspects of the clinical case and limit the element interactivity for these complex encounters.
Our study has several limitations. First, it was a single-institution study looking at only three subject areas within the context of the ICR course. A larger sampling of students from other institutions and across more subject areas may further elucidate the relationship between instructional format and educational outcomes. Second, we used the cognitive-based educational outcomes that are currently available within the ICR course and may have missed other potential benefits associated with providing more authentic instructional encounters. We do not have specific reliability and validity data for these outcome measures; however, there is a clear distinction between the GPA tertiles based on these outcome measures, which provides some evidence of validity (see Table 1). Third, it is likely that both cognitive load and emotional engagement change with increasing authenticity of the instructional format. However, we did not directly measure cognitive load or emotional engagement in this study. Further work should attempt to capture both of these constructs in an attempt to verify the interpretations we have provided here. Fourth, we recognize that there is no consensus in the community on how to best subdivide study participants into novice, intermediate, and expert learners and that using different-sized groups (quartiles, quintiles, etc.) may have affected our results. However, we believe that our use of tertiles was both institutionally relevant and consistent with previous studies in the medical education literature. Finally, we were able to tightly control the instructional formats and standardize the information provided to students within the study; however, we did not seek to control the study method or materials that students accessed outside the ICR course, which may have had an influence on the educational outcomes that we measured.
Our study has several strengths, including our use of a prospective, randomized design, across two academic years with nearly 70% of all medical students enrolled in the ICR course participating. We also used multiple educational outcome measures that matched the instructional formats we studied. Our finding that higher-performing students benefit from more authentic instructional formats is consistent with both CLT and emotional engagement theory. Taken together, our findings suggest that the tenets of cognitive load and emotional engagement theories function together in a complex manner and that there may be some benefit to tailoring preclerkship clinical education on the basis of students’ ability.
Other disclosures: None.
Ethical approval: This study was reviewed and approved by the institutional review board at the Uniformed Services University of the Health Sciences, F. Edward Hébert School of Medicine.
Disclaimer: The opinions expressed in this report are solely those of the authors and do not reflect the official policies of the Department of Defense, the United States Navy, the United States Air Force, or other federal agencies.
1. Higgs J, Jones M, Loftus S, Christensen N Clinical Reasoning in the Health Professions. 20083rd ed Philadelphia, Pa Elsevier
2. Cervero RM Effective Continuing Education for Professionals. 1988 San Francisco, Calif Jossey-Bass
3. Carter MB, Wesley G, Larson GM. Lecture versus standardized patient interaction in the surgical clerkship: A randomized prospective cross-over study. Am J Surg. 2006;191:262–267
4. Davidson R, Duerson M, Rathe R, Pauly R, Watson RT. Using standardized patients as teachers: A concurrent controlled trial. Acad Med. 2001;76:840–843
5. van Zanten M, Boulet JR, Norcini JJ, McKinley D. Using a standardised patient assessment to measure professional attributes. Med Educ. 2005;39:20–29
6. Ericsson KA. Deliberate practice and acquisition of expert performance: A general overview. Acad Emerg Med. 2008;15:988–994
7. Sweller J. Cognitive load during problem solving: Effects on learning. Cogn Sci. 1988;12:257–285
8. Eylon B, Reif F. Effects of knowledge organization on task performance. Cogn Instr. 1984;1:5–44
9. Littlewood S, Ypinazar V, Margolis SA, Scherpbier A, Spencer J, Dornan T. Early practical experience and the social responsiveness of clinical education: Systematic review. BMJ. 2005;331:387–391
10. Dornan T, Littlewood S, Margolis SA, Scherpbier A, Spencer J, Ypinazar V. How can experience in clinical and community settings contribute to early medical education? A BEME systematic review. Med Teach. 2006;28:3–18
11. Dammers J, Spencer J, Thomas M. Using real patients in problem-based learning: Students’ comments on the value of using real, as opposed to paper cases, in a problem-based learning module in general practice. Med Educ. 2001;35:27–34
12. Howe A, Anderson J. Involving patients in medical education. BMJ. 2003;327:326–328
13. Smithson S, Hart J, Wass V. Students’ hopes and fears about early patient contact: Lessons to be learned about preparing and supporting students during the first year. Med Teach. 2010;32:e24–e30
14. van Merriënboer JJG, Kirschner PA, Kester L. Taking the load off a learner’s mind: Instructional design for complex learning. Educ Psychol. 2003;38:5–13
15. van Merriënboer JJ, Sweller J. Cognitive load theory in health professional education: Design principles and strategies. Med Educ. 2010;44:85–93
16. Kirschner PA, Sweller J, Clark RE. Why minimal guidance during instruction does not work: An analysis of the failure of the constructivist, discovery, problem-based, experiential, and inquiry based teaching. Educ Psychol. 2006;41:75–86
17. La Rochelle JS, Durning SJ, Pangaro LN, Artino AR, van der Vleuten CP, Schuwirth L. Authenticity of instruction and student performance: A prospective randomised trial. Med Educ. 2011;45:807–817
18. Gonnella JS, Erdmann JB, Hojat M. An empirical study of the predictive validity of number grades in medical school using 3 decades of longitudinal data: Implications for a grading system. Med Educ. 2004;38:425–434
19. Roop SA, Pangaro L. Effect of clinical teaching on student performance during a medicine clerkship. Am J Med. 2001;110:205–209
20. Sweller J, van Merriënboer JJG, Paas FGWC. Cognitive architecture and instructional design. Educ Psychol Rev. 1998;10:251–296
21. Sweller J. Cognitive load theory, learning difficulty, and instructional design. Learn Instr. 1994;4:295–312
22. Bransford JD, Brown AL, Cocking RR How People Learn: Brain, Mind, Experience, and School. 2000 Washington, DC National Academy Press
23. Yeung AS, Jin P, Sweller J. Cognitive load and learner expertise: Split-attention and redundancy effects in reading with explanatory notes. Contemp Educ Psychol. 1998;23:1–21
24. Pollock E, Chandler P, Sweller J. Assimilating complex information. Learn Instr. 2002;12:61–86