Recent literature has defined certain cognitive biases and cognitive errors that physicians are likely to encounter in everyday practice. The practitioner of emergency medicine is at high risk for these errors because of multiple factors including high decision density, high levels of diagnostic uncertainty, high patient acuity, and frequent distractions. Some authors have suggested that instructing physicians in “cognitive forcing strategies” or “metacognition” will help reduce the amount of cognitive error in medical practice.1–6
On the whole, previous efforts to teach general critical thinking skills have not been very successful. This may be because individuals are unlikely to improve much in this area despite educational efforts. One group of authors showed that even after three years of medical education there was little improvement on the Watson-Glaser Critical Thinking Assessment.7 Researchers have repeatedly found that expertise, in contrast to critical thinking skills, seems to be content specific. Teaching clinical problem solving has had mixed success because success in one clinical problem does not seem to translate into a higher likelihood of success on a different clinical problem.8
This does not mean that teaching metacognition, or “thinking about thinking,” is not a useful endeavor. Previous authors have noted the need for further exploration of this topic.9,10 On a theoretical level, previous models such as Schon’s concept of “reflective practice” have been well regarded for quite some time.11 But this concept has not been substantially put into practice and studied during its application. More recently, cognitive psychologists have focused on knowledge organization and on the issues of context and transfer mentioned earlier. If expertise is context specific, perhaps the transfer of clinical strategies in and around a particular context can be taught. This could be labeled context-specific metacognition. In effect, the effort to use cognitive forcing strategies in emergency medicine is just that.
The question then becomes how best to teach context-specific metacognition. Alternatively, one might phrase this as “What is the most efficient way to create expertise?” Many residency programs in emergency medicine use the case presentation format in several varieties. The educational impact of these vicarious experiences is difficult to assess. It is unclear whether or not these case presentations translate into more developed mental concepts of disease prototypes or if they create problems by showing only the extreme presentations of common diseases or the rare and interesting cases. Is the resident able to infer from these specific instances some of the more general strategies for dealing with related cases? Does the resident learn more from successful patient encounters (real or simulated) or from mistakes one makes during patient encounters? All residency programs stress clinical exposure to many individual instances of patient contact. But if the resident does not see enough of certain critical problems, he or she may be left with incomplete training.
It is with these questions in mind that we undertook a pilot study using high-fidelity human patient simulation. This study is meant to examine the use of very difficult cases, which are experienced in the human patient simulation lab. The test case has been designed so that most of the residents should fail to manage the case correctly, thus creating a negative instance. It was our hope that by creating a negative instance residents would be highly receptive to the educational concepts. The intention was for the participants to learn the specific cognitive error they encountered, the general class of error in which it can be categorized, and more general cognitive error theory. This could be described as inductive reasoning catalyzed by negative instantiation.
We chose a qualitative methodology for this project for several reasons. First, the number of residents available to us was small and therefore most quantitative analysis would not be subtle enough to capture the learning experience. Second, the concepts being taught are highly complex and therefore it is unclear if there are or ever will be assessment tools capable of reliable and valid assessment of this level of clinical skill. Thirdly, this project is meant in part to be a theory-building exercise. We did not set out with any certainty as to the effectiveness of this educational intervention, nor did we feel that all the nuances of performing this type of instruction could be perfected on the first try.
Setting and Population
Participants were seven postgraduate year-two (PGY2) residents and eight postgraduate year-three (PGY3) residents in an emergency medicine osteopathic residency. This study took place at Lehigh Valley Hospital in Allentown, Pennsylvania, during academic year 2002–03. Because this was an educational endeavor with no risk of harm to real patients or subjects, we received a waiver of Institutional Review Board review. All participants did give written consent to participate and be videotaped in the simulation lab as well as oral consent to participate in the posteducation interview.
The residents were exposed to the scenario one at a time. The scenario was a difficult case and it was anticipated that most of the residents would not successfully manage the patient. The scenario had built in cognitive error traps, which the resident was meant to fall into. These are discussed later in the scenario details. The scenario was standardized as much as possible so that each resident had a similar amount of time, prompts, actors, and so on. At the end of the scenario each resident had a debriefing with the lead investigator during which technical critical actions and the cognitive errors were briefly discussed in approximately five minutes. The main debriefing was a PowerPoint presentation with audio format CD-ROM with a didactic presentation on succinylcholine (15 minutes) and cognitive forcing strategies (30 minutes). The cognitive forcing strategies debriefing included information about the specific cognitive errors in the lab and a more general discussion of cognitive biases and cognitive forcing strategies.
The interview and survey were developed from a list of approximately 20 questions that we, the authors, had formulated about the potential impact of the project. These questions were then prioritized, and those that could be easily answered with a Likert or rank-order response scale were organized into an eight-question, one-page written instrument (see Appendix). The remaining questions were developed into an 11-question interview (List 1). Surveys were anonymous; the only identifying information was the date the survey was completed and residency year of the respondent. Responses were entered into a database and frequencies were tabulated for each response. Residents were interviewed by a PhD-level ethnographer (LMD) with the 11-question (15-minute) interview. All interviewees were informed that the interview transcripts would be de-identified prior to being evaluated by the lead investigator so as not to bias their responses. The interviews were then de-identified, transcribed, and placed into a NVivo qualitative research database.
Interview transcripts were analyzed for content and theme by the principal investigator (WFB) and ethnographer together. The transcripts were first read through and themes were identified. Thirty-five different themes (or nodes) were arrived at for use in a coding scheme (see List 2). The transcripts were then reviewed by a total of four reviewers, two of whom were emergency physicians (WFB and GCB) and two of whom were not (LMD, DCA). One of the emergency physicians and one of the nonphysicians (the ethnographer) were very familiar with the project and cognitive forcing strategies. The other two reviewers were less familiar with both the project and the concept of cognitive forcing strategies. The reviewers of the transcripts were to code the dialogue using the coding scheme and were allowed to use more than one node per interviewee comment. If three of the four coders agreed on a code for a comment, then that comment was kept for the next stage of analysis, which quantified the number of comments for each node.
The Survey System sample size calculator was used to determine sample size needed for adequate representation of the 225 coded transcript responses (passages) using 95% confidence level and 10% confidence interval.12 Sixty-four (28.4%) of the passages were then randomly selected for intercoder reliability analysis. Coder agreement on interpretation of response themes was analyzed using a kappa statistic.13
The scenario chosen involved a hypothetical 67-year-old female dialysis patient with renal failure. The Medical Education Technologies, Incorporated (METI) Human Patient Simulator was used as the simulation platform. The information given to the resident is that a 67-year-old woman presents to the emergency department severely short of breath. She is unable to speak more than a few words. She mainly says, “I can’t breathe” and other than that gives little history. Her medication list is available from the paramedic if the resident asks for it. She denies chest pain. The patient weighs 180 lb and is five feet six inches tall, with a short neck, and she looks as if she will have a moderately difficult airway. IV, O2, and monitor must be initiated by the resident. Vital signs are BP 180/110, P 120, RR 25–30, Pox 85% on room air. The physical exam reveals rales bilateral. There is an S3 heart sound. There is a modeled arteriovenous shunt on one arm, which the residents must find on physical examination
The resident could briefly try nitroglycerine, morphine, angiotensin-converting enzyme inhibitors, nebulizer treatments, and bilevel positive airway pressure or continuous positive airway pressure (patient does not tolerate), which improve the numbers slightly, but not quickly enough. Resident must make the decision to support the airway because the patient’s mental status begins to deteriorate and respiratory failure is ensuing. They may use midazolam or etomidate to sedate the patient for intubation. If they do not use paralytics, the resident is informed that the patient gags and is not able to be intubated without paralytics. This leads them into the first cognitive error trap, which might be viewed as either a faulty hypothesis or an omission error or both. The faulty hypothesis would be to presume the pulmonary edema is from a cardiac condition and not consider other causes such as renal failure. The omission error is not examining the patient to look for a shunt or asking about renal function prior to giving succinylcholine. The resident then gives or does not give succinylcholine to a patient with a potassium level of 6.5 mg/dL (the resident will not be made aware of the lab value until the end of the scenario).
After giving succinylcholine the resident now notices that the patient is developing a wide complex arrhythmia on the rhythm monitor. A classic widened QRS complex from hyperkalemia will now appear on the electrocardiogram. In the few who avoided giving succinylcholine, the patient was still made to develop a wide complex after more of a delay so that the scenario can lead into Part 2. The second cognitive error is also faulty hypothesis generation in that the resident must recognize that the wide complex arrhythmia is due to an electrolyte imbalance (high potassium), when the differential includes hypoxia and cardiac ischemia. In treating the wide tachycardia on the monitor, the residents should assess hemodynamic stability. If they chose to shock or to give antiarrhythmics, it would not resolve the rhythm. At this time they are allowed to get a chest x-ray, which shows pulmonary edema. Labs can be back in 15 minutes, which show a K level of 6.5. Getting the labs back will allow the residents to finish the scenario if they have not yet diagnosed and treated hyperkalemia.
The written survey results are presented in Table 1. The simulation scenario with cognitive forcing strategies debriefing was ranked second only to patient care for the overall group. When separated by classes, the PGY3s ranked simulation with cognitive forcing strategies second as well, but the PGY2s ranked supervised patient care first, simulation labs without cognitive strategies second, and simulation with cognitive forcing strategies tied for third with lecture/grand rounds. The group reported very little prior exposure to cognitive forcing strategies with a mean rating of 1.27 for the group (1 = no exposure, 5 = extensive exposure). The remaining questions were based on a Likert scale of 1–5, with 1 = disagree completely, 3 = no opinion, and 5 = agree completely. Question 3 was proposed to measure attitudes toward working with our own emergency department nurses in future simulation labs, and the response was mixed with a mean of 3.33. Question 4, regarding the stress caused by attending presence in the simulation lab, was also mixed with a mean of 2.8. Questions 5, 6, and 7 were focused on the educational effectiveness of succinylcholine information, hyperkalemia information, and cognitive strategies. All were viewed favorably, with means of 4.73, 4.6, and 4.3, respectively.
The interview transcripts were analyzed with a coding scheme outlined in List 2. It must be stated that simply because one comment was mentioned less does not imply that the opposite of that comment is true. If a comment is mentioned fewer times it may have been so for many reasons (see Discussion). Overall kappa was 0.634, indicating a substantial strength of intercoder agreement on interpretation of response themes.14 Nineteen (30%) of the 64 passages had kappa values equal to 1.00, indicating a perfect agreement among the four coders. Those passages that were agreed upon by three of the four coders are then listed by node in order of descending frequency in Table 2.
Example comments from the interviews are listed with their coding node in Table 3. The comments listed were selected based on the fact that they were prototypical or represented contrasting views in some cases. Some of the most revealing insights came from the question “What did you learn from the scenario?” because it was open-ended, and from the question “Can you explain the concept of cognitive forcing strategies?” because it asked for an explanation in their own words of a complex concept. Some of the residents made comments about using mental checklists, which can be considered a form of heuristics. Several made comments about “stepping back” and reassessing one’s thought process, which we felt constituted a clear metacognitive strategy.
Based on analysis of the interviews by the lead investigators there were some differences between the PGY3s and PGY2s. Five out of eight of the PGY3s and only one out of seven of the PGY2s were able to explain the concept of cognitive forcing strategies, or metacognition. When asked what they learned overall more of the PGY3s commented on cognitive strategies or heuristic techniques (six out of eight), whereas the PGY2s commented on the more concrete knowledge gained about succinylcholine (five out of seven) and only one of the PGY2s mentioned cognitive strategies.
Regarding error and its use as a training tool, 14 out of 15 residents viewed the experience of an error in the simulation lab as a positive experience point. Others commented positively on the opportunity to make errors without injuring patients. The themes of mistakes causing reflection or motivation recurred throughout most of the interviews. Although many residents commented that videotaping of the scenario was stressful, none of them specifically mentioned the error in management as a creator of stress.
Realism was often noted to be as good as possible given the limitations of the simulator. The simulator was missing certain cues on the physical examination, which might have made the diagnosis more obvious in real life. A few residents commented that the nurse was unrealistically helpful during the lab. The results of the interview correlated well with the written survey regarding the perceived value of the experience. All of the residents described the overall experience as positive in the interview, which would explain why they felt it second only to direct patient care on the written survey.
Creating and analyzing the qualitative interview posed several interesting challenges. We attempted to make some of the interview questions (see List 1) open-ended and nondirected to gather the perceived experience of the residents with this new educational modality (e.g., What did you learn from this scenario?). Other questions were designed to assess the impact of our particular trial of cognitive forcing strategies instruction (e.g., Can you explain the concept of cognitive forcing strategies?). Other questions were intended to assess the perceptions and preferences regarding the debriefing. The coding scheme was a collaborative effort, but certainly could leave room for loss of subtleties of the interview content. As mentioned earlier, the lack of a response by the interviewee does not necessarily indicate that the opposite of that response is true. For example, this was the first introduction to cognitive forcing strategies for the residents and they may have commented more had they been given an introductory lecture to learn the terminology of the subject. The lack of understanding may have been due to the small amount of exposure rather than to a failure of the educational technique. In some cases they seemed to understand the concept, but had difficulty verbalizing this understanding.
The quantification of the comments agreed upon by multiple reviewers brought out some other points. Clearly in both our subjective analysis and quantitative analysis of the comments the simulation lab was a positive experience for the residents. The knowledge gained about succinylcholine was felt to be both valuable and effective in its transmission in both analyses. The lab was acknowledged to be stressful by 67% of the comments quantified. However, closer examination of those comments indicates that it was primarily videotaping that was considered stressful, which seemed to diminish as the scenario progressed. Also, the feeling of being evaluated, despite reassurances that they were not being graded, contributed to stress. About half of the participants commented that they would want to review videotapes of their performance either with or without an attending physician present. Of those who wanted feedback, they primarily wanted one-on-one, face-to-face comments with the attending physician.
The quantitative analysis of the comments revealed 33% of individuals discussed cognitive forcing strategies in some form or another. This was less than the lead investigator’s subjective analysis of 47% and reflects the difficulty reviewers had on agreeing when a cognitive strategy is being demonstrated or discussed. In the lead investigator’s subjective analysis it appeared that the PGY3s were focusing on the cognitive points of the case, whereas the PGY2s were focusing on the knowledge about succinylcholine. This was useful information from an instructional standpoint, because one could argue that the same scenario can be used for multiple postgraduate years by simply changing the learning objectives. For PGY2s the learning objectives could focus on knowledge acquisition and pattern recognition, while PGY3s can look at overall case management and cognitive strategies.
The CD-ROM was then used as the main debriefing tool for several reasons. The first was that we wished to ensure that all the residents received the same knowledge didactic and the same explanation of cognitive forcing strategies. The second was logistics. By allowing each resident to do the CD-ROM independently, we did not utilize an instructor for 15–30 minutes at the end of each scenario. Third, once the CD-ROM and scenario are refined they become a stable and reusable piece of curriculum material that would allow the lab to run with minimal attending input and training. This methodology ensures that those who learn best by reading and those who learn best by listening will find the experience rewarding. The majority of residents who made a comment about it felt that it was effective and several commented that it allowed one to replay portions of the debriefing easily.
As mentioned before, many residents did seek videotape review. Future efforts at this type of education will be well served by a digital video recording format for several reasons. Most important, the digital video format would allow an instructor who had witnessed a training exercise to very quickly access that exact point in the exercise he or she wished to discuss. Currently, even with digital videotape, the several minutes of rewinding and fast-forwarding to find these key points is an unacceptable use of faculty time. With the advent and cost reduction of hard-drive video recorders and DVD recorders this issue should be resolved. Also, the digital media makes storage and review for research purposes much more convenient.
Limitations and Future Questions
This study was limited by several factors. First, the sample size of residents was small and represented only one emergency medicine residency program. Qualitative analysis at this level of detail is often performed on small sample sizes (less than 30) because of the labor intensity of interview acquisition and analysis. We felt the richness of the data was worth the risk of a lack of broader validity. Second, the scenario used was a single scenario that had not undergone rigorous prior testing for construct validity. However, this scenario was not designed to show a gradation in performance from medical student to resident to attending, but rather it was designed to force all individuals into the error trap, which it did with the exception of one member from each postgraduate year tested. In that sense it did show at least some measure of reproducibility. The scenario had face validity, in that it was created with the help of emergency medicine specialists drawing on their experience and was agreed upon as appropriate by other emergency medicine specialists.
The direction of simulation curricula in emergency medicine continues to evolve. Some institutions are building cases around the crisis resource management model drawn from aviation and anesthesia. Others are focusing on a well-rounded medical curriculum at both medical student and resident levels. We submit a third possibility of concentrating on known chances for medical error drawn from data in the patient safety arena. These might constitute systems errors manifesting at the bedside, procedural complications, or individual physician cognitive errors that are known to be problematic. By focusing on known high-risk or problematic cases, we may begin to build a more reliable and safe health care system. Quantifying the impact of this effort may be difficult, but quantification should not delay its evolution.
Our pilot data suggest that metacognitive strategies can be taught to upper level emergency medicine residents as evidenced by their ability to discuss these strategies after a simulation and debrief. More junior residents may derive value from experiencing similar scenarios, but focus on knowledge acquisition. Our cohort of residents ranked this experience second only to direct patient care for educational effectiveness, which suggests that this topic and means of instruction warrant further study. Future efforts will be needed to quantify the educational effectiveness of metacognitive instruction in the setting of simulation and debriefing.
The authors acknowledge several people for their important contributions to this report: Gina Sierzega, MA, for her assistance in manuscript preparation; Christopher Sarley for his assistance in developing the PowerPoint with audio CD-ROMs; and the staff of the George E. Moerkirk Emergency Medicine Institute for their support of the project, which was supported by the Leonard Parker Pool Healthcare Trust via a Center for Educational Development and Support Fellowship Grant. (Grant number 9200103 “Learning to Serve and Innovate”)