Providing emergency care to children presenting with acute, life-threatening illness poses many challenges. Pediatric emergencies are rare, and severe illnesses present differently in children than in adults. Improvements in medical care have reduced the prevalence of severe childhood illness, producing an environment where experienced providers and trainees encounter life-threatening childhood illness infrequently, even in busy tertiary care centers. In addition, most severely ill children present to general emergency departments rather than tertiary care pediatric facilities.1 This means the providers most likely to care for children with severe illness have limited experience in delivering pediatric emergency care.2,3 Developing and implementing new methods to train emergency physicians is necessary to ensure high-quality care for children in acute care situations.
Medical educators have used high-fidelity human patient simulation to address experiential deficits in medical education4,5 and in clinical pediatrics.6 Simulation allows educators to provide standardized learning experiences in contrast to the uneven quality of most clinical education. Simulation-based education focuses on the learner without risk to patients. It also fulfills the need for the deliberate practice that is essential for clinical skill acquisition and maintenance.7
Medical accreditation bodies acknowledge the value of simulation in clinical education. The Accreditation Council for Graduate Medical Education includes simulation in its suggestions for clinical competence evaluation.8 The Residency Review Committee for Emergency Medicine similarly endorses simulation, stating that procedural competency may come from “clinical practice or laboratory simulation” (emphasis added).9
We reasoned that a simulation-based, pediatric emergency medicine (EM) curriculum targeting EM residents would result in improved resident performance in simulation-based, pediatric-focused assessment. This report describes the development of a pediatric EM curriculum, consisting of six instructional and three evaluation simulation scenarios. We also report the results of a controlled evaluation study of curriculum outcomes. This report includes outcome data on the reliability of scores derived from checklist rating instruments and discusses the validity of inferences the scores permit.
We conducted this three-year project in two phases. Phase 1, the development phase (March 2005 to August 2006), involved generating and pilot-testing the curriculum and its associated assessment instruments. Phase 2, the validation phase (August 2006 to April 2007), addressed curriculum implementation and evaluation via a randomized, controlled trial.
Phase 1: Curriculum development
We first began developing the curriculum by convening a group of pediatricians and education experts who framed the curriculum and set its goals. The primary goal of this development and research group was to provide EM learners a systematic approach to seriously ill children having poorly defined medical conditions. The group used the “ABCDE” (Airway, Breathing, Circulation, Disability, Exposure/Environment) mnemonic for care of the ill child as employed in the Pediatric Advanced Life Support (PALS) resuscitation course as a foundation for this systematic approach.10 We developed a content map to guide case construction.11 The content map covered general assessment domains (airway, breathing, circulation) and specific clinical specialties (e.g., cardiology, pulmonary, endocrinology). We designed cases to represent pediatric EM problems with similar features and to address learning objectives in the content map. This approach balances subject areas in instruction and evaluation. On the basis of this process, we selected four clinical problems to address—(1) infant in shock, (2) tachycardia, (3) altered mental status, and (4) trauma—in the simulation-based pediatric emergency curriculum. We formatted the curriculum as a set of six instructional and three evaluation case scenarios. The simulation-based pediatric EM curriculum is shown in Chart 1.
During curriculum development and testing (Phase 1), we assessed 10 EM senior residents using a set of four pediatric simulation instruction and evaluation cases created in a previous project.6 This baseline assessment occurred early in the developmental phase, 15 months before Phase 2. We conducted this assessment to calibrate the difficulty of the new case set and to inform our evaluation instruments.
Each curriculum case contains a simulation flow diagram, scripts for use by actors playing a nurse and a parent or historian, and an instructor’s guide. The instructor’s guides contain case learning objectives and key discussion points. (Case materials are available at http://pediatrics.patientsimulation.net/emsct/.) Our expert faculty reviewed each case iteratively, first with academic faculty from general EM and pediatric EM and then with faculty from medical education. Three months before Phase 2, we pilot-tested the instructional cases with a new group of five senior pediatric residents. Feedback from both study personnel and residents inspired revisions that improved case flow and content. Academic faculty reviewed and approved the final products, completing the instructional case development.
We developed our evaluation instruments in parallel with the instructional cases. The development and research group designed measures having key actions (e.g., orders normal saline bolus) within the cases that are congruent with the educational facets (“ABCDE” concepts and approach to the ill child). Evaluation cases are similar but not identical to instructional cases (Chart 1). This purposely overlapping approach assesses learner ability to generalize from instructional cases to variant evaluation cases: cardiomyopathy, beta-blocker overdose, and motor vehicle collision. We used the same iterative development and pilot-testing cycle for the evaluation case development as was used for the instructional cases. The cases included dichotomous, unweighted checklists containing 37 to 61 items designed using the method described by Stufflebeam.12
Each case runs about 15 minutes and is followed by a 15- to 30-minute structured debriefing and short downtime for the next case preparation. The six instructional scenarios take approximately six hours to complete. A facilitator and two associates (a nurse and a parent or paramedic) manage the simulation lab. A simulation technician runs all cases from an adjacent room in communication with the facilitator. The debriefing format includes formal content provided by a single facilitator (M.D.A.) using PowerPoint slides to ensure consistent content for all learners. The facilitator also provides specific feedback about individuals’ performances and reinforces the central “ABCDE” theme during each debriefing.
We trained a nurse and a nurse practitioner to be our evaluation raters. Their training included a review of the case content and objectives, practice rating sessions, and a discussion. Rating practice sessions included observing and scoring videotaped pilot group performances. Expert faculty then reviewed the practice session videos with the raters to provide raters with feedback about their scoring and to help clarify unclear items. Raters also participated in the final cycle of instrument revision.
Phase 2: Curriculum evaluation
We used a two-group, randomized trial design with a wait-list control condition.13Figure 1 depicts the CONSORT participant flow diagram for the study.14
Study participants (N = 81) were categorical EM residents from two large tertiary care residency programs located in Chicago, Illinois: Northwestern University (NU) and the University of Illinois at Chicago (UIC). The NU EM program lasts four years. The UIC EM program covers three years. All 81 residents were required to participate in all facets of the educational program. We obtained the residents’ consent to include their data in research reports, and we compensated them for their time with a $150 or $200 stipend and parking reimbursement. The IRBs of both universities and the participating hospitals approved this study.
Simulator and setting.
We used a high-fidelity human patient simulator, the Laerdal SimBaby mannequin (Laerdal Medical Co., Wappingers Falls, New York). The infant simulator has respiratory movement, heart sounds, and peripheral pulses. The mannequin can be intubated endotracheally, ventilated, and receive intraosseous and intravenous access. It can also measure and respond to participant interventions (e.g., automatically producing a change in cardiac rhythm after detecting a defibrillation shock). A patient monitor displays physiological parameters. For all cases, a simulator operator adjusts physiological parameters (respiratory rate) and monitors values (pulse oximetry) based on a predetermined case flow diagram. A physician–facilitator and a nurse are present at all instructional and evaluation sessions to provide additional prescripted stimuli as needed and to increase the realism of the case.
The simulation laboratory, located in the Center for Simulation Teaching and Research (CSTAR) at the Evanston Northwestern Hospital, Evanston, Illinois, was formerly an affiliate of NU and one of the training sites for the NU EM residency program. CSTAR provides a controlled environment for medical education and evaluation. For this program, we modified CSTAR’s simulation space to look like an emergency department treatment area including props and equipment (e.g., patient monitor, crash cart, gurney, equipment carts, a defibrillator) as environmental cues.
We began implementing the curriculum in August 2006 and completed the evaluation for this study by April 2007. We assigned residents randomly with stratification by program and postgraduate year (PGY) to either the intervention group or the wait-list control group (Figure 1). The intervention group received the instructional curriculum during 1 of 10 teaching sessions scheduled across a six-week period from August to September 2006. We asked the residents to select a training session from among these 10 dates during this period to suit their schedules. Therefore, session size depended on trainee scheduling. Two to six residents at various training levels participated in each session.
Participants worked in teams of two to three residents; if the team was larger than three, we split the team in two. When more than one team participated in a session (which occurred in all but four instances), the teams took turns either actively participating in the simulation or closely observing the other team via closed-circuit television. We directed the active team to choose a team leader before each case.
We evaluated all participants in one-hour individual sessions covering the three evaluation cases during a 12-week period beginning in October 2006. We assigned evaluation case order randomly. Each evaluation case lasted no longer than 20 minutes. As in the instructional sessions, a nurse, a historian/parent, and the facilitator were in the simulation lab with the participant. Evaluation cases followed a script, and we provided additional information using the scripted content only. We directed the participant to think aloud during each case. At the end of each case, we asked the participant to give a telephone summary to simulate transfer to a large pediatric hospital. This transfer sign-out allowed the participants to voice any internal frames of reference (e.g., differential diagnostic thought processes or reasons for specific management decisions) that they had not verbalized during patient care. We did not give feedback after this first evaluation session because we did not want feedback to confound the effectiveness of the intervention. All residents completed a questionnaire that asked about their previous simulation and clinical care experiences as well as specific training courses such as PALS or Advanced Pediatric Life Support (APLS). We specifically asked this last question because some variation in exposure to these courses existed among residents between and within training programs.
We conducted instructional sessions for the wait-list control group during nine sessions beginning in January 2007, and we evaluated all participants a second time beginning in March 2007. Again, we provided no feedback during evaluation sessions.
After initial enrollment and randomization, five participants in each study arm could not, for scheduling or clinical load reasons, attend their group’s education session. This occurred despite multiple notifications to the participants and the endorsement of both training program directors. In each case, we made a swap with a member of the other group.
We provided feedback only after the second cycle of evaluation. Feedback included a review of videotaped performance using a form that provided the resident’s item-by-item responses for each session and a comparison both with his or her PGY group and with all participants. We offered all graduating seniors one-on-one feedback before graduation. All other residents received either individualized or group debriefing.
Primary outcome measures were summary checklist scores obtained on each of the three evaluation cases during the two evaluation sessions. We calculated the percentage of checklist items completed correctly for use as a summary score.
Evaluation sessions involved two raters, both blind to participating residents’ study group assignments, who viewed each simulation either live via closed-circuit television or on videotape. Each rater completed the case checklist independently. The raters recorded scores using computer-based forms to reduce data errors. We encouraged the raters to request alternate camera angles or to ask facilitators clarifying questions via headset radios.
We calculated our statistical power on the basis of the maximum possible number of participants in the two programs (81). We expected a 20% posttraining improvement to occur, with standard deviations similar to those in previous work.5 In this analysis, 41 residents per group were sufficient to detect this difference with 80% power.
Our data analysis involved four steps: (1) estimating rater reliability using the case 2 intraclass correlation coefficient (ICC),15 (2) evaluating within-group and (3) between-group differences, and (4) assessing statistical interactions using a mixed-model analysis of variance (ANOVA). Mixed models provide more appropriate effect estimates than traditional approaches such as multivariate ANOVA.16 We used mixed-effects models to assess the effect on summary score by six variables:
- group assignment,
- previous PALS or APLS training,
- rater, and
- training program.
We conducted all analyses following an as-treated approach, and we performed our statistical analyses with Stata 9.2 (College Station, Texas) and SAS 9.1 (Cary, North Carolina).
Of 81 residents, 77 (95%) completed at least one assessment and 69 (85%) completed both assessments. Participating residents’ self-reported experience providing emergent care for children was limited before participating in the study. Twenty-one percent (16/77) reported having observed at least one infant arrest, with 8% (6/77) reporting that they played a management role. Forty-two percent (32/77) reported having observed at least one arrest of an older child (aged ≥1), with 19% (15/77) reporting they played a management role. Experience with infant arrest was not associated with more postgraduate training. Table 1 provides other participant characteristics.
Resident performance data are presented in Table 2. Within-group comparisons evaluate the change in the individual study group (intervention or wait-list control) scores across evaluation sessions. For the cardiomyopathy and beta-blocker overdose cases, we found a statistically significant improvement in the wait-list control group’s performance between sessions. The intervention group showed no difference for these two cases between sessions but did show significant improvement in the motor vehicle collision case.
Between-group comparisons evaluate the difference between the intervention and wait-list control group scores within each evaluation session (evaluation session 1 and evaluation session 2). We found no difference in scores between study groups at session 1. At session 2, there was a statistically significant difference in summary scores between the intervention and wait-list control groups for the cardiomyopathy and beta-blocker overdose cases but not for the motor vehicle collision case. The group × session interaction was significantly associated with the cardiomyopathy and beta-blocker overdose cases, reinforcing the contrast between the first session, in which training showed no effect, and the second, in which the wait-list control improved.
Reliability coefficients ranged from 0.78 to 0.89 for session 1 and from 0.83 to 0.92 for session 2, indicating high levels of interrater agreement. Reliability coefficients were consistent across case and session, as shown in Table 2.
Table 3 shows case scenario performance by PGY. There is a statistically significant improvement in scores with increasing experience, although improvement was small. There are no significant differences in scores associated with previous PALS or APLS training or between training programs (NU versus UIC).
Reallocation of participants did not change data analysis outcomes. Reanalysis using an intention-to-treat approach or with the exclusion of reallocated individuals left results unchanged.
The objective of this study was to develop a simulation-based pediatric EM curriculum and to evaluate curriculum effectiveness using a randomized trial. Our evaluation instruments produced reliable and valid data for the study. Results from the evaluation phase demonstrate a limited effect from the instructional intervention, with statistically significant yet modest improvement in second-session scores for two of three evaluation cases. We believe our study outcomes provide valuable lessons, chiefly because the impact was limited. These lessons can inform educators when planning simulation-based medical education programs.
What are factors that led to our findings? This study has two main components: (1) curriculum development and delivery and (2) simulation-based assessment. We will consider how facets of each component may have contributed to our outcome.
Curriculum development and delivery
Our development process produced six instructional cases, resulting in a simulation-based pediatric emergency curriculum that was both sensible and grounded clinically as a result of the participation and extensive critique by panels of pediatric and general emergency physicians as well as medical educators. Each case scenario was followed by a structured debriefing led by a single instructor (M.D.A.) who has six years of experience with simulation-based instruction. We reviewed and pilot-tested the debriefing material in the same manner as the cases to ensure uniform content for all participants.
We decided to use a highly structured debriefing format rather than the learner-centered format typically employed in scenario debriefing. Learner-directed debriefing, by design, lacks consistency across multiple instructional sessions because different groups are likely (in fact, are encouraged) to focus on different topics of interest. We standardized scenario debriefing to reduce variation and boost the consistency of study results.
We also withheld feedback within evaluation cases until after the second evaluation session. This ensured that we could attribute any measured learning improvements to the instructional intervention. Immediate postsimulation feedback is a feature of the deliberate practice model.7 Use of an instructor-directed learning style and delayed feedback may have weakened the overall intervention’s impact. We made these decisions to narrow the focus of our study to the instructional intervention.
The decision to reduce the frequency and intensity of simulation-based instruction to conform to participants’ schedules was the key variable that affected program outcomes. We set out to provide a strong, focused intervention, following the work of Cordray and Pion17 and McGaghie and colleagues.18 These authors demonstrate a dose–response relationship between the intensity of instructional treatments and learning outcomes. Further, we intended to provide repeated sessions over time to allow for distributed and deliberate practice.7,18 However, as the project advanced, achievement of instructional and evaluation research goals at a site distant from residency education conflicted with resident availability. Both residency programs signed on with the intent to fully participate, and to the greatest extent possible, each program did so. However, after repeated conversations with the program directors, we decided that the single-day format was the only practical approach that could be implemented. Scheduling residents even with this abbreviated format was still compromised by missed appointments, late arrivals, and last-minute changes. We believe that the single training session did not provide an instructional dose strong enough to produce a meaningful effect.
We also set out to translate work on acquisition of specific skills (e.g., Wayne and colleagues’5 research on Advanced Cardiac Life Support) to a broader clinical curriculum. We started with a specific set of goals, produced a content map, and then developed representative cases. We had not expected the inequality in the amount of time required to cover each case in the debriefing. One case addressing diabetic ketoacidosis required more instructional effort because we found our participants were unfamiliar with the difference in management of this illness in children as compared with adults. Spending this extra time reviewing this material may have distracted our participants from the focus of our efforts, the ABCDEs of initial management. In this problem we are again reminded of the importance of Cordray and Pion’s17 work—that is, an intervention must be strong and focused to be effective.
Meaningful performance assessment stems from rigorous measures which yield reliable data that permit valid research inferences. We demonstrated a high degree of data reliability from our instruments across raters. The rigorous checklist development process, with baseline and pilot testing after expert critique, supports content validity.19 The trend of increasing scores with higher levels of training demonstrates discriminant validity.
The instructional intervention produced modest gains following the second training session. This pattern of educational gains later in the academic year (group × session interaction) was also noted in work by Butter and colleagues.20 This pattern suggests that learners need to acquire basic skills as a prerequisite to more advanced training. Such research outcomes are also a marker for an insufficiently robust instructional intervention.21 Motor vehicle collision scores improved for both groups between the first and second assessments in contrast with the other two cases. This may reflect the higher volume of trauma care that EM residents provide compared with acute medical care. Trauma management varies less between adult medicine and pediatrics compared with medical management.
The overall scores were lower than our developmental work led us to expect. Low scores were not the result of excessively difficult items missed by many participants, suggesting that items were unfair or unrealistic. Nor did the participants achieve all key items only to miss less important items. While we neither weighted our checklist nor chose key items a priori, we did not detect a pattern in which participants achieved key items in a pattern different than other items. The lower scores do not reduce the validity of the data for our population. They simply demonstrate that the performance level was below expectations.
Another program development decision warrants attention. Previous work suggested that trainees may not transfer learning when presented with modified evaluation cases.22 Thus, we chose cases that were linked but not identical to the instructional cases (Chart 1). Case goals for the instruction and evaluation case were highly concordant, preserving the ABCDE focus of our curriculum. We felt that had the results shown a positive outcome, this additional step would support the generalizability of our findings. Instead, using linked cases rather than assessing performance on the same cases may also have contributed to our limited results.
Our group runs two active simulation programs, serving a pediatric and an adult tertiary care facility with large faculty, staff, and trainee populations. As simulation has grown, so have the requests for training. We frequently receive requests to support low-intensity or single-day sessions, even though we express concerns that these events do not effectively produce significant, sustained learning. This study provides concrete evidence of the pitfalls of insufficient training.
In a systematic review of the simulation-based education literature, Issenberg and colleagues23 described 10 aspects of simulation interventions that lead to effective learning.
- Feedback is provided during the learning experience;
- Learners engage in repetitive practice;
- Simulation is integrated into an overall curriculum;
- Learners practice with increasing levels of difficulty;
- Simulation is adaptable to multiple learning strategies;
- Simulation provides for clinical variation;
- Simulation learning occurs in a controlled environment;
- Simulation provides individualized learning;
- Outcomes or benchmarks are clearly defined and measured;
- The simulation is a valid and appropriate learning tool.We conducted a well-designed, rigorous intervention using assessment tools that produce reliable and valid data. However, because of a variety of local constraints, this study fell short on the first two items: providing (1) focused feedback and (2) consistent, deliberate practice. Consequently, the study did not demonstrate robust learning outcomes. Given this experience, we suggest another item for effective simulation use:
- Commit sufficient resources (time, staff, and learner availability) to ensure the success of a simulation-based instructional effort.
Leaders of simulation programs, curriculum planners, and medical organizations should recognize that limited efforts, while easier to implement and less costly in time and resources, are likely to produce limited educational gains. Simulation-based medical education cannot be done “on the cheap.”21
Researchers should not use our results as an excuse to limit future investigation of narrowly focused simulation interventions in an effort to ensure clear educational outcomes. Doing so will limit simulation to the sidelines of medical education, to be used for training of procedures and resuscitation algorithms. Instead, we hope this work serves as a call to design and conduct new research that will demonstrate successful approaches to integrating simulation-based education into the general medical education curriculum.
This project produced a valid, reproducible pediatric EM instructional and evaluation curriculum resulting in modest educational gains only for residents trained later in the academic year. More frequent and focused instructional intervention is required, with the commensurate increase in time and effort, to achieve substantial performance improvements.
This work was supported by funding from the Department of Health and Human Services, Health Resources and Services Administration’s Maternal and Child Health Bureau under its Targeted Issue Grant Program (CFDA 93.127).