Comparative Effectiveness of Technology-Enhanced Simulation Versus Other Instructional Methods: A Systematic Review and Meta-Analysis : Simulation in Healthcare

Journal Logo

Review Article

Comparative Effectiveness of Technology-Enhanced Simulation Versus Other Instructional Methods

A Systematic Review and Meta-Analysis

Cook, David A. MD, MHPE; Brydges, Ryan PhD; Hamstra, Stanley J. PhD; Zendejas, Benjamin MD, MSc; Szostek, Jason H. MD; Wang, Amy T. MD; Erwin, Patricia J. MLS; Hatala, Rose MD, MSc

Author Information
Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare 7(5):p 308-320, October 2012. | DOI: 10.1097/SIH.0b013e3182614f95

Abstract

Heightened sensitivity to patient safety has led to enhanced safeguards, standardization of best practices, and recognized need for new models for training health professionals. Authors have proposed that technology-enhanced simulation affords health professionals the opportunity to train in an environment that does not compromise patient safety1 and allows educators to structure experiences to encourage deliberate practice, targeted assessment, feedback, and reflection.2

In recent years, the volume of research on technology-enhanced simulation training has grown substantially, and evidence syntheses are increasingly important. A synthesis focused on the comparative effectiveness of technology-enhanced simulation in relation to other active educational interventions would help educators, administrators, and learners judge the effectiveness of simulation activities and determine what makes them more or less effective. Although several systematic reviews2–6 have provided useful syntheses, these reviews included few or no studies making comparison with nonsimulation instruction3–6 or failed to make such distinctions.2 Most opted not to conduct quantitative pooling (meta-analysis) to derive best estimates of effect. The 2 reviews that did use meta-analysis focused on narrow topical areas—one on deliberate practice training in comparison with no intervention6 and the other on virtual reality training for laparoscopic surgery.5 In a more recent review of 609 studies,7 we found that, in comparison with no intervention, technology-enhanced simulation is associated with large effects on outcomes of knowledge, skills, and behaviors and moderate effects on patient care. However, in that review, we did not include studies making comparison with other active interventions. Thus, the need remains for a comprehensive synthesis focused on comparisons between simulation-based training and nonsimulation instruction.

To address this gap, we sought to identify and quantitatively summarize all studies involving health professions learners that compared technology-enhanced simulation with another educational modality. We recognized that differences between simulation training and other training could be due not only to the instructional modality but also to other features of instructional design (eg, a difference between interventions in the quantity or quality of feedback). Thus, in addition to estimating the overall effect, we planned to explore for each outcome the influence of selected instructional design features.

METHODS

This review was planned, conducted, and reported in adherence to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) standards of quality for reporting meta-analyses.8 Our methods have been described in detail previously7; we summarize them briefly in the next sections.

Questions

We sought to answer 2 questions:

  1. What is the effectiveness of simulation technologies for training health professionals in comparison with other instructional modalities? Based on the educational principles of cognitive load, situated learning, and motor learning,9,10 we anticipated that simulation training would have negligible benefits in comparison with alternate training modalities for outcomes of knowledge, but that for task-oriented skills, behaviors, and patient effects, the results would favor simulation with a small to moderate effect. We also anticipated that satisfaction would be higher for simulation training.
  2. How do outcomes vary for selected instructional design variations? Based on the strength of the theoretical foundation (ie, those most likely to influence outcomes) and currency in the field, we prospectively focused on 4 instructional design features. We expected that lower extraneous cognitive load,9 enhanced feedback,2,11 learning in groups (as in problem-based learning12 or team training,13 vs. alone), and spending more time in learning activities14 would be associated with improved outcomes.

Study Eligibility

We define technology-enhanced simulation as an educational tool or device with which the learner physically interacts to mimic an aspect of clinical care for the purpose of teaching or assessment.7 Simulation technologies encompass diverse products including computer-based virtual reality simulators, high-fidelity and static mannequins, plastic models, live animals, inert animal products, and human cadavers.

We included studies published in any language that investigated the use of technology-enhanced simulation to teach health professions learners at any stage in training or practice, in comparison with another instructional modality, using outcomes of reaction (satisfaction), learning (knowledge or skills in a test setting), behaviors (in practice), or effects on patients (see Appendix 4, for definitions). Computer-based virtual patients15 and human patient actors (standardized patients) did not qualify as technology-enhanced simulation but did count as comparison interventions.

Study Identification

An experienced research librarian (P.J.E.) developed a strategy to search MEDLINE, Embase, CINAHL, PsycINFO, Scopus, ERIC, and Web of Science (see Appendix Box 1, Supplemental Digital Content, https://links.lww.com/SIH/A70, which contains the full search strategy). This search had no beginning date cutoff and was updated on May 11, 2011. We added to the screening pool all articles published in 2 journals devoted to health professions simulation (Simulation in Healthcare and Clinical Simulation in Nursing) since their inception and all articles cited in several published reviews of simulation. Finally, we searched for additional studies in the reference lists of 190 articles meeting inclusion criteria.

Study Selection

Working independently and in duplicate, we screened all titles and abstracts for inclusion. In the event of disagreement or insufficient information in the abstract, we reviewed the full text of potential articles independently and in duplicate and resolved conflicts by consensus. Chance-adjusted interrater agreement for study inclusion, determined using intraclass correlation coefficient (ICC), was 0.69.

Data Extraction

Using a detailed data abstraction form, we abstracted data independently and in duplicate from each article, resolving conflicts by consensus. We abstracted the same data elements for both the simulation and nonsimulation interventions. We abstracted information on the training level of learners, clinical topic, instructional design, study design, outcomes, and methodological quality. Interrater agreement (ICC) for the instructional design features selected for subgroup analyses were as follows: cognitive load, 0.56; feedback, 0.47; group learning, 0.70; and time spent learning, 0.68. Although lower cognitive load indicates superior instructional design, we reversed the scoring of this variable so that higher scores indicate superior design. We coded additional simulation features (ICC ranging from 0.37 to 0.73 as reported previously7) including repetitive practice, curriculum integration, range of task difficulty, multiple learning strategies, clinical variation, individualized learning, mastery learning, distributed practice (whether learners trained on 1 or >1 day), number of task repetitions, and alignment between training and assessment (ICC, 0.83). We planned to abstract information on simulation fidelity but found this construct difficult to operationalize with high reliability and therefore dropped this variable. We graded methodological quality using the Medical Education Research Study Quality Instrument (MERSQI)16 and an adaptation of the Newcastle-Ottawa Scale (NOS) for cohort studies17,18 as described previously.7

Because the results of simulation training may vary for different outcomes, we distinguished outcomes using Kirkpatrick’s classification19 and abstracted information separately for reaction (satisfaction), learning (knowledge and skills, with skills further classified as time, process, and product measures), behaviors with patients (time and process), and results (patient effects). We also abstracted information on monetary and time costs associated with training development and maintenance.

Data Synthesis

For each reported outcome, we calculated the standardized mean difference (Hedges g effect size) using methods detailed previously.7 If we could not calculate an effect size using reported data, we requested additional information from authors via e-mail.

We used the I2 statistic20 to quantify inconsistency (heterogeneity) across studies. I2 estimates the percentage of variability across studies not due to chance, and values higher than 50% indicate large inconsistency. We used random-effects models to pool weighted effect sizes. Some studies included more than 2 groups (eg, simulation, lecture, and standardized patient). For these, we selected the 2 most effective interventions for the main analysis and then performed sensitivity analyses substituting the other intervention(s) to see how this influenced results. We planned subgroup analyses based on study design (randomized vs. nonrandomized), total quality score, and the selected instructional design features (cognitive load, feedback, group learning, and learning time). We performed sensitivity analyses, excluding studies that assessed skill outcomes using the same simulator as was used for training and studies that used P value upper limits or imputed SDs to estimate the effect size.

Although funnel plots can be misleading in the presence of inconsistency,21 we used these along with the Egger asymmetry test22 to explore possible publication bias. When we found asymmetric funnel plots, we estimated a revised pooled effect size using trim-and-fill analysis, although this method also has limitations when inconsistency is present.23

We used SAS 9.2 (SAS Institute, Cary, NC) for all analyses. Statistical significance was defined by a 2-sided α of 0.05. Determinations of clinical significance emphasized Cohen effect size classifications (<0.2 = negligible, 0.2–0.49 = small, and 0.5–0.8 = moderate).24

RESULTS

Trial Flow

We identified 10,297 articles using our search strategy and 606 from our review of reference lists and journal indices. From these, we identified 92 studies comparing simulation training with another instructional modality and reporting an eligible outcome (Fig. 1). We received additional information from authors of 2 studies. Table 1 summarizes key study features, and Appendix Table 2 (Supplemental Digital Content, https://links.lww.com/SIH/A70, which contains the citation and detailed information for each included study) provides a complete listing of references with additional information. Of these 92 studies, 7 were multi-arm studies that included a comparison with no intervention, and the no-intervention results were reported previously.7

F1-6
FIGURE 1:
Trial flow.
T1-6
TABLE 1:
Description of Included Studies

Study Characteristics

Investigators used technology-enhanced simulations to teach topics such as emergency resuscitation, gastrointestinal endoscopy, laparoscopic surgery, physical examination, and teamwork. They most often made comparison with lectures (n = 24), real or standardized patients (n = 22), and small-group discussions (n = 16). Nearly half the articles (n = 45) were published in or after 2008, and 1 article was published in Chinese. Learners include student and practicing physicians, nurses, emergency medical technicians, dentists, chiropractors, and veterinarians, among others.

Most simulation interventions (63/92) provided for repeated practice (eg, multiple scenarios or cases), whereas only 35 other-modality interventions provided for this. In fact, most instructional design key features were more common among simulation training than other-modality training, including cognitive interactivity (60 vs. 25), feedback (13 vs. 4), and mastery learning (4 vs. 3). Two simulation interventions involved live human standardized patients (“hybrid” simulation). Table 2 summarizes the prevalence of additional instructional design features.

T2-6
TABLE 2:
Prevalence of Instructional Design Key Features

Of 165 outcomes reported in these 92 studies, 136 (82%) were objectively determined (eg, faculty ratings or simulator/computer scoring). Twenty studies reported satisfaction, and 42 reported knowledge outcomes. Seventy-six outcomes assessed skills in a training setting, including time to complete the task, process measures (such as global performance ratings, checklist scores, efficiency scores, minor errors, or self-reported confidence), and task products (such as procedural success or integrity of an anastomosis). Of these 76 outcomes, 48 (63%) were assessed using the same simulator as was used in the simulation intervention (eg, the simulation group practiced on a mannequin, and both groups were assessed using that mannequin). Time and process measures of behaviors with real patients (eg, instructor ratings of competence, completion of key elements per unit time, and medication errors) were reported in 7 and 11 studies, respectively. Nine studies reported direct effects on patients such as procedural success, patient satisfaction, and major complications. Among the 61 studies reporting skill, behavior, or patient effect outcomes, the assessment context was more often aligned with the instructional approach in the simulation intervention [36 (59%)] than in the comparison intervention [19 (31%)].

Study Quality

Table 3 summarizes the methodological quality of included studies. The number of enrolled participants ranged from 6 to 429 with a median of 40 (interquartile range, 28–78). Seventy-one studies were randomized. Only a minority of studies reported validity evidence to support the interpretations of outcome scores, with 37% providing content evidence, 36% providing internal structure evidence, and 8% reporting relations with other variables. Studies lost more than 25% of participants from the time of enrollment or failed to report follow-up for 6 (30%) of 20 studies reporting satisfaction, 13 (31%) of 42 studies reporting knowledge, 1 (7%) of 14 studies reporting time measure of skills, 7 (13%) of 52 studies reporting process measure of skills, 1 (10%) of 10 studies reporting product measure of skills, and 1 (9%) of 11 studies reporting process measure of behaviors. Assessors were blinded to the study intervention for 85 (52%) of the 165 outcome measures. The mean (SD) quality scores averaged 3.9 (1.2) for the NOS (maximum, 6 points) and 12.9 (1.7) for the MERSQI (maximum, 18 points).

T3-6
TABLE 3:
Quality of Included Studies

Meta-Analysis

Meta-analyses demonstrated that technology-enhanced simulation training was associated with higher outcomes than other instructional modalities (Figs. 2–9), with small to moderate effect sizes that were statistically significant for satisfaction, knowledge, process skills, and product skills. Inconsistency was large in all analyses, indicating that results varied substantially from study to study.

F2-6
FIGURE 2:
Meta-analysis: satisfaction (n = 20). Positive numbers favor the simulation intervention. P interaction values reflect paired or 3-way comparisons (treatment-subgroup interactions) among bracketed subgroups. Studies are classified according to relative between-intervention differences in key instructional methods; namely, did the simulation intervention have more (Sim > Other), less (Sim < Other), or the same (Sim = Other) degree of selected instructional design enhancements. Participant groups are not mutually exclusive, and thus, no statistical comparison is made. Some features could not be discerned for all studies; hence, some numbers do not add to the total N. CAI, computer assisted instruction; SP, standardized patient; VP, computer-based virtual patient.
F3-6
FIGURE 3:
Meta-analysis: knowledge (n = 42). See legend of Figure 2 for interpretive information.
F4-6
FIGURE 4:
Meta-analysis: time skills (n = 14). See legend of Figure 2 for interpretive information.
F5-6
FIGURE 5:
Meta-analysis: process skills (n = 51). See legend of Figure 2 for interpretive information.
F6-6
FIGURE 6:
Meta-analysis: product skills (n = 11). See legend of Figure 2 for interpretive information.
F7-6
FIGURE 7:
Meta-analysis: time behaviors (n = 7). See legend of Figure 2 for interpretive information.
F8-6
FIGURE 8:
Meta-analysis: process behaviors (n = 11). See legend of Figure 2 for interpretive information.
F9-6
FIGURE 9:
Meta-analysis: patient effects (n = 9). See legend of Figure 2 for interpretive information.

To explore this variation, we performed subgroup analyses that clustered and compared studies according to the relative presence of a given instructional design element (eg, feedback): was this design element stronger in the simulation intervention, stronger in the comparison intervention, or similar in both? These subgroup analyses confirmed for most outcomes the expected pattern of larger effect size when the simulation intervention incorporated stronger instructional methods (more feedback, time on task, group learning, or minimization of cognitive load) and smaller effect size when the comparison intervention incorporated a stronger instructional design. However, these interactions were statistically significant for only a minority of analyses.

To explore the possible impact of biased skill outcome assessment or imprecise effect size estimation, we performed sensitivity analyses excluding the disputable measures; these analyses did not alter study conclusions. Finally, to explore the possibility that small studies showing nonsignificant differences remain unpublished (publication bias), we analyzed funnel plots. When we found an asymmetric funnel plot (suggesting possible publication bias), we performed trim-and-fill analyses that attempt to compensate for the unpublished studies. In all cases, these trim-and-fill analyses yielded results similar to the original.

Satisfaction

Twenty studies (with 1362 participants providing data) compared simulation versus other-modality training using outcomes of satisfaction or preference. The pooled effect size for these outcomes was 0.59 [95% confidence interval (CI), 0.36–0.81; P < 0.001], indicating a statistically significantly higher satisfaction with simulation. The magnitude of this effect is considered moderate.24 However, we also found large inconsistency among studies, with individual effect sizes ranging from −0.19 to 2.15 and an I2 of 81%. A visually asymmetric funnel plot suggested possible publication bias. Assuming this asymmetry reflects publication bias, trim-and-fill analyses provided a lower pooled effect size of 0.49 (95% CI, 0.22–0.76).

Figure 2 shows the results of subgroup analyses for satisfaction outcomes. For the 16 studies with randomized group assignment, the pooled effect size was 0.60. Looking at the instructional design subgroups, the effect size was highest (greater satisfaction) for 4 studies in which the simulation intervention provided more feedback than the comparison (“Sim > Other” in Fig. 2), intermediate for 12 studies in which feedback was equal (“Sim = Other”), and lowest for 1 study in which the comparison provided greater feedback (“Sim < Other”), but these differences were not statistically significant (no treatment-subgroup interaction). However, we did find a statistically significant treatment-subgroup interaction (Pinteraction = 0.009) with learning time (longer time associated with higher satisfaction).

Knowledge

Forty-two studies (2607 participants) reported knowledge outcomes, which yielded a small but statistically significant pooled effect size of 0.30 (95% CI, 0.16–0.43; P < 0.001), favoring simulation. An I2 of 84% indicated large inconsistency, with individual effect sizes ranging from −0.6 to 1.4. The Egger test suggested an asymmetric funnel plot. Again, assuming this asymmetry reflects publication bias, trim-and-fill analyses provided a pooled effect size of 0.18 (95% CI, 0.04–0.33) that, while still statistically significant, was negligible in magnitude. The 32 randomized trials had a pooled effect size of 0.29. We found significant treatment-subgroup interactions with cognitive load (Pinteraction = 0.01), feedback (Pinteraction = 0.04), and learning time (Pinteraction = 0.01), all following the predicted pattern.

Skill: Time

Fourteen studies (501 participants) reported the time required to complete the task in a simulation setting. The pooled effect size of 0.33 (95% CI, 0.00–0.66; P = 0.05) reflects a small and statistically borderline association favoring simulation. Again, inconsistency was large (I2 = 73%), and effect sizes ranged from −1.2 to 1.13. Funnel plot analyses did not suggest publication bias. The 11 randomized trials had pooled effect size of 0.40. Subgroup analyses revealed a significant interaction favoring higher feedback (Pinteraction < 0.001). Sensitivity analyses limited to the 7 studies that assessed outcomes using a different simulator than was used for training (ie, transfer of skills to a new setting) revealed a slightly higher effect size [0.36 (95% CI, 0.02–0.69)].

Skill: Process Measures

Fifty-one studies (2341 participants) reported skill measures of process (eg, global ratings or efficiency). The pooled effect size of 0.38 (95% CI, 0.24–0.52; P < 0.001) reflects a small yet statistically significant association favoring simulation but with large inconsistency among studies (I2 = 79%). Effect sizes ranged from −1.59 to 2.8. Inspection of the funnel plot suggested possible publication bias; trim-and-fill analyses provided a lower pooled effect size of 0.25 (95% CI, 0.10–0.41). The 42 randomized trials had a pooled effect size of 0.39.

Effect sizes were higher for interventions with longer learning time (Pinteraction = 0.02). There was also a significant interaction with feedback (Pinteraction = 0.03), but the direction of effect was inconsistent. Sensitivity analyses limited to the 16 studies that assessed outcomes using a different simulator than was used for training revealed a slightly lower effect size [0.34 (95% CI, 0.02–0.67)].

Skill: Product Measures

Eleven studies (475 participants) evaluated the products of learners’ performance such as procedural success or the quality of a finished product. Again, we found a moderate pooled effect size of 0.66 (95% CI, 0.30–1.02; P = 0.002) and large inconsistency (I2 = 68%). Effect sizes ranged from 0.03 to 1.8. Trim-and-fill analyses in response to an asymmetric funnel plot yielded a lower pooled effect size of 0.43 (95% CI, 0.00–0.85). The 9 randomized trials had a pooled effect size of 0.69.

Subgroup analyses revealed a significant interaction favoring higher feedback (Pinteraction < 0.001) and group over individual learning (Pinteraction = 0.02). Studies with higher MERSQI quality scores were associated with higher effect sizes (Pinteraction < 0.001). Sensitivity analyses limited to the 5 studies that assessed outcomes using a different simulator than was used for training revealed a somewhat lower effect size [0.56 (95% CI, 0.01–1.11)].

Behavior

Seven studies (171 participants) used a measure of time to evaluate behaviors while caring for patients, with a moderate pooled effect size of 0.56 (95% CI, −0.07 to 1.18; P = 0.08), with an I2 of 73%. Effect sizes ranged from −0.55 to 1.78. The funnel plot was symmetric. The 6 randomized trials had a pooled effect size of 0.65. There were no significant subgroup interactions.

Eleven studies (515 participants) reported other learner behaviors while caring for patients. The association approached a large effect size [0.77 (95% CI, −0.13 to 1.66; P = 0.09)], with an I2 of 94%. Effect sizes ranged from −1.26 to 2.75. The 8 randomized trials had a similar pooled effect size of 0.75. We found no statistically significant associations in any of the planned subgroup analyses. The funnel plot suggested possible asymmetry; trim-and-fill analyses yielded a slightly higher effect size of 0.87 (95% CI, 0.02–1.72). There were no significant subgroup interactions.

Effects on Patient Care

Finally, 9 studies (494 participants) reported effects on patients. For these outcomes, simulation training was associated with a small pooled effect size of 0.36 (95% CI, −0.06 to 0.78; P = 0.09). Inconsistency was large (I2 = 70%), and effect sizes ranged from −1.37 to 1.51. The 7 randomized trials had an effect size of 0.53. The funnel plot was symmetric. Effect sizes were higher for interventions with longer learning time (Pinteraction = 0.03).

Additional Sensitivity Analyses

We used P value upper limits (eg, P < 0.01) and imputed SDs to estimate 9 effect sizes each. Sensitivity analyses excluding these 18 effect sizes yielded pooled estimates virtually identical to those of the full sample except for time skills and process behaviors, which were slightly higher (pooled estimates, 0.42 and 0.88, respectively), and time behavior, which was slightly lower (pooled estimates, 0.46). Three studies reported 2 or more eligible comparison interventions, and we included the best-performing comparison in main analyses. Sensitivity analyses substituting the results from the other intervention(s) did not alter the results. Excluding studies with low follow-up also showed no appreciable difference.

Costs

Although several studies reported the price of the simulator, only 5 studies reported costs associated with the comparison training.25–29 In each case, the simulation training was more costly (in money or faculty time) but also more effective than the alternate approach (see Appendix Table 3, Supplemental Digital Content, https://links.lww.com/SIH/A70, which contains detailed cost and outcomes information for the 5 studies reporting comparative costs).

DISCUSSION

As in clinical medicine, comparative effectiveness research in education illuminates the relative benefits of 2 treatment approaches. We found that technology-enhanced simulation training, in comparison with other instructional modalities, is associated with higher learning outcomes. Pooled effect sizes were small to moderate in magnitude24 for nearly all abstracted outcomes, and differences were statistically significant for satisfaction, knowledge, process skills, and product skills. These results largely paralleled our anticipated results, with the pooled effect size smallest for knowledge outcomes and progressively higher for skill and then behavior outcomes. The simplest explanation for these findings is that the benefit of hands-on practice (and thus simulation training) is greater for higher-order outcomes. In support of this explanation, standardized patients and real patients had effects similar to technology-enhanced simulation for all outcomes except process measure of skills, whereas lecture, small-group discussion, and video training were frequently inferior. Alternatively, these findings could reflect an artifact of testing—if the assessment method conferred an advantage to the group receiving simulation training. Although we cannot exclude this entirely, it seems less likely given the persistent benefit after excluding studies that used the same simulator for training and testing. A third explanation is that investigators who invest effort into measuring higher-order outcomes (product measure of skills and behaviors) also invest effort into creating highly effective simulation training (implementation bias30).

Subgroup analyses suggested that strong instructional design, rather than simulation training per se, is at least partially responsible for the observed effects. Higher feedback and learning time, group work, and lower extraneous cognitive load all demonstrated with reasonable consistency a pattern of higher effect sizes when simulation provided more of these features than the comparison intervention, and lower effect sizes when the comparison provided more of these. However, subgroup analyses are by nature hypothesis generating, and these associations may not be causal. In addition, although the direction of effect was reasonably consistent, these interactions were usually not statistically significant. Finally, for most subgroup analyses, the effect favored the simulation arm even when the nonsimulation intervention had a stronger instructional design.

Limitations and Strengths

We used intentionally broad inclusion criteria to present a comprehensive overview of the field and achieve adequate statistical power for subgroup analyses. However, in so doing, we pooled outcomes across diverse clinical topics and instructional designs. This likely contributed to the inconsistency in meta-analysis results and limits the inferences drawn.

Literature reviews are necessarily constrained by the quantity and quality of available evidence. The NOS and MERSQI quality scores are higher than those found in previous reviews,16,18,31 and 71 studies were randomized. However, sample sizes were small, sample representativeness was rarely addressed, outcome validity evidence was infrequently presented, and many reports failed to clearly describe key features of the context, instructional design, or outcomes. Poor reporting in the original studies likely contributed to suboptimal interrater agreement during screening and data abstraction. Many studies reported multiple measures of the same outcome (eg, >1 process measure of skills); however, we followed a consistent approach in selecting measures to minimize bias, and the direction of results was usually congruent within a study. We were unable to account for between-study variation in instructor ability (expertise bias) or the quality of course implementation. We also cannot exclude the possibility that participants in either arm engaged in learning extraneous to the interventions under study.

Subgroup analyses should be interpreted cautiously. Such analyses not only require accurate reporting of instructional design features and correct data abstraction but also reflect between-study comparisons (rather than within-study comparisons) and are susceptible to numerous limitations.32

As noted previously, skill outcomes assessed using the training simulator would artificially favor simulation. Behaviors and patient effects would not be susceptible to such bias, but for these outcomes the width of the CIs (which admit the possibility of no effect) suggests uncertainty in the impact on actual patient care.

Several analyses suggested possible publication bias. Although the pooled effect sizes adjusted using trim-and-fill analyses would not substantially alter the conclusions, we do not know the extent of this problem (if it exists at all) or the accuracy of the adjusted analyses. Unfortunately, no methods exist to detect or adjust for publication bias with certainty.33

Our review has several additional strengths, including an exhaustive literature search led by an experienced reference librarian and unrestricted regarding time and language; explicit inclusion criteria encompassing a broad range of learners, outcomes, and study designs; duplicate, independent, and reproducible data abstraction; rigorous coding of methodological quality; and focused analyses. Although funnel plots are limited in the presence of large inconsistency,21 these did not suggest that publication bias substantially affected our conclusions.

Comparison With Previous Reviews

The present review complements our recent meta-analysis showing that simulation training is associated with large positive effects in comparison with no intervention.7 Although several other reviews have addressed simulation in general2 or in comparison with no intervention,5,6 we are not aware of previous reviews focused on comparisons of simulation training with other instructional modalities. This comprehensive and quantitative synthesis thus represents a novel and important contribution to the field.

Our finding of small to moderate effects favoring simulation in comparison with other instructional modalities contrasts with parallel reviews of Internet-based instruction18 and computer-based virtual patient simulations,15 in which comparisons with noncomputer instruction found no significant difference.

Implications

We see at least 3 important implications. First, although technology-enhanced simulation training is associated with improved outcomes in comparison with other instructional modalities, the costs of both interventions were rarely reported, making it impossible to comment on the true comparative value of simulation training. Future research should carefully document actual costs,34 including equipment, space, time, and salaries for development and maintenance, and opportunity cost (ie, what is replaced when simulation training is introduced?).

Second, the merits of simulation likely vary for different educational objectives, as suggested by the larger effect sizes for skill and behavior outcomes. As has been previously proposed,35 deliberate alignment of objectives and instructional modalities will likely enhance efficiencies for both cognition (eg, optimal management of cognitive load) and costs (time and monetary investment). For example, educators might use less costly interventions (eg, lecture or Web-based learning) for nonskill objectives and reserve simulation for later stages of instruction. Our analyses provide preliminary empirical support for such alignments, and this warrants further study. Given the difficulty in isolating the effect of the modality (ie, simulation) from the instructional design (eg, learning time, feedback, or repetitions), direct quantitative comparisons of simulation and nonsimulation interventions are problematic.36,37 Rigorous qualitative studies will thus play an important complementary role.

Third, once an educator has decided that simulation training is the ideal approach for a given objective, the salient question is how to use simulation effectively. Between-study subgroup analyses provide only weak evidence to answer such questions. We believe that theory-based, adequately powered, and carefully designed comparisons of different simulation training methods (simulation-simulation research) will best clarify38 evidence-based approaches to simulation design.39,40

REFERENCES

1. Ziv A, Wolpe PR, Small SD, Glick S. Simulation-based medical education: an ethical imperative. Acad Med 2003; 78: 783–788.
2. Issenberg SB, McGaghie WC, Petrusa ER, Lee Gordon D, Scalese RJ. Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review. Med Teach 2005; 27: 10–28.
3. Sutherland LM, Middleton PF, Anthony A, et al.. Surgical simulation: a systematic review. Ann Surg 2006; 243: 291–300.
4. Sturm LP, Windsor JA, Cosman PH, Cregan P, Hewett PJ, Maddern GJ. A systematic review of skills transfer after surgical simulation training. Ann Surg 2008; 248: 166–179.
5. Gurusamy K, Aggarwal R, Palanivelu L, Davidson BR. Systematic review of randomized controlled trials on the effectiveness of virtual reality training for laparoscopic surgery. Br J Surg 2008; 95: 1088–1097.
6. McGaghie WC, Issenberg SB, Cohen ER, Barsuk JH, Wayne DB. Does simulation-based medical education with deliberate practice yield better results than traditional clinical education? A meta-analytic comparative review of the evidence. Acad Med 2011; 86: 706–711.
7. Cook DA, Hatala R, Brydges R, et al.. Technology-enhanced simulation for health professions education: a systematic review and meta-analysis. JAMA 2011; 306: 978–988.
8. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: the PRISMA statement. Ann Intern Med 2009; 151: 264–269.
9. van Merriënboer JJG, Sweller J. Cognitive load theory and complex learning: recent developments and future directions. Educ Psychol Rev 2005; 17: 147–177.
10. Bradley P, Postlethwaite K. Simulation in clinical learning. Med Educ 2003; 37 (Suppl 1): 1–5.
11. van de Ridder JM, Stokking KM, McGaghie WC, ten Cate OT. What is feedback in clinical education? Med Educ 2008; 42: 189–197.
12. Dolmans DH, Schmidt HG. What do we know about cognitive and motivational effects of small group tutorials in problem-based learning? Adv Health Sci Educ Theory Pract 2006; 11: 321–336.
13. McGaghie WC, Issenberg SB, Petrusa ER, Scalese RJ. A critical review of simulation-based medical education research: 2003–2009. Med Educ 2010; 44: 50–63.
14. Cook DA, Levinson AJ, Garside S. Time and learning efficiency in Internet-based learning: a systematic review and meta-analysis. Adv Health Sci Educ Theory Pract 2010; 15: 755–770.
15. Cook DA, Erwin PJ, Triola MM. Computerized virtual patients in health professions education: a systematic review and meta-analysis. Acad Med 2010; 85: 1589–1602.
16. Reed DA, Cook DA, Beckman TJ, Levine RB, Kern DE, Wright SM. Association between funding and quality of published medical education research. JAMA 2007; 298: 1002–1009.
17. Wells GA, Shea B, O’Connell D, et al. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. 2007. Available at: http://www.ohri.ca/programs/clinical_epidemiology/oxford.htm. Accessed February 29, 2012.
18. Cook DA, Levinson AJ, Garside S, Dupras DM, Erwin PJ, Montori VM. Internet-based learning in the health professions: a meta-analysis. JAMA 2008; 300: 1181–1196.
19. Kirkpatrick D. Revisiting Kirkpatrick’s four-level model. Train Dev 1996; 50: 54–59.
20. Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003; 327: 557–560.
21. Lau J, Ioannidis JPA, Terrin N, Schmid CH, Olkin I. The case of the misleading funnel plot. BMJ 2006; 333: 597–600.
22. Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997; 315: 629–634.
23. Terrin N, Schmid CH, Lau J, Olkin I. Adjusting for publication bias in the presence of heterogeneity. Stat Med 2003; 22: 2113–2126.
24. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum; 1988.
25. Limpaphayom K, Ajello C, Reinprayoon D, Lumbiganon P, Graffikin L. The effectiveness of model-based training in accelerating IUD skill acquisition. A study of midwives in Thailand. Br J Fam Plann 1997; 23: 58–61.
26. de Giovanni D, Roberts T, Norman G. Relative effectiveness of high- versus low-fidelity simulation in learning heart sounds. Med Educ 2009; 43: 661–668.
27. Nunnink L, Welsh AM, Abbey M, Buschel C. In situ simulation-based team training for post-cardiac surgical emergency chest reopen in the intensive care unit. Anaesth Intensive Care 2009; 37: 74–78.
28. Delasobera BE, Goodwin TL, Strehlow M, et al.. Evaluating the efficacy of simulators and multimedia for refreshing ACLS skills in India. Resuscitation 2010; 81: 217–223.
29. Petscavage JM, Wang CL, Schopp JG, Paladin AM, Richardson ML, Bush WHJ. Cost analysis and feasibility of high-fidelity simulation based radiology contrast reaction curriculum. Acad Radiol 2011; 18: 107–112.
30. Cook DA, Beckman TJ. Reflections on experimental research in medical education. Adv Health Sci Educ Theory Pract 2010; 15: 455–464.
31. Reed DA, Beckman TJ, Wright SM, Levine RB, Kern DE, Cook DA. Predictive validity evidence for Medical Education Research Study Quality Instrument scores: quality of submissions to JGIM’s Medical Education Special Issue. J Gen Intern Med 2008; 23: 903–907.
32. Oxman A, Guyatt G. When to believe a subgroup analysis. In: Hayward R, ed. Users’ Guides Interactive. Chicago, IL: JAMA Publishing Group; 2002. Available at http://www.jamaevidence.com/abstract/3347922. Accessed: November 1, 2010.
33. Montori VM, Smieja M, Guyatt GH. Publication bias: a brief review for clinicians. Mayo Clin Proc 2000; 75: 1284–1288.
34. Zendejas B, Wang AT, Brydges R, Hamstra SJ, Cook DA. Cost: the missing outcome in simulation-based medical education research: a systematic review. Surgery 2012 Aug 9 [Epub ahead of print]. doi 10.1016/j.surg.2012.06.025.
35. Cook DA, Triola MM. Virtual patients: a critical literature review and proposed next steps. Med Educ 2009; 43: 303–311.
36. Friedman C. The research we should be doing. Acad Med 1994; 69: 455–457.
37. Cook DA. The research we still are not doing: an agenda for the study of computer-based learning. Acad Med 2005; 80: 541–548.
38. Cook DA, Bordage G, Schmidt HG. Description, justification, and clarification: a framework for classifying the purposes of research in medical education. Med Educ 2008; 42: 128–133.
39. Cook DA. One drop at a time: research to advance the science of simulation. Simul Healthc 2010; 5: 1–4.
40. Weinger MB. The pharmacology of simulation: a conceptual framework to inform progress in simulation research. Simul Healthc 2010; 5: 8–15.

APPENDIX: BOX. DEFINITIONS OF TERMS

T4-6
Keywords:

Medical education; Simulation; Instructional design; Instructional method; Educational technology

Supplemental Digital Content

© 2012 Society for Simulation in Healthcare