Share this article on:

Validity of Cognitive Load Measures in Simulation-Based Training: A Systematic Review

Naismith, Laura M. PhD; Cavalcanti, Rodrigo B. MD, MSc

doi: 10.1097/ACM.0000000000000893
Review Papers

Background Cognitive load theory (CLT) provides a rich framework to inform instructional design. Despite the applicability of CLT to simulation-based medical training, findings from multimedia learning have not been consistently replicated in this context. This lack of transferability may be related to issues in measuring cognitive load (CL) during simulation. The authors conducted a review of CLT studies across simulation training contexts to assess the validity evidence for different CL measures.

Method PRISMA standards were followed. For 48 studies selected from a search of MEDLINE, EMBASE, PsycInfo, CINAHL, and ERIC databases, information was extracted about study aims, methods, validity evidence of measures, and findings. Studies were categorized on the basis of findings and prevalence of validity evidence collected, and statistical comparisons between measurement types and research domains were pursued.

Results CL during simulation training has been measured in diverse populations including medical trainees, pilots, and university students. Most studies (71%; 34) used self-report measures; others included secondary task performance, physiological indices, and observer ratings. Correlations between CL and learning varied from positive to negative. Overall validity evidence for CL measures was low (mean score 1.55/5). Studies reporting greater validity evidence were more likely to report that high CL impaired learning.

Conclusions The authors found evidence that inconsistent correlations between CL and learning may be related to issues of validity in CL measures. Further research would benefit from rigorous documentation of validity and from triangulating measures of CL. This can better inform CLT instructional design for simulation-based medical training.

Funding/Support: None reported.

Other disclosures: None reported.

Ethical approval: Reported as not applicable.

Previous presentations: Preliminary findings were presented at the Canadian Conference on Medical Education, April 29, 2014, Ottawa, Ontario, Canada, and at the Association for Medical Education in Europe Conference, September 3, 2014, Milan, Italy.

Correspondence: Laura M. Naismith, PhD, Toronto Western Hospital 8E-427C, 399 Bathurst St., Toronto ON M5T 2S8, Canada; e-mail: laura.naismith@uhnresearch.ca.

A growing body of evidence in medical education demonstrates the effectiveness of simulation-based training for enhancing both learning and patient outcomes.1,2 These findings, however, provide little guidance to clinical educators in terms of how to use simulation to achieve specific learning goals. Theories of learning and instructional design can be used to inform decisions about how to structure simulation activities to promote efficient learning. One such theory, cognitive load theory (CLT), provides a framework for understanding simulation complexity from the perspective of the learner.3–5 CLT is underpinned by an information processing model of human cognitive architecture which posits that information sensed from the environment must be processed by working memory before it can be consolidated and stored in long-term memory in the form of schemas (i.e., “learned”). Working memory is limited in duration in capacity and represents the key bottleneck to learning.6 Exposing learners to material and tasks that are too complex for their current level of training risks overloading their working memories and impairing learning.7 Novices are particularly at risk of cognitive overload because novel information they encounter cannot be easily “chunked” or integrated within their existing schemas.8

Simulation environments have many characteristics that compel learners’ attention and working memory resources. For example, in medical simulation, trainees may be required to perform both cognitive (e.g., patient monitoring, clinical reasoning and decision making, communication) and psychomotor tasks (e.g., physical exam maneuvers, suturing, invasive procedures) simultaneously.9,10 The physical environment of the simulation task, including issues of fidelity, may also contribute to learners’ cognitive load.11,12 Despite the apparent applicability of CLT, findings from multimedia learning13,14 have not been easily replicated within medical simulation training contexts. While in some cases, increased cognitive load showed a detrimental impact on learning and performance outcomes,15,16 many studies have failed to demonstrate a relationship between cognitive load and learning.17,18

We sought to investigate the issue of conflicting results from a measurement perspective. Three main cognitive load measurement techniques have been established in the literature: self-report, secondary task, and physiological indices.19 Self-report measures represent the most common means of measuring cognitive load, as there are several existing scales with established validity in other domains.20 Self-report measures typically reflect retrospective assessments of cognitive load, whereas secondary task and physiological indices are intended to measure cognitive load concurrently. With secondary task measures, participants are asked to perform the primary learning task together with an additional monitoring, memorization, or computational task. The assumption is that when working memory demands for the primary learning task increase, there will be less working memory capacity available to allocate to the secondary task.21 Such changes can be detected as a decrease in reaction time to a secondary stimulus or a decline in recall or accuracy of secondary task performance. Physiological indices such as heart rate and pupil dilation can be used to detect changes in autonomic nervous system activity as a function of changing cognitive demands.19

As the validity evidence supporting the use of CLT in medical simulation training appears to be limited,12,22 the objective of this study was to review how cognitive load has been measured across a variety of simulation-based training contexts. Our specific aim was to assess the prevalence of validity evidence that has been collected to support the use of various instruments in measuring cognitive load during simulation training. Our broader objective was to determine which cognitive load measures are most appropriate for use in medical simulation contexts, so as to inform the methodological rigor of future studies in this field.

Back to Top | Article Outline

Method

We developed a systematic review protocol in accordance with PRISMA quality standards.23

Back to Top | Article Outline

Pilot study

We conducted a preliminary search of the MEDLINE database in August 2013 using the search terms cognitive load, simulation, learning, education, training, and teaching. The search yielded 396 abstracts. Two raters (L.M.N. and an external associate) independently reviewed titles and abstracts and selected 31 studies for further review. During this pilot stage we drafted the initial coding sheet and framework for analysis.

Back to Top | Article Outline

Search strategy

On the basis of the results of the pilot study24 we worked with a health sciences librarian at our institution to develop a more extensive search protocol. The search was conducted in February 2014 and included five databases that represented both medical/health and educational domains: MEDLINE, EMBASE, PsycInfo, CINAHL, and ERIC. The MEDLINE search strategy is included as Appendix 1. Search terms for cognitive load included cognitive/mental load, workload, effort, demand, burden, and overload. We also retrieved records related to working memory and short-term memory. We initially conceived of simulation broadly and included search terms related to patient simulation, space simulation, computer simulation, computer-assisted instruction, manikins, standardized patients, role-playing, and virtual reality. We searched each database from the earliest possible date and made no exclusions on the basis of language or geography.

In addition to the database review, we also reviewed our pilot study results against the revised search criteria and wrote to three authors to solicit full texts of published abstracts.

Back to Top | Article Outline

Study selection

We adopted a multistage process to select relevant studies for review. To be included in the first stage of the review, studies had to meet the following inclusion criteria:

  • Population: Healthy adult human participants with no identified cognitive impairments;
  • Intervention: The study involved a simulation wherein participants’ cognitive load was directly measured using one or more techniques;
  • Comparison: No specific comparison sought;
  • Outcomes: Any outcomes;
  • Study design: All study designs.

We excluded studies that focused on the effects of disease, age, drugs, nutrition, and/or environmental stimuli (e.g., heat, noise) on cognitive load as well as most studies that manipulated cognitive load as an independent variable, unless it was also measured by a manipulation check. Two raters (L.M.N. and an external associate) independently reviewed titles and abstracts. Disagreements were resolved by discussion.

In the second stage of the process, we refined our inclusion criteria to select studies that measured cognitive load with respect to learning processes and/or outcomes. We also included studies of performance outcomes that were conducted within a training context as well as studies that examined the feasibility of different cognitive load measures for evaluating learning processes. We independently screened titles and abstracts. Thirty-seven initial disagreements (10.1%) were resolved by discussion.

In the third stage of the process, we further refined our inclusion criteria to distinguish between multimedia and simulation. Adopting Gaba’s25 definition of simulation, we only retained studies involving multimedia environments if they (a) replaced or amplified a real experience, (b) evoked or replicated substantial aspects of the real world, and (c) afforded considerable opportunities for participant interaction. To make decisions at this stage, we independently reviewed the full text of the articles. Eleven initial disagreements (8.7%) were again resolved by discussion.

Back to Top | Article Outline

Data extraction

A key objective of this review was to understand whether the relationship between cognitive load and learning in simulated environments was related to the technique used to measure cognitive load. Accordingly, we developed a coding sheet to extract the following information:

  1. Study aim, including main research question and whether or not CLT was specifically cited;
  2. Methods, including description of simulation, study design, population, sample size, and cognitive load measures used;
  3. Validity evidence for cognitive load measures used, including content, response process, internal structure, relations with other variables, and consequences, based on examples reported by Cook and colleagues26;
  4. Findings related to learning/performance and total cognitive load, categorized in terms of whether the relationship was positive (i.e., high cognitive load was associated with beneficial learning outcomes), negative (i.e., high cognitive load was associated with poor learning outcomes), or neutral (no association);
  5. Implications, including discussion points raised by the study authors as well as our own comments.
Back to Top | Article Outline

Scoring and data analysis

To assess the validity of the measurement techniques, we developed a scoring framework based on examples reported by Cook and colleagues26 (see Table 1). For each validity element (i.e., content, response process, internal structure, relations with other variables, and consequences), we awarded 1 point if the collected evidence sufficiently addressed the validity element and 0.5 points if the evidence partially or inconsistently supported the validity element. For example, reporting significant correlations between all questionnaire items or a Cronbach alpha > 0.6 received 1 point for internal structure, whereas correlations between some items but not others received 0.5 points. As with Cook et al,26 our primary aim was to assess the prevalence of the different validity elements. Correspondingly, we calculated overall scores based on the total number of validity elements reported. Thus, the minimum possible score was 0 and the maximum possible score was 5. We dual-coded 12 studies (25% of the data set) and achieved initial agreement of 89%. The remainder of the studies were coded by a single rater.

Table 1

Table 1

We then examined the relationship between validity scores and study findings. Studies were categorized according to whether increased cognitive load improved learning (positive relationship), impaired learning (negative relationship), or showed no effect on learning (neutral relationship). We used ANOVAs and t tests as appropriate to test whether there was a difference in the amount of validity evidence to support the observed relationship between cognitive load and learning for both medical education and other research domains. Studies which did not report learning outcomes were excluded from this analysis.

For each cognitive load measurement technique (i.e., self-report, secondary task, and physiological indices), we calculated the mean validity score by averaging the scores of all of the studies in which it was used. Again, we used ANOVAs and t tests to see whether validity scores differed between medical education and other research domains according to which measurement technique was used.

Back to Top | Article Outline

Results

The initial search yielded 4,660 results (database: 4,625; additional search: 35; Figure 1). A total of 319 abstracts met the initial search criteria, 124 abstracts met the secondary search criteria, and 48 studies met all search criteria (Appendix 2).15–18,27–70

Figure 1

Figure 1

Back to Top | Article Outline

Description of studies

The final set of studies came from a diverse set of research domains. Thirteen studies (27%) sampled medical practitioners and trainees, whereas the remainder of studies sampled various populations including undergraduate students (20 studies, 42%), pilots (6 studies, 13%), police officers (1 study, 2%), and naval cadets (1 study, 2%).

Most studies (38; 79%) evaluated cognitive load in the context of an educational or training intervention, where learning or performance was the primary outcome measure. Four studies (8%) examined the effect of cognitive load on physiological variables,40,67,68,70 4 studies (8%) specifically aimed to establish the feasibility of a cognitive load measurement technique,17,27,45,66 and 2 studies (4%) aimed to model the cognitive load of a simulated task.50,51

To define the phenomenon of interest, 17 studies (35%) used the term “cognitive load,” whereas 18 studies (38%) referred to “mental workload” and 10 studies (21%) referred to “mental effort.” Additional terms used included cognitive workload,34 mental strain,35 and mental demand.58 Twenty-one studies (44%) cited CLT as either a conceptual framework that guided study design and/or as a way to explain observed results.

In terms of the type of simulation task used, 19 studies (40%) employed primarily cognitive tasks, including monitoring, problem solving, and decision making, whereas 12 studies (25%) used primarily psychomotor tasks such as suturing, laparoscopic procedures, or vehicle control, and 11 studies (23%) used integrated cognitive and psychomotor tasks such as performing a simulated combat flight. We also identified 6 studies (13%) of simulated clinical encounters, including examining and diagnosing a high-fidelity cardiac simulator15,29 and managing the rapidly deteriorating vital signs of a manikin in a simulated operating room.17

All studies we retrieved used quantitative study designs. Twenty-nine studies (60%) were randomized controlled trials, 12 studies (25%) used repeated-measures or crossover designs, 4 (8%) were observational designs, and 3 (6%) were nonrandomized group comparisons. The number of participants per study ranged from 4 to 191 (median: 34).

Back to Top | Article Outline

Measurement of cognitive load in simulation training

The majority of studies (41; 85%) used a single measure of cognitive load. Most studies (34; 71%) used self-report measures. The two most commonly used literature-based measures were the National Aeronautics and Space Administration–Task Load Index71 (NASA-TLX; 14 studies) and the Paas Scale72 (12 studies). The NASA-TLX tended to be used when the phenomenon of interest was “mental workload” (9/14 studies) and the simulated task was psychomotor (7/14 studies), whereas the Paas Scale was more often employed when CLT was specifically cited (11/12 studies) and the simulated task was primarily cognitive (9/12 studies). Other literature-based measures of cognitive load included the Subjective Mental Effort Questionnaire,39,62 the Multiple Resources Questionnaire,28 the Cooper–Harper Scale,42 and the Borg Scale of Mental Strain.35 Only 1 study attempted to measure the components of cognitive load (i.e., intrinsic load, extraneous load, germane load) separately.69

Seven studies employed physiological measures of cognitive load, including EEG,27 eye movements,27,37,51 pupil fluctuations,45 and heart rate or heart rate variability.40,48,51,55 Seven additional studies used secondary task measures including both response time to a secondary stimulus,17,43,66 memorization tasks,43,57 and accuracy of secondary task performance.30,38

In addition to the three literature-based measurement techniques, we found an additional three studies that used observer ratings to evaluate changes in linguistic features as a function of cognitive load.67,68,70

Back to Top | Article Outline

Sources of validity evidence

All studies collected some validity evidence to support their measure of cognitive load (Table 1, Appendix 2). Validity scores ranged from 0.5 to 3 (mean [SD] = 1.55/5 [0.71]). Seven studies (15%) collected validity evidence across three elements, 23 studies (48%) collected validity evidence across two elements, and 18 studies (38%) collected validity evidence for one element only. Five studies scored less than 1 and were identified as at risk of biased outcomes.30,32,56,59,63

Table 1 details the results of our validity scoring protocol. Most studies (44; 92%) reported content validity evidence, many studies (29; 60%) reported relations with other variables, and a minority of studies (12; 25%) reported internal structure validity evidence. We did not identify any studies with evidence to support the response process or consequences validity elements.

Back to Top | Article Outline

Relations between validity scores and study findings

Overall, we identified 12 studies (25%) which demonstrated a negative relationship between cognitive load and learning. High cognitive load was associated with poor execution of both cognitive15,42 and psychomotor16,33 tasks as well as decreased transfer performance.47,65 Six studies (13%) demonstrated a positive relationship between cognitive load and learning, in terms of both enhanced task performance46,69 and increased learner engagement.64 We categorized 21 studies (44%) as “neutral,” in that they did not observe a relationship between cognitive load and learning. Eight studies (17%) did not report learning or performance data.

Validity scores for studies that reported negative relationships between cognitive load and learning outcomes were higher than those categorized as neutral (Table 2). This difference did not reach statistical significance among studies in medical education, t(4.81) = 1.50, P = .20, d = 0.93. For other research domains, there was a main effect of relationship on validity scores, F(2,25) = 5.62, P = .01, η2 = 0.31. Post hoc tests using the Bonferroni correction showed that validity scores for studies reporting a negative relationship (mean [SD] = 2.13 [0.52]) were significantly higher than for studies categorized as “neutral” (mean [SD] = 1.18 [0.72]). We did not detect any association between cognitive load measurement technique used and relationship observed.

Table 2

Table 2

Back to Top | Article Outline

Comparisons between measurement types

Table 3 displays the mean validity scores for the various cognitive load measurement techniques and instruments. There were no significant differences in the validity scores between medical education and other domains for self-report, t(32) = −0.52, P = .60, d = −0.21, or secondary task measures, t(5) = 0.25, P = .81, d = 0.20. Studies which employed observer ratings and physiological indices tended to have higher validity scores than studies which used self-report and/or secondary task measures, though these differences were also not statistically significant. We did not find any examples of studies using physiological indices or observer ratings of cognitive load in medical education.

Table 3

Table 3

Back to Top | Article Outline

Discussion

In this systematic review we examined how cognitive load has been measured within simulation training environments and evaluated the prevalence of validity evidence collected in support of various cognitive load measurement types and instruments. Although we found some validity evidence to suggest that self-report, secondary task, and physiological indices can all be used to detect cognitive load, in many cases the amount of evidence was quite limited. In this section, we outline key findings and outstanding issues for each cognitive load measurement technique identified and relate these to the broader goal of improving methodological rigor in medical education research.

Back to Top | Article Outline

Self-report

The majority of studies measured cognitive load through self-report. Though most studies used literature-based scales, few studies used multiple measures and/or collected additional validity evidence to support the use of these scales in their specific context.

The relatively low validity scores associated with the use of self-report measures across domains may be partly attributed to significant variations in how these measures were implemented. For example, the original NASA-TLX consists of six dimensions (mental demand, physical demand, temporal demand, performance, effort, fatigue) measured along visual analogue scales.71 Although introduced in the literature as a measure of task workload, in this set of studies, the TLX was most frequently referred to as a measure of mental workload (nine studies). In one study, only some dimensions of the TLX were measured,58 and in other studies the instrument instructions were altered41,60 or omitted entirely.36 Seven studies analyzed TLX dimensions separately, whereas the other seven studies reported total scores only. Similarly, the original Paas scale consists of a single item that measures overall cognitive load in terms of invested mental effort along a nine-point scale.72 In six studies, this scale was adapted by either changing the number of points on the scale52,54,56,59,63 and/or modifying the item to ask about task difficulty instead of invested mental effort.32,54 Modifying the Paas scale requires psychometric properties to be reestablished.19,73 The results of our review suggest that this also applies to the modified use of the NASA-TLX instrument.

Direct assessment of cognitive load components (i.e., intrinsic load, extraneous load, and germane load) represents a key area for future research.73,74 Despite the fact that these components are theorized to have differential impacts on learning,8 only one study in this review attempted to assess their levels separately.69

Back to Top | Article Outline

Secondary task

Like self-report, secondary task measures of cognitive load were also variably implemented across studies. Validity scores for studies that used simple reaction time tasks17,43,47,66 were typically higher than for studies that used more complex tasks,30,38,57 supporting the notion that the type of secondary task selected may have a significant impact on findings.21 Complex secondary tasks in particular may overload working memory resources, leading to the inability to detect changes in cognitive load as a result of training.38 Four of the six studies which measured both primary and secondary task performance failed to demonstrate a relationship between cognitive load and primary task outcomes.17,38,43,57 This may reflect a conscious deprioritization of the secondary task, either to maintain acceptable levels of cognitive load or due to other factors, such as lack of interest.57 Learners employing such a strategy would dedicate working memory resources to learning the primary task while delaying responses to the secondary task, resulting in spuriously high estimates of cognitive load.

Back to Top | Article Outline

Physiological indices

Although the use of physiological indices of cognitive load was associated with higher validity scores than the use of either self-report or secondary task measures, there was still considerable variation depending on the specific index employed. Studies that used EEG and/or eye movement metrics tended to have higher validity scores than those that relied on measures of heart rate or heart rate variability. Current findings suggest that physiological indices are the most sensitive means for detecting variations in cognitive load levels during simulation training.27,40,45,55 Interpretation of these measures, however, is considerably hampered by high levels of inter- and intraindividual variability.45

Back to Top | Article Outline

Observer ratings

The three studies in which cognitive load was rated by an external observer, all from outside the medical/health professions domain, had comparatively high validity scores.67,68,70 Although this may be partly attributed to the need to justify the use of a novel measurement technique, observer ratings appear to warrant further investigation as measures of cognitive load in medical simulation training. Observer ratings may be particularly useful for identifying person- and task-specific indicators of cognitive overload in clinical simulation environments.75

Back to Top | Article Outline

Using validity evidence to enhance instructional design

Ideally, for a theory to inform instructional design on a broad scale, it should first demonstrate consistent results across multiple research contexts. We suggest that the overall conflicting pattern of findings of CLT studies in simulation training may be a result of measurement limitations. In particular, we observed major limitations in the type and amount of validity evidence collected to support different measures of cognitive load. We found that studies which reported fewer validity elements were more likely to report a neutral relationship between cognitive load and learning, whereas studies which reported a greater number of validity elements were more likely to report that high cognitive load was associated with impaired learning.

Depending on the intended use of cognitive load scores,26 some validity elements may be more important than others. For instance, studies evaluating the use of CLT as a design framework may be particularly interested in relations between variables, whereas studies focused on learner assessment would need to ensure internal structure validity. Of significant concern in our findings was the complete lack of evidence in support of measurement response processes and consequences. Response process evidence is critical to understand how trainees experience cognitive load in the context of simulation training and under which conditions the process of measurement may change the outcome (e.g., spuriously high cognitive load as a result of consciously deprioritizing a secondary task). Making design decisions such as establishing a cutoff point for when the level cognitive load is too high during training,15 or using cognitive load levels to determine whether students require additional training,16,17 requires careful study of consequences. Moving the field forward will require systematic design and evaluation of theory-informed educational interventions, with a particular emphasis on collecting validity evidence from the broadest possible range of sources.26

Back to Top | Article Outline

Conclusions

Understanding the impact of working memory demands during simulation-based training requires sensitive and reliable techniques to measure cognitive load. Although a range of measurement techniques appear to be applicable within the context of medical simulation training, the findings of this review suggest that there is considerable opportunity to improve the rigor with which they are implemented and evaluated. Across research domains, the measurement of cognitive load is reliant on retrospective, self-reported data. Further research involving multiple measurement techniques such as physiological indices and observer ratings of cognitive load is warranted. In particular, the use of multiple concurrent measurements would seem advisable, so as to increase the validity of cognitive load measurements and help improve the rigor of studies of CLT in both simulation and in medical education more broadly.

Acknowledgments: The authors wish to thank Ani Orchanian-Cheff for helping to construct and execute the database search and Stephan Hambaz for his assistance in the study selection process.

Back to Top | Article Outline

References

1. Cook DA, Hatala R, Brydges R, et al. Technology-enhanced simulation for health professions education: A systematic review and meta-analysis. JAMA. 2011;306:978–988
2. McGaghie WC, Issenberg SB, Barsuk JH, Wayne DB. A critical review of simulation-based mastery learning with translational outcomes. Med Educ. 2014;48:375–385
3. Sweller J. Cognitive load during problem solving: Effects on learning. Cogn Sci. 1988;12:257–285
4. Sweller J, Van Merriënboer JJG, Paas F. Cognitive architecture and instructional design. Educ Psychol Rev. 1998;10:251–295
5. van Merriënboer JJG, Sweller J. Cognitive load theory and complex learning: Recent developments and future directions. Educ Psychol Rev. 2005;17:147–177
6. Cowan N. Working memory underpins cognitive development, learning, and education. Educ Psychol Rev. 2014;26:197–223
7. Ayres PL. Systematic mathematical errors and cognitive load. Contemp Educ Psychol. 2001;26:227–248
8. van Merriënboer JJ, Sweller J. Cognitive load theory in health professional education: Design principles and strategies. Med Educ. 2010;44:85–93
9. Stroud L, Cavalcanti RB. Hybrid simulation for knee arthrocentesis: Improving fidelity in procedures training. J Gen Intern Med. 2013;28:723–727
10. Posner GD, Hamstra SJ. Too much small talk? Medical students’ pelvic examination skills falter with pleasant patients. Med Educ. 2013;47:1209–1214
11. Choi H-H, van Merriënboer JJG, Paas F. Effects of the physical environment of cognitive load and learning: Towards a new model of cognitive load. Educ Psychol Rev. 2014;26:225–244
12. Naismith LM, Cheung JJH, Ringsted C, Cavalcanti RB. Limitations of subjective cognitive load measures in simulation-based procedural training. Med Educ. 2015;49:805–814
13. Paas F, Sweller JMayer RE. Implications of cognitive load theory for multimedia learning. In: The Cambridge Handbook of Multimedia Learning. 20142nd ed New York, NY Cambridge University Press:27–42
14. Kirschner F, Kester L, Corbalan G. Cognitive load theory and multimedia learning, task characteristics and learning engagement: The current state of the art. Comp Hum Behav. 2011;27:1–4
15. Fraser K, Ma I, Teteris E, Baxter H, Wright B, McLaughlin K. Emotion, cognitive load and learning outcomes during simulation training. Med Educ. 2012;46:1055–1062
16. Yurko YY, Scerbo MW, Prabhu AS, Acker CE, Stefanidis D. Higher mental workload is associated with poorer laparoscopic performance as measured by the NASA-TLX tool. Simul Healthc. 2010;5:267–271
17. Davis DH, Oliver M, Byrne AJ. A novel method of measuring the mental workload of anaesthetists during simulated practice. Br J Anaesth. 2009;103:665–669
18. Andrade AD, Cifuentes P, Mintzer MJ, Roos BA, Anam R, Ruiz JG. Simulating geriatric home safety assessments in a three-dimensional virtual world. Gerontol Geriatr Educ. 2012;33:233–252
19. Paas F, Tuovinen JE, Tabbers H, Van Gerven PWM. Cognitive load measurement as a means to advance cognitive load theory. Educ Psychol. 2003;38:63–71
20. Young JQ, Van Merrienboer J, Durning S, Ten Cate O. Cognitive load theory: Implications for medical education: AMEE guide no. 86. Med Teach. 2014;36:371–384
21. Brünken R, Steinbacher S, Plass JL, Leutner D. Assessment of cognitive load in multimedia learning using dual-task methodology. Exp Psychol. 2002;49:109–119
22. Littlewood K, Park C. Comments on “emotion, cognitive load and learning outcomes during simulation training.” Med Educ. 2013;47:851
23. Moher D, Liberati A, Tetzlaff J, Altman DGPRISMA Group. . Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Ann Intern Med. 2009;151:264–269, W64
24. Naismith L, Hambaz S, Cavalcanti RB. How should we measure cognitive load in postgraduate simulation-based education? Med Educ. 2014;48(suppl 1):128
25. Gaba DM. The future vision of simulation in healthcare. Simul Healthc. 2007;2:126–135
26. Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Adv Health Sci Educ Theory Pract. 2014;19:233–250
27. Soussou W, Rooksby M, Forty C, Weatherhead J, Marshall S. EEG and eye-tracking based measures for enhanced training. Conf Proc IEEE Eng Med Biol Soc. 2012;2012:1623–1626
28. Klein MI, Warm JS, Riley MA, et al. Mental workload and stress perceived by novice operators in the laparoscopic and robotic minimally invasive surgical interfaces. J Endourol. 2012;26:1089–1094
29. Sibbald M, de Bruin AB, Cavalcanti RB, van Merrienboer JJ. Do you have to re-examine to reconsider your diagnosis? Checklists and cardiac exam. BMJ Qual Saf. 2013;22:333–338
30. Prabhakharan P, Molesworth BR, Hatfield J. Impairment of a speed management strategy in young drivers under high cognitive workload. Accid Anal Prev. 2012;47:24–29
31. Youssef Y, Lee G, Godinez C, et al. Laparoscopic cholecystectomy poses physical injury risk to surgeons: Analysis of hand technique and standing position. Surg Endosc. 2011;25:2168–2174
    32. D’Mello SK, Dowell N, Graesser A. Does it really matter whether students’ contributions are spoken versus typed in an intelligent tutoring system with natural language? J Exp Psychol Appl. 2011;17:1–17
    33. Muresan C 3rd, Lee TH, Seagull J, Park AE. Transfer of training in the development of intracorporeal suturing skill in medical student novices: A prospective randomized trial. Am J Surg. 2010;200:537–541
    34. Stirling L, Newman D, Willcox K. Self-rotations in simulated microgravity: Performance effects of strategy training. Aviat Space Environ Med. 2009;80:5–14
    35. Hedman L, Klingberg T, Enochsson L, Kjellin A, Felländer-Tsai L. Visual working memory influences the performance in virtual image-guided surgical intervention. Surg Endosc. 2007;21:2044–2050
    36. Saleem JJ, Patterson ES, Militello L, et al. Impact of clinical reminder redesign on learnability, efficiency, usability, and workload for ambulatory clinic nurses. J Am Med Inform Assoc. 2007;14:632–640
    37. Papadelis C, Kourtidou-Papadeli C, Bamidis P, Albani M. Effects of imagery training on cognitive performance and use of physiological measures as an assessment tool of mental effort. Brain Cogn. 2007;64:74–85
    38. Byrne AJ, Sellen AJ, Jones JG, et al. Effect of videotape feedback on anaesthetists’ performance while managing simulated anaesthetic crises: A multicentre study. Anaesthesia. 2002;57:176–179
    39. Bharathan R, Vali S, Setchell T, Miskry T, Darzi A, Aggarwal R. Psychomotor skills and cognitive load training on a virtual reality laparoscopic simulator for tubal surgery is effective. Eur J Obstet Gynecol Reprod Biol. 2013;169:347–352
    40. Murai K, Hayashi Y, Higuchi K, Saiki T, Fujita T, Maenaka K. Evaluation of teamwork for simulator training based on heart rate variability: Case study of a cadet of ship navigator. Int J Intell Comput Med Sci Image Process. 2011;4:93–100
    41. Coderre S, Anderson J, Rikers R, Dunckley P, Holbrook K, McLaughlin K. Early use of magnetic endoscopic imaging by novice colonoscopists: Improved performance without increase in workload. Can J Gastroenterol. 2010;24:727–732
    42. Duffy VG, Ng PPW, Ramakrishnan A. Impact of a simulated accident in virtual training on decision-making performance. Int J Industr Ergon. 2004;34:335–348
    43. Sauer J, Hockey GRJ, Wastell DG. Effects of training on short- and long-term skill retention in a complex multiple-task environment United Kingdom. Ergonomics. 2000;43:2043–2064
    44. Svensson E, Angelborg-Thanderz M, Borgvall J, Castor M. Skill decay, reacquisition training, and transfer studies in the Swedish Air Force: A retrospective review. In: Individual and Team Skill Decay: The Science and Implications for Practice. 2013 New York, NY Routledge:258–281
      45. Reiner M, Gelfeld TM. Estimating mental workload through event-related fluctuations of pupil area during a task in a virtual world. Int J Psychophysiol. 2014;93:38–44
      46. Kopainsky B, Sawicka A. Simulator-supported descriptions of complex dynamic problems: Experimental results on task performance and system understanding. Syst Dyn Rev. 2011;27:142–172
      47. Batson CD, Brady RA, Peters BT, et al. Gait training improves performance in healthy adults exposed to novel sensory discordant conditions. Exp Brain Res. 2011;209:515–524
      48. Yuviler-Gavish N, Yechiam E, Kallai A. Learning in multimodal training: Visual guidance can be both advantageous and disadvantageous in spatial tasks. Int J Hum Comput Stud. 2011;69:113–122
      49. Singh AL, Tiwari T, Singh IL. Effects of automation reliability and training on automation-induced complacency and perceived mental workload. J Ind Acad Appl Psychol. 2009;35:9–22
        50. Leung GTC, Yuce G, Duffy VG. The effects of virtual industrial training on mental workload during task performance. Hum Factors Ergon Manuf. 2010;20:567–578
        51. Dahlstrom N, Nahlinder S. Mental workload in aircraft and simulator during basic civil aviation training. Int J Aviat Psychol. 2009;19:309–325
        52. Erlandson BE, Nelson BC, Savenye WC. Collaboration modality, cognitive load, and science inquiry learning in virtual inquiry environments. Educ Technol Res Dev. 2010;58:693–710
        53. Singh AL, Tiwari T, Singh IL. Performance feedback, mental workload and monitoring efficiency. J Ind Acad Appl Psychol. 2010;36:151–158
          54. Horz H, Winter C, Fries S. Differential benefits of situated instructional prompts. Comput Hum Behav. 2009;25:818–828
          55. Saus E-R, Johnsen BH, Eid J, Riisem PK, Andersen R, Thayer JF. The effect of brief situational awareness training in a police shooting simulator: An experimental study. Mil Psychol. 2006;18(suppl):S3–S21
          56. Salden RJCM, Paas F, van der Pal J, van Merrienboer JJG. Dynamic task selection in flight management system training. Int J Aviat Psychol. 2006;16:157–174
          57. Kearns SK The effectiveness of guided mental practice in a computer-based single pilot resource management (SRM) training program [dissertation]. 2007 Minneapolis, Minn Capella University
          58. Bolton AE Immediate versus delayed feedback in simulation-based training: Matching feedback delivery timing to the cognitive demands of the training exercise [dissertation]. 2006 Orlando, Fla University of Central Florida
          59. Camp G, Paas F, Rikers R, van Merrienboer J. Dynamic problem selection in air traffic control training: A comparison between performance, mental effort and mental efficiency. Comput Hum Behav. 2001;17:575–595
          60. Teague RC Training for performance in demanding tasks: The effectiveness of demand presentation and multiple-example simulator training [dissertation]. 1997 Fairfax, Va George Mason University
          61. de Croock MBM, van Merrienboer JJG, Paas FGWC. High versus low contextual interference in simulation-based training of troubleshooting skills: Effects on transfer performance and invested mental effort. Comput Hum Behav. 1998;14:249–267
            62. Neerincx MA, de Greef HP. Cognitive support: Extending human knowledge and processing capacities. Hum Comput Interact. 1998;13:73–106
            63. Salden RJCM, Paas F, Broers NJ, van Merrienboer JJ. Mental effort and performance as determinants for the dynamic selection of learning tasks in air traffic control training. Instr Sci. 2004;32:20
            64. Darabi AA, Nelson DW, Paas F. Learner involvement in instruction on a complex cognitive task: Application of a composite measure of performance and mental effort. J Res Technol Educ. 2007;40:39–48
            65. Fraser K, Huffman J, Ma I, et al. The emotional and cognitive impact of unexpected simulated patient death: A randomized controlled trial. Chest. 2014;145:958–963
            66. Rojas D, Haji F, Shewaga R, Kapralos B, Dubrowski A. The impact of secondary-task type on the sensitivity of reaction-time based measurement of cognitive load for novices learning surgical skills using simulation. Stud Health Technol Inform. 2014;196:353–359
            67. Huttunen KH, Keränen HI, Pääkkönen RJ, Päivikki Eskelinen-Rönkä R, Leino TK. Effect of cognitive load on articulation rate and formant frequencies during simulator flights. J Acoust Soc Am. 2011;129:1580–1593
            68. Huttunen K, Keränen H, Väyrynen E, Pääkkönen R, Leino T. Effect of cognitive load on speech prosody in aviation: Evidence from military simulator flights. Appl Ergon. 2011;42:348–357
            69. Kluge A, Grauel B, Burkolter D. Combining principles of cognitive load theory and diagnostic error analysis for designing job aids: Effects on motivation and diagnostic performance in a process control task. Appl Ergon. 2013;44:285–296
            70. Khawaja MA, Chen F, Marcus N. Analysis of collaborative communication for linguistic cues of cognitive load. Hum Factors. 2012;54:518–529
            71. Hart SG, Staveland LEHancock PA, Meshkati N. Development of NASA-TLX (task load index): Results of empirical and theoretical research. In: Human Mental Workload. 1988 Amsterdam, The Netherlands North Holland Press
            72. Paas FGWC. Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. J Educ Psychol. 1992;84:429–434
            73. de Jong T. Cognitive load theory, educational research, and instructional design: Some food for thought. Instr Sci. 2010;38:105–134
            74. Leppink J, Paas F, Van der Vleuten CP, Van Gog T, Van Merriënboer JJ. Development of an instrument for measuring different types of cognitive load. Behav Res Methods. 2013;45:1058–1072
            75. Cavalcanti RB, Naismith L, Sibbald M. Applications of cognitive load theory to learning clinical reasoning in the workplace. Unpublished manuscript.
            Back to Top | Article Outline

            MEDLINE Search Strategy

            Table

            Table

            Back to Top | Article Outline

            Summary of Studies Reviewed

            Table

            Table

            © 2015 by the Association of American Medical Colleges