Surgical education is undergoing a dramatic change in response to resident work hour restrictions, more demands for efficient use of the operating room (OR), and public concerns for patient safety when physicians learn new procedures on patients. Surgical simulators facilitate the acquisition of new skills through repetitive and deliberate practice on inanimate models in a safe and nonthreatening training environment and have been embraced by national surgical organizations for resident training outside of the OR.1 Proficiency-based simulator training that sets training goals based on expert-derived performance measures has been demonstrated to improve operative performance.2–4 Nevertheless, several studies have previously demonstrated that although learners can reach expert-level criteria on the simulator, their performance in the OR is not on par with experts.2,5 A number of factors could explain this incomplete transfer of simulator-acquired skill to the OR, including fidelity differences between the training and testing environment, performance anxiety of learners, inadequate performance assessment, or differences in task demands and mental workload.3,6
The term mental workload can be thought of as reflecting the amount of attention an operator can direct to a task at any given moment.3 In general, the mental workload associated with an easy task is low, whereas difficult tasks produce higher mental workload. Mental workload can also be described by the difference between task demands and available attentional resources.3 Thus, a high workload task that is mentally demanding leaves little or no spare attentional capacity to deal with new or unexpected events, and the likelihood of performance errors increases.7
Often, mental workload is inferred from traditional performance metrics (task time and errors), but there are other ways to measure mental workload. Operators can be asked to perform one or more tasks simultaneously with the primary task of interest. These secondary tasks provide an alternative measure of mental workload by indexing spare attentional capacity.3,7,8 However, it is also possible to ask individuals to rate their subjective impressions of mental workload.3 These subjective reports can be a valuable source of information concerning mental workload in two fundamental ways. First, they can be used to identify the specific sources of task demands contributing to mental workload. Second, they can reveal workload differences between two individuals who have comparable performance scores.
One of the most widely used instruments for measuring subjective mental workload is the National Aeronautics and Space Administration-Task Load Index (NASA-TLX).9 The NASA-TLX provides an overall index of mental workload as well as the relative contributions of six subscales: mental, physical, and temporal task demands; and effort, frustration, and perceived performance (Fig. 1). The psychometric characteristics of the NASA-TLX are well documented, and it has been validated and used initially by the Human Performance Group at the NASA Ames Research Laboratory as a tool for subjective evaluation of individual's workload in flight simulation,9,10 air traffic control studies,11 automated and manual control,12 and vigilance tasks.13 More recently, it has been used in a variety of tasks outside of the aeronautical field including the medical domain14–16 for assessment of workload perception, but its use for assessment of surgical skill remains limited.
The Institute of Medicine report, “To Err is Human,” stated that as many as 44,000 to 98,000 people die yearly in US hospitals as a consequence of medical errors.17 According to the Institute of Medicine, the rate of medical errors is higher in intensive care units, ORs, and emergency departments—all high workload environments. In their study with airway management tasks, Weinger et al14 reported lower workload ratings during less complex procedures (as determined by expert consensus) and vice versa. Young et al,18 in their perianesthesia review, highlighted the importance of developing a method for measuring workload to prevent errors and provide safe and cost-effective care and recommended the NASA-TLX tool as a sensitive assessment instrument. Regarding laparoscopic surgery, Klein et al19 found that rotating the camera angle from 0° to 90° increased perceived mental workload in a group of trainees. Others have demonstrated that increased surgeon fatigue and mental demands during laparoscopy may increase the duration of the procedure and the number of errors made.20,21
The objective of this study was to assess the relationship of workload and performance during simulator training and in the OR on a complex laparoscopic task across three studies and to determine the value of the NASA-TLX tool as a supplemental and potentially predictive performance metric.
We analyzed the NASA-TLX workload and performance data from three separate institutional review board- and Institutional Animal Care and Use Committee-approved trials previously completed by the authors.22,23 Participants of these trials were all novices (second-year medical students or senior premedical students) without any prior clinical, surgical, or simulator experience. All participants completed a baseline questionnaire, providing information about their demographics and prior laparoscopic, simulator, and video game experience. After this, they watched a video tutorial about laparoscopic suturing and knot tying, and their baseline performance on the fundamentals of laparoscopic surgery (FLS)6 suture model was assessed using a previously published formula [Objective score = 600 − task completion time (seconds) − (10 × accuracy error) − (100 × security error)2 where 600 represents the maximum time in seconds participants were allowed to complete the task (ie, 10 minutes)]. The accuracy error was determined by measuring the distance (in mm) between the suture and the premarked targets and measuring the gap (in mm) between the two premarked targets after the knot was tied (a gap of <2 mm was considered no error). The security error was determined by cutting the suture tails to 1 cm and inserting small suture scissors within the loop and spreading in an effort to disrupt the knot. If the knot slipped or got disrupted, a penalty of 1 or 2 was applied, respectively, whereas if the suture broke without the knot being disrupted, there was no assigned penalty. Participants of the three studies followed the same proficiency-based simulator curriculum in laparoscopic suturing and knot tying on the FLS suture model until they achieved a previously published expert-level performance score (score = 512, with higher scores reflecting better performance) based on time and errors on two consecutive plus 10 additional attempts.2 To assess skill transfer after training completion, all participants were tested on a live porcine Nissen fundoplication model where they placed three gastrogastric sutures laparoscopically, and their suturing performance was assessed using the same formula as for the FLS suture model. Participants also completed the NASA-TLX workload assessment questionnaire at baseline, during training, and after the porcine test. In addition, inadvertent injuries to surrounding structures in the porcine model were recorded to assess safety.
The FLS training/testing occurred in the simulation laboratory in the presence of an instructor who was available to provide performance feedback without any external “stressors.” In the OR, more people were involved, including the instructor, a laparoscopy fellow driving the camera, a faculty rating the subject's performance, and the vivarium anesthesia person. In addition, in the OR, participants interacted with the anesthesia machine, the intubated pig, the diaphragmatic motion and limited abdominal cavity working space of the pig during task performance, and a slightly different port positioning compared with practice on the simulator.
For the purposes of this study, participant performance and NASA-TLX workload scores from each of these studies at baseline, at training completion, and during the porcine test were analyzed and compared using repeated measures analysis of variance. Power calculation revealed that our sample size was adequate to detect a 15% difference in the NASA mean scores among groups with a power of 0.83. To assess the relationship between performance and NASA scores, Pearson correlation was used. Data were reported as mean ± SD, and P < 0.05 was considered significant.
The NASA-TLX Workload Assessment Questionnaire
For workload assessment, the NASA-TLX was used. This questionnaire uses a 20-point visual analog scale to measure mental workload along six subscales (Fig. 1). The assessment is achieved by requesting the respondents to rate mental, physical, and temporal demands imposed by the completion of the task, as well as the level of frustration they experienced, number of performance concerns they had, and overall effort required to complete the task. Mental and physical demands determine the level of intellectual/perceptual and physical work required for completion of a task, respectively. The temporal demand provides the measure for time pressure during the completion of the task. The effort component assesses mental and physical work required to perform at a certain proficiency level. The frustration component evaluates the level of stress associated with completion of the task. The start and end points for the scales used to quantify each of these five components are low and high, respectively. The sixth component, performance, was developed to assess the degree of the trainee's satisfaction on completion of the task. The endpoints for the performance component are good and poor.24
No participant had prior experience with laparoscopy or with the FLS simulator. Participant demographics and baseline characteristics are listed in Table 1. Participants' performance score at baseline was 17 ± 45; only 5 of 28 participants completed the suturing and knot tying task on the FLS model within the allocated 10 minutes. After 8 ± 5 training sessions and 53 ± 12 repetitions, all participants achieved proficiency with a mean score of 523 ± 12 (P < 0.001 compared with baseline). However, participant performance deteriorated during the live porcine test in the OR with a mean score of 268 ± 132 (P < 0.001 compared with simulator proficiency and P < 0.001 compared with baseline performance; Fig. 2).
In contrast, NASA workload scores were high at baseline (86 ± 15), declined at proficiency after training (59 ± 20; P < 0.01 vs. baseline), and increased again in the OR (73 ± 25; P < 0.01 vs. baseline and vs. Proficiency), mirroring the changes in performance scores (Fig. 2). Also, analysis of the six subscale scores demonstrated that each of the subscales, except temporal demand, showed the same trend during the training period as the total NASA workload score (Fig. 3). These findings were true across all studies without significant differences.
Overall subjective NASA-TLX workload scores were significantly and negatively correlated with the objective performance scores across studies (r = −0.5, P < 0.001). Furthermore, participants with higher workload scores caused more inadvertent injuries to adjacent structures (r = 0.38, P < 0.05). Of the six subscales of the NASA-TLX questionnaire, the mental and physical demands at baseline demonstrated correlations with workload (r = 0.52–0.82; P < 0.05), inadvertent injuries (r = 0.51; P < 0.01), and the suturing performance on the live porcine model (r = −0.35; P = 0.05 only for mental subscale).
Simulator training facilitates the acquisition of new skills in a low stress environment and permits the objective evaluation of performance. Numerous studies have confirmed the value of simulator-based skills training by demonstrating improved trainee performance in the OR.2,5,25
Nevertheless, available evidence suggests that this skill transfer is incomplete. Novice trainees who complete proficiency-based training on the simulator and achieve expert levels of performance are not able to demonstrate the same level of performance in the OR.2,5 This incomplete skill transfer could be caused by several factors, including trainee workload. In this study, we aimed to assess the subjective workload associated with laparoscopic performance using the NASA-TLX workload assessment tool. We found that participant workload had an inverse relationship with surgical performance. Trainees exhibited their worst performance at baseline when they also experienced the highest workload while learning a new and difficult task. Conversely, they achieved their best performance at the end of their proficiency-based training, and this was accompanied by their lowest workload scores. In this regard, the subjective workload scores corroborate the performance measures and provide a complementary picture of mental workload. In addition, we found that the transition to the operating room, a complex environment generally accepted to be more demanding and challenging than the simulation laboratory, resulted in a performance decrement. Moreover, this decline was accompanied by a concomitant increase in workload scores. Further, participants with higher workload scores in the OR caused more inadvertent injuries, indicating a possible detrimental impact of increased workload not only on performance but also on safety. Our findings, as shown in Figure 3, are congruent with those reported in the literature; higher physical and mental demands can cause more errors because of increased fatigue and loss of concentration.15,20 Weinger et al14 compared the workload of different types of airway management techniques and concluded that overall workload ratings were lower during the performance of the less complex techniques. In addition, when training on a new task, a learner's workload is expected to decrease with practice and may be a good indicator of a learner's increased comfort with the task.
Importantly, we found an association between increased mental and physical workload on the simulator at baseline and increased workload, inferior performance, and more injuries in the OR. This finding may have important implications and suggests that initial workload ratings obtained on simulators may predict performance levels during transfer to the clinical environment. A result such as this offers educators the possibility that initial subjective workload ratings might help identify those students who are in need of additional training. Furthermore, by identifying learners with higher workload, learner-specific interventions such as supplemental training or training in stress coping strategies may boost learner confidence, skill acquisition, and transfer. Moreover, we found that the transition to the OR to be more demanding and challenging and resulted in a performance decrement.23 In particular, trainee NASA workload scores increased from 56 ± 22 on the simulator to 78 ± 27 (P < 0.01) in the OR, and their heart rate increased from 98 ± 14 to 115 ± 18 beats/min (P < 0.001). These are intriguing possibilities, but additional research regarding the predictive validity of the NASA-TLX for laparoscopic performance is clearly needed.
Our findings support the value of the NASA-TLX tool for workload assessment during the acquisition of surgical skill on simulators. The changes in workload and performance scores as described were similar across the three studies included in this report and across all individual participants. Young et al,18 in their review, highlighted the importance of developing a method for measuring workload to prevent errors and provide safe and cost-effective care and recommended the NASA-TLX tool as a sensitive assessment instrument.
Although we have no data in this study that allow us to compare the NASA-TLX scores between individuals with different experience levels (ie, experts vs. novices), we have shown that NASA-TLX scores decline after proficiency-based training and are sensitive to changes in experience level and performance of novices. We elected to include novices in our studies to control for the confounding effect of prior clinical experience on skill acquisition and workload. Nevertheless, we believe that the findings of our study would extend to more experienced individuals. In fact, in a previous study by our group that examined camera navigation abilities, we found that more experienced surgeons reported lower workload scores on the NASA-TLX compared with those with less experience.26
We cannot exclude the possibility that the student–faculty interaction may have impacted the perceived student workload during testing sessions. However, this interaction was minimal and is unlikely to have had a strong influence on student's workload. An instrument similar to the NASA-TLX tool that provides quantitative information of operator workload offers a complementary means for obtaining information about task demands during simulator training. It is relatively easy and quick to administer and provides an additional source of data that can corroborate more traditional performance metrics (eg, task completion time and errors). Although traditional performance metrics describe well the outcome of a task, they cannot give insight into the level of effort the performer invested to achieve that outcome. Two individuals may achieve the same performance outcome at significantly different levels of workload, and incorporating a subjective workload assessment tool such as the NASA questionnaire seems imperative for training.
In conclusion, the NASA-TLX workload assessment tool provides a valid measure of workload, task difficulty, and learner comfort with a task during simulator training in laparoscopic techniques and during transition to the OR. It should be incorporated as an additional metric during simulator training, because it provides performance information that is otherwise not available to the learner or the trainer.
1.Scott DJ, Dunnington GL. The new ACS/APDS Skills Curriculum: moving the learning curve out of the operating room. J Gastrointest Surg
2.Korndorffer JR Jr, Dunne JB, Sierra R, Stefanidis D, Touchard CL, Scott DJ. Simulator
training for laparoscopic suturing using performance goals translates to the operating room. J Am Coll Surg
3.O'Donnell RD, Eggemeier FT. Workload
assessment methodology. In: Boff KR, Kaufman L, Thomas JP, eds. Handbook of Perception and Performance
. Vol. 2. Cognitive Processes and Performance
. Vol. 42. New York, NY: Wiley; 1986:1–49.
4.Szabo Z, Hunter J, Berci G, Sackier J, Cuschieri A. Analysis of surgical movements during suturing in laparoscopy. Endosc Surg Allied Technol
5.Seymour NE, Gallagher AG, Roman SA, et al. Virtual reality training improves operating room performance: results of a randomized, double-blinded study. Ann Surg
2002;236:458–463; discussion 463–464.
6.Society of American Gastrointestinal and Endoscopic Surgeons. Fundamentals of Laparoscopic Surgery (FLS
). 2005. Available at: http://www.flsprogram.org
. Accessed March 13, 2009.
7.Carswell CM, Clarke D, Seales WB. Assessing mental workload
during laparoscopic surgery. Surg Innov
8.Stefanidis D, Scerbo MW, Korndorffer JR Jr, Scott DJ. Redefining simulator
proficiency using automaticity theory. Am J Surg
9.Hart SG, Staveland LE. Development of NASA-TLX
: results of empirical and theoretical research. In: Hancock PA, Meshkati N, eds. Human Mental Workload
. Amsterdam: Elsevier; 1987.
10.Nygren TE. Psychometric properties of subjective workload
measurement techniques: implications for their use in the assessment of perceived mental workload
. Hum Factors
11.Metzger U, Parasuraman R. Automation in future air traffic management: effects of decision aid reliability on controller performance and mental workload
. Hum Factors
12.Endsley MR, Kaber DB. Level of automation effects on performance, situation awareness and workload
in a dynamic control task. Ergonomics
13.Szalma JL, Warm JS, Matthews G, et al. Effects of sensory modality and task duration on performance, workload
, and stress in sustained attention. Hum Factors
14.Weinger MB, Vredenburgh AG, Schumann CM, et al. Quantitative description of the workload
associated with airway management procedures. J Clin Anesth
15.Weinger MB, Herndon OW, Zornow MH, Paulus MP, Gaba DM, Dallen LT. An objective methodology for task analysis and workload
assessment in anesthesia providers. Anesthesiology
16.Levin S, France DJ, Hemphill R, et al. Tracking workload
in the emergency department. Hum Factors
17.Institute of Medicine. To Err is Human: Building a Safer Health System. 2000. Available at: http://www.nap.edu/openbook.php?isbn=0309068371
. Accessed May 18, 2009.
18.Young G, Zavelina L, Hooper V. Assessment of workload
using NASA task load index in perianesthesia nursing. J Perianesth Nurs
19.Klein MI, Riley MA, Warm JS, Matthews G. Perceived mental workload
in an endoscopic surgery simulator
. In: Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting
. Santa Monica, CA: HFES; 2005:1014–1018.
20.Schuetz M, Gockel I, Beardi J, et al. Three different types of surgeon-specific stress reactions identified by laparoscopic simulation in a virtual scenario. Surg Endosc
21.Munch-Petersen HR, Rosenberg J. [Physical and mental strain on the surgeon during minimally invasive surgery.] Ugeskr Laeger
22.Stefanidis D, Korndorffer JR Jr, Markley S, Sierra R, Heniford BT, Scott DJ. Closing the gap in operative performance between novices and experts: does harder mean better for laparoscopic simulator
training? J Am Coll Surg
23.Prabhu A, Smith W, Yurko YY, Acker C, Stefanidis D. Increased stress levels may explain the incomplete transfer of simulator
-acquired skill to the operating room. Surgery
24.Derossis AM, Fried GM, Abrahamowicz M, Sigman HH, Barkun JS, Meakins JL. Development of a model for training and evaluation of laparoscopic skills. Am J Surg
25.Peters JH, Fried GM, Swanstrom LL, et al. Development and validation of a comprehensive program of education and assessment of the basic fundamentals of laparoscopic surgery. Surgery
26.Stefanidis D, Haluck R, Pham T, et al. Construct and face validity and task workload
for laparoscopic camera navigation: virtual reality versus videotrainer systems at the SAGES Learning Center. Surg Endosc