Secondary Logo

Journal Logo

Economics, Education, and Policy: Research Report

Intraoperative Noise Increases Perceived Task Load and Fatigue in Anesthesiology Residents

A Simulation-Based Study

McNeer, Richard R. MD, PhD*; Bennett, Christopher L. PhD; Dudaryk, Roman MD*

Author Information
doi: 10.1213/ANE.0000000000001067


During the past 5 decades, the operating room (OR) has acquired the unenviable distinction of being one of the noisiest of clinical environments, with sound pressure levels increasing an average 0.4 dB per year.1 Commonly performed surgeries, such as orthopedic procedures, can have sustained sound levels exceeding 100 dB for 40% of the time,2 far exceeding the limit recommended by the Environmental Protection Agency of 45 dB (day-night average sound level) to avoid annoyance and maintain 100% speech intelligibility3 and the World Health Organization guidelines of ≤55 dB (equivalent sound level during daylight [16] hours).4 There has been an increasing appreciation of the harmful effects noise pollution has on caregiver health, cognition, and performance.5,6

Two of the noisiest periods during surgery coincide with anesthetic induction and emergence.7 Hypothetically, less frequent but potentially catastrophic intraoperative emergencies such as anaphylaxis, pneumothorax, and hemorrhage also would be associated with elevated noise levels, although this has not been studied. Therefore, anesthesiologists may be particularly susceptible to noise exposure, with 84% subjectively reporting that noise has a negative impact on their work.8 Recently, 2 laboratory studies reported a negative effect of noise on accuracy and response times for detecting audible changes in oxygen saturation9 and on anesthesiology resident performance on mental efficiency and short-term memory tests.10

OR noise pollution is therefore a significant clinical problem impacting caregiver well-being, and by extension, patient safety. It is difficult, however, to conduct clinical trials to investigate rigorously this problem because of difficulty in controlling for the real-world complexity present in operating suites and because of concern with testing interventions in real patient-care settings. An important study recently reported a lower incidence of postoperative complications after intraoperative noise levels were decreased. However, that study was not randomized nor blinded and was conducted in a pediatric cardiac OR, a specialized perioperative environment.11

Medical simulators offer a safer and more controlled venue for performing randomized controlled studies investigating the noise problem and the effect of interventions, but the degree to which findings are extrapolatable to the clinical arena depends on simulation realism. The clinical auditory environment (soundscape) is an important simulation component with respect to clinical noise. However, current OR simulators lack the native capability to simulate realistic clinical soundscapes beyond rendering the pulse oximeter auditory display and annunciating medical device alarm sounds.

Recently, we sought to address this gap by retrofitting our fully functional replica of an OR with a high-fidelity audio reproductive system to add immersive, auditory realism to the simulation experience. We refer to our simulator as NOISE (Noisy OR Immersive Simulation Environment). In a separate study, we investigated the acoustical environments in NOISE and several ORs at our institution. The NOISE has a shorter reverberation time (benefiting intelligibility) relative to the ORs (565 vs 700 milliseconds) likely because the total space (room volume) in NOISE is smaller and construction materials are different between the 2 rooms. However, the equivalent continuous noise levels are similar between the NOISE (76.5 dB) and a typical OR at our institution (76.0 dB).11a

The primary objective of the current study was to perform randomized and controlled simulation experiments in our NOISE to test the hypothesis that OR noise increases perceived task load and fatigue, which are contributors to workplace stress. A secondary objective of this study was to propose and test the plausibility of a new psychometric model that combines psychometric indicators of task load and fatigue into an instrument for measuring perceived stress. Development of new techniques for measuring the psychological variables in an experimental setting will help increase our basic understanding of the underlying psychological constructs at the interface between environment and caregiver and will augment the effort to characterize and mitigate the harmful effects of clinical noise.


This study was approved by IRBs at the University of Miami-Miller School of Medicine and the Jackson Health System. Written informed consent was obtained from all subjects. This study was funded by the Anesthesia Patient Safety Foundation. The funder had no role in the study design, study conduct, or writing of the manuscript.

The NOISE Simulation Setup

Our OR simulator at the University of Miami-Jackson Memorial Hospital Center for Patient Safety contains an METI human patient simulator (Medical Education Technologies, Inc., Sarasota, FL), an anesthesia workstation (Datex-Ohmeda, GE Healthcare, Little Shalfont, UK), and associated medical alarm equipment. For this study, we installed 4 corner speakers powered by an audio interface (MOTU Traveler, MOTU, Cambridge, MA). Then, quadriphonic soundscapes (Supplemental Digital Content 1, Video 1, were composed by the use of recordings of typical sounds obtained in our clinical ORs (e.g., telephone ringing, suctioning, door closing, shoe skidding, and stepstool and instrument clanging). The desired effect was to create a sound field in which discrete sound sources would be perceived by subjects as coming from distinct areas of the simulator room, similar to where they usually originate in our clinical ORs (Fig. 1). Two 30-minute soundscapes were composed in this fashion with the use of open-source audio editing software (

Figure 1:
Room layout of the NOISE (Noisy OR Immersive Simulation Environment) simulator showing the locations of speakers, anesthesia workstation, and simulated patient. The subject sits at the head of the operating room bed facing the workstation. From this location, the subject hears representative operating room sounds emanating from various locations in the room.

A custom multimedia graphical user interface (GUI) was designed with the use of MATLAB® R2010a (The Mathworks, Inc., Natick, MA) to run on a PC laptop (Dell® Inspiron) inconspicuously located on top of the anesthesia workstation (Supplemental Digital Content 1 and 2, Video 1 and 2,, The GUI served 3 functions. First, it was capable of displaying simulated patient vital signs and ventilator variables on a 15-inch liquid crystal display (i.e., LCD) monitor connected to the laptop and readily visible to subjects. The GUI continuously updates screen variables by reading XML files pertaining to a custom 30-minute simulation script. Second, the GUI was responsible for rendering the pulse oximeter auditory display and any triggered audible alarms (based on typically used alarm thresholds) to the laptop speaker, which has similar specifications to the workstation speaker. The audible alarms were designed to comply with the International Electrotechnical Commission standard medical audible alarm sounds (60601-1-8). Third, the GUI had a text input/logging feature that allows subjects to enter responses relevant to simulated patient status and answers to distractor task questions via a standard keyboard and mouse. The distractor task questions consisting of a set of 100 questions that were variations of 20 distinct questions related to the practice of anesthesiology were menial and tedious, usually requiring simple calculations to be performed to arrive at the answer (Appendix 1).

Although our custom GUI logged text entry content and response times relating to simulated patient care and the answering of distractor questions, these variables were not treated as dependent variables under the null hypothesis of the current experiment. We currently are using these preliminary data to guide the development of psychometric instruments for assessing performance in screen-based simulation (Richard R. McNeer, MD, PhD, Roman Dudaryk, MD, Nicholas B. Nedeff, MD, Christopher L. Bennett, MD, unpublished data, 2015).

The simulated clinical soundscapes consisted of combinations of the quadraphonic soundscapes and the GUI sonification of the script-responsive pulse oximeter display and alarm sounds. Specifically, the noise condition was achieved by combining GUI output with an accompanying quadraphonic soundscape played through the 4 corner speakers. To achieve the Quiet condition, the pulse oximeter and alarm sounds were rendered without an accompanying soundscape. From a position in front of the anesthesia workstation at approximately head level while seated, the sound levels (equivalent [peak]) for the noise and quiet conditions were 76.5 (93.0) and 72 (84), respectively. These levels are comparable with sound levels present in our ORs.11a When we account for the logarithmic basis of the decibel unit, the noise soundscape is 3 times and 8 times louder than the quiet soundscape in terms of equivalent and peak levels, respectively.

Experimental Procedure

The experiment was designed to investigate the impact of noise on subject perception of task load and fatigue while subjects cared for a simulated patient and simultaneously answered a set of distractor questions in a simulated OR. There were 2 sessions, spaced approximately 1 week apart (Fig. 2). On the morning of the first scheduled session, each subject was given a sheet with an example of the 20 types of questions to be used as a distractor task (Appendix 1). The purpose of the distractor task was to decrease the likelihood that subject attention would be directed solely and continuously to the GUI. The subject was given enough time to determine that he or she knew how to answer each question type. Questions that could not be answered were explained to the subject by the investigator.

Figure 2:
Schematic representations showing experimental design. A, General layout of experimental sessions. Subjects first completed the Perceived Stress Scale (PSS) instrument, which has been validated to measure baseline stress levels. Subjects then relaxed in a quiet room for 15 minutes before giving 2 simulated lunch breaks. The first lunch break was uneventful with vital signs remaining normal, whereas the second lunch break contained 3 intraoperative crises. After completion of the second lunch break, the subjects completed the NASA Task Load Index (NASA-TLX) and Swedish Occupational Fatigue Inventory (SOFI) instruments. Session length lasted between 2 and 2.5 hours. B, Schematic representation showing the experimental design. At the start of session 1, subjects were randomized into either “Quiet” or “Noise” groups. This “Soundscape” condition applied to both lunch breaks. About a week later, the subjects were crossed-over to the opposite Soundscape condition for session 2. The Soundscape grouping represented a within-subjects factor. Because subjects were exposed either to Quiet or Noise in the first session, a between-subjects factor was also inherent to the experimental design and is referred to as “Order.”

Each session was composed of 2 consecutive simulated lunch breaks (Fig. 2A). The first lunch break followed a script that was uneventful during which minor fluctuations in vital signs and machine variables occurred, whereas the second lunch break followed an eventful script with 3 intraoperative emergency scenarios. The order of sessions and comprising lunch breaks were constant for all subjects. On the day of the first session, subjects were randomized into 1 of the 2 Soundscape groups (Fig. 2B). Group 1 experienced the quiet condition during the first session and was then crossed-over to the noise condition during the second session 1 week later. Group 2 was exposed to the Soundscape levels in reverse. Both sessions followed the same process flow.

Sessions began between 12 PM and 1 PM. Each subject was instructed to have eaten lunch, could not have worked the previous night, and had to be on a rotation involving active clinical duty. At the start of each session, subjects completed the Perceived Stress Scale (PSS) 14-item instrument12 (0–100) so their baseline fatigue level could be assessed. The subject also was instructed to use the restroom if needed before the start of each session. To gather preliminary data for another study, a portable, wireless, 4-lead biosensor was attached to the subject to record the electrocardiogram. The electrocardiogram data were recorded throughout all phases of the simulation experiments to obtain preliminary physiologic data not related to the null hypothesis being tested. We are using these data to develop novel methodology to measure physiologic responses including heart rate variability, and the results will be presented in a future manuscript.13,14

A 15-minute rest period occurred during which the subject was instructed to sit comfortably in a quiet room and to relax by clearing the mind without falling asleep. Next, the subject was brought to the simulated OR, asked to sit at the head of the bed as shown in Figure 1, and to adjust the height of the seat to ensure that the GUI display on the anesthesia workstation could be viewed comfortably. He or she was asked to familiarize himself or herself with the GUI layout and with the location of the text entry box. The subject was instructed that he or she was to be the lunch person and was given 2 simulated lunch breaks and that 2 tasks were to be accomplished: (1) The questions (supplied on a handout) needed to be answered and entered into the GUI; (2) The simulated patient was to be monitored for any changes in patient vital signs or ventilator variables. The subject was instructed to document via the text entry box when problems with patient care were detected, then a differential would need to be generated, and an action or plan for intervention or therapy would need to be formulated. Subjects were instructed to enter the information pertaining to detection (e.g., tachycardia, decreased end-tidal CO2), differential (e.g., hypovolemia), and intervention (e.g., give phenylephrine) into the GUI, pressing the return key between each thought or item. For instance, if the differential consisted of 2 items, each would be entered separately.

Instructions were reiterated that both completion of the anesthesia-related questions and monitoring/care of the patient needed to be accomplished. If a resident asked which task was more important, he or she was told to take care of the patient and answer the questions. At the start of a lunch break, sign out was brief and the subject was told that “this is an ASA physical status I 20-year-old man who came from home for an elective left inguinal hernia repair under general anesthesia. He has no allergies and good IV access. We are currently in the maintenance period, and everything has been going fine.” The simulation was then started for the first lunch break that lasted 30 minutes. The second lunch break immediately followed, and sign out was similar to the first except that the procedure was a right inguinal hernia repair. At the conclusion of the second lunch break simulation, the NASA Task Load Index (NASA-TLX)15 for assessing perceived task load and the Swedish Occupational Fatigue Inventory (SOFI)16 for assessing fatigue were administered. These validated psychometric instruments are detailed in the next section.

Study Design and Statistical Methods

The experiments followed a repeated-measures counterbalanced (mixed) design (Fig. 2). The Soundscape condition consisted of 2 levels (Quiet and Noise) and was treated as the within-subjects variable because each subject was exposed to both levels. Each subject initially was exposed randomly to either Quiet or Noise conditions in the first session and then was crossed-over to the other Soundscape condition in the second session approximately 1 week later. The order of exposure (Order), therefore, was treated as the between-subjects factor. The dependent variables consisted of subject responses from the NASA-TLX and SOFI instruments. Subjects were instructed to complete the instruments separately, NASA-TLX instrument first followed by the SOFI instrument. In addition, subjects were instructed not to discuss instrument responses or any other experimental details with other participants.

The NASA-TLX instrument (Appendix 2) is composed of 6 items (Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration), which subjects respond to on a continuous scale from 0 (very low) to 100 (very high). These items are then individually weighted and combined into an index (total score) that also ranges from 0 to 100. The SOFI instrument (Appendix 3) is composed of 5 items (Lack of Energy, Lack of Motivation, Physical Exertion, Physical Discomfort, and Sleepiness). Subjects select responses from a 7-point Likert scale (0 = not at all and 6 = to a very high degree). Therefore, there were a total of 12 dependent variables: NASA-TLX total score; 6 NASA-TLX items; 5 SOFI items.

Power Analysis

The required sample size for this study was calculated with an online power and sample size calculator for general linear multivariate models (GLIMMPSE v2.1.0).17 The set of parameters used for this study is in the form of a JavaScript Object Notation (i.e., JSON) file, which can be uploaded to the GLIMMPSE Web site and easily reviewed ( To summarize, the model chosen for power analysis was repeated-measures design; however, after we specified the main effect as Soundscape condition (within-subjects) and predictor as Order (between-subjects), the model is by definition a mixed-design. Because there were 12 response variables, in anticipation of adjusting the significance level because of multiple pairwise comparisons, the type I error was entered as 0.004 (Bonferroni correction for 12 comparisons [0.05/12]).

Attempts were made to estimate response means and effect size from literature reports for NASA-TLX18 and SOFI.16,19 The effect of Soundscape was predicted to be 15% of predicted mean. In addition to this main effect based on Soundscape grouping, we predicted that subjects would perceive greater stress during the first session relative to the second because of anticipatory cognitive appraisal20,21; we estimated this effect to be 5% of the predicted means. We estimated that within-subject correlation would be 0.6 with a decay rate of 0.05. The within-instrument item correlations for both the NASA-TLX and the SOFI instruments were estimated to be 0.7, whereas between-instrument item correlation (i.e., between the NASA-TLX items and the SOFI items) would be 0.3. With these input parameters and a desired power of 0.90, calculation results indicated that 18 subjects would be required (actual power of 0.935). We decided to enroll 20 subjects for this study to allow for 1 or 2 subject dropouts or exclusions.

Hypothesis Testing and Strategy to Reduce the Risk of Type I Error

All statistical analyses were performed using IBM SPSS software suite (version 22, IBM Corp., Armonk, NY). Normality of subject responses to instrument items was determined with the Shapiro-Wilk test, by assessing the degree of skewness and kurtosis and by observing histograms and normal Q-Q and detrended normal Q-Q plots. The Levine test for homogeneity of variance was used to test the assumption that response variance was similar across experimental conditions. For hypothesis testing, a general linear model in the form of a mixed-design (split-plot) analysis was performed to determine main effects of Soundscape condition and any interaction effects between Soundscape and Order (i.e., Soundscape × Order). In addition to lowering the sensitivity level (σ = 0.004) because of multiple pairwise comparisons, a multivariate analysis of variance (MANOVA) was performed on the 12 response variables, and when MANOVA is significant (P < 0.05), likelihood of type 1 error because of multiple comparisons is considered to be decreased.22

In addition to calculating P values for the univariate comparisons, point estimation was performed, and 95% confidence intervals (CIs) are reported. Effect size is reported as absolute mean difference along each respective psychometric instrument scale. In addition, standardized effect size is reported using eta-squared (η2),23 which ranges from 0 to 1 and can be categorized into small effect (0.02–0.13), medium effect (0.13–0.26), and large effect (>0.26).24 Two additional effect size parameters (partial η2 and generalized η2) are reported in the Supplemental Digital Content section (refer to Results section). The formulas used to calculate the standardized effect size parameters are listed in Appendix 4.

Presentation of Psychometric Data

Figure 3:
Data presentation format used in this manuscript for showing effects of “Soundscape” and “Order” on subject responses to psychometric instruments. The figure shows the results for NASA Task Load Index (NASA-TLX) total score. A, A parallel plot is used to show within-subject responses based on Soundscape grouping. The mean and 95% confidence intervals (CIs) are displayed for “Quiet” and “Noise” conditions. To the right of the parallel plot is a floating axis anchored by the Quiet condition mean. Mean difference (Noise minus Quiet) and 95% CI are shown on the floating axis. The NASA-TLX total score was larger in Noise than in Quiet. B, An interaction plot shows the effect of Order grouping on the Soundscape main effect. An effect of the Soundscape × Order interaction is suggested when the lines connecting the Quiet and Noise means have slopes that are different for “Quiet First” and “Noise First” groups. In this case, the effect of Soundscape on NASA-TLX total score was greater in the group exposed to Noise in the first session relative to the group exposed to Quiet in the first session. C, The within-subjects mean differences based on Soundscape are plotted with 95% CIs for the Quiet First and Noise First groups. When considered with the interaction plot from (B), an interaction between Soundscape and Order is observed but does not reach significance (Table 1).

Figure 3 illustrates the format used in this manuscript for presenting all 11 psychometric instrument item responses and the NASA-TLX total score. The within-subjects Soundscape effect is depicted by the use of parallel plots of individual subject responses with point estimates for means and 95% CIs (Fig. 3A). To the right of each parallel plot is a floating axis (green in color), which shows the mean differences and 95% CIs. Interaction plots (Fig. 3B) were used to show Soundscape × Order interactions, and point estimates of the mean differences from the interaction plots are presented in a separate plot (Fig. 3C).

Development and Feasibility Testing of a Psychometric Model of Stress

All factor analyses were performed using SPSS software suite (IBM). To determine whether the NASA-TLX and SOFI instruments measured their respective latent constructs (i.e., task load and fatigue), internal consistency was evaluated with the Cronbach α. To test the plausibility of the proposed psychometric model, a partial confirmatory factor analysis (pCFA) was performed.25 Traditionally, psychometric models are proposed based on empirical evidence obtained from exploratory (unrestricted) factor analysis (EFA), wherein the indicators (e.g., NASA-TLX and SOFI items) are allowed to freely load onto extracted factors. Subsequent model confirmation is accomplished by performing a confirmatory (restricted) factor analysis (CFA) on newly acquired data. In contrast to the EFA, the indicators in a CFA are forced to load zero (i.e., not to load) onto some of the latent variables (formerly, the extracted factors from the EFA). A figure highlighting these points can be found in the Supplemental Digital Content section (Supplemental Digital Content 4, A pCFA has been suggested as an intermediary process lying somewhere between EFA and CFA on the pathway to model confirmation.25 Although by itself not confirmatory, a pCFA can supply useful information regarding model fit and help the researcher gauge the likelihood that a future CFA will be successful. For this study, factors were extracted with conventional data reduction techniques that rely partly on calculation of factor eigenvalues and parallel analysis.26,27

Parallel analysis was performed using an SPSS syntax script available from an online source ( Several indices were calculated to evaluate model fit as part of the pCFA25 and are reported. They are the Bentler-Bonett Normed Fit Index, the Tucker-Lewis Index, the Bentler Comparative Fit Index, root mean square error of approximation, and the standardized root mean square residual. In addition, the pattern matrix nonsalient loadings were used to calculate nonsalient loading distribution. Normality of the nonsalient loading distribution which is another parameter of model fit was assessed by reviewing histograms, Q-Q normal plots, and detrended normal plots, and by performing a Shapiro-Wilk test in SPSS.


Subject Randomization and Demographics

Twenty Clinical Anesthesia year 1 (CA-1) residents (12 male/8 female) participated in the study. There were an equal number of subjects in the quiet first (n = 10, 5 male/5 female) and noise first (n = 10, 7 male/3 female) groups. NASA-TLX and SOFI data for 1 female subject were excluded from the data set because of observed irregularities while completing the subjective instruments. All subjects were instructed to complete the NASA-TLX instrument before starting the SOFI instrument; however, this subject was observed cross-referencing her responses for the 2 instruments.

Psychometric Instrument Results

Baseline perceived fatigue level was assessed with the 14-item PSS before each session. There was no statistical difference in baseline fatigue levels at the starts of session 1 (33.00 [0.69] and session 2 (32.45 [0.40]), as measured by the PSS instrument.

The NASA-TLX items, NASA-TLX total score, and the SOFI items were all approximately normally distributed and satisfied the Levene test for homogeneity of variance. The MANOVA of the 12-item set (6 NASA-TLX items, NASA-TLX total score, and 5 SOFI items) reached significance (P = 0.003), suggesting that subsequent univariate analysis of variance of individual items are less susceptible to type I error.22

The NASA-TLX total score was greater in noise than in quiet (Fig. 3A; Table 1) on a scale from 0 to 100 by a mean difference of 13.3 (SE = 4.0, P = 0.004). The standardized effect size η2 was 0.36, indicating that 36% of the variance in the NASA-TLX total score was attributable to Soundscape grouping. A Soundscape × Order interaction was observed, suggesting that the effect of Soundscape was larger in the subjects who were exposed to the Noise condition first (Fig. 3, B and C); however, this interaction did not reach significance (P = 0.131).

Table 1:
NASA Task Load Index Within-Subjects Soundscape Effect

Subject responses to the 5 NASA-TLX items were greater in Noise than in Quiet (Table 1), with Temporal Demand reaching the criterion for significance after Bonferroni adjustment of sensitivity level to 0.004. The largest effect size was observed in Temporal Demand whereas Performance had the smallest (η2 = 0.53 and 0.07, respectively). Except for the Physical Demand item, mean differences based on Soundscape were larger for subjects in the Noise First group, with the largest Soundscape × Order interaction observed for the Mental Demand item (P = 0.045; Fig. 4, A and B; Table 1). Parallel plots and additional calculations of standardized effect size parameters pertaining to the NASA-TLX items can be found in the Supplemental Digital Content section (Supplemental Digital Content 5,; Supplemental Digital Content 6,

Figure 4:
Interaction effects of “Soundscape” and “Order” factors on the NASA Task Load Index (NASA-TLX) responses. A and B, The most significant “Soundscape × Order” interaction was observed for the “Mental Demand” item (P = 0.045). An interaction effect was not observed for the “Frustration” item (P = 0.891). The small interaction observed for the “Physical Demand” was reversed (P = 0.391) relative to the other items and the NASA-TLX total score.
Table 2:
Swedish Occupational Fatigue Inventory Within-Subjects Soundscape Effect
Figure 5:
Interaction effects of “Soundscape” and “Order” factors on Swedish Occupational Fatigue Inventory instrument responses. A and B, There was no clear pattern associated with the Soundscape × Order interaction.

Of the 5 SOFI items, Lack of Energy, Lack of Motivation, and Sleepiness showed an effect of Soundscape, with subjects reporting greater levels in Noise than in Quiet (Table 2). Of these, Lack of Energy reached significance (P =0.001, η2 = 0.467). Modest Soundscape × Order interactions between were observed but did not reach significance (Fig. 5, A and B; Table 2). Parallel plots and additional calculations of standardized effect size parameters pertaining to the SOFI items can be found in the Supplemental Digital Content section (Supplemental Digital Content 7,; Supplemental Digital Content 8,

Proposed Psychometric Model of Stress

The Cronbach α, performed on the 6-item NASA-TLX, yielded a good internal consistency (α = 0.766), which supports the assumption that the NASA-TLX items measured the same construct (i.e., Task Load). Internal consistency increases substantially after the Performance item is removed from the analysis (Table 3). The 5-item SOFI instrument also had good internal consistency (α = 0.768), which supports the assumption that the SOFI instrument measured the same construct (i.e., Fatigue). Internal consistency increases if either the Physical Exertion or Physical Discomfort items are removed from the analysis (Table 4). When both are removed, the Cronbach α increases to 0.848.

Table 3:
Cronbach α Item-Total Statistics for NASA Task Load Index Instrument
Table 4:
Cronbach α Item-Total Statistics for Swedish Occupational Fatigue Inventory Instrument

Partial CFA of the NASA-TLX and SOFI item responses yielded 4 factors with eigenvalues >1, a measure used routinely to indicate factor significance26,27 (Table 5; Supplemental Digital Content 9, Significance of the extracted factors was further verified by performing a Parallel analysis/Monte Carlo simulation. Factor 1 was loaded with the Lack of Energy, Lack of Motivation, and Sleepiness items from the SOFI instrument, whereas factor 2 was loaded with Mental Demand, Temporal Demand, Effort, and Frustration items from the NASA-TLX instrument, although there was cross-loading of the Frustration item with other factors (Table 5). Factor 3 loaded with the NASA-TLX Performance and Physical Demand items and the SOFI Physical Discomfort item. Factor 4 loaded with the SOFI Physical Exertion item. The correlations between factors ranged from negligible between factors 2 and 4 and between factors 3 and 4, to moderate between factors 1 and 2, to strong between factors 2 and 3 (Table 6).

Table 5:
Maximum Likelihood Factor Extraction Showing Item Loadings
Table 6:
Extracted Factors Interfactor Correlation
Table 7:
Partial Confirmatory Factor Analysis Fit Results
Figure 6:
Path diagram based on partial confirmatory factor analysis. The observed variables are the NASA Task Load Index and Swedish Occupational Fatigue Inventory instrument items (rectangles). The latent variables or constructs correspond to the extracted factors (ovals). On the basis of item loadings to the extracted factors from Table 5, 1-way arrows are drawn between constructs and items. A future confirmatory factor analysis will verify this model and establish discriminant validity for characterizing the differential effects of noise on the proposed theoretical constructs.

On the basis of these results, a model was proposed to explain the relationship between the psychometric instrument items (observed measures or indicators) and extracted factors (latent variables or constructs; Fig. 6). Global goodness of fit indices were calculated as part of the pCFA (Table 7; Supplemental Digital Content 9, All but one of the fit indices was consistent with good fit (the Normed Fit Index was <0.95). Taken together, the results of the pCFA indicate that the proposed model is plausible, and a future CFA on a new set of data is likely to be successful.


In this simulation-based study, we observed that intraoperative noise increased the perception of task load and fatigue, which likely contributed to an increase in the stress experienced by the CA-1 resident subjects. This finding is consistent with the previously reported finding that anesthesiologists consider OR noise to have a negative impact on their job.8 Our counterbalanced experimental design controlled for other sources of stress, which could manifest differentially between, for example, sessions 1 and 2. For example, we anticipated and subsequently observed a greater effect of Soundscape on observed stress in residents exposed to noise first and quiet second relative to residents exposed to the conditions in reverse. We attribute this observation to anticipatory cognitive stress appraisal,20 which emphasizes the importance of counterbalancing to control for this effect.

The authors of the first report investigating noise in the OR compared the problem with air and water pollution.28 Despite the initial lack of understanding and evidence bases for the exact roles of these pollutions in surgery, efforts to provide aseptic surgical environments were implemented early and have been in practice for well over a century. It can be argued that it is time for noise, the third pollution, to be addressed in our ORs with similar urgency, especially considering that numerous, minimally disruptive measures are available such as behavioral modification,11 and use of plastic materials when possible in lieu of clangy metal. The scientific rigor required in modern clinical research is difficult to attain in hard-to-control clinical settings, and findings obtained in approximately realistic clinical simulations like the current study may not be considered completely extrapolatable to the real-world. However, the simulation-based findings reported here should be weighed accordingly using a balance between the desire for definitive scientific results and the need to expeditiously address the clinical noise problem.

For our secondary objective, we performed factor analyses of the data. The NASA-TLX and SOFI are validated instruments, and reliability analysis of our results indicate that each performed with good internal consistency, supporting the assumption that both Task Load and Fatigue constructs (latent variables) were reliably measured in our experiments. To assess the underlying relationships between the individual NASA-TLX and SOFI items (measured variables) and the Task Load and Fatigue constructs, we performed a pCFA that gives an indication of the likelihood or plausibility that a future CFA (with a new data set) will be successful based on model fit. Our expectation was that pCFA would yield 2 extracted factors from a combined data set of the NASA-TLX and SOFI items, each factor corresponding to the Task Load or Fatigue. Instead, 4 factors were extracted (Fig. 6). The first 2 factors make logical sense by corresponding to the psychological-based NASA-TLX and SOFI items, and we refer to them as Psychological Task Load and Psychological Fatigue constructs. The Physical Exertion item that is described on the SOFI instrument with acute and symptomatic terminology (i.e., palpitations, sweaty, out of breath, and breathing heavily) loaded to the fourth factor and we refer to this construct as Acute Physical Load. The SOFI Physical Discomfort item is defined in more chronic terms (tense muscles, numbness, stiff joints, and aching), whereas the NASA-TLX Physical Demand is defined with the question “How physically demanding was the task?” In our experiment, the task referred to both lunch breaks (over an hour) in a session. The fact that NASA-TLX Performance and Physical Demand items and SOFI Physical Discomfort item loaded to the same factor are therefore understandable because self-appraisal of how one performed in an essentially mental task may correlate with more chronic physical symptoms evoked over the course of an hour in this case. We refer to this construct as Self-Appraisal/Chronic Physical Load.

Importantly, the pCFA indicates a reasonably good fit of our data with the model presented in Figure 6 and suggests that performance of a future CFA with larger sample size is likely to be successful yielding measures of construct (convergent and discriminant) validity. The model would then represent a new psychometric instrument for specific use in investigations of the noise problem.

There are several limitations inherent in this study. First, it is simulation-based and uses a screen interface loosely based on existing monitor displays. It is, therefore, difficult to quantify the extent to which these results extrapolate to a real clinical OR. Second, the results were obtained in residents (CA-1s) at our institution because it was logistically easier to enroll and obtain clinical coverage for this group during the conduct of experiments. In addition, restricting enrollment to a single class of anesthesia residents helped ensure homogeneity with regard to subject clinical experience and past simulation exposure. We do not know whether the findings would be reproducible at other institutions or be observed in residents at different levels of training or in fully trained anesthesiologists, anesthesia assistants, nurse anesthetists, surgeons, and other OR staff members.

A major limitation of the experimental setup is that resident visual attention was directed solely to the GUI, which was part of the anesthesia workstation. A more realistic setup would have resident visual attention necessarily divided (by 90°) between the patient mannequin and GUI. Our NOISE simulator uses a highly realistic OR soundscape, and this immersive condition was crucial in providing the independent variable in these studies. However, because the NOISE soundscape was composed of a heterogeneous group of sound sources that included beepers, equipment noise, clangs, and music, it is not possible to determine which soundscape components were responsible for increasing task load and fatigue levels in our residents.

Another possible limitation of this study is that it was not adequately powered. Given that it was designed primarily to test the hypothesis that noise increases perceived task load and fatigue, the risk of type 1 error is unlikely for the following reasons. We used an online tool (Glimmpse), which to our knowledge is the most comprehensive resource currently available for calculating power and sample size in mixed-design experiments. Previous reports on the NASA-TLX18 and SOFI16,19 instruments allowed us to input estimates of effect sizes and SDs. The response means, SDs, and effect sizes observed in our data are comparable with those previously published. Socioemotional stress has been shown to increase perceived task load as assessed by NASA-TLX in paramedics while administering advanced life support in a simulated setting; mental demand increased from 39 (18) to 57 (25), temporal demand increased from 25 (21) to 33 (22), effort increased from 40 (26) to 54 (22), and frustration increased from 19 (17) to 42 (25).18 Noise was one factor used to induce socioemotional stress, and the similarity of that study to ours helps to put our NASA-TLX data into context. For example, we observed an increase in NASA-TLX total score from 47.5 (10.2) in quiet to 60.8 (13.5) in noise, and we contend that this statistically significant difference is likely to be clinically relevant. In anticipation of performing Bonferroni corrections, we entered an adjusted type 1 error rate of 0.004 into the calculation. In addition, MANOVA of the data set reached significance, suggesting a decreased risk of type 1 error in the pairwise item comparisons.22 Furthermore, by using a repeated-measures, counterbalanced design, the sample size requirement to achieve a power of 0.90 was greatly reduced. Nonetheless, we cannot completely rule out the possibility of type 1 error or that other unknown and uncontrolled factors influenced our results.

In summary, we demonstrated that noise increases perceived levels of task load and fatigue in anesthesia CA-1 residents while being given lunch breaks during simulated surgeries. Our NOISE simulator, which faithfully reproduces the auditory environment characteristic in our clinical ORs, was a crucial component in our experimental design. In addition, we used validated psychometric instruments for assessing perceived task load and fatigue. We believe the current findings add significantly to the growing mass of evidence, implicating the negative impact noise has on caregivers and patient safety. In addition, we proposed a psychometric model for stress that combines items from the task load and fatigue instruments. A preliminary pCFA of this model supports further validation with a CFA.


  1. Estimate allowable blood loss to reach the transfusion trigger HCT = 17. Starting HCT = 39, 70 kg.
  2. Calculate BMI for an 18-year old female who is 5 foot 6 inches and 125 pounds.
  3. What are 3 adverse reactions to sitagliptin? You can use computer/phone (e.g., Epocrates, athenahealth, Watertown, MA).
  4. Calculate the size ET tube to use in a 14-year old female patient.
  5. Calculate the final concentration of drug X after 1 g is diluted in a 550 mL of NS.
  6. Calculate Glasgow Coma Scale: opens eyes to pain, inappropriate verbal responses but words discernible, withdraws to pain.
  7. Calculate PaO2 using the alveolar gas equation, when FIO2 = 70%, PaCO2 = 37, RQ = 0.87.
  8. Calculate the PaO2/FIO2 ratio when PaO2= 107 mm Hg and FIO2 = 55%.
  9. Acid-base interpretation: pH = 7.50, PaCO2 = 31.
  10. Calculate MAP when given SBP = 125 and DBP = 78.
  11. Convert cm H2O to mm Hg when cm H2O = 17.
  12. Calculate tidal volume based on weight: weight = 93 kg.
  13. Calculate dead space ratio when MV = 6.2 L/min, RR = 10, and absolute dead space = 180 cc.
  14. Calculate MV based on TV = 555 and RR = 11.
  15. You want to give 4 μg/kg fentanyl on induction to a 79-kg female. How many μg do you give?
  16. Calculate lowest acceptable systolic blood pressure (20% of baseline) when baseline BP = 145/85.
  17. Calculate fluid deficit in an 83-kg male who last ate or drank at 11:20 PM. Surgery start at 9:45 AM.
  18. Assuming dead space ratio of 0.3, and tidal volume = 570 mm, what is the anatomical dead space? Assume alveolar dead space is negligible.
  19. For heart rate = 83, and cardiac output = 5.3 L/min, what is the calculated stroke volume.
  20. Assuming the toxic dose of a local anesthetic is 5 μg/kg, how much can be given to an 87-kg male?
  21. An IV is flowing at 33 mL/min. How long will it take for 900 mL to be administered?
  22. During general anesthesia, a mixture of 55% N2O and 45% O2 is being administered to a patient. Assuming the flow rate of O2 is 1 L/min, what is the flow rate of N2O?
  23. Exactly 2.25 L of irrigation is used during a case. Assuming that the suction canister contains only irrigation and blood, what was blood loss if the canister contains 3260 mL of fluid?
  24. How many pack years has an 88-year old patient smoked if he started when he was 25 and he has averaged about 1.5 packs per day?
  25. Calculate the final concentration of drug Y after 10 g is diluted in a 250 mL of H2O.








Interaction between within-subjects and between-subjects factors

= Generalized eta-squared,

= Partial eta-squared,

= Eta-squared,

SSA = Between factor type III sum of squares,

SSP = Within factor type III sum of squares,

SSPA = Within × between factor type III sum of squares,

SSs/A = Between-subjects type III sum of squares error,

SSPs/A = Within × between factor type III sum of squares error.


The authors acknowledge the Anesthesia Patient Safety Foundation (Masimo Foundation Research Award) for funding this study.


Name: Richard R. McNeer, MD, PhD.

Contribution: This author helped design the study, conduct the study, analyze the data, and write the manuscript.

Attestation: Richard R. McNeer has seen the original study data, reviewed the analysis of the data, approved the final manuscript, and is the author responsible for archiving the study files.

Name: Christopher L. Bennett, PhD.

Contribution: This author helped design the study and conduct the study.

Attestation: Christopher L. Bennett has seen the original study data, reviewed the analysis of the data, and approved the final manuscript.

Name: Roman Dudaryk, MD.

Contribution: This author helped conduct the study and write the manuscript.

Attestation: Roman Dudaryk has seen the original study data, reviewed the analysis of the data, and approved the final manuscript.

This manuscript was handled by: Franklin Dexter, MD, PhD.


1. Busch-Vishniac IJ, West JE, Barnhill C, Hunter T, Orellana D, Chivukula R. Noise levels in Johns Hopkins Hospital. J Acoust Soc Am. 2005;118:3629–45
2. Kracht JM, Busch-Vishniac IJ, West JE. Noise in the operating rooms of Johns Hopkins Hospital. J Acoust Soc Am. 2007;121:2673–80
3. U.S. EPA (U.S. Environmental Protection Agency). Information on Levels of Environmental Noise Requisite to Protect Public Health and Welfare with an Adequate Margin of Safety. 1974 Available at: Accessed October 25, 2015
4. Berglund B, Lindvall T, Schwela DH 1999 WHO Guidelines for Community Noise Available at: Accessed October 25, 2015
5. Choiniere DB. The effects of hospital noise. Nurs Adm Q. 2010;34:327–33
6. Katz JD. Noise in the operating room. Anesthesiology. 2014;121:894–8
7. Ginsberg SH, Pantin E, Kraidin J, Solina A, Panjwani S, Yang G. Noise levels in modern operating rooms during surgery. J Cardiothorac Vasc Anesth. 2013;27:528–30
8. Tsiou C, Efthymiatos G, Katostaras T. Noise in the operating rooms of Greek hospitals. J Acoust Soc Am. 2008;123:757–65
9. Stevenson RA, Schlesinger JJ, Wallace MT. Effects of divided attention and operating room noise on perception of pulse oximeter pitch changes: a laboratory study. Anesthesiology. 2013;118:376–81
10. Murthy VS, Malhotra SK, Bala I, Raghunathan M. Auditory functions in anaesthesia residents during exposure to operating room noise. Indian J Med Res. 1995;101:213–6
11. Engelmann CR, Neis JP, Kirschbaum C, Grote G, Ure BM. A noise-reduction program in a pediatric operation theatre is associated with surgeon’s benefits and a reduced rate of complications: a prospective controlled clinical trial. Ann Surg. 2014;259:1025–33
11a. Bennett CL, Dudaryk R, Ayers AL, McNeer RM. Simulating environmental and psychological acoustic factors of the operating room. J Acoustical Soc of Am. 2015;138:3855–63
12. Cohen S, Kamarck T, Mermelstein R. A global measure of perceived stress. J Health Soc Behav. 1983;24:385–96
13. Dudaryk R, Bohorquez J, Bennett CL, McNeer RR. Anesthesia resident anticipation during simulation experiments may affect physiologic responses to stress. Anesth Analg. 2015;120(Suppl 3S):S359
14. Dudaryk R, Bohorquez J, Bennett CL, McNeer RR. Use of Hilbert transformation to monitor physiologic stress response to simulated intraoperative crises. Anesth Analg. 2015;120(Suppl 3S):S358
15. Carswell CM, Lio CH, Grant R, Klein MI, Clarke D, Seales WB, Strup S. Hands-free administration of subjective workload scales: acceptability in a surgical training environment. Appl Ergon. 2010;42:138–45
16. Ahsberg E, Gamberale F, Gustafsson K. Perceived fatigue after mental work: an experimental evaluation of a fatigue inventory. Ergonomics. 2000;43:252–68
17. Kreidler SM, Muller KE, Grunwald GK, Ringham BM, Coker-Dukowitz ZT, Sakhadeo UR, Barón AE, Glueck DH. GLIMMPSE: online power computation for linear models with and without a baseline covariate. J Stat Softw. 2013;54:i10
18. Bjørshol CA, Myklebust H, Nilsen KL, Hoff T, Bjørkli C, Illguth E, Søreide E, Sunde K. Effect of socioemotional stress on the quality of cardiopulmonary resuscitation during advanced life support in a randomized manikin study. Crit Care Med. 2011;39:300–4
19. Johansson S, Ytterberg C, Back B, Holmqvist LW, von Koch L. The Swedish Occupational Fatigue Inventory in people with multiple sclerosis. J Rehabil Med. 2008;40:737–43
20. Kuebler U, Wirtz PH, Sakai M, Stemmer A, Meister RE, Ehlert U. Anticipatory cognitive stress appraisal modulates suppression of wound-induced macrophage activation by acute psychosocial stress. Psychophysiology. 2015;52:499–508
21. Wirtz PH, Ehlert U, Emini L, Rüdisüli K, Groessbauer S, Gaab J, Elsenbruch S, von Känel R. Anticipatory cognitive stress appraisal and the acute procoagulant stress response in men. Psychosom Med. 2006;68:851–8
22. Bender R, Lange S. Adjusting for multiple testing—when and how? J Clin Epidemiol. 2001;54:343–9
23. Olejnik S, Algina J. Generalized eta and omega squared statistics: measures of effect size for some common research designs. Psychol Methods. 2003;8:434–47
24. Bakeman R. Recommended effect size statistics for repeated measures designs. Behav Res Methods. 2005;37:379–84
25. Gignac GE. Partial confirmatory factor analysis: described and illustrated on the NEO-PI-R. J Pers Assess. 2009;91:40–7
26. Steger MF. An illustration of issues in factor extraction and identification of dimensionality in psychological assessment data. J Pers Assess. 2006;86:263–72
27. O’Connor BP. SPSS and SAS programs for determining the number of components using parallel analysis and Velicer’s MAP test. Behav Res Methods Instrum Comput. 2000;32:396–402
28. Shapiro RA, Berland T. Noise in the operating room. N Engl J Med. 1972;287:1236–8

Supplemental Digital Content

© 2016 International Anesthesia Research Society