Secondary Logo

Journal Logo

An Evaluation of an Expert System for Detecting Critical Events During Anesthesia in a Human Patient Simulator: A Prospective Randomized Controlled Study

Görges, Matthias PhD*; Winton, Pamela MBChB, BSc Med Sci (hons), FRCA; Koval, Valentyna MD; Lim, Joanne MASc†‡; Stinson, Jonathan BSc; Choi, Peter T. MD, MSc (Epid), FRCPC; Schwarz, Stephan K. W. MD, PhD, FRCPC; Dumont, Guy A. PhD, PEng*; Ansermino, J. Mark MBBCh, MSc (Inf), FFA (SA), FRCPC

doi: 10.1213/ANE.0b013e3182975b63
Technology, Computing, and Simulation: Research Report
Free

BACKGROUND: Perioperative monitoring systems produce a large amount of uninterpreted data, use threshold alarms prone to artifacts, and rely on the clinician to continuously visually track changes in physiological data. To address these deficiencies, we developed an expert system that provides real-time clinical decisions for the identification of critical events. We evaluated the efficacy of the expert system for enhancing critical event detection in a simulated environment. We hypothesized that anesthesiologists would identify critical ventilatory events more rapidly and accurately with the expert system.

METHODS: We used a high-fidelity human patient simulator to simulate an operating room environment. Participants managed 4 scenarios (Anesthetic Vapor Overdose, Tension Pneumothorax, Anaphylaxis, and Endotracheal Tube Cuff Leak) in random order. In 2 of their 4 scenarios, participants were randomly assigned to the expert system, which provided trend-based alerts and potential differential diagnoses. Time to detection and time to treatment were measured. Workload questionnaires and structured debriefings were completed after each scenario, and a usability questionnaire at the conclusion of the session. Data were analyzed using a mixed-effects linear regression model; Fisher exact test was used for workload scores.

RESULTS: Twenty anesthesiology trainees and 15 staff anesthesiologists with a combined median (range) of 36 (29–66) years of age and 6 (1–38) years of anesthesia experience participated. For the Endotracheal Tube Cuff Leak, the expert system caused mean reductions of 128 (99% confidence interval [CI], 54–202) seconds in time to detection and 140 (99% CI, 79–200) seconds in time to treatment. In the other 3 scenarios, a best-case decrease of 97 seconds (lower 99% CI) in time to diagnosis for Anaphylaxis and a worst-case increase of 63 seconds (upper 99% CI) in time to treatment for Anesthetic Vapor Overdose were found. Participants were highly satisfied with the expert system (median score, 2 on a scale of 1–7). Based on participant debriefings, we identified avoidance of task fixation, reassurance to initiate invasive treatment, and confirmation of a suspected diagnosis as 3 safety-critical areas.

CONCLUSION: When using the expert system, clinically important and statistically significant decreases in time to detection and time to treatment were observed for the Endotracheal Tube Cuff Leak scenario. The observed differences in the other 3 scenarios were much smaller and not statistically significant. Further evaluation is required to confirm the clinical utility of real-time expert systems for anesthesia.

Published ahead of print June 18, 2013.

From the Departments of *Electrical and Computer Engineering and Anesthesiology, Pharmacology & Therapeutics, The University of British Columbia; and Centre of Excellence for Simulation Education and Innovation, Vancouver Coastal Health and The University of British Columbia, Vancouver, British Columbia, Canada.

Pamela Winton, MBChB, BSc Med Sci (hons), is currently affiliated with Royal Hospital for Sick Children, Edinburgh, Scotland.

Accepted for publication April 3, 2013.

Published ahead of print June 18, 2013.

Funding: This work has been supported by a research grant from the Canadian Institutes of Health Research (principal investigator, Dr. Ansermino), and a Government of Canada Postdoctoral Research Fellowship (to Dr. Görges).

Conflict of Interest: See Disclosures at the end of the article.

This report was previously presented, in part, at the 2012 Society for Technology in Anesthesia Annual Meeting, Palm Beach, FL (Jan 18–21, 2012).

Reprints will not be available from the authors.

Address correspondence to Matthias Görges, PhD, Department of Electrical and Computer Engineering in Medicine Pediatric Anesthesia Research Team, BC Children’s Hospital, 1L7-4480 Oak St., Vancouver, BC V6H 3V4, Canada. Address e-mail to mgorges@cw.bc.ca.

Back to Top | Article Outline

Problems with Current Patient Monitor Alarms

Medical monitors use threshold-based alarms to alert clinicians to changes in their patients’ physiological status. Unfortunately, these alarms are prone to artifacts, exhibit a high false alarm rate, and are often remote from the initiation of the triggering event.1–3 This leads to alarm systems being ignored or disabled.4,5 In addition, alarm systems usually are based on single variable thresholds whereas clinical deterioration often manifests itself as deviations in multiple variables, in which case multivariable alarms offer a potential solution.6,7 Finally, the clinician is unable to continuously visually track changes in multiple sources of physiological data over time, a phenomenon called “change blindness,”8 which potentially compromises patient safety. Computerized aids offer an opportunity to overcome this problem to optimize clinical care and preempt clinical deterioration.9

Back to Top | Article Outline

Problems with Information Overload and Human Limitations

Advances in sensor technology in anesthesiology have resulted in an exponential growth in the amount of physiological data collected during a typical surgical procedure. However, only a fraction of this information is relayed to the clinician using a visual display or audio signals; the majority is unused and is discarded. Automated post hoc analysis of such data is highly sensitive and specific in the identification of incidents that the clinician may have missed, which in turn are associated with increased hospital mortality.10

Even the most advanced monitoring system cannot guarantee that its data will be directly brought to the clinician’s attention, particularly when the clinician is busy either with manual tasks or resolving a clinical problem. The human limitations affecting the simultaneous processing of multiple information sources must be addressed if patient safety is to be improved.11 Specific cognitive limitations that have been identified to be at the root of preventable accidents during anesthesia include imperfect vigilance, distraction, data overload, and cognitive resource limitations that may result in task fixation or limited task switching.12

Back to Top | Article Outline

Potential Solution: Clinical Expert Systems

The development of new sensors or the intelligent synthesis of existing signals (“smart sensors”) cannot reliably prevent adverse events unless the data produced from these devices are assimilated and provided to the clinician in a format that is easy to comprehend.2 The clinician’s focus can then be directed toward the patient instead of continual observation of multiple monitors. Here, clinical expert systems can assist the anesthesiologist in interpreting the overwhelming stream of physiological data, intelligently extracting key features from these data, and bringing them to the attention of the clinician.

On the basis of the above considerations, our research group has developed a prototype for such a clinical expert system, an improved version of iAssist,13–15 as well as a knowledge authoring tool, iKnow,16 which allows users to easily create rules for alerts based on multiple variables. This system uses a validated set of limits and rules for the identification of 3 critical ventilatory events.17 The rules were created using structured interviews with expert anesthesiologists, followed by a Delphi approach to find consensus for detection criteria.17 Additional rules were based on a Master’s thesis18 and feedback from 2 senior staff anesthesiologists (authors JMA and SKWS).

Back to Top | Article Outline

Purpose of the Study

The purpose of this study was to evaluate the efficacy of the expert system for enhancing the detection of critical respiratory-related events in a simulated operating room (OR) environment. We hypothesized that anesthesiologists would identify critical events more rapidly and accurately with the use of the expert system than without it.

Back to Top | Article Outline

METHODS

With approval from The University of British Columbia–Children’s & Women’s Health Centre of BC Research Ethics Board (ClinicalTrials.gov Identifier, NCT01240317) and after obtaining written informed consent from the participants, we conducted a prospective, randomized, single-center (Centre of Excellence for Simulation Education & Innovation, Vancouver General Hospital, Vancouver, British Columbia, Canada) trial with staff anesthesiologists and anesthesiology trainees (fellows and residents).

Back to Top | Article Outline

Study Setup

Study Environment

A high-fidelity human patient simulator (Adult HPS D with software version 6.4, Medical Education Technologies Inc., Sarasota, FL) was set up in a simulated OR with a Datex-Ohmeda S/5 anesthesia machine (GE Healthcare, Piscataway, NJ).

An improved version of the iAssist expert system ran on a Thinkpad X200 laptop (Lenovo, Morrisville, NC), connected to a Datex AS/3 patient monitor (GE Healthcare) using an RS232 interface. Information was displayed on a 17-inch touch screen (Model ASLCD72V-BK-TC, NEC Display Solutions of America Inc., Itasca, IL) placed on top of the anesthesia machine (Figs. 1 and 2). A more detailed description of the display and of the decision algorithms is given in the Study Interventions section.

Figure 1

Figure 1

Figure 2

Figure 2

Study team members acted as surgeon, scrub nurse, anesthesia assistant, and outgoing anesthesiologist; study participants took the role of the incoming anesthesiologist. The actors interacted with the participants in a standardized way to provide timed cues, distractions when required, or perform tasks/provide equipment when asked to do so.

Back to Top | Article Outline

Orientation to the Study Environment

After completion of a demographic questionnaire (see Data Collection section), participants took part in a 3-phase orientation including: (1) the simulated OR and HPS; (2) the anesthesia machine including basic safety check; and (3) training with the expert system, which included instruction on functions of display and alarms and a short case demonstration of malignant hyperthermia. Participants had the opportunity to ask questions before commencing the study scenarios and were encouraged to think aloud while administering their anesthetics.

Back to Top | Article Outline

Scenario Setup

The room was set up to fully simulate an OR environment. The HPS was surgically draped, and operating equipment was used. All participants and actors wore appropriate OR attire. The trachea of the HPS mannequin was intubated with a 7.5-mm internal diameter, cuffed endotracheal tube (ETT) and their lungs mechanically ventilated. All baseline settings were programmed for the study scenario. On entering the study room, the participant was given a detailed handover by the outgoing anesthesiologist (enacted by PW). Each participant was allowed 2 minutes for orientation before the disease phase of the scenario was initiated.

Back to Top | Article Outline

Simulated Scenarios

We simulated 4 critical ventilatory events: (1) Anesthetic Vapor Overdose; (2) Tension Pneumothorax; (3) Anaphylaxis; and (4) Endotracheal Tube Cuff Leak. The scenario order was randomized using the Latin-square method. Before the commencement of the study, 2 senior staff anesthesiologists (authors JMA and SKWS) validated the scenarios for clinical accuracy and realism in simulation. Clinical descriptions of the scenarios follow below; technical details appear in Appendix A.

Back to Top | Article Outline

Anesthetic Vapor Overdose

A 70-year-old, 50-kg woman was scheduled to undergo a mastectomy. Study participants were requested to assume responsibility for the case after induction of anesthesia. Participants were informed that the patient became hypotensive after an induction dose of propofol. As a result of this, and while surgical stimulation was minimal, the isoflurane vaporizer was set at a low inspired concentration. The participant was asked to administer an antibiotic at the surgeon’s request. The patient became hypertensive soon after skin incision and the surgeon remarked that the patient was bleeding more than usual. If participants did not respond to this cue for an inadequate depth of anesthesia, the surgeon requested participants to check the blood pressure (BP) and asked if the patient was “light.” Participants were coerced to increase the isoflurane vapor concentration. If they set the inspired concentration to <2%, the anesthesia assistant (enacted by MG) surreptitiously increased the concentration to 2% while the participant was distracted during administration of the antibiotic. This was done to reach an inspired isoflurane concentration target of 1.1% within approximately 1 minute from the increase, given that the actors could not force high fresh gas flows in each participant. The anesthesia assistant then left the OR and the time measurement was initiated (t0). As the end-tidal isoflurane concentration increased, the patient became hypotensive. Time to diagnosis was defined as the first indication that the participant noticed the high isoflurane concentration, while time to treatment was defined as the time to reduction in isoflurane concentration. Treatment of hypotension alone (with fluids or vasopressors) was not considered a correct definitive treatment.

Back to Top | Article Outline

Tension Pneumothorax

A 22-year-old man was involved in a high-speed motor vehicle accident and scheduled for open reduction and internal fixation of an ankle fracture. Concomitant injuries included: fractured ribs (with no evidence of hemothorax or pneumothorax on initial assessment), a small liver laceration (to be managed conservatively), and a minor head injury (computed tomography scan of head and cervical spine normal). Participants were requested to take over the case from the outgoing anesthesiologist after a comprehensive handover. Two minutes after handover (t0), the patient developed a right-sided tension pneumothorax. Clinical signs included high peak airway pressures, decreased lung compliance, moderately increased airway resistance, tachycardia, hypotension, hypoxia, and absent breath sounds over the right hemithorax. Time to diagnosis was defined as the time to verbally express the detection of the tension pneumothorax, and time to treatment was defined as time to needle decompression (needle thoracocentesis). If the participant requested a chest tube to be placed by the surgeon, they were informed that this would take 5 to 10 minutes to set up and perform.

Back to Top | Article Outline

Anaphylaxis

A 37-year-old, 79-kg woman was scheduled for a laparoscopic cholecystectomy. The patient had received an antibiotic just before the participant entered the room. Two minutes after handover (t0), the patient developed anaphylaxis. Clinical signs included high peak airway pressures, decreased lung compliance, tachycardia, hypotension, and hypoxia. The surgeon confirmed the presence of a rash if participants asked whether a rash was present or absent. Time to diagnosis was defined by the verbal report of the detection of anaphylaxis, and time to treatment was defined as the time taken until administration of epinephrine.

Back to Top | Article Outline

Endotracheal Tube Cuff Leak

A 22-year-old, 70-kg man was scheduled for removal of 4 wisdom teeth under general endotracheal anesthesia. He was healthy but anxious. Drapes were placed at the level of the patient’s neck, obscuring the direct view of the head, but removed when requested by participants. After handover, the surgeon (enacted by MG) aspirated 8 mL of air from the ETT cuff (initial volume, 20 mL). No change was seen on the capnogram (t0). Two minutes later, a further 1 mL was aspirated which resulted in a slightly narrowed end-tidal CO2 (ETCO2) trace. Two minutes later, a final 1.5 mL aspiration caused a noticeable leak, producing a drop of the ventilator bellows and changes in the capnogram. Time to diagnosis was defined as the verbal report of an ETT leak, and time to treatment was defined as the addition of air to the ETT cuff or replacement of the ETT. Compensating for the leak with increased fresh gas flows or increased inspired oxygen fractions was not considered an appropriate treatment, and detection of a loss of volume from the breathing circuit was not considered an appropriate diagnosis.

Back to Top | Article Outline

Study Interventions

The iAssist expert system showed 6 vital sign trends in all cases (Fig. 2): heart rate (HR), blood oxygen saturation (SpO2), and mean arterial blood pressure (MAP) in the left column, as well as ETCO2, respiratory rate, and expired minute volume in the right column. Participants were randomized to have the decision rules enabled in 2 of their 4 scenarios. This randomization used full permutations of 2 of 4 scenarios, repeated in blocks of 6. When the decision support was enabled, yellow pop-up boxes highlighted clinical changes, and red pop-up boxes announced potential differential diagnoses. Audible alerts were: “ding, ding, ding” with decreasing and increasing pitch for clinical change announcements and a submarine diving horn for differential diagnosis alarms.

Decision rules were based on the results of a previous study,17 modified to fit the scenario time course and the patient simulator. Simulator responses and rules were validated by running the scenarios multiple times with the anesthesiologist authors (PW, JMA, SKWS) as pilot subjects before initiation of the study; JMA and SKWS were unbiased in their evaluations as they did not participate in the creation of the scenarios. The main decision rules (for the red pop-up boxes) were:

  • Anesthetic Vapor Overdose: Age >60 years, AND mean noninvasive blood pressure (NIBPmean) decrease by 10% below 5 minutes average while NIBPmean <70 mm Hg, AND inspired isoflurane concentration >1.0% (used instead of end-tidal isoflurane concentration due to the slow isoflurane equilibration of the system).
  • Tension Pneumothorax: HR increase by 15% above 10 minutes average while HR >80 min−1, AND NIBPmean decrease (as for overdose), AND lung compliance (Compl) decrease by 20% below 5 minutes average while Compl <30 mL·cm H2O−1.
  • Anaphylaxis: HR increase (as for tension pneumothorax), AND NIBPmean decrease (as for tension pneumothorax), AND peak inspiratory pressure (Ppeak) increase by 20% over 5 minutes average while Ppeak >30 cm H2O, AND compliance decrease (as for tension pneumothorax), AND Compl <20 mL·cm H2O−1.
  • Endotracheal Tube Cuff Leak: “Possible Leak” if the difference between inspired and expired tidal volume (ΔTV) >125 mL, OR “Definite Leak” if ΔTV >250 mL.

With the exception of Anaphylaxis and Pneumothorax, which frequently appeared simultaneously or within a few seconds of each other, no alternative differential diagnoses were presented for each scenario to simplify the experiment. In addition to the 4 differential diagnoses suggestions, we used trend-based alerts for the following clinical changes (yellow pop-up boxes): BP Decrease, CO2 Decrease, CO2 Increase, Compl Decrease, HR Increase, HR Decrease, PIP Increase, SpO2 Decrease, and SpO2 Low. These rules were used as the building blocks of the “main decision rules” described in detail above.

Midway through the study (before n = 16), the adult simulator rack (HPS 245), which contained the lung bellows and electronics malfunctioned and was replaced with a pediatric simulator rack (HPS/Ped 333). The replacement rack caused both an unexpected increase in the lung compliance of the patient simulator and a net gain of 90 mL TV with each breath. As a result, some scenario variables and decision rules were adjusted.

Back to Top | Article Outline

Data Collection

Participant characteristics (age, sex, level of experience, previous simulation experience, color blindness, sleep in the last 12 hours, and use of sedatives in the last 24 hours) were collected using a questionnaire.

Back to Top | Article Outline

Time to Diagnosis and Time to Treatment

One investigator (JS or JL) recorded time-stamped tasks and thoughts verbalized by the participant in real time. This allowed for measurement of the time from the start of a critical event to its detection, and its treatment initiation using the predefined end points for each scenario (see Simulated Scenarios section). Additionally, all participants were videotaped using 2 cameras with the first providing an overview of the OR from the patient’s feet and the second providing a closer view of the anesthesia care area at the head of the simulated patient. Data from the 2 video cameras in addition to data captured from the patient monitor and ventilator were used if timing of events was unclear from the paper records.

Back to Top | Article Outline

Workload Questionnaires

The NASA Task Load Index (TLX) questionnaire,19 a workload assessment tool, was administered after the conclusion of each scenario. The NASA-TLX is a multidimensional rating procedure that derives a subjective workload score based on a weighted average of ratings on 6 subscales: Mental Demands, Physical Demands, Temporal Demands, Own Performance, Effort, and Frustration.

Back to Top | Article Outline

Debriefing and Usability Questionnaire

Immediately after each scenario, an investigator with clinical anesthesia experience facilitated a debriefing session with the participants. This allowed participants to ask questions about the scenario and explore treatment options. A modified version of the standardized Post-Study System Usability Questionnaire (PSSUQ)20 was given to each participant at the end of the test to measure acceptability and user perception of the expert system. General comments, in free-text form, were also elicited at this time.

Back to Top | Article Outline

Data Analysis

Data were analyzed using MATLAB (The Mathworks Inc, Natick, MA) and Stata release 10 (Stata Corporation, College Station, TX). For continuous data, we used means (standard deviations) and medians (ranges) to describe normally distributed and skewed data, respectively. We used proportions (percents) to describe discrete data. For comparisons (see below), a 2-tailed P value <0.05 was considered statistically significant.

The sample size calculation was based on a type I error rate of 0.05, power of 0.90, and 4 independent variables (participant, scenario, order of scenarios, and decision support) in a multivariable regression model.a A very conservative (small) estimate of effect (f2 = 0.125), equivalent to a squared multiple correlation of 0.11, was used.21 We chose a small effect size because we could not find any data in the published literature to guide our sample size and we did not want to miss a potential effect by basing our sample size on a more optimistic estimate (leading to a subsequently smaller sample size). A total of 128 trials (32 participants × 4 scenarios/participant) were required. To allow for technical failures and dropouts, we adjusted the number of participants by 10% (32/0.9) for a sample size of 36 participants (144 trials).

Back to Top | Article Outline

Time to Diagnosis and Time to Treatment

The 2 primary analyses were the comparison of the time to diagnosis and time to treatment with and without the expert decision system (iAssist). Both outcome measures were analyzed using a mixed-effects linear regression model with models fitted using the restricted maximum likelihood method. Models included time (either time to diagnosis or time to treatment) as the dependent variable, the fixed effects—decision support, scenario, order of scenarios and scenario × decision support interaction, random intercepts for participants, and an independent, normally distributed residual error. All of the fixed effects were included as categorical variables by coding multiple dummy variables for each factor. If the interaction term was statistically significant (P ≤ 0.10), we estimated a model-adjusted intervention effect for each of the 4 scenarios, as no single summary of the difference between groups would be appropriate. These model-adjusted effects account for any effect of order. To ensure a family-wise error rate of ≤0.05 over the 4 scenarios we considered P ≤ 0.015 to be statistically significant.

Back to Top | Article Outline

Workload Questionnaires

The NASA-TLX scores with each of its 6 domains were analyzed with the Kruskal-Wallis test, with availability of expert system (iAssist) and scenario as independent variables.

Back to Top | Article Outline

Debriefing and Usability Questionnaire

PSSUQ scores were tabulated and medians and ranges calculated. Themes were identified from a qualitative review of written and verbal responses.

Back to Top | Article Outline

RESULTS

Thirty-six individuals gave consent to participate in this study and 35 (15 anesthesiology residents, 5 anesthesiology fellows, and 15 staff anesthesiologists) participated in this study. Of the 35 participants, 24 (68.6%) were male. The combined median age (range) was 36 (29–66) years with a median (range) of 6 (1–38) years of anesthesia experience. No participant was color blind; 1 participant had taken a sedative the night before the study; and participants had a median (range) of 6 (3–8) hours of sleep before the study. Most participants (31/35; 88.6%) had previous simulator experience, with a median (range) of 6 (0–200) hours.

The results of the mixed-effects multivariable regression model on time to diagnosis and time to treatment are summarized in Table 1. Scenario and scenario-decision support interaction had statistically significant effects on both time to diagnosis and time to treatment whereas decision support and the order in which the scenarios were presented did not. Estimates of intervention effect showed that for the Endotracheal Tube Cuff Leak scenario, the time to correct diagnosis was 128 seconds shorter with the availability of the expert system (99% confidence interval [CI], 54–202 seconds). Similarly, the time to correct or definitive treatment was estimated to be shorter by 140 seconds with the availability of the expert system (99% CI, 79–200 seconds). For the other scenarios, time to correct diagnosis and treatment were comparatively short without decision support; no significant differences were found (Table 2). In the best case, decision support decreased the time to diagnosis 97 seconds (lower 99% CI) for the Anaphylaxis scenario and increased time to treatment by 63 seconds (upper 99% CI) for the Anesthetic Vapor Overdose scenario. Figures 3 and 4 show dot plots overlaid with quartile boxes of the time to diagnosis and time to treatment by availability of decision support for all 4 scenarios.

Figure 3

Figure 3

Figure 4

Figure 4

Table 1

Table 1

Table 2

Table 2

Back to Top | Article Outline

Scenario Exclusions

Due to Technical Problems

Data from 5 scenarios were excluded from the analysis due to technical problems during the simulation session. For 2 subjects, during the Anesthetic Vapor Overdose scenario, the BP did not increase as scripted. For a third subject, during the Tension Pneumothorax scenario, the HPS progressed into the severe disease stage without the preprogrammed gradual changes due to a system fault. For 2 other subjects, the Endotracheal Tube Cuff Leak scenario was excluded: For one subject, the HPS was defective and could not be repaired before initiating this scenario; for the other subject, a simulated clinical leak could not be achieved even when the ETT cuff was deflated completely. This was likely due to the ETT having become adherent to the artificial trachea.

Back to Top | Article Outline

Due to Participant

In 2 scenarios (Anesthetic Vapor Overdose, treated with multiple ephedrine and fluid boluses; and Anaphylaxis, treated with phenylephrine, salbutamol, and aminophylline), no recognized definitive treatments were administered and hence treatment times were censored at 5 minutes. In a third scenario (Anaphylaxis), the participant failed to make the correct diagnosis but unintentionally administered the correct definitive treatment (epinephrine; albeit for another diagnosis); hence, this scenario was also excluded from the analysis of diagnosis and treatment times.

Back to Top | Article Outline

Workload Questionnaires

When analyzed for the availability of the decision support, none of the 6 subscales of the NASA-TLX (Mental Demands, Physical Demands, Temporal Demands, Own Performance, Effort, or Frustration) showed statistically significant differences. When analyzed for scenario, only the Physical Demands subscale was significantly different (P = 0.009, χ2= 11.55).

Back to Top | Article Outline

Debriefing and Usability Questionnaire

Median scores for the PSSUQ were less than 2 to 3 on a scale of 1 to 7, indicating that participants were positive about the system’s usability (see Fig. 5).

Figure 5

Figure 5

The usability feedback comments showed 3 definite themes: (1) avoidance of task fixation (e.g., “I like the ‘suggestions’ that the monitor gives because it can potentially expand your differential diagnoses in a time of crisis and reduce your ‘this and this only’ error in thinking”); (2) reassurance/support (e.g., “Really liked that it supported the differential diagnosis I had. It gave me more confidence to initiate action, e.g., needle decompression of tension pneumothorax”); and (3) supportive “objective” decision making (e.g., “It could be thought of as the ‘extra eyes’ of a colleague looking at the data”).

Of note, when debriefing the Anesthetic Vapor Overdose scenario, several participants volunteered that they had in fact accidentally overdosed an elderly patient in the past by having increased the anesthetic vapor concentration and subsequently being distracted. This could be due to conventional alarm limits for inspired anesthetic agents being set relatively high for elderly patients. Additionally, 3 participants stated in their free-text comments that they were expecting something to happen very quickly as they were in a simulator and that they believed that iAssist would help them in long standard cases.

Back to Top | Article Outline

DISCUSSION

This randomized controlled study of an anesthesia expert system in a simulated OR environment (high-fidelity HPS) showed a significant improvement in both time to detection and time to treatment in only 1 of 4 simulated critical ventilatory events, Endotracheal Tube Cuff Leak. Whereas the other 3 scenarios (Anesthetic Vapor Overdose, Tension Pneumothorax, Anaphylaxis) were all associated with comparatively short time to correct diagnosis and definitive treatment (i.e., with median elapsed time in the 1.7–3.6 minutes range), the Endotracheal Tube Cuff Leak scenario had much slower time to correct diagnosis and definitive treatment (median elapsed time in the 6.8–7.0 minutes range) without the expert system (i.e., decision rules disabled). These observations underscore the potential of iAssist to facilitate rapid recognition and management of critical ventilatory events that otherwise might only be recognized with considerable delay, threatening patient safety. Notwithstanding the scenario-dependent efficacy in the present trial, participants appreciated the support of the expert system (verbal and written responses) and found the system highly usable. The advantage of iAssist was also demonstrated in 3 scenarios, when the expert system was disabled. Twice, participants did not administer the recognized definite treatment and once the correct treatment was given unintentionally. In general, our results also highlight the challenges in simulating the clinical environment in a situation when participants are aware that a critical event inevitably is going to occur within a short time. Clearly, further research in a real-life clinical setting will be required.

Back to Top | Article Outline

Study Sample

Our study included a representative sample of anesthesiology staff, fellows, and residents, and randomization was well balanced for participant number (expertise) and scenario order, as neither was identified to predict time to diagnosis and time to treatment in the multiple linear regression analysis. Hence, we believe that our findings can be extrapolated to a wide population of anesthesia providers.

Back to Top | Article Outline

Time to Diagnosis and Time to Treatment

As shown in Figures 3 and 4, the time differences due to scenario were much larger than the differences due to availability of the expert system. This was an expected result, because the Tension Pneumothorax and Anaphylaxis scenarios progressed much more rapidly than the Anesthetic Vapor Overdose and Endotracheal Tube Cuff Leak scenarios. For example, the scripted vital sign deterioration for Anaphylaxis was complete after 60 to 90 seconds, whereas in the Endotracheal Tube Cuff Leak scenario it took 120 seconds for the first subtle changes in the ETCO2 waveform to be visible, and at least a full 240 seconds until the bellows were no longer filling completely (Appendix A, description of the scenarios).

The expert system markedly accelerated detection and treatment of the Endotracheal Tube Cuff Leak, with an estimated mean decrease in duration of 128 (99% CI, 54–202) seconds and 140 (99% CI, 79–200) seconds, respectively. This improvement would be clinically significant as the expert system would allow the leak to be corrected before a change in the ventilator bellows mechanism is encountered and long before the patient’s ventilation is compromised. It is interesting that the only statistically significant improvement in time to detection and time to treatment were in the scenario with the simplest rule. Potential reasons include the infrequent use and display of spirometry-derived variables, from which the leak was calculated, or that it was “harder” to detect the Cuff Leak manually due to the surgical drapes covering the shared airway. This is in contrast to the other scenarios, where the perceived advantage of the expert system was that it provided confirmatory support for a diagnosis already made by the clinician, while in this case the detection of a potential problem was made by the expert system before the clinician. This finding warrants further investigation.

Notwithstanding the above considerations, we were surprised that we did not find a more rapid detection and treatment of anesthetic vapor overdose when the expert system was enabled. Participants were hypervigilant, frequently scanning monitors for changes and expecting something “bad” to happen, likely leading to an artificially rapid diagnosis without available decision support and the isoflurane concentration was displayed as a large number on the ventilator monitor (the default for the anesthesia machine), rather than a small number on the vaporizer dial or patient data monitor as is more typical in real-life clinical monitors. These observations were supported by subjects’ comments during debriefings and in their comments of the usability questionnaire (see Debriefing and Usability Questionnaire section). However, this observation warrants further confirmation and investigation.

For both Anaphylaxis and Tension Pneumothorax, which were associated with rapid changes in the patient, the expert system did not accelerate the detection of changes by participants. This was likely in part due to the design and threshold values used for the system. The suggested differential diagnoses were displayed only when the patient had deteriorated to a degree that was already appreciated by the participant. The scenarios were designed with fixed end points and with 1 single diagnosis. This assisted the participants with the diagnosis based on the comprehensive handover. In particular, the possible occurrence of tension pneumothorax is a very specific event associated with rib fractures after trauma. In contrast, the other scenarios were more generic critical events that might occur in a wide range of anesthetic cases. Participants were already experts in detecting a pneumothorax, which is clearly shown in Figure 3, where the time ranges for pneumothorax were the smallest in this experiment. This is still true for the treatment times, even though these show a slightly larger range of times.

Both the Tension Pneumothorax and Anaphylaxis scenarios dictated that the participant perform a clinical examination of the patient to make the definitive diagnosis. This additional information was not available to the expert system. Although this context made it relatively easy for participants to take a swift clinical action without additional decision support, we would expect this to be much more difficult to achieve in a real-life clinical environment.

Notwithstanding, it is important in this context to emphasize that the design and previous tuning of the expert system was based on the desire to reduce the false-positive event rate to ensure acceptance by users and to avoid the introduction of additional hazards (false alarms). In this setting, the benefit, as suggested by our participants, of the expert system might be to provide confirmatory support for diagnosis already made by the clinician.

Back to Top | Article Outline

Workload Questionnaires

The expert system did not affect self-reported workload during the scenarios, which is important for the possible future introduction of a clinical support system into the anesthetic workplace, as even clinicians with minimal training were able to make use of the system without impacting their perceived workload or clinical performance.

Back to Top | Article Outline

Debriefing and Usability Questionnaire

The good usability scores (low values) indicate that the participants felt that the system was usable. Many participants indicated that they would like to see it implemented in their clinical environment.

During the structured debriefings, several themes were identified by the authors regarding the management of anesthetic critical events, all of which related to nontechnical skills and decision making during a crisis. Recurrent themes included: avoidance of task fixation, problems of group collusion in making a diagnosis (or benefits of the second observer), and difficulties of initiating a procedure that is not routinely performed or seen as high risk. In all 3 areas, participants noted that the expert system might improve decision making during the critical event and hence improve patient safety.

  1. Avoidance of task fixation: Task fixation whereby clinicians become highly focused on 1 activity and lose sight of the whole situation12 has been highlighted as a contributory factor in many critical clinical incidents, even resulting in death.b Hence, improvements in this area are very important.
  2. Prevention of group collusion in making a diagnosis: Stress during a critical incident causes limitations in perception and cognition and may produce group collusion (whereby clinicians all collude with a suggested diagnosis and do not consider other options or diagnosis). Group collusion has also contributed to a patient death under anesthesia.c Participants noted that the expert system can act as an objective second observer and is not subject to collusion.
  3. Encouragement in initiation of a high-risk procedure: Despite clinical simulation becoming more popular in anesthesia training, anesthesiologists rarely have the opportunity to practice unusual critical events, and, more importantly, unusual procedures such as needle thoracocentesis. This may result in such procedures being seen as high risk, despite their life-saving benefits. This perception may delay or prevent the clinician initiating the procedure. Participants noted that the expert system gave them confidence in initiating such life-saving actions. All these themes indicate that even if an expert system does not accelerate time to diagnosis and treatment in all critical events, there may be strong clinician support to implement these systems in critical practice. Additional positive features of the system noted by participants were trend-based alerts (a novel alarm feature) and use of the system to improve vigilance during long cases (in detecting subtle clinical changes). Finally, we acknowledge the possibility that broadening the differential diagnoses may potentially result in a clinician performing an inappropriate high-risk procedure when the diagnosis is incorrect. However, we believe that the benefits of the expert system still outweigh its potential risks.
Back to Top | Article Outline

Limitations

The main limitation of our study was related to the use of an HPS. We had multiple failed scenarios due to technical issues with the HPS. We were required to adapt rule thresholds to fit the clinical performance of the HPS, while showing realistic vital sign values. Hypervigilance of the study participants undoubtedly affected response time and behavior in the simulation environment.

As discussed above, the scenario selection may have limited the ability to detect a difference between the 2 arms of the study. Our results indicate that the strength of the expert system is in detecting slow changes that could not be detected in the brief clinical scenarios. A larger selection of longer scenarios would have provided a more comprehensive test of the expert system. Based on subject feedback, the Anesthetic Vapor Overdose scenario in particular could have benefitted from 10 minutes of boredom before changes started to happen. However, this was not possible, as simulator time and physician availability were a limited resource.

As we were unable to show an improvement in detection time for all scenarios but Endotracheal Tube Cuff Leak, which we initially hypothesized, it does not mean we can apply our improvements to our other scenarios, or even multivariable alarms in general. Further investigation is needed here.

Another limitation was that scenarios and rules as implemented in this study were validated using authors of this manuscript instead of external reviewers. We sought to limit this by having 2 of the 3 anesthesiologists not participate in the creation of the scenarios and using rules based on a previous publication as well as a Master’s thesis.

The selection of the correct differential diagnosis for each scenario may have limited the validity of the study as alternative suggestions that may have been clinically reasonable did not stop the detection timer. For example, in the Anesthetic Vapor Overdose scenario, an overdose of other drugs or hypovolemia could be potential sources of the hypotension, whereas for the Tension Pneumothorax scenario, an obstructed ETT or severe bronchospasm could have caused similar increases in airway resistance and peak pressures. However, we carefully crafted the scenarios to allow the participants to quickly eliminate these alternative diagnoses, for instance by indicating a rash for Anaphylaxis, a complete absence of breath sounds for Tension Pneumothorax, an absence of current bleeding, and the fact that the patient was kept purposefully “light” in the handover for the Anesthetic Vapor Overdose scenario; and by using spirometry in the Endotracheal Tube Cuff Leak scenario to exclude any ventilation problems upstream.

Forcing participants to administer an overdose of isoflurane was extremely challenging. Participants administered fast-acting drugs like opioids and propofol in response to indications of a lighter plane of anesthesia before increasing the dose of isoflurane. The information given at the time of case handover and the lack of recent experience with the older drug, isoflurane, may also have confounded responses. In many of the scenarios, the anesthesia assistant was required to surreptitiously increase the concentration of isoflurane. Finally, it would have been preferable to define t0 as the time at which the end-tidal isoflurane concentration increased above 1.1% and MAP started to decrease, instead of when the vaporizer concentration was increased. However, this was unfeasible due to the large absorption of isoflurane by the plastic lung of the mannequin and might have prevented some participants from ever reaching the MAP decrease-triggering concentration as they might have turned the vaporizer dial down during the time it took for the isoflurane vapor concentration to increase to the triggering threshold.

The performance of the decision support in this study and in the clinical environment is highly dependent on the rule design and threshold selection. Optimization of the system within a clinical setting will be required before widespread adoption. This task will require considerable effort.

Back to Top | Article Outline

Conclusion

When using the iAssist expert system, clinically important and statistically significant decreases in time to detection and time to treatment were observed for the Endotracheal Tube Cuff Leak scenario. The observed differences in the other 3 scenarios: Anesthetic Vapor Overdose, Tension Pneumothorax, and Anaphylaxis were much smaller and not statistically significant. Participants found the expert system usable and appreciated the support for their suspected diagnosis; the system encouraged them to perform invasive treatments, and potentially prevented them from falling victim to task fixation. Many participants expressed their desire to have such a system available during their routine clinical activities.

Future work includes refinement of rules using real-time feedback obtained by evaluating the expert system in the OR. Additionally, the intensive care unit may be even more suited to deployment of an expert system as vital sign changes tend to occur over longer periods and trend detection of ventilatory variables can readily highlight the power of combining the single variable rules of the system.

Back to Top | Article Outline

APPENDIX A: SCENARIOS

Anesthetic Vapor Overdose

Baseline (as the participant enters the room): The simulation mannequin is intubated with a 7.5-mm internal diameter endotracheal tube (ETT), mechanically ventilated with a tidal volume of 400 mL at a rate of 12 min−1, and has the following vital signs: heart rate (HR) 59 min−1, blood pressure (BP) 120/39 (63) mm Hg, blood oxygen saturation (SpO2) 99%. The isoflurane vaporizer is set to 0.6% at 3 L·min−1 of fresh gas flow with a fraction of inspired oxygen (FIO2) of 0.4.

Once the handover has been completed, the surgeon starts the mastectomy, which, given the low depth of anesthesia, changes the vital signs to: HR 66 min−1, BP 138/60 (87) mm Hg. When the participant increases the isoflurane vapor concentration so that the inspired isoflurane concentration reached 1.1%, the patient’s vital signs deteriorate to: HR 60 min−1, BP 82/38 (53) mm Hg.

Back to Top | Article Outline

Tension Pneumothorax

Baseline (as the participant enters the room): The simulation mannequin is intubated and ventilated as above and has the following vital signs: HR 75 min−1, BP 139/74 (96) mm Hg, SpO2 99%. The peak airway pressure is 16 cm H2O, the lung compliance is 58 mL·cm H2O−1, and the airway resistance is 18 cm H2O·L−1·s−1. The isoflurane vaporizer is set to 1.2%. After the handover is complete and 2 minutes have elapsed, the simulator transitions into the disease state. Without intervention, the vital sign values deteriorate over the course of 60 to 90 seconds to: HR 130 min−1, BP 76/39 (52) mm Hg, SpO2 83%. The peak airway pressure reaches 38 cm H2O, the compliance is 14 mL·cm H2O−1, and the airway resistance is 38 cm H2O·L−1·s−1. Breath sounds on the right side are absent.

Back to Top | Article Outline

Anaphylaxis

Baseline (as the participant enters the room): The simulation mannequin is intubated with a 7.5-mm ETT, mechanically ventilated with a tidal volume of 500 mL at a rate of 12 min−1 and has the following vital sign values: HR 76 min−1, BP 140/73 (96) mm Hg, SpO2 98%. The peak airway pressure is 17 cm H2O, the lung compliance is 60 mL·cm H2O−1, and the airway resistance is 20 cm H2O·L−1·s−1. The isoflurane vaporizer is set to 0.9%. After the handover is complete and 2 minutes have elapsed, the simulator transitions into the disease state. Without intervention, the vital signs deteriorate over the course of 60 to 90 seconds to: HR 127 min−1, BP 74/41 (53) mm Hg, SpO2 86%. The peak airway pressure reaches 53 cm H2O, the compliance is 8 mL·cm H2O−1, and the airway resistance is 77 cm H2O·L−1·s−1. This causes the expired tidal volume to drop to approximately 220 mL. After treatment with epinephrine, the simulated vital signs quickly return to: HR 120 min−1, BP 120/70 (87) mm Hg, SpO2 99%. The peak airway pressure improves to 18 cm H2O, the compliance to approximately 50 mL·cm H2O−1, and the airway resistance to 18 cm H2O·L−1·s−1.

Back to Top | Article Outline

Endotracheal Tube Cuff Leak

Baseline (as the participant enters the room): The simulation mannequin is intubated with a 7.5-mm ETT, mechanically ventilated with a tidal volume of 600 mL at a rate of 12 min−1 and has the following vital signs: HR 60 min−1, BP 121/46 (69) mm Hg, SpO2 99%. The peak airway pressure is 14 cm H2O, the lung compliance 75 mL·cm H2O−1, and the airway resistance is 8 cm H2O·L−1·s−1. The isoflurane vaporizer is set to 1.1% at 3 L·min−1 of flow. The baseline leak (TVinsp − TVexp), given an ETT cuff volume of 20 mL, is 90 to 110 mL. After the handover is complete, the actor playing the dentist uses a syringe to withdraw 8 mL of air from the ETT cuff. This produces a hardly noticeable change in the CO2 waveform without changes to the baseline leak amount. Two minutes after the handover, the dentist removes another 1 mL of air from the ETT cuff. This causes the CO2 waveform to become slightly narrower and increases the leak to approximately 200 mL. Four minutes after the handover, the dentist removes another 1.5 mL of air from the ETT cuff. This causes the capnogram amplitude to become very narrow, the leak to increase to approximately 300 mL, and at unchanged fresh gas flow rates a failure of the bellows to fill normally.

Back to Top | Article Outline

DISCLOSURES

Name: Matthias Görges, PhD.

Contribution: This author helped in study design, conduct of study, data analysis, and manuscript preparation.

Attestation: Matthias Görges has seen the original study data, reviewed the analysis of the data, and approved the final manuscript.

Conflict of Interest: Matthias Görges was holding a Government of Canada Postdoctoral Research Fellowship during the study period and is now holding a Canadian Institutes of Health Research postdoctoral fellowship.

Name: Pamela Winton, MBChB, BSc Med Sci (hons), FRCA.

Contribution: This author helped in study design, conduct of study, data analysis, and manuscript preparation.

Attestation: Pamela Winton has seen the original study data and approved the final manuscript.

Conflict of Interest: The author has no conflicts of interest to declare.

Name: Valentyna Koval, MD.

Contribution: This author helped in study design and conduct of study.

Attestation: Valentyna Koval approved the final manuscript.

Conflict of Interest: The author has no conflicts of interest to declare.

Name: Joanne Lim, MASc.

Contribution: This author helped in study design, conduct of study, and manuscript review.

Attestation: Joanne Lim approved the final manuscript.

Conflict of Interest: The author has no conflicts of interest to declare.

Name: Jonathan Stinson, BSc.

Contribution: This author helped in conduct of study and manuscript review.

Attestation: Jonathan Stinson approved the final manuscript.

Conflict of Interest: The author has no conflicts of interest to declare.

Name: Peter T. Choi, MD, MSc (Epid), FRCPC.

Contribution: This author helped in study design, data analysis, and manuscript review.

Attestation: Peter T. Choi reviewed the analysis of the data and approved the final manuscript.

Conflict of Interest: The author has no conflicts of interest to declare.

Name: Stephan K. W. Schwarz, MD, PhD, FRCPC.

Contribution: This author helped in study design and manuscript review.

Attestation: Stephan K. W. Schwarz approved the final manuscript.

Conflict of Interest: The author has no conflicts of interest to declare.

Name: Guy A. Dumont, PhD, PEng.

Contribution: This author helped in study design and manuscript review.

Attestation: Guy A. Dumont approved the final manuscript.

Conflict of Interest: The author has no conflicts of interest to declare.

Name: J. Mark Ansermino, MBBCh, MSc (Inf), FFA (SA), FRCPC.

Contribution: This author helped in study design, conduct of study, data analysis, and manuscript preparation.

Attestation: J. Mark Ansermino has seen the original study data, reviewed the analysis of the data, approved the final manuscript, and is the author responsible for archiving the study files.

Conflict of Interest: J. Mark Ansermino was principal investigator on a research grant from the Canadian Institutes of Health Research, which funded this project.

This manuscript was handled by: Dwayne R. Westenskow, PhD.

Back to Top | Article Outline

ACKNOWLEDGMENTS

The authors would like to thank all participating anesthesiologists for their time as well as Dr. Penny Brasher for statistical advice.

Back to Top | Article Outline

FOOTNOTES

a As pointed out by one of the reviewers this calculation is incorrect as it assumed only 4 independent variables when in fact, due to the categorical nature of some of the variables, there were more than 4. In addition, the calculation was based on the effect size of all 4 variables considered together when in fact we were interested in only 1—decision support.
Cited Here...

b http://www.chfg.org/wp-content/uploads/2010/11/ElaineBromileyAnonymousReport.pdf. Accessed May 17, 2013.
Cited Here...

c Department of Health. Protecting the breathing circuit in anaesthesia: report to the chief medical officer of an expert group on blocked anaesthetic tubing. Available at: http://www.frca.co.uk/documents/Protecting%20the%20PBC.pdf (p. 42). Accessed May 17, 2013.
Cited Here...

Back to Top | Article Outline

REFERENCES

1. Lawless ST. Crying wolf: false alarms in a pediatric intensive care unit. Crit Care Med. 1994;22:981–5
2. Borowski M, Görges M, Fried R, Such O, Wrede C, Imhoff M. Medical device alarms. Biomed Tech (Berl). 2011;56:73–83
3. Block FE Jr, Nuutinen L, Ballast B. Optimization of alarms: a study on alarm limits, alarm sounds, and false alarms, intended to reduce annoyance. J Clin Monit Comput. 1999;15:75–83
4. Webb RK, Currie M, Morgan CA, Williamson JA, Mackay P, Russell WJ, Runciman WB. The Australian Incident Monitoring Study: an analysis of 2000 incident reports. Anaesth Intensive Care. 1993;21:520–8
5. Korniewicz DM, Clark T, David Y. A national online survey on the effectiveness of clinical alarms. Am J Crit Care. 2008;17:36–41
6. Imhoff M, Kuhls S. Alarm algorithms in critical care monitoring. Anesth Analg. 2006;102:1525–37
7. Saeed M, Mark RG. Multiparameter trend monitoring and intelligent displays using wavelet analysis. Comput Cardiol. 2000;27:797–800
8. Simons DJ, Ambinder MS. Change blindness. Theory and consequences. Curr Dir Psychol Sci. 2005;14:44–8
9. Tappan JM, Daniels J, Slavin B, Lim J, Brant R, Ansermino JM. Visual cueing with context relevant information for reducing change blindness. J Clin Monit Comput. 2009;23:223–32
10. Sanborn KV, Castro J, Kuroda M, Thys DM. Detection of intraoperative incidents by electronic scanning of computerized anesthesia records. Comparison with voluntary reporting. Anesthesiology. 1996;85:977–87
11. Westenskow DR, Orr JA, Simon FH, Bender HJ, Frankenberger H. Intelligent alarms reduce anesthesiologist’s response time to critical faults. Anesthesiology. 1992;77:1074–9
12. Gaba DM, Howard SK, Small SD. Situation awareness in anesthesiology. Hum Factors. 1995;37:20–31
13. Ansermino JM, Daniels JP, Hewgill RT, Lim J, Yang P, Brouse CJ, Dumont GA, Bowering JB. An evaluation of a novel software tool for detecting changes in physiological monitoring. Anesth Analg. 2009;108:873–80
14. Brouse C, Dumont G, Yang P, Lim J, Ansermino JM. iAssist: a software framework for intelligent patient monitoring. Conf Proc IEEE Eng Med Biol Soc. 2007;2007:3790–3
15. Dosani M, Lim J, Yang P, Brouse C, Daniels J, Dumont G, Ansermino JM. Clinical evaluation of algorithms for context-sensitive physiological monitoring in children. Br J Anaesth. 2009;102:686–91
16. Dunsmuir D, Daniels J, Brouse C, Ford S, Ansermino JM. A knowledge authoring tool for clinical decision support. J Clin Monit Comput. 2008;22:189–98
17. Ansermino JM, Dosani M, Amari E, Choi PT, Schwarz SK. Defining rules for the identification of critical ventilatory events. Can J Anaesth. 2008;55:702–14
18. Görges M Development and Evaluation of an Event Recognition Alarm System Using a High Fidelity Patient Simulator [MSc Thesis]. 2005 Hamburg, Germany HAW-Hamburg
19. Hart SG, Staveland LEHancock P, Meshkati N. Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Human Mental Workload. 1988 Amsterdam, the Netherlands North Holland Press:139–83
20. Lewis JR. IBM computer usability satisfaction questionnaires: psychometric evaluation and instructions for use. Int J Hum-Comput Int. 1995;7:57–78
21. Cohen J Statistical Power Analysis for the Behavioral Sciences. 19882nd ed Hillsdale, NJ Lawrence Erlbaum Associates:590
© 2013 International Anesthesia Research Society