It is evident, similar to the findings in aviation, that a large majority of human error that occurs in medicine is associated with nontechnical skills (NTS) or behavioral issues often referred to as human factors.1 Behavioral markers or NTS are aspects of individual and team performances that can be observed and include terms such as “teamwork,” “communication,” and “situational awareness.” There is evidence in both aviation2 and medicine3 that NTS directly relate to either positive or negative outcomes. To assess NTS, a number of tools have been developed such as the Anesthetists' Non-Technical Skills (ANTS),4 the Non-Technical Skills for Surgeons,5,6 the Ottawa Crisis Resource Management Global Rating Scale,7 and the Assessment of Obstetric Team Performance7 to name a few. These tools can be used to provide formative feedback for trainees and to assess the performance of practicing physicians.
The concept of using high-fidelity simulation for research and training in medicine developed into formal courses in Crisis Resource Management using realistically simulated crises, followed by debriefing, to teach NTS.8,9 Although a number of studies have suggested that simulation may improve NTS performance,10–13 most work has been done with residents, and there is no evidence in the literature to demonstrate that these interventions actually improve performance of practicing anesthesiologists.
The purpose of this study was to determine whether a high-fidelity simulation educational debriefing session improved the NTS of practicing anesthesiologists in the management of simulated anesthetic scenarios. We chose to use the ANTS tool for this study as it was specifically developed to assess the NTS of anesthesiologists.
In a previous study at the Canadian Simulation Centre for Human Performance and Crisis Management Training, 67 practicing anesthesiologists managed a 45-minute standardized anesthetic case using high-fidelity simulation.14 Each participant returned 5 to 9 months later and managed a second scenario. Before beginning the study, subjects were randomly assigned to either group A: control group (no educational intervention) or to group B: simulation debriefing. Anesthesiologists assigned to the control group received no feedback after their first session and were dismissed, but they were offered faculty-facilitated debriefing after their second session. Subjects in the debriefing group received a 45- to 60-minute Crisis Resource Management Training (ACRM)-guided debriefing using the videotape of their performance as a template for discussion. Subjects were also randomly assigned to receive either scenario 1 or 2 as the pretest scenario. All performances were videotaped. The purpose of this previous study was to determine whether simulation debriefing improved the technical performance of practicing anesthesiologists. Fifty-nine subjects completed both sessions, and videotaped footage was available for those subjects.
After Research Ethics Board approval, two independent raters reviewed the archived videotapes of these 59 subjects for both scenarios and evaluated the NTS scores using the ANTS System.
Two scenarios developed using the Delphi technique and tested for equivalence of difficulty in a previous study were used to assess performance.15 The first scenario was an 80-year-old female with significant coronary and peripheral vascular disease undergoing a laparotomy for bowel resection (scenario 1). During the case, two independent events occurred with time for recovery between events; initially, peritoneal traction led to bradycardia and hypotension and later on in the case, a significant amount of blood loss occurred leading to hypotension, myocardial ischemia, and ultimately pulseless electrical activity. The second patient was a 52-year-old obese male smoker for a laparoscopic cholecystectomy (scenario 2). In this case, excessive intraabdominal insufflation pressures led to high airway pressure and rising CO2. Later on in the case, the patient began to desaturate secondary to a tension pneumothorax followed by ventricular fibrillation.
Study subjects were expected to review the patient's chart, interview the patient, and prepare the operating room and equipment as per their usual practice, after which they would begin induction of anesthesia. Actors portrayed scripted roles of surgeon, scrub nurse, circulating nurse, and respiratory therapist. A second anesthesiologist was available as back up if requested.
Debriefing was conducted by one of the three faculty who were experts in simulation debriefing. The debriefing session was standardized for this study. The initial part of the debriefing consisted of a powerpoint presentation reviewing the types of human errors that can occur during the medical management of a clinical situation. These included a discussion of knowledge-based, skill-based, rule-based, technical, and latent errors. Once completed, subjects were shown the videotape of their performance which was stopped at three to four intervals (predetermined by the facilitator) and a nonjudgmental discussion of the subjects' management of the situation focusing on the human error categories listed above was conducted. The facilitator chose the debriefing points based on the participant's live performance and may have included a discussion of technical skills, such as advanced cardiac life support (ACLS) protocols, and NTS, such as communication and task delegation. The facilitator reviewed with the participant ideas for mitigating error specifically related to the NTS issues that were observed.
ANTS Video Review
The ANTS system, a previously developed behavioral marking system, involves assessment of four categories of behavior: (1) task management, (2) team working, (3) situation awareness, and (4) decision making. Within each category, three to four elements are identified for a total of 15 skill elements. For each element, a rating label with anchored descriptors allows assessment of the NTS with 1 = poor (endangered patient safety) to 4 = good (outstanding, an example for others).
Evaluation of the behavioral marking system has demonstrated that the system is defined, the skills are observable, and can be rated with acceptable levels of agreement. Internal consistency was also found to be acceptable and validity determined to be satisfactory16 (Appendix).
Two video reviewers (one anesthesiologist and one anesthesia assistant) were recruited to evaluate the videotapes of all 59 subjects using the ANTS system. The video reviewers attended a workshop where the tool was explained and the individuals worked with the investigators (P.J.M. and S.B.) in trialing the ANTS on a sample of videotaped performances. After the workshop, the video reviewers independently viewed sample videotapes and underwent further calibration training until the intraclass correlation coefficient (ICC) between the reviewers and “expert” scores was >0.9, after which the reviewers began their assessment of the study subjects' videotapes.
For purposes of the review, each scenario was divided into two major sections: the first section (precritical event) began when the anesthesiologist entered the operating room and finished as the first critical event occurred. The second section (postcritical event) began at the beginning of the first critical event and ended approximately 5 minutes after cardiac arrest. Reviewers were asked to provide separate ANTS scores for the two sections.
Videotaped scenarios were presented to the reviewers in a random order so reviewers did not know whether they were viewing a subject's first or second session, whether the subject had received debriefing, and were blinded to the study subject's name or institutional affiliation.
Data were analyzed using SPSS (version 17.0). A total ANTS score was obtained by adding up the four category scores (minimum score of 4 and maximum score of 16) and the scores for the two raters averaged. These scores were analyzed parametrically using a 2 × 2 × 2 × 4 mixed-design analysis of variance, with group (debrief or no debrief) and scenario order (Sc1-Sc2 and Sc2-Sc1) as between subject variables and session (1 and 2), section (precritical event and postcritical event), and items (1, 2, 3, and 4) as repeated measures. A P value of <0.05 was considered significant. Analyzing global ratings parametrically as continuous data (when possible) has become convention in the educational literature because it is more powerful than nonparametric analysis.17 Session refers to whether it was the subject's first time in the simulator (session 1) or second time in the simulator (session 2). Scenario order refers to the order in which they participated in the scenarios (Sc1-Sc2 or Sc2-Sc1). Debrief refers to whether or not the participant received a debriefing after the first session. Items refers to the four global items on the ANTS with item 1 = task management, 2 = team working, 3 = situation awareness, and 4 = decision making. Section refers to either the precritical or critical event section of the scenarios.
Interrater reliability of the video reviewers for both category and element data was evaluated using ICC.
Data for one videotape could not be used because of technical difficulties leaving 58 completed data sets. Demographic data are listed in Table 1.
The interrater reliability for the total ANTS score was 0.44, P < 0.001.
Overall, there was a main effect of session. The scores on the ANTS improved approximately 5% from session 1 to 2 (2.81–2.94, P < 0.01). That is, no matter to what group the subjects were assigned, their scores improved between sessions 1 and 2. There was no significant difference in scores in the “precritical event” period versus the “postcritical event” period (2.87 vs. 2.88, P = 0.71), and there was no main effect of scenario (2.81 vs. 2.91, P = 0.44), indicating that which scenario was used during the first session did not influence the results.
There was no main effect of debriefing and no significant interaction of debrief by session. However, there was a significant debrief by session by item interaction; F(3,153) = 2.84, mean squared error (MSE) = 0.087, P < 0.05. Post hoc analyses using paired and independent sample t tests revealed that for the categories 1 and 2—“task management” and “team working,” neither of the groups improved from session 1 to 2. For the category 3—“situation awareness,” the debrief group improved but the no debrief group did not. For the category 4—“decision making,” both groups improved between sessions (Fig. 1).
Training in error management and NTS is required for healthcare professionals to optimize the safety of patients.18 There is evidence that good NTS can lead to positive outcomes for both the patient and the team3,19–21 and that adverse events in surgery are primarily caused by failures in perception, judgment, communication, and teamwork.22–24 de Leval et al have demonstrated that despite major breakdowns in human factors during cardiac procedures, recognition and compensation can prevent catastrophic events. ACRM addresses the NTS of participants after simulation scenario management to facilitate change and heighten awareness of communication and teamwork issues.
This study was designed to represent the type of continuing medical education program that we believed a practicing anesthesiologist might attend were it offered for maintenance of certification credits. Considering that the anesthesiologist would most likely have to attend the session after hours, commute to a simulation center, take part in the simulation, and receive debriefing, it was felt that a 45-minute simulation followed by a 45- to 60-minute ACRM-guided debrief would be a reasonable intervention and was of similar length or longer than many simulation educational interventions published in the literature. Because we felt that it would be unlikely for a practicing anesthesiologist to attend a simulation session more than once a year, it was decided to assess the efficacy of the intervention 5 to 9 months after the first session.
In our original study,14 we demonstrated a statistically significant improvement in the technical performance of anesthesiologists 5 to 9 months after the first session in the group who received the debriefing intervention. At the time we began that study, there were no valid, reliable tools to assess NTS. Since that time, the ANTS scoring system was developed and validated and was therefore the tool we chose to assess the NTS of our subjects. There was also evidence in the literature to suggest that the ANTS tool was a sensitive measure because studies with residents demonstrated an improvement in ANTS scores after simulation-based education.11–13
Unfortunately, the findings in our study did not mirror those found in the studies with anesthesia residents. Our results demonstrate a marginal improvement in ANTS scores over time and no effect of debriefing except for the situation awareness category of the ANTS tool. In the study by Yee et al, the participants were residents who worked in teams of three and in total received nine debriefings over the course of the study, and their assessments were separated only by a period of 1 month. In the study by Savoldelli et al, residents managed an 8-minute simulation, and their NTS assessments were done on the same day as the pretest. In addition, their ANTS scores were lower at baseline than those found in our study (2.3 among their residents vs. 2.81 in our study group of experienced practitioners) so that the lower baseline NTS among fairly inexperienced residents may have simply left more room for improvement. In addition, the improvement in the ANTS scores of the residents may also reflect increased familiarity with the simulation environment. Similar to our findings, Müller demonstrated improvement in ANTS scores in two groups of residents, but there was no difference between the groups who received Crisis Resource Management NTS training versus those who received clinical training using seminars. The posttest assessment was done on the same day as the pretest and intervention.10
The interrater reliability of our two raters was quite low despite an initially high correlation after training. We demonstrated correlations ρ >0.9 (Pearson correlation coefficients) between our raters and two experts before beginning the study but an interrater reliability (IRR) of 0.44 for the study sessions. The initial training was quite intensive and there were two workshops where the raters and the experts watched videotaped performances together and discussed them in depth until consensus was reached. A short time after these sessions, the raters independently viewed other videos of simulation sessions. These rating sheets were collected and compared with the experts and a high correlation was found. The raters then viewed the videotapes of the 59 sessions involved in this study over the course of a year without any further feedback or discussion. It may be that there is deterioration over time when using the ANTS tool and raters may also have become less attentive over time considering the number of hours of viewing required to complete the rating scales.
Interrater reliabilities of ANTS in other studies have been quite variable. In the study by Yee et al,13 the single rater ICC was 0.53, with two-rater IRRs for the ANTS category level at 0.53 and element level of 0.50. Similarly, the study by Savoldelli et al11 reported a single rater ICC at the category level of 0.58. Two raters were used in this study. Fletcher et al16 reported an IRR at the ANTS category level to range from 0.56 to 0.65 and at the element level 0.55 to 0.65. Welke et al12 reported a two-rater IRR for the category level of 0.62. In a recent study, 26 specialist anesthesiologists were trained to use the ANTS tool in an 8-hour program.25 ICC for each element was 0.11 to 0.62. Their goal of reaching an acceptable level of correlation of >0.7 was not achieved. A study to determine the efficacy of simulation-based training in the management of weaning from cardiopulmonary bypass by anesthesia residents demonstrated an IRR of 0.95, the only study in the literature with an IRR >0.7 using ANTS.26
This study did not demonstrate any appreciable improvement in NTS of practicing anesthesiologists using the ANTS tool as a measurement. A 45-minute simulation scenario followed by 45 to 60 minutes debriefing may simply not be a sufficient enough intervention to change behaviors of experienced anesthesiologists who may have already acquired many NTS over the years of practice, thereby lessening the impact of simulation training. In addition, retesting at 5 to 9 months may be too long an interval to retain information. Despite our attempts to minimize the number of debriefers and to standardize the debriefing, the debriefing intervention may have been variable and may have contributed to the negative findings in this study. Finally, the personality of the individual participant and his/her receptivity to feedback and change in practice may also play a role in the lack of improvement. This study also raises the question of whether the ANTS tool is the best measure of NTS of practicing anesthesiologists.
The results of this study highlight the need for researchers to carefully consider the design of a study attempting to improve NTS of experienced practitioners and tools that will be used to measure these skills.
1. Fletcher GC, McGeorge P, Flin RH, Glavin RJ, Maran NJ. The role of non-technical skills in anaesthesia: a review of current literature. Br J Anaesth
2. Connelly P. A Resource Package for CRM Developers: Behavioral Markers of CRM Skill from Real World Case Studies and Accidents. Austin, TX: Aerospace Crew Research Project; 1997:1–62.
3. Carthey J, de Leval MR, Wright DJ, Farewell VT, Reason JT. Behavioural markers of surgical excellence. Saf Sci
4. Fletcher G, Flin R, McGeorge P, Glavin R, Maran N, Patey R. Rating non-technical skills: developing a behavioural marker system for use in anaesthesia. Cogn Tech Work
5. Yule S, Flin R, Maran N, Rowley D, Youngson G, Paterson-Brown S. Surgeon's non-technical skills in the operating room: reliability testing of the NOTSS behaviour rating system. World J Surg
6. Kim J, Neilipovitz D, Cardinal P, Chiu M, Clinch J. A pilot study using high-fidelity simulation to formally evaluate performance
in the resuscitation of critically ill patients: The University of Ottawa Critical Care Medicine, High-Fidelity Simulation, and Crisis Resource Management I Study. Crit Care Med
7. Tregunno D, Pittini R, Haley M, Morgan P. The development and usability of a behavioral marking system for performance
assessment of obstetrical teams. Qual Saf Health Care
8. Howard SK, Gaba DM, Fish KJ, Yang G, Sarnquist FH. Anesthesia crisis resource management training: teaching anesthesiologists to handle critical incidents. Aviat Space Environ Med
9. Gaba DM, DeAnda A. A comprehensive anesthesia simulation environment: re-creating the operating room for research and training. Anesthesiology
10. Müller MP, Hänsel M, Fichtner A, et al. Excellence in performance
and stress reduction during two different full scale simulator training courses: a pilot study. Resuscitation
11. Savoldelli G, Naik VN, Park J, Joo HS, Chow R, Hamstra SJ. Value of debriefing during simulation crisis management: oral versus video-assisted oral feedback. Anesthesiology
12. Welke TM, LeBlanc VR, Savoldelli GL, et al. Personalized oral debriefing versus standardized multimedia instruction after patient crisis simulation. Anesth Analg
13. Yee B, Naik VN, Joo HS, et al. Nontechnical skills in anesthesia crisis management with repeated exposure to simulation-based education. Anesthesiology
14. Morgan PJ, Tarshis J, LeBlanc V, et al. Efficacy of high-fidelity simulation debriefing on the performance
of practicing anaesthetists in simulated scenarios. Br J Anaesth
15. Morgan PJ, Lam-McCulloch J, Herold-McIlroy J, Tarshis J. Simulation performance
checklist generation using the Delphi technique. Can J Anesth
16. Fletcher G, Flin R, McGeorge P, Glavin R, Maran N, Patey R. Anaesthetists' Non-Technical Skills (ANTS): evaluation of a behavioural marker system. Br J Anaesth
17. Cohen J. Statistical Power Analysis for the Behavioral Sciences
. New York, NY: Academic Press; 1977.
18. Patey R. Identifying and assessing non-technical skills. Clin Teacher
19. Edmondson A. Speaking up in the operating room: how team leaders promote learning in interdisciplinary action teams. J Manag Stud
20. Healey AN, Undre S, Vincent CA. Developing observational measures of performance
in surgical teams. Qual Saf Health Care
2004;13(suppl 1): i33–i40.
21. Moorthy K, Munz Y, Adams S, Pandey V, Darzi A. A human factors analysis of technical and team skills among surgical trainees during procedural simulations in a simulated operating theatre. Ann Surg
22. de Leval MR, Carthey J, Wright DJ, Farewell VT, Reason JT. Human factors and cardiac surgery: a multicenter study. J Thorc Cardiovasc Surg
23. Way LW, Stewart L, Gantert W, et al. Causes and prevention of laparoscopic bile duct injuries: analysis of 252 cases from a human factors and cognitive psychology perspective. Ann Surg
24. Gawande AA, Zinner MJ, Studdert DM, Brennan TA. Analysis of errors reported by surgeons at three teaching hospitals. Surgery
25. Graham J, Hocking G, Giles E. Anaesthesia non-technical skills: can anaesthetists be trained to reliably use this behavioural marker system in 1 day? Br J Anaesth
26. Bruppacher HR, Alam SK, LeBlanc VR, et al. Simulation-based training improves physicians' performance
in patient care in high-stakes clinical setting of cardiac surgery. Anesthesiology