A variety of performance patterns were recognized for each of the events. In the initial scenario, only one half of the participants (21 of 42) recognized that the condition was anaphylaxis (heart rate [HR], 140 bpm; BP, 75/40; bronchospasm; O2 saturation, 85%) during the initial 3 min. After participants received a verbal prompt from the recovery room nurse indicating the patient had a rash(3 min), the majority of them (38 of 42) were able to diagnose anaphylaxis. Only 27 of 42 participants treated the condition with epinephrine.
The myocardial ischemia exercise required trainees to recognize that tachycardia (HR max, 130 bpm) and hypertension (BP max, 180/120) were associated with ST increase in the ECG (lead II ST-T wave increase, 2.7 mm). The diagnosis of myocardial ischemia was established by 29 of the 42 participants during the exercise. Almost all of the senior residents(15 of 16) were able identify the presence of myocardial ischemia, but less than half of the student nurse anesthetists (7 of 15) and CA-1 residents (5 of 12) were able to recognize myocardial ischemia. Despite failure to recognize the diagnosis, the majority of trainees initiated therapy to treat the tachycardia and hypertension.
In the atelectasis scenario, participants were expected to recognize that a definitive step to improve oxygenation was either to suction or provide larger tidal volumes. The intubated mannequin had an O2 saturation of 88%, decreased lung compliance, and reduced tidal volumes. Whereas more than half of the senior residents performed these actions (11 of 16), only 7 of 27 student nurse anesthetist and junior residents accomplished either of these definitive steps.
The stroke scenario required participants to recognize an intracerebral event in a postoperative patient. Eighteen of the 42 participants did not recognize that a cerebral vascular event had occurred in this bradycardic, hypertensive, and unresponsive simulated patient who had a dilated pupil. Trainees who recognized the diagnosis were also more likely to indicate the need for consultation and securing the airway.
In the final scenario, most of the trainees (34 of 42) recognized the need for reintubation after examining a tachypneic postoperative patient in respiratory failure. The mannequin was receiving 100% O2 with a nonrebreathing mask and had a respiratory rate of 28 breathes/min and O2 saturation of 75%. All 16 senior residents, 10 of 15 student nurse anesthetists, and 9 of 12 CA-1 residents reintubated the mannequin during the 5-min exercise.
In terms of overall performance, fewer than 20% of all participants were able to complete all three key actions for the stroke scenario. In contrast, the ventricular tachycardia scenario was managed effectively; more than 75% of all participants were able to complete the key actions, often in much less than the prescribed 5-min time period. Across all six scenarios, a larger percentage of the senior residents were able to complete all three key actions than either of the other two study groups.
ANOVA was used to test for specific differences in performance among groups and across scenarios. For the analysis based on the weighted checklist, the case by group interaction was not significant. This indicates that the relative performance of the individuals in each group did not vary as a function of the case and suggests that, whereas there are group differences, all groups found that the cases were similar in terms of individual case difficulty. However, there was a significant main effect attributable to group (F = 11.2; P < 0.01). This result reveals that, averaged over the six cases, there was a significant difference in mean scores among the senior residents, junior residents, and nurse anesthetists. A post hoc analysis (Scheffé test for multiple comparisons) revealed that the senior residents out-performed the nurses (mean difference = 11.2; P < 0.05). Although the junior residents also out-performed the nurses (mean difference = 5.0), this difference was not statistically significant. Finally, the senior residents out-performed the junior residents (mean difference = 6.2), but this effect was not statistically significant. There was also a significant main effect attributable to case (F = 17.5; P < 0.01). This indicates that, averaged over all study participants, the scenarios were not of equal difficulty.
The results for the other two scoring systems (global rating and key action) were similar to those for the weighted checklists. For these two analyses, there was no significant group-by-case interaction. This indicates that the group differences in performance were consistent across cases. The group main effects were all significant(Fglobal = 16.8; P < 0.01; Fkey action = 16.6; P < 0.01) revealing differential mean performance by group. Based on post hoc analysis of the global scores, the senior residents significantly(P < 0.05) out-performed the junior residents (mean difference = 1.0) and the nurse anesthetists (mean difference = 1.7). The differences observed between junior residents and nurse anesthetists (mean difference = 0.7) were not statistically significant. Similar results were found for the key action scores. That is, senior residents significantly out-performed junior residents and nurse anesthetists, but there were no statistically significant differences in scores between junior residents and nurse anesthetists. Similar to the weighted checklist analysis, there were significant main effects attributable to case (Fglobal = 8.8; P < 0.01; Fkey action = 13.6; P < 0.01). Averaged over study participants, the cases were not of equal difficulty.
Generalizability analysis was used to evaluate the sources of variance in scores and, in particular, to determine how reliable raters were in assigning scores and how consistent participants were in managing the scenarios. In this study, the variance attributable to the raters, and associated interactions, were relatively small. This indicates that the raters identified comparable scoring end-points for each event and were reasonably consistent in their assignment of scores for each exercise(Table 3). Although participants’ abilities varied depending on the content of the exercise, raters rank-ordered trainee performances in a near identical manner for each scenario. These rater variances were similar, and relatively small, whether analyzed across the entire participant group or within groups of participants (student nurse anesthetist, junior residents, and senior residents) (Table 3). This indicates that a trainee’s score is unlikely to vary as a function of the number of raters or the scoring method used to quantify the performance (Table 3). The largest variance component for the checklist, key action, and global scoring methods was related to the content of the exercises (trainee ×scenario) (Table 3). Therefore, the reliability of the participants’ overall scores will be more dependent on the number of scenarios in the assessment as opposed to the number of raters for a given scenario. Overall, whereas the use of six encounters resulted in moderately reliable scores, additional performance samples would be required if more precise ability estimates were required.
Based on our brief six-scenario assessment, the senior residents received higher scores than both the student nurse anesthetists and junior residents on the simulation exercises. For most scenarios, the junior residents and student nurse anesthetists had, on average, comparable performances. These findings provide some evidence to support the discriminant validity of the multi-scenario simulation exercise. The senior residents, having both additional training and increased patient management experience, would be expected to be able to handle the acute care scenarios more effectively and efficiently. Likewise, based on the similar duration of training and anesthesia experiences of the junior residents and student nurse anesthetists, one might not expect meaningful performance differences between these groups of trainees.
The scores obtained from individual scenarios provide a way to evaluate how well trainees perform in various types of encounters and to make some inferences about trainee skill in specific domains of practice. For example, most trainees, regardless of group, successfully managed the ventricular tachycardia scenario. Trainees in all three participant groups were able to recognize and effectively administer the prescribed treatment in less than five minutes from the onset of the arrhythmia and frequently in less than two minutes. Although most of our participants had never encountered a patient in ventricular tachycardia in the operating room environment, it would seem that their previous training in advanced cardiac life support prepared them to manage this condition in an intraoperative environment. The algorithms and arrhythmia recognition skills acquired in advanced cardiac life support training likely translated to enhanced performance in a simulation laboratory.
The comparable performance of the three groups on the ventricular tachycardia scenario also provides some evidence to support fairness in scoring models and, at least for this scenario, the content of the exercise. All three groups effectively managed this exercise, and many participants received the highest possible score. The requisite skills to obtain the maximum possible score were achieved by nurse and physician groups. If scenarios were designed to test in-depth knowledge as well as clinical skill, then group comparisons would be expected to favor the residents who have more extensive knowledge. Our goal in developing the evaluation was to evaluate requisite skills in acute care management, rather than measure in-depth knowledge of pathophysiology of disease process.
Unlike the ventricular tachycardia scenario, participants did not effectively manage some of the exercises. Two postoperative scenarios, stroke and anaphylaxis, were more difficult for all participants, regardless of previous training. Clinical findings in these two simulations (stroke = increased BP, bradycardia, and unresponsiveness in addition to a dilated left pupil; anaphylaxis = bronchospasm, tachycardia, and hypotension) were not subtle but seemed to be more difficult for participants to identify and subsequently manage. These results indicate that simulation-based assessment might be helpful to identify deficits in skill acquisition during training. If training strategies using simulation were available, then participants could potentially manage these conditions as well as the ventricular tachycardia scenario. The complexity of many conditions and non-uniform treatment algorithms may make training strategies more difficult to develop for these events when compared with more straightforward scenarios such as the ventricular tachycardia exercise. An alternative explanation for the performance deficits might be that these two conditions were simply modeled in a manner that made it more difficult for all providers to recognize and treat. More study of these exercises, and additional scenarios of related content, are required to determine if the results are generalizable to other acute care postoperative conditions or if similar performance deficiencies are found in graduates of other training programs.
There were several limitations to our study that warrant discussion. Trainees managed the scenarios in the same order and received performance feedback about their performance during the study period. If a simulation-based assessment is to be used as a summative evaluation method, then steps to enhance the security of the exercises and standardize feedback during the evaluation would need to be implemented in future studies. The majority of our raters (five of the six raters) were recruited from the faculty at the training site used by the residents and student nurse anesthetists. Trainee scores may be subject to rater bias or “halo” effects particularly when raters are aware of the training level of participants. This bias might be manifest by variances in scores recorded by blinded and unblinded raters. The more variation that there is among raters in scoring actions, the greater the potential is for this type of bias. However, if variances among raters are minimal, then a small number of raters can be used to establish a reliable score. Fortunately, in this study, as in our previous studies, the variance among raters’ scores was small, indicating that regardless of whether we used a single rater(blinded or unblinded) or the mean ratings of multiple raters, the trainee’s overall assessment score would be similar. Simple, unambiguous scoring systems with defined end-points for performance may be important to decreasing the potential for rater bias.
A simulation-based assessment may be a valuable tool for understanding the relationship between a specialist’s training and clinical experiences in developing and maintaining skill. The senior residents with more training and experience performed better than the junior residents and nurse anesthetists. If experience were a key requirement to developing requisite skills, then additional practice experience beyond clinical training would potentially narrow the differences in performance among groups. If training were an essential requirement to develop requisite skill, then differences between the senior residents and nurse anesthetist group would be expected to persist beyond training.
The skills required to manage these modeled situations are relevant for both nurse anesthetists and anesthesiologists, but the content domain of acute care is certainly more expansive than represented by the six scenarios modeled in this study. Therefore, replicating this study with additional scenarios would be valuable. The resident and nurse participants represent trainees from a small number of training programs. As a result, it is unclear whether the results of our investigation will generalize to trainees in other programs. By increasing the number of performance samples for each participant as well as the number of trainees, a more detailed analysis of the content and nature of a simulation-based assessment could be provided. This information could be used to assess skill acquisition during training and to develop training and assessment strategies using life-sized mannequins.
At present, there are few, if any, methods available to determine whether a professional has the skills required to manage complex, high-acuity events(18–24). A simulation-based assessment strategy could be developed for critical events, but additional studies that explore content domain and fidelity of the exercises are required. A key goal of future investigations will be to explore the relationship between a provider’s skill managing simulated patients and associated measures of clinical performance.
1.Boulet JR, Murray DJ, Kras J, et al. Reliability and validity of a simulation-based acute care skills assessment for medical students and residents. Anesthesiology 2003;99:1270–80.
2.Chopra V, Gesink BJ, DeJong J, et al. Does training on an anaesthesia simulator lead to improvement in performance? Br J Anaesth 1994;73:293–7.
3.Devitt JH, Kurrek MM, Cohen MM, et al. The validity of performance assessments using simulation. Anesthesiology 2001;95:36–42.
4.Gaba DM, Howard SK, Flanagan B, et al. Assessment of clinical performance during simulated crises using both technical and behavioral ratings. Anesthesiology 1998;89:8–18.
5.Holzman RS, Cooper JB, Gaba DM, et al. Anesthesia crisis resource management: real-life simulation training in operating room crises. J Clin Anesth 1995;7:675–87.
6.Jacobsen J, Lindekaer AL, Ostergaard HT, et al. Management of anaphylactic shock using a full scale anaesthesia simulator. Acta Anaesthesiol Scand 2001;45:315–9.
7.Lindekaer AL, Jacobsen J, Andersen G, et al. Treatment of ventricular fibrillation during anaesthesia in an anaesthesia simulator. Acta Anaesthesiol Scand 1997;41:1280–4.
8.Monti EJ, Wren K, Haas R, Lupien AE. The use of an anesthesia simulator in graduate and undergraduate education. CRNA 1998;9:59–66.
9.Murray DJ, Boulet J, Ziv A, et al. An acute care skills evaluation for graduating medical students: a pilot study using clinical simulation. Med Educ 2002;36: 833–41.
10.Murray DJ, Boulet JR, Kras JD, et al. Anesthesia acute care skills: a simulation-based anesthesia skills assessment for residents. Anesthesiology 2004;101:1084–95.
11.O’Donnell J, Fletcher J, Dixon B, Palmer L. Planning and implementing an anesthesia crisis resource management course for student nurse anesthetists. CRNA 1998;9:50–8.
12.Schwid HA, Rooke GA, Carline J, et al. Evaluation of anesthesia residents using mannequin-based simulation: a multi-institutional study. Anesthesiology 2002;97:1434–44.
13.Boulet J, McKinley DW, Norcini J, Whelan GP. Assessing the comparability of standardized patient and physician evaluations of clinical skills. Adv Health Sci Educ Theory Pract 2002;7:85–97.
14.Dillon GF, Boulet JR, Hawkins RE, Swanson DB. Simulations in the United States medical licensing examination™ (USMLE™). Qual Saf Health Care. 2004;13:i41–5.
15.Norcini J, Boulet J. Methodological issues in the use of standardized patients for assessment. Teach Learn Med 2003;15:293–7.
16.Rothman AI, Blackmore D, Dauphinee WD, Reznick R. The use of global ratings in OSCE station scores. Adv Health Sci Educ Theory Pract 1997;1:215–9.
17.Brennan RL. Generalizability theory. New York: Springer-Verlag, 2001:1–538.
18.Gaba DM. What makes a ‘Good’ anesthesiologist. Anesthesiology 2004;101:1061–3.
19.Issenberg SB, McGaghie WS, Hart IR, et al. Simulation technology for health care professional skills assessment. JAMA 1999;282:861–6.
20.JhaAK, DuncanBW, Bates DW. Simulator-based training and patient safety, making health care safer: a critical analysis of patient safety practices (evidence report/technology assessment No. 43—AHRQ publication 01-E058). Shojania KG, Duncan BW, McDonald KM, Wachter RM, eds. Rockville, MD: Agency for Healthcare Research and Quality, 2001:510–7.
21.Accreditation Council for Graduate Medical Education Web Site. ACGME Outcome Project. Available at www.acgme.org
22.Epstein RM, Hundert EM. Defining and assessing professional competence. JAMA 2002;287:226–35.
23.Leach DC. Competence is a habit. JAMA 2002;287:243–4.
© 2005 International Anesthesia Research Society
24.Arens JF. Do practitioner credentials help predict safety in anesthesia practice? APSF Newsletter 1997;12:6–8.