The evidence supporting a positive relationship between effective team performance and patient outcomes1–5 has been the impetus behind the endorsement by the World Health Organization,6 and other national health organizations and medical licensing bodies,7–12 for strongly encouraging academic institutions to integrate team training curricula at all levels of education. Despite this, educators remain challenged by the scarcity of research in undergraduate health professions education to guide the design, delivery, and assessment of such interprofessional curricula.13 Of additional concern is the lack of measurement instruments to evaluate team-based performance during simulated resuscitation scenarios. Currently, the construct validity of existing instruments is limited, and they have been tested only within a postgraduate context.14–17
The first step in developing any evaluation instrument is to understand the theoretical framework that supports the constructs to be measured.18 The concepts of team performance are drawn from the research in the assessment of persons from both aviation19 and military contexts.20,21 In aviation, the introduction of Crew Resource Management19—specifically, the concepts of leadership, roles and responsibilities, communication, situation awareness, and resource utilization—was used as a framework for the development of flight personnel training curriculum to improve the effectiveness of teamwork and, ultimately, passenger safety. Medicine adopted these same concepts and renamed it Crisis Resource Management (CRM)22 to align with health care professionals’ team roles and processes. Today, CRM is the most frequently used framework for designing interprofessional education (IPE) curricula and evaluating team training in medical education.23
To date, only a few instruments exist for assessing interprofessional team training initiatives. The generalizability of all these instruments is limited to a postgraduate context. The first instrument developed, the Anesthesia Nontechnical Instrument,14 was designed specifically for evaluating the performance of health care professionals in the operating room. This instrument reports a moderate to strong reliability (α = 0.76–0.86) with evidence to support satisfactory content validity, based on fair to moderate interrater agreement.
The Mayo High Performance Teamwork Scale (MHPTS), tested with 19 residents and 88 nurses, was designed for team members to evaluate their own team as they managed an emergency scenario.15 The 16-item scale is scored 0 (never or rarely), 1 (inconsistently), and 2 (consistently). Findings from the testing, however, exposed some scoring issues. Lack of an event in the scenario led evaluators to assign some zeros, which researchers felt negatively biased the analysis of the teams’ performance. Consequently, these scores were changed to 1 (inconsistent) to reduce the scoring bias that resulted when the event did not occur during the scenario. This methodology is questionable and limits the trustworthiness of concepts evaluated. Furthermore, despite evidence of satisfactory internal consistency (α = 0.85), the use of Rasch analysis presumes a unidimensional construct but should be supported by a factor analysis to reinforce underlying factors examined.
The Clinical Teamwork Scale (CTS) was developed as an instrument for evaluating team effectiveness of providers of obstetric care.16 Researchers focused on general behavioral descriptors to promote generalizability beyond the obstetric posteducation context. Videos portraying three different levels of team performance were shown to evaluators to provide evidence to support validity of score interpretations. Evaluators, trained in CRM, were blinded to the performance level of each video recording. First, Pearson correlations between evaluators varied between 0.94 and 0.96. The intraclass correlation (ICC) was 0.98 (CI 0.97–0.99). The scenario demonstrating average performance reported the most variance (0.905) compared with high performance (0.214) and low performance (0.729). In addition, the largest source of variance was attributable to the item-times-rater interaction.16 These results suggest that we are making progress with evaluation of teamwork in education. Second, the results reinforce the fact that experts know what good teamwork looks like, but challenge them to create unambiguous indicators to facilitate objective differentiation between poor and average performance.
The most recently developed teamwork performance measure, the Team Emergency Assessment Measure (TEAM),17 published since the initiation of the research reported here, was created specifically to measure an emergency medical team’s response to resuscitation events. Although supported by face, content, and a degree of construct validity, the results from factor analysis suggest that this 12-item scale measures only one construct, teamwork, explaining a majority of the variance (80.27%). Cohen kappa measuring agreement between two raters on six resuscitation video recordings was 0.55, interpretable as only fair. The ICC 0.60 was only fair. Cronbach alpha was 0.89, reflecting high internal consistency between items, which one would expect, because the scale measured only one construct.
In short, although the introduction of IPE appears straightforward, there is an absence of psychometrically sound measures or instruments to assess the level of team performance with clear behavioral indicators, which demonstrates the challenges inherent in developing such an instrument. The purpose of this present study was to examine the psychometric characteristics of a newly developed assessment instrument, the KidSIM Team Performance Scale, used to evaluate team performance of undergraduate health professionals who had experienced a simulation-based interprofessional curriculum. The 12-item team-based performance scale was adapted from both the MHPTS15 and the CTS16 and includes a 5-point scale (with 5 representing optimal performance) anchored by behavioral descriptors to differentiate levels of achievement for each item.
We used a two-group, quasi-experimental research design to analyze the psychometric properties of the newly developed KidSIM Team Performance Scale.* The course and clerkship directors from three disciplines at three institutions were involved in the coordination and recruitment for this interprofessional team training initiative. All of the students were actively engaged in their pediatric or emergency clinical rotations; however, participation was consensual and voluntary. The study participants were
- undergraduate medical students (n = 35; 17.9% of the total number of student participants) from the University of Calgary Faculty of Medicine in their pediatric emergency clinical rotation (the third year of their three-year program),
- nursing students (n = 127; 64.8% of the total number of student participants) from the University of Calgary School of Nursing in their pediatric rotation (third year of a four-year program), and
- respiratory therapy students (n = 34; 17.3% of the total number of student participants) from the Southern Alberta Institute of Technology in their final clinical practicum (third year of program).
In all cases, each team consisted of a medical student, three to four nursing students, and, in all but three teams, a respiratory therapy student. To accommodate the medical students in their emergency rotations, 25% of the teams had two medical students. After receiving ethical approval from the conjoint health regional ethics board (CHREB) at the University of Calgary, the study started in September 2010 and was completed in April 2011. Throughout the 2010–2011 academic year, teams of five to six students from the participating disciplines were allocated to either a comparison group (total n = 95) or an intervention group (total n = 101) once a week, which occurred most Wednesday mornings.
The KidSIM Team Performance Scale is a compilation or modification of existing items of performance indicators drawn from preexisting research and scales used to evaluate participants’ team-based behaviors.15,16 Although the items developed for the scale were identified from the MHTPS15 and CTS,16 three of us (E.S., T.D., V.G.) developed all the behavioral indicators to adhere to the team performance learning objectives, and also developed behavioral indicators to reduce subjectivity from evaluator observations during the process of scoring team performance (see Appendix 1).18 Items 1, 3, 5, and 12 were taken verbatim from the MHPTS.15 Item 9 was taken verbatim from the CTS,16 and items 2, 4, 6, 7, 8, and 10 were modifications of items drawn from the MHPTS. We developed item 11 to examine situational awareness. The 12 items on the KidSIM Team Performance Scale were designed to assess team performance with respect to leadership, roles and responsibilities, communication, situational awareness, resource utilization, and patient-centered care. A five-point behavioral indicator scale was used to differentiate level of achievement with respect to each item.
Students in both the comparison and intervention groups participated in the simulation-based team training curriculum on acute illness management, which consisted of two scripted 20-minute scenarios each followed by 40 minutes of facilitated debriefing. Students from at least two of the three participating professional groups were assigned to teams of five to six members (e.g., one medical student, one to two respiratory therapy students, and to four nursing students) and were asked to manage an infant scenario (sepsis or seizure) as a team in the intensive care simulation setting and an older child scenario (asthma or anaphylaxis) in the emergency department simulation setting, both located at the Alberta Children’s Hospital. Curriculum objectives focused on the CRM concepts of leadership, roles and responsibilities, communication, situation awareness, and resource utilization. In addition, the medical learning objectives focused on basic medical management.
Led by experienced facilitators from nursing, medicine, and respiratory therapy, debriefing focused on student and facilitator observations of task work and teamwork which led to effective and ineffective team performance affecting the medical management of scenarios. Students were instructed to perform all diagnostic or therapeutic actions in real time and to verbalize diagnosis and treatment during scenarios to facilitate scoring accuracy. We used the Laerdal infant mannequin (Simbaby–Laerdal, Toronto, Ontario, Canada) and the METI child mannequin (Pediasim HPS, Sarasota, Florida) in respective scenarios to create a realistic depiction of CRM issues.
Students in the intervention group received an additional 30-minute formalized team training module prior to participation in the simulation-based team training curriculum. This module consisted of a short PowerPoint presentation to introduce teamwork concepts, a discussion, and a short video that role-modeled effective teamwork before the students began engaging in the simulation-based curriculum.
Two wide-angle cameras were used to digitally record team performances. This provided evaluators with four views: three views of the team caring for the patient (to secure views of task work by all team members) and one view of up-to-date vital signs of heart rate, oxygen saturation, respiratory rate, and blood pressure. Digital recordings were uploaded to a central server for retrospective scoring by six blinded evaluators from medicine, nursing, and respiratory therapy backgrounds, who all participated in a two-hour training session to familiarize themselves with the measurement instrument. On completion of the study, all digital recordings were erased as dictated by the requirements of the CHREB.
Descriptive statistics were performed using SPSS Version 19.0 (SPSS Inc., Chicago, Illinois) to derive the means and standard deviations for each questionnaire item, overall total scale scores, and subscale scores for both the comparison and intervention groups. Paired-sample t tests were used to examine the team performance scores between Time 1 and Time 2. Independent-sample t tests were used to examine team performance scores between the intervention (team training module) and comparison group. An exploratory factor analysis (principal components analysis) with varimax rotation was completed to assess the construct validity of the 12-item performance scale and the variance attributed to each of the underlying factors. The resulting eigenvalue-greater-than-one test was used to identify the number of resulting factors. Items loading over 0.32 (i.e., accounting for greater than 10% of the variance) were retained and grouped according to where they loaded the highest on an identified factor. An analysis of the internal reliability (Cronbach alpha = α) was completed for each of the resulting subscale factors and the total performance scale. Cohen d was used to examine mean effect size differences, where a “small” effect size is from 0.20 to 0.49, a “medium” effect size is from 0.50 to 0.79, and a “large” effect size is equal or greater to 0.80. The intraclass correlation (ICC) was also analyzed for Time 1 and Time 2 to examine rater agreement between scenarios.
Of the 196 participants in this study, 161 (82%) were female. A total of 182 students (93%) had reported previous team-based learning, with most identifying either a course-related (61; 33.5%) or work-related experience (75; 41.2%). A total of 131 students (73%) had no critical care experience; 19 students (9.7%) reported less than one month, and 37 students (18.9%) reported greater than one month. Only 3 students (1.6%) reported completion of an advanced life support course. Only 23 participants (11.7%) reported previous IPE.
The internal consistency of the KidSIM Team Performance Scale was α = 0.90. A total of 57 simulation-based team performances were evaluated by 1 of 6 examiners with an average of 4.9 examiners per scenario. The examiners were faculty and staff members from medicine (2 emergency medicine physicians), nursing (3 registered nurses), and 1 respiratory therapist. The interclass correlation for the 6 examiners was found to be ICC = 0.90 at Time 1 and 0.87 at Time 2. As shown in Table 1, mean (SD) scores for each of the 12 items at Time 1 were as low as 2.35 (SD = 1.31) for item 11, “The team leader provides summaries of patient status and needed interventions to the team,” and as high as M = 3.74 (SD = 1.10) for item 1, “A leader is clearly recognized by all team members.” Mean (SD) scores for scenario two ranged from a low of 2.89 (SD = 1.21) for item 11, to a high of M = 4.02 (SD = 0.95) for item 1 and M = 4.02 (SD = 0.92) for item 3. In a separate analysis of variance, it was found that regardless of the team composition there was no significant difference in performance on the KidSIM Team Performance Scale.
There was a significant relationship between teams receiving the supplemental curriculum and performance scores (Table 2). Intervention teams achieved overall higher scores at both Time 1 and Time 2; t = 4.85 (220), P < .001. Comparison teams, however, also experienced a significant improvement in score by Time 2 (P < .001). The intervention group did not experience a significant gain in score by Time 2. The effect size calculations show that the intervention group (i.e., those that received the formalized team training module) achieved higher mean scores at Time 1 (large effect size, d = 0.92) and 2 (medium, d = 0.61) in relation to the comparison group scores. During the second scenario (Time 2), team performance scores were shown to improve for both the intervention and comparison groups.
As shown in Table 3, an exploratory factor analysis of the KidSIM Team Performance Scale resulted in a three-factor solution accounting for 65.5% of the variance. These results indicate that we assessed subscales related to communication (6 items: 27.6% variance, α = 0.84), roles and responsibilities (5 items: 27.4% variance, α = 0.86), and patient-centered care (1 item, 10.5% variance). Further analysis of the factors identified exposed the relationship between time slot and quantity of curriculum, with the intervention group outperforming the comparison group with all three factors at Time 1 (P < .01), but only the factor of communication (P < .05) at Time 2.
Discussion and Conclusions
Educators today are challenged by the lack of standardized assessment measures with evidence to support reliable and valid score interpretation to evaluate both the learners and the curriculum. Furthermore, there is a scarcity of research in this area, especially in undergraduate medical education. Our findings contribute to this important body of knowledge by supporting the validity and reliability of team performance scores obtained using the KidSIM Team Performance Scale to assess teams of undergraduate health professionals. The scale was shown to evaluate two key dimensions of team performance: (1) roles and responsibilities, and (2) communication. Unfortunately, a third dimension (patient-centered care) that was identified only had a single item and thus needs to be either deleted or supported further with more similar items and then retested. The dimension roles and responsibilities reflected team organization, and the dimension communication reflected the active interdependence between team members trying to complete tasks. The two factors (i.e., dimensions) underlying team performance in our results are congruent with literature from team performance experts like Salas and colleagues,24 who describe team performance as differentiating between the active team process and each team member’s role. Urban and colleagues25 identify team structure and communication as the two concepts underscoring team performance. This information is important because in previously tested scales, team performance has been regarded as a unidimensional construct. Our evaluation instrument is the first scale to identify team performance as a two-dimensional construct consistent with previous interpretations by team performance experts.
Furthermore, the reliability coefficient for the KidSIM Team Performance Scale had higher internal consistency than did the other team-based measures identified. Additionally, the reliability for the communication and roles and responsibilities factors, or subscales, were much higher than reported in other instruments (α = 0.85 and 0.87, respectively). Together the two factors account for just over 50% of the variance, which was equally distributed between each factor. Only the TEAM performance instrument17 accounted for more variance (80.3%), but again failed to measure more than one dimension of team performance. Furthermore, our results provide evidence to support a degree of construct validity for the scores obtained using the KidSIM Team Performance Scale instrument. The relationships between the performance scores and time slot, and between performance scores and presence of a team training curriculum, were significant despite having evaluators blinded to both conditions. This suggests that evaluators from medicine, nursing, and respiratory therapy were able to use this instrument to discern between teams of learners who received a supplemental curriculum and teams who had more experience.
Mean scores for each item on the instrument ranged from a low of 2.35 (SD = 1.31) to a high of 4.02 (SD = 0.92), suggesting the adequacy of a one-to-five scale to evaluate team performance for this level of learner. Additionally, evaluators assigned lower scores for teamwork tasks that continue to challenge even postgraduates like the provision of situation summaries (item 11; see Table 1).26,27 Again this is suggestive of the instrument’s ability to discriminate between team process concepts that may reflect an advanced expectation for this level of learner. Nevertheless, those teams that received a formalized team training module were found to perform higher at Time 1 and 2, suggesting that there may be a benefit to using a multiblended approach when delivering team training initiatives. This also provides an indication of the construct validity of the KidSIM Team Performance Scale, as it is anticipated that those teams that were provided with the formalized team training module would have performed better on the scale.
There are several limitations to our study. Our results would have been strengthened with a randomized controlled research design; however, this was impossible because of the logistics of scheduling participants in three health professions. In particular, those teams that were allocated to the comparison group completed their clinical rotations before the end of the study and consequently did not receive the formalized team training module. The largest limitation resides in determining appropriate anchors to differentiate levels of performance using a five-point scale for each of the 12 items. In part, a pilot of the instrument might have assisted in the testing of the items and the appropriateness of the behavioral anchors. Many of the behavioral indicators refer to how well (on a five-point scale) the team members performed on any one of the specific behaviors. We based these numbers on performance on a 20-minute scenario. Unfortunately, there are no standards for setting such criteria, and as such the scale may need to be modified to accommodate an experienced interprofessional team and a longer scenario. The validity of using the instrument in different contexts requires further research to evaluate the construct validity of items on the instrument. Furthermore, generalizability of the findings is limited to an undergraduate context. Nevertheless, the team performance scale has the potential to be used in similar interprofessional settings where simulation is used to depict and deliver training in acute illness management.
In conclusion, we propose the use of the KidSIM Team Performance Scale as an assessment measure appropriate for providing reliable and valid information about interprofessional undergraduate teams’ ability to communicate and engage in appropriate roles and task work. We recognize the need for further research with more rigorous testing of scale items in different contexts to improve the trustworthiness of our findings and the generalizability of the scale. In addition, to enhance our understanding of the validity of the scale, research assessing multiple approaches to the delivery of team-based curricula is required. As educators, we need instruments such as this to further our knowledge of how best to assess team training curricula.
Other disclosures: None.
Ethical approval: This study was reviewed and received ethics approval from the conjoint health research ethics board at the University of Calgary.
Previous presentations: A poster summarizing these data was presented at the International Meeting for Simulation Healthcare, January 30, 2012, San Diego, California.
* KidSIM is the name of the pediatric patient simulation program that has been run at the Alberta Children’s Hospital since October 2005, and is simply the combination of the word “Kid” and the abbreviation “SIM,” for simulation.
1. Capella J, Smith S, Philp A, et al. Teamwork training improves the clinical care of trauma patients. J Surg Educ. 2010;67:439–443
2. Baker GR, Norton PG, Flintoft V, et al. The Canadian Adverse Events Study: The incidence of adverse events among hospital patients in Canada. CMAJ. 2004;170:1678–1686
3. Baker DP, Day R, Salas E. Teamwork as an essential component of high-reliability organizations. Health Serv Res. 2006;41(4 pt 2):1576–1598
4. Manser T. Teamwork and patient safety in dynamic domains of healthcare: A review of the literature. Acta Anaesthesiol Scand. 2009;53:143–151
5. Thomas AN, Panchagnula U, Taylor RJ. Review of patient safety incidents submitted from critical care units in England & Wales to the UK National Patient Safety Agency. Anaesthesia. 2009;64:1178–1185
6. Barr H. The WHO Framework for Action. J Interprof Care. 2010;24:475–478
8. Frank JR, Snell LS, Cate OT, et al. Competency-based medical education: Theory to practice. Med Teach. 2010;32:638–645
13. McGaghie WC, Issenberg SB, Petrusa ER, Scalese RJ. A critical review of simulation-based medical education research: 2003–2009. Med Educ. 2010;44:50–63
14. Fletcher G, Flin R, McGeorge P, Glavin R, Maran N, Patey R. Anaesthetists’ Non-Technical Skills (ANTS): Evaluation of a behavioural marker system. Br J Anaesth. 2003;90:580–588
15. Malec JF, Torsher LC, Dunn WF, et al. The Mayo high performance teamwork scale: Reliability and validity for evaluating key crew resource management skills. Simul Healthc. 2007;2:4–10
16. Guise JM, Deering SH, Kanki BG, et al. Validation of a tool to measure and promote clinical teamwork. Simul Healthc. 2008;3:217–223
17. Cooper S, Cant R, Porter J, et al. Rating medical emergency teamwork performance: Development of the Team Emergency Assessment Measure (TEAM). Resuscitation. 2010;81:446–452
18. Cain Smith P, Kendal L. Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. J Appl Psychol. 1963;47:149–155
19. Kanki B, Helmreich R, Anca J Crew Resource Management. 2010 London, UK Elsevier
20. Smith-Jentsch K, Johnston J, Payne S Measuring Team-Related Expertise in Complex Environment. 1998;Volume XXIII Washington, DC American Psychological Association
21. Morgan B, Glickman A, Woodard EA, Blaiwes AS, Salas E. Measurement of Team Behaviors in a Navy Training Environment. Technical Report No. NTSC TR-86-014. 1986 Orlando, Fla Naval Training Systems Center
22. Howard SK, Gaba DM, Fish KJ, Yang G, Sarnquist FH. Anesthesia crisis resource management training: Teaching anesthesiologists to handle critical incidents. Aviat Space Environ Med. 1992;63:763–770
23. Chakraborti C, Boonyasai RT, Wright SM, Kern DE. A systematic review of teamwork training interventions in medical student and resident education. J Gen Intern Med. 2008;23:846–853
24. Salas E, Cooke NJ, Rosen MA. On teams, teamwork, and team performance: Discoveries and developments. Hum Factors. 2008;50:540–547
25. Urban J, Bowers CA, Monday SD, Morgan B. Workload, team structure, and communicaion in team performance. Military Psych. 1995;7:123–139
26. Marsch SC, Müller C, Marquardt K, Conrad G, Tschan F, Hunziker PR. Human factors affect the quality of cardiopulmonary resuscitation in simulated cardiac arrests. Resuscitation. 2004;60:51–56
27. Andersen PO, Jensen MK, Lippert A, Østergaard D. Identifying non-technical skills and barriers for improvement of teamwork in cardiac arrest teams. Resuscitation. 2010;81:695–702