Secondary Logo

Share this article on:

Evaluation of a Measurement System to Assess ICU Team Performance*

Dietz, Aaron S., PhD1,2; Salas, Eduardo, PhD3; Pronovost, Peter J., MD, PhD, FCCM1,2; Jentsch, Florian, PhD4; Wyskiel, Rhonda, RN, MSN1; Mendez-Tellez, Pedro Alejandro, MD2; Dwyer, Cynthia, RN5; Rosen, Michael A., PhD1,2

doi: 10.1097/CCM.0000000000003431
Feature Articles
Editor's Choice

Objective: Measuring teamwork is essential in critical care, but limited observational measurement systems exist for this environment. The objective of this study was to evaluate the reliability and validity of a behavioral marker system for measuring teamwork in ICUs.

Design: Instances of teamwork were observed by two raters for three tasks: multidisciplinary rounds, nurse-to-nurse handoffs, and retrospective videos of medical students and instructors performing simulated codes. Intraclass correlation coefficients were calculated to assess interrater reliability. Generalizability theory was applied to estimate systematic sources of variance for the three observed team tasks that were associated with instances of teamwork, rater effects, competency effects, and task effects.

Setting: A 15-bed surgical ICU at a large academic hospital.

Subjects: One hundred thirty-eight instances of teamwork were observed. Specifically, we observed 88 multidisciplinary rounds, 25 nurse-to-nurse handoffs, and 25 simulated code exercises.

Interventions: No intervention was conducted for this study.

Measurements and Main Results: Rater reliability for each overall task ranged from good to excellent correlation (intraclass correlation coefficient, 0.64–0.81), although there were seven cases where reliability was fair and one case where it was poor for specific competencies. Findings from generalizability studies provided evidence that the marker system dependably distinguished among teamwork competencies, providing evidence of construct validity.

Conclusions: Teamwork in critical care is complex, thereby complicating the judgment of behaviors. The marker system exhibited great potential for differentiating competencies, but findings also revealed that more context specific guidance may be needed to improve rater reliability.

1The Armstrong Institute for Patient Safety and Quality, The Johns Hopkins University School of Medicine, Baltimore, MD.

2Department of Anesthesiology and Critical Care Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD.

3Department of Psychology, Rice University, Houston, TX.

4Department of Psychology, and Institute for Simulation & Training, University of Central Florida, Orlando, FL.

5Surgical Intensive Care Unit, Johns Hopkins Hospital, Baltimore, MD.

*See also p. 2045.

This work was performed at the Johns Hopkins University. Portions of the data collection and analyses that are reported were a part of Dr. Dietz’s dissertation work.

The views presented in this article are those of the authors and do not necessarily reflective of the Johns Hopkins University, Johns Hopkins Hospital, Rice University, the University of Central Florida, or the Gordon and Betty Moore Foundation.

Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website (http://journals.lww.com/ccmjournal).

Supported, in part, by grants from the Gordon and Betty Moore Foundation (grant number: 3186.01).

Dr. Dietz and Ms. Dwyer’s institutions received funding from the Gordon and Betty Moore Foundation. Dr. Mendez-Tellez received support for article research from the National Institutes of Health. Ms. Dwyer received support for article research from the Gordon and Betty Moore Foundation. Dr. Rosen’s institution received funding from the Gordon and Betty Moore Foundation (grant number: 3186.01), Agency for Healthcare Research and Quality, Centers for Disease Control and Prevention, and Jhpiego - Global Health Services, Treatment & Prevention; and he disclosed that he is a co-investigator on a project funded through the National Aeronautics and Space Administration. The remaining authors have disclosed that they do not have any potential conflicts of interest.

For information regarding this article, E-mail: mrosen44@jhmi.edu

Teamwork is a salient topic in clinical research and practice because healthcare is increasingly specialized and many hands care for one patient. Breakdowns in teamwork have linked to patient harm (1 , 2), whereas interventions to promote teamwork have improved safety and efficiency (1–4). In the ICU, care teams have significantly impacted the outcomes and experiences of patients (5), and poor teamwork has contributed to preventable harm and staff burnout (6 , 7). Given that teamwork competencies can be effectively trained (4 , 8) resulting in reductions in mortality (9), healthcare should prioritize ensuring clinicians have and practice teamwork skills. This requires training and competency assessment.

However, valid measures for evaluating team skills are needed to 1) advance teamwork science and our understanding of what impacts patient safety and quality improvement, and 2) evaluate performance to support life-long learning and career development. In aviation, where effective teamwork is critical for safety, behavioral marker systems are used to observe and measure a specific set of behaviors to assess performance (10 , 11). These types of tools have been developed for healthcare (12–15), but primarily focus on certain specialties (e.g., surgery) or specific contexts (e.g., responding to cardiac or trauma resuscitation) (16). Existing marker systems for ICU teams, for example, have only been validated for airway and cardiac emergency events (17) and multidisciplinary rounds (18 , 19). Yet in the ICU, team tasks and structures are complex (Appendix 1, Supplemental Digital Content 1, http://links.lww.com/CCM/D989) and raise unique challenges for assessment, such as understanding what competencies are most important for a given situation. Therefore, this article describes the evaluation of a behavioral marker system for assessing ICU team performance that can be applied across different tasks.

Back to Top | Article Outline

MATERIALS AND METHODS

In this prospective study, we observed three team-related tasks and rated team competencies using the behavioral marker system. The Institutional Review Board of the Johns Hopkins University School of Medicine approved this study and classified it exempt quality improvement work in which consent was not required.

Back to Top | Article Outline

Behavioral Marker System

We previously established the content validity of the marker system through formal systematic literature reviews of teamwork in critical care (20) and of existing behavioral marker systems (16), and interviews with ICU nurses and physicians to develop a theoretical framework to guide measurement and elicit exemplars of effective and ineffective teamwork (21). The marker system comprised four teamwork dimensions and 10 subdimensions: “communication” (style, content, closed loop), “leadership” (task management/delegation, team norms), “backup and supportive behavior” (offering/seeking backup, error correction and feedback), and “team decision making” (planning and establishing goals, contingency planning, updating and revising goals). The subdimensions were core competencies that emerged from our previous interviews with ICU clinicians (21). A five-point scale (poor, marginal, neutral/acceptable, good, and very effective) was used to score performance on these competencies. We provided definitions for the low-, midpoint-, and high-scale scores to guide measurement. For example, poor was defined as “Performance was expected, but not observed. Performance consistently demonstrated negative teamwork behaviors.” Example content for the communication and team decision making dimensions of the marker system is summarized in Tables 1 and 2, respectively. The full content and scoring tool is provided in Appendix 2 (Supplemental Digital Content 2, http://links.lww.com/CCM/D990).

TABLE 1

TABLE 1

TABLE 2

TABLE 2

Back to Top | Article Outline

Observation Procedures

We observed nurse-to-nurse morning handoffs and multidisciplinary rounds in a 15-bed surgical ICU at a large academic hospital in the mid-Atlantic United States. Simulated code videos from the same institution were also viewed. These tasks were chosen because they represented both action-oriented (direct patient care, simulated codes) and transition-oriented (planning/establishing goals, handoffs, and rounds) team care. The observability of certain team behaviors is contingent upon the task being observed (21), and the selection of these specific tasks ensured that the reliability and validity of each teamwork competency could be assessed. Furthermore, rounds and handoffs were selected because they were routinely conducted around the same time every day, making it logistically easier for observers to capture performance. Similarly, there are few easily observable team action tasks in the ICU and reviewing retrospective simulated codes provided an easily accessible way to observe an unpredictable and sensitive task.

Two raters (A.S.D., M.A.R) who helped develop the behavioral marker system and had expertise in behavioral measurement conducted all observations. Raters scored each team on subdimensions of teamwork germane to each task, which were informed by previous studies describing competencies expected for action-oriented and transition-oriented tasks (22 , 23), other global competencies (24), and input from clinician team members. Raters initially practiced using the system, observing six handoffs, eight multidisciplinary rounds, and 25 simulated codes, independently rated performance on each, and shared their ratings and rationales. Data collected from these practice observations were not included in the final scoring of our analysis. Practice observations also afforded an opportunity to confirm which teamwork competencies manifested for a given task. For example, we found that although a general plan of care was established during multidisciplinary rounds, the explicit delineation of roles and responsibilities to achieve those goals occurred immediately after rounds.

Raters observed the same 138 instances of teamwork (25 nurse-to-nurse morning handoffs, 88 multidisciplinary rounds, and 25 simulated code exercises) and independently rated performance on the subdimension competencies for each task (illustrated in Appendix 3, Supplemental Digital Content 3, http://links.lww.com/CCM/D991). An instance of teamwork was defined by the interaction among team members for a specific patient for a given task. For example, patient rounds may involve an attending physician, fellow, resident, and a nurse assigned to a patient as well as ancillary staff such as pharmacists, respiratory therapists, and physical therapists. Because the compositional make up of a team can vary across each of these tasks for each patient, one team can only be observed for one task.

Back to Top | Article Outline

Behavioral Marker System Reliability and Validity

We assessed reliability of the behavioral marker rating system in two ways. Interrater reliability was assessed to compare the consistency of scoring between raters (25). Intraclass correlation coefficients (ICCs) were calculated for each task overall and for each subdimension, and both the single measure, comparing agreement of raters on a single score, and an averaged measure of the two scores were reported. An ICC less than 0.40 indicated poor reliability between raters, 0.40–0.59 fair reliability, 0.60–0.74 good reliability, and greater than or equal to 0.75 excellent reliability (26). SPSS v.21 (IBM, Armonk, NY) was used for this analysis. A two-way random-effects model with absolute agreement was used to provide a conservative ICC estimate (27).

Generalizability theory was applied to further examine reliability and provide evidence of construct validity. We chose generalizability theory because it leverages an analysis of variance for behavioral measurement data. It can partition multiple sources of systematic variance (both desirable and undesirable), allowing for a better evaluation of the dependability of a measurement instrument. Traditional reliability testing cannot make this differentiation and treats all measurement error as random variation. Because generalizability theory assesses both desired and undesired variance, it represents a powerful methodologic approach for providing evidence of reliability and validity (28–30).

Generalizability theory has been previously applied in healthcare to validate measurement instruments (17 , 29 , 31). It is leveraged in the present study to estimate the variance in scores attributable to instances of teamwork observed, subdimensions, raters, tasks, and associated interactions (illustrated in Appendix 2, Supplemental Digital Content 3, http://links.lww.com/CCM/D991). A valid measurement system should demonstrate systematic differences in how subdimensions are scored across teams. Therefore, the following pattern of results is expected. The percent of variance associated with the subdimension by instance of teamwork interaction should be the greatest source of variance, followed by the main effects of subdimensions and instances of teamwork. The percent of variance associated with rater effects and task effects should be minimal because large values would represent inconsistent scoring across raters and tasks.

In the present study, analyses were performed using EduG v.6.1 (32), which enumerated each source of variance and calculated a generalizability coefficient. Generalizability coefficients estimated the amount of variance in observed scores attributable to desired sources of variance (e.g., differentiating performance and competencies) compared with undesired/unexpected variance (e.g., rater effects); higher coefficients indicate better measurement systems (33). Relative generalizability coefficients measured the extent to which the marker system could make comparative distinctions (team A performed better than team B), whereas absolute coefficients measured exact differences (team A averaged a 4.2 across subdimensions and team B averaged a 3.6 across subdimensions). A coefficient above 0.80 was used as our cutoff score (32) for acceptable dependability of the marker system, and the percent of variance in scoring attributes is reported for all analyses.

We applied generalizability studies to examine the three tasks for four primary sources of systematic variance: instances of teamwork, rater effects, subdimension effects of the marker system, and task effects. A secondary analysis explored systematic effects associated with the attending physician leading rounds. A power analysis was not appropriate because generalizability studies are not based on hypothesis testing (34). However, the sample sizes reported for each analysis were consistent with previous research (17 , 30 , 35).

All measurement sources were treated as random for analysis to provide a more conservative account of study findings. We used a mixed design to estimate all possible variance in the dataset. Instances of teamwork were nested within the observed tasks (handoffs, rounds, simulated codes) because team composition varied each time a task was performed in the ICU. The nested design prevented estimation of some variance components because of confounding (e.g., the instance of teamwork and task interaction was indistinguishable from the main effect of the task) (36). Additionally, we sought to examine any change in variance when the data for each task were combined compared to when it was analyzed separately. Therefore, six separate generalizability studies were conducted to analyze the patterns of results.

Back to Top | Article Outline

Transition Team Tasks

Our analysis of transition tasks included four sources of variance: instances of teamwork nested in tasks, raters, and six subdimensions (from communication and team decision-making dimensions) relevant to both rounds and handoffs (“analysis 1”). Instances of teamwork for rounds were randomly sampled from a single attending for this analysis to avoid introducing potential confounding associated with leadership effects. We then conducted separate analyses for handoffs (“analysis 2”) and rounds data (“analysis 3”) to analyze for differences when the data for these two tasks were investigated separately. Next, we analyzed all rounding data to explore potential leader effects during rounds (“analysis 4”); 16 instances of teamwork were randomly sampled per leader (n = 4). The variance sources examined were instances of teamwork nested within the attending physicians leading rounds, raters, and subdimensions. In addition to the six subdimensions explored in analyses 1–3, we assessed team norms, and error correction and feedback.

Back to Top | Article Outline

Action Team Task

“Analysis 5” was a generalizability study for the action team task (codes) and included three sources of variance: instances of teamwork, raters, and subdimensions (style, content, and closed-loop communication), task management and delegation, and offering/seeking support.

Back to Top | Article Outline

Global Teamwork Competency

Every subdimension of communication manifested during each task. Therefore, we examined variance associated with this subdimension across each task (“analysis 6”). For this analysis, sources included instances of teamwork, raters, and subdimensions.

Back to Top | Article Outline

RESULTS

Interrater Reliability

Rater scores for each overall task had good correlation for single score comparisons (ICC, 0.69 for rounds; ICC, 0.64 for handoffs; and ICC, 0.62 for simulated codes) and excellent correlation for averaged measures (ICC, 0.81 for rounds; ICC, 0.78 for handoffs; and ICC, 0.76 for simulated codes) (Table 3). Across all tasks, there were seven subdimensions indexed as fair and one as poor. Interrater reliability for communication style showed excellent correlation for rounds (ICC, 0.75 single measure; and ICC, 0.86 average measure) and poor-to-fair correlation for simulated codes (ICC, 0.38 single measure; and ICC, 0.55 average measure).

TABLE 3

TABLE 3

Back to Top | Article Outline

Validity Testing of Behavioral Marker System

The analysis of variance for each generalizability study is summarized in Appendix 2 (Supplemental Digital Content 3, http://links.lww.com/CCM/D991). Outside of the residual error term (unexplained variance), the interaction of subdimensions and instances of teamwork accounted for the largest proportion of variance for each analysis, meaning subdimensions were differentially scored by raters for each instance of teamwork. The subdimension and instance main effect generally accounted for the second largest source of variance. A notable exception was simulated codes (analysis 5), wherein the subdimension main effect only accounted for 5.8% of the total variance.

Variance due to overall rater effects never surpassed 0.7%, demonstrating minimal systematic differences in the rater’s scores for instances of teamwork, subdimension competencies, and tasks (Appendix 2, Supplemental Digital Content 3, http://links.lww.com/CCM/D991). A relatively large variance (14%) was attributed to one rater systematically scoring some instances of teamwork (averaged over subdimensions) higher than the other rater for simulated codes (analysis 5).

Table 4 presents the generalizability coefficients. The marker system differentiated among subdimensions regardless of rater, task, or instances of teamwork that were observed across all analyses except simulated codes, which approached conventional criteria (0.76 relative, 0.66 absolute). When instances of teamwork were viewed as the only desired source of variance (i.e., were there overall differences in how teams performed regardless of subdimensions?), only analysis 4 approached conventional standards.

TABLE 4

TABLE 4

Back to Top | Article Outline

DISCUSSION

This study evaluated the reliability and validity of a behavioral marker system for assessing ICU team performance during multidisciplinary rounds, nurse-to-nurse handoffs, and simulated code events. We found the behavioral marker system to: 1) differentiate teamwork competencies and 2) reliably capture teamwork differences within a particular instance of teamwork. This means that raters were judging performance differently based on the competency they were observing during each instance of teamwork and that each competency represented a unique aspect of teamwork. This finding justifies the use of this tool to capture how a specific team is performing across a variety of competencies relevant to both action and transition tasks. This tool could be used in learning and development to determine where some teams are performing better than others. Furthermore, variance attributable to rater effects across all analyses was marginal.

The marker system did not reliably differentiate between high- and low-performing teams for handoffs and codes unless competencies were applied as a desired source of variation in the generalizability study analysis. Thus, the marker system will have greater utility for providing formative, rather than completely summative, evaluations or assessments.

Additionally, we found that about 30% of variance in each analysis was from residual (unexplained) error. The unexplained error could have stemmed from such factors as experience of team members, patient complexity, and task interruptions. For instance, complex patients generally require more resource and contingency planning, which could influence team behaviors. Future research would benefit from understanding the implications of these factors on the reliability of behavioral measurement.

Although interrater reliability ranged from good to excellent overall, there were seven times where reliability was fair and one case where it was poor. Low reliability values reduce the confidence that raters are consistently scoring the same attribute. They also may underscore the challenges associated with behavioral measurement. Teamwork in the ICU is complex, thereby complicating the rating of teamwork behaviors. To illustrate, a single statement from a clinician could involve behaviors related to updating and revising goals (e.g., the patient did respond to a certain treatment), planning and establishing goals (e.g., consults with outside services and/or additional tests are suggested), and contingency planning (e.g., there are no signs of active bleeding, but that is a situation in need of monitoring). Capturing all this information is a difficult undertaking for raters, especially when the behaviors occur in rapid succession or when there are more members on the team for raters to pay attention to. Costa et al (18), for example, found rating team behaviors in an ICU challenging for similar reasons.

Another key finding was that the observability of specific teamwork competencies varied across team performance contexts. Communication was consistently observed across tasks, but team decision making was mostly observed in transition tasks and back up and supportive behavior in action tasks. We expected leadership to be globally relevant to both action- and transition-oriented team tasks, but leadership behaviors did not manifest during nurse-to-nurse handoffs.

There were limitations to our study. The marker system was used for both in-person and video-recorded observations, and these differences may have influenced our findings. However, code events are unplanned and infeasible to capture in real-time, but a critical period when effective team performance is paramount for patient safety. Biases intrinsic to observational research may have influenced study findings. For instance, direct observation of clinician behavior may have altered that behavior. Raters may have been susceptible to the contrast effect (i.e., comparing a current instance to the previous instance for valuations rather than relying on the behavioral markers) (37). Logistical challenges inherent with care transitions and codes (e.g., opportunities/ability to observe instances of teamwork) limited our ability to have consistent sample sizes across tasks. Additionally, we only tested the marker system in one surgical ICU and our results may not generalize to other types of ICUs or hospitals. Finally, definitions of teams, teamwork, and team competencies vary widely in the Critical Care literature (16) and healthcare more broadly (38). This diversity of terminology and conceptualizations limits the development of assessment tools that can be broadly implemented.

Back to Top | Article Outline

CONCLUSIONS

Teamwork skills are essential to provide safe and efficient care in the ICU. Measuring teamwork in critical care environments poses unique challenges, including highly diverse and dynamic team compositions, variability in physical and temporal distributions, and extreme variety in types of team tasks, ranging from highly cognitive and analytical tasks requiring collaborative problem solving to action-oriented procedural and physical tasks requiring behavioral coordination. Our findings support the validity of this tool and its utility for evaluating team performance for multiple task types.

Back to Top | Article Outline

ACKNOWLEDGMENTS

We thank Chris Holzmueller (Armstrong Institute for Patient Safety and Quality, Johns Hopkins University School of Medicine) for her insightful feedback and contributions during the review process.

Back to Top | Article Outline

REFERENCES

1. Pham JC, Aswani MS, Rosen M, et al. Reducing medical errors and adverse events. Annu Rev Med 2012; 63:447–463
2. Schmutz J, Manser T. Do team processes really have an effect on clinical performance? A systematic literature review. Br J Anaesth 2013; 110:529–544
3. Weaver SJ, Dy SM, Rosen MA. Team-training in healthcare: A narrative synthesis of the literature. BMJ Qual Saf 2014; 23:359–372
4. Hughes AM, Gregory ME, Joseph DL, et al. Saving lives: A meta-analysis of team training in healthcare. J Appl Psychol 2016; 101:1266–1304
5. Reader TW, Flin R, Mearns K, et al. Developing a team performance framework for the intensive care unit. Crit Care Med 2009; 37:1787–1793
6. Profit J, Sharek PJ, Amspoker AB, et al. Burnout in the NICU setting and its relation to safety culture. BMJ Qual Saf 2014; 23:806–813
7. Pronovost PJ, Thompson DA, Holzmueller CG, et al. Toward learning from patient safety reporting systems. J Crit Care 2006; 21:305–315
8. Salas E, DiazGranados D, Klein C, et al. Does team training improve team performance? A meta-analysis. Hum Factors 2008; 50:903–933
9. Neily J, Mills PD, Young-Xu Y, et al. Association between implementation of a medical team training program and surgical mortality. JAMA 2010; 304:1693–1700
10. Federal Aviation Administration, U.S. Department of Tranportation: Advisory Circular AC 120-51E: Crew Resource Management Training. 2004. Available at: https://www.faa.gov/regulations_policies/advisory_circulars/index.cfm/go/document.information/documentID/22879. Accessed January 10, 2018
11. Flin R, Martin L. Behavioral markers for crew resource management: A review of current practice. Int J Aviat Psychol 2001; 11:95–118
12. Russ S, Hull L, Rout S, et al. Observational teamwork assessment for surgery: Feasibility of clinical and nonclinical assessor calibration with short-term training. Ann Surg 2012; 255:804–809
13. Fletcher G, Flin R, McGeorge P, et al. Anaesthetists’ Non-Technical Skills (ANTS): Evaluation of a behavioural marker system. Br J Anaesth 2003; 90:580–588
14. Sevdalis N, Lyons M, Healey AN, et al. Observational teamwork assessment for surgery: Construct validation with expert versus novice raters. Ann Surg 2009; 249:1047–1051
15. Mitchell L, Flin R, Yule S, et al. Evaluation of the scrub practitioners’ list of intraoperative non-technical skills system. Int J Nurs Stud 2012; 49:201–211
16. Dietz AS, Pronovost PJ, Benson KN, et al. A systematic review of behavioural marker systems in healthcare: What do we know about their attributes, validity and application? BMJ Qual Saf 2014; 23:1031–1039
17. Weller J, Frengley R, Torrie J, et al. Evaluation of an instrument to measure teamwork in multidisciplinary critical care teams. BMJ Qual Saf 2011; 20:216–222
18. Costa DK, Dammeyer J, White M, et al. Interprofessional team interactions about complex care in the ICU: Pilot development of an observational rating tool. BMC Res Notes 2016; 9:408
19. O’Leary KJ, Boudreau YN, Creden AJ, et al. Assessment of teamwork during structured interdisciplinary rounds on medical units. J Hosp Med 2012; 7:679–683
20. Dietz AS, Pronovost PJ, Mendez-Tellez PA, et al. A systematic review of teamwork in the intensive care unit: What do we know about teamwork, team tasks, and improvement strategies? J Crit Care 2014; 29:908–914
21. Dietz AS, Rosen MA, Wyskiel R, et al. Development of a behavioral marker system to assess intensive care unit team performance. Proc Hum Factors Ergon Soc Annu Meet 2015; 59:991–995
22. Marks MA, Mathieu JE, Zaccaro SJ. A temporally based framework and taxonomy of team processes. Acad Manag Rev 2001; 26:356–376
23. LePine JA, Piccolo RF, Jackson CL, et al. A meta-analysis of teamwork processes: Tests of a multidimensional model and relationships with team effectiveness criteria. Pers Psychol 2008; 61:273–307
24. Cannon-Bowers JA, Tannenbaum SI, Salas E, et al. Defining competencies and establishing team training requirements. In: Team Effectiveness and Decision Making in Organizations. 1995, pp Salas E, Guzzo RA (Eds). San Francisco, CA, Jossey-Bass, 333–380.
25. Tinsley HE, Weiss DJ. Interrater reliability and agreement of subjective judgments. J Couns Psychol 1975; 22:358–376
26. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 1994; 6:284–290
27. Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull 1979; 86:420–428
28. Brennan RL. Generalizability Theory. 2001New York, NY, Springer.
29. Crossley J, Marriott J, Purdie H, et al. Prospective observational study to evaluate NOTSS (Non-Technical Skills for Surgeons) for assessing trainees’ non-technical performance in the operating theatre. Br J Surg 2011; 98:1010–1020
30. Kraiger K, Teachout M. Generalizability theory as construct-related evidence of the validity of job performance ratings. Hum Perform 1990; 3:19–35
31. Moonen-van Loon JM, Overeem K, Govaerts MJ, et al. The reliability of multisource feedback in competency-based assessment programs: The effects of multiple occasions and assessor groups. Acad Med 2015; 90:1093–1099
32. Cardinet J, Johnson S, Pini G. Applying Generalizability Theory Using EduG. 2011New York, NY, Rutledge.
33. Cardinet J, Tourneur W, Allal L. The symmetry of generalizability theory: Applications to educational measurement. J Educ Meas 1976; 13:119–135
34. Crossley J, Russell J, Jolly B, et al. ‘I’m pickin’ up good regressions’: The governance of generalisability analyses. Med Educ 2007; 41:926–934
35. Mathieu JE, Day DV. Brannick MT, Salas E, Prince CW. Assessing processes within and between organizational teams: A nuclear power plant example. In: Team Performance Assessment and Measurement: Theory, Methods, and Applications. 1997, pp Mahwah, NJ, Lawrence Erlbaum Associates, 173–195
36. Shavelson RJ, Webb NM. Generalizability Theory: A Primer. Meas Methods Soc Sci Ser. Vol. 1991:1. Thousand Oaks, CA, Sage Publications, . Available at: http://search.proquest.com/docview/618053236?accountid=14521. Accessed January 10, 2018
37. Feldman M, Lazzara EH, Vanderbilt AA, et al. Rater training to support high-stakes simulation-based assessments. J Contin Educ Health Prof 2012; 32:279–286
38. Rosen MA, DiazGranados D, Dietz AS, et al. Teamwork in healthcare: Key discoveries enabling safer, high-quality care. Am Psychol 2018; 73:433–450
Keywords:

group processes; intensive care unit; interdisciplinary communication; patient safety; quality improvement; teamwork

Supplemental Digital Content

Back to Top | Article Outline
Copyright © by 2018 by the Society of Critical Care Medicine and Wolters Kluwer Health, Inc. All Rights Reserved.