Debriefing Assessment for Simulation in Healthcare: Development and Psychometric Properties : Simulation in Healthcare

Journal Logo

Empirical Investigations

Debriefing Assessment for Simulation in Healthcare

Development and Psychometric Properties

Brett-Fleegler, Marisa MD; Rudolph, Jenny PhD; Eppich, Walter MD, MEd; Monuteaux, Michael ScD; Fleegler, Eric MD, MPH; Cheng, Adam MD; Simon, Robert EdD

Author Information
Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare 7(5):p 288-294, October 2012. | DOI: 10.1097/SIH.0b013e3182620228



This study examined the reliability of the scores of an assessment instrument, the Debriefing Assessment for Simulation in Healthcare (DASH), in evaluating the quality of health care simulation debriefings. The secondary objective was to evaluate whether the instrument’s scores demonstrate evidence of validity.


Two aspects of reliability were examined, interrater reliability and internal consistency. To assess interrater reliability, intraclass correlations were calculated for 114 simulation instructors enrolled in webinar training courses in the use of the DASH. The instructors reviewed a series of 3 standardized debriefing sessions. To assess internal consistency, Cronbach α was calculated for this cohort. Finally, 1 measure of validity was examined by comparing the scores across 3 debriefings of different quality.


Intraclass correlation coefficients for the individual elements were predominantly greater than 0.6. The overall intraclass correlation coefficient for the combined elements was 0.74. Cronbach α was 0.89 across the webinar raters. There were statistically significant differences among the ratings for the 3 standardized debriefings (P < 0.001).


The DASH scores showed evidence of good reliability and preliminary evidence of validity. Additional work will be needed to assess the generalizability of the DASH based on the psychometrics of DASH data from other settings.

Changes in graduate and postgraduate health care education over the past 2 decades bear witness to a paradigm shift toward competency-based medical education and the requisite accompanying expansion of formative and summative assessment processes and tools.1,2 Simultaneously, there has been exponential growth of simulation in health care education and research.3–7 Simulation offers tremendous advantages to health care educators, including the opportunity to practice managing critical but infrequent events and the chance to practice procedures in a safe environment. Training programs around the world increasingly rely on simulation to prepare and assess clinical learners.8–16 Whether for just-in-time practice for difficult cases at the point of care17,18 or for communication and teamwork-related training,19,20 simulation has tremendous support as evidenced by its widespread and expanding use. Beyond its uses in undergraduate and graduate training, simulation can be used to assess educational needs at the established practitioner level and to provide continuing health care education.21 The convergence of this educational shift and the expansion of health care simulation provide the opportunity to use simulation in support of competency-based education. A crucial ingredient when using simulation for technical or behavioral and teamwork skills is debriefing.22–24

Debriefing is a facilitated conversation after such things as critical events and simulations in which participants analyze their actions, thought processes, emotional states, and other information to improve performance in future situations.25 Debriefing embodies 3 important aspects of the experiential nature of adult learning: reflection, feedback, and future experimentation.26,27 Reflecting on one’s own clinical or professional practice is a crucial step in the experiential learning process23 because it helps learners develop and integrate insights from direct experience into later action.26,28 Effective debriefing requires clear, honest feedback in the context of a psychologically safe environment.25,29,30

Given the expansion of simulation-based assessment and the pivotal role of debriefing, a tool that yields reliable data that support valid judgments of an instructor’s debriefing competence has the potential to facilitate instructor training and evaluation. Although there are several tools to assess debriefings in specialized settings, to date, there is no standardized instrument to assess debriefings in a wide variety of health care simulation contexts. A recent literature review yielded 5 instruments that measure the quality of debriefings in specialized contexts, 3 from health care simulation along with 2 instruments from other fields. The 2 instruments from related domains are the Debriefing Assessment Battery, developed by Dismukes et al31 for use in aviation, and a communication skills trainer/facilitator assessment system developed by Bylund.32 Although these 2 tools contained some debriefing themes pertinent to health care education, the differences in context limit their use and generalizability to debriefing in health care. In health care, Gururaja et al33 developed a 25-item observation-based assessment instrument for in situ simulations for operating room teams; Reed34 developed a rating scale for learners in nursing simulations to assess their subjective experience of the debriefing. Arora et al35 developed the Objective Structured Assessment of Debriefing to assess surgery simulation debriefings. These tools, although potentially valuable additions to the health care educator’s debriefing assessment toolbox, are not designed to be applicable to debriefings across the health care education spectrum.

The development of the Debriefing Assessment for Simulation in Healthcare (DASH) tool is intended to address the need for a debriefing assessment tool based on a behaviorally anchored rating scale (BARS) that has the potential to provide valid and reliable data for use in a wide variety of settings in simulation-related health care education.

This article describes the development and then reviews the psychometric properties of the DASH. The primary objective was to address 2 aspects of reliability: interrater reliability and internal consistency. The secondary objective was to examine the DASH scores for evidence of validity. Interrater reliability refers to consistency in ratings among different raters. Another aspect of reliability, internal consistency, is a statistic that indicates the extent to which the items of a test measure the same trait, knowledge, or ability. Validity “refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests.”36 The DASH’s validity will be supported by (1) detailing the development process and the origins of its content and (2) showing data that demonstrate the DASH’s ability to discriminate between varying levels of debriefing performance in an expected manner.


DASH Structure

The DASH37 is a 6-element, unweighted, criterion-referenced behaviorally anchored rating scale (BARS; Table 1). Six elements comprising a debriefing are defined, and raters are asked to compare observed performance to the defined elements. This approach is the basis of criterion-referenced testing where scores are given in relation to a well-defined behavioral domain, versus norm-referenced testing, where scores are meant to compare 1 examinee to another. Element ratings are based on a 7-point effectiveness scale (Table 2). The rater handbook and score sheet are available here (see Document, Supplemental Digital Content 1, DASH handbook 2010,, and Document, Supplemental Digital Content 2, DASH short score sheet, Each element in the DASH is a concept that describes a whole area of debriefing behavior, for example, “provokes engaging discussions.” Each element is elaborated via dimensions that reflect a part of the concept, for example, “facilitates discussion through verbal and nonverbal techniques.” Observable examples of positive and negative behaviors are provided for each dimension, for example, “paraphrasing or verbally mirroring what trainees say.” These observable behaviors are the behavioral anchors of the DASH. Because of the complexity of debriefing and the rigor expected of DASH raters, training is needed before application of the DASH.

DASH Elements and Dimensions
Rating Scale

DASH Content

The DASH aims to assess those instructor behaviors that evidence and theory indicate facilitate learning and change in experimental contexts. Typically, BARS content is elicited from domain experts.38–40 In the absence of a theoretical or empirical consensus regarding the optimal behaviors for health care simulation debriefings, the DASH was constructed on the premise that research findings and theory from related domains logically transfer to debriefing and could be used to augment BARS content from the traditional approach. Specifically, the DASH synthesizes findings from aviation debriefing; clinical learning and teaching; formative assessment; adult, experiential, and organizational learning; deliberate practice; and the cognitive, emotional, and behavioral bases for mobilizing change in adults.26,31,41–47 For the conventional BARS development approach of eliciting task-related behaviors and categories from domain experts, the developers drew on their domain expertise. Collectively, they have conducted more than 5000 debriefings, and through their simulation instructor training activities, they have observed and provided feedback on more than 2500 debriefings by instructors with a broad range of debriefing styles and skill levels from Asia, Oceania, North America, Europe, and Central and South America.

Using the content outlined previously, the DASH elements were identified and refined in an iterative process known as theory elaboration48–50 in which the test developers worked back and forth between high-level constructs suggested by the literature, their own experience, and semistructured interviews with established debriefing instructor trainers from other simulation centers in North America, Europe, and Australia. The DASH developers thereby identified a set of activities generally accepted as best practices for effective and ineffective debriefing from a broad range of fields and debriefing styles, pertinent to the guiding DASH design principle that it should be applicable to the assessment of a wide variety of universal debriefing behaviors and not linked to any particular debriefing style. Thus, the DASH is designed to assess debriefing quality in a variety of simulation environments, across health care disciplines and educational objectives.

Certain points of the rater training are worth mentioning. The 6 elements were constructed to make them as distinct from one another as possible. Given the nature of describing behavior, it is understandable that raters perceive that the elements have some overlap. They are instructed to ignore this perceived overlap and to rate each element independently of the others. Raters are taught to give element scores, but the scores are not an average of the dimensions. No explicit weighting is given to the dimensions; raters are taught to make those judgments themselves. The dimensions and behavioral examples are intended to provide guidelines and examples but are not intended as checklist items; they are integrated into a global rating at the element level by the rater. This approach is consistent with research supporting the use of global rather than checklist ratings for the evaluation of complex behaviors.51–53

Piloting and User Review

After a working version of the DASH was constructed using psychometric and instructional design methods,54,55 it was reviewed for content and usability. Eight simulation experts from 5 different pediatric tertiary care academic medical centers in the United States and Canada participated in a 2-day in-person intensive review session in October 2008. All 8 experts were from centers participating in the Examining Pediatric Resuscitation Education using Simulation and Scripting (EXPRESS) project,5 a multicenter study examining the impact of scripted debriefing and level of simulation realism on Pediatric Advanced Life Support educational outcomes after simulation sessions. All are practicing physicians in pediatric emergency medicine, critical care, or anesthesia with a minimum of 5 years of experience in simulation and debriefing. For the first round of feedback and revision, each investigator studied the draft rater’s handbook and then discussed each element, posing clarifying questions and suggesting edits to make the language clearer. In the next round, 2 demonstration videos and 2 EXPRESS debriefing videos were reviewed and scored by all 8 experts. Modifications were again made to the DASH based on feedback obtained during this process. The feedback led to a refinement of element titles, reassignment of some dimensions to other elements and the addition of a new dimension regarding the demonstrated content expertise of the debriefer. In addition, this round resulted in more concrete and precise descriptions of behavioral anchors, refinement of the layout of the DASH rater’s handbook for ease of use, and refinement of the layout of the score sheet, condensing it to 2 pages. For the last round, a teleconference format was used, which led to minor final revisions of the language of the elements, dimensions, and behaviors to better reflect terminology familiar to clinician educators.

Psychometric Assessment

One hundred fifty-one international health care educators participated in 4.5-hour Web-based interactive DASH rater training sessions. Anonymous IP addresses were used to identify each participant. However, because of shared computer networks at some institutions, not all participants were uniquely identifiable, and therefore, some data were excluded from analysis. Therefore, ratings from a total of 114 rater trainees were analyzed from 2 separate training sessions. The ratings from these sessions were analyzed to assess for reliability and validity. This research was reviewed by the Partners Healthcare Human Research Ethics Committee and determined to be exempt.

The participants included nurses, physicians, other health professionals, and Masters and PhD educators; their work environments ranged from community-based hospitals to academic medical centers. A training session consisted of 4 steps. First, the rater trainees were asked to thoroughly familiarize themselves with the DASH rater’s handbook before the Web-based session; specifically, they were asked to study the 6 elements and develop a working knowledge of the dimensions in each element. At the beginning of the session, the trainers provided a brief didactic summary of each DASH element with highlights of each dimension. Next, the trainers described and illustrated best practices and common pitfalls for rating in general and for the DASH in particular. Finally, in 3 consecutive rounds, the rater trainees watched, rated, and then discussed 3 separate course introductions and subsequent debriefings.

The introductions and debriefings comprised 3 scripted videos that were produced for rater training to exemplify superior, average, and poor debriefings. The debriefings were conducted with 3 different groups of learners who had managed a clinical simulation involving pulseless electrical activity due to pneumothorax. The clinical and behavioral objectives of the case included (1) identification and management of pneumothorax, (2) establishing roles clearly, and (3) team leadership with a focus on stating an action plan. The 3 debriefings used archetypes for superior, average, and poor debriefings suggested by practical experience and the literature related to debriefing, particularly research on the role of psychologic safety, feedback, and reflection in learning. To develop the 3 debriefings, criteria for the quality of feedback conversations described in the organizational behavior, productive conversations, and feedback literature were used: whether the debriefer provides clear and actionable information about the performance of the learners, to what degree the debriefer created a psychologically safe learning environment that allows for specific feedback for key behaviors, and the degree to which the debriefer followed understandable phases of a debriefing.41,46,56–58 The rater trainees had no prior knowledge of any aspect of the debriefings they viewed, including these archetypes.

For each of the 3 rounds of ratings, scores were compiled and posted online in real time. The instructors then led a group discussion on participant ratings to provide reinforcement, corrections, and adjustments. Trainers elicited the participants’ rationales for their ratings and helped calibrate trainees’ assessments to the elements of the DASH. The guiding principle was to rate “through the eyes of the DASH” so as to help participants compare and adjust their ratings to the criteria set forth in the DASH, as the optimal means to obtain rater convergence and interrater reliability.59

Assessment of Reliability and Validity: Statistical Analysis

The following statistical analyses, which use parametric inference, relied on the assumption that the variables of interest were normally distributed. Assumption of normality was considered to be reasonable given the robustness of the employed tests to deviations from normality, visual inspection of the data, and scrutiny of descriptive statistics (ie, skewness, kertosis).

Interrater reliability was assessed for the 114 webinar raters’ scores at the element level and for the overall mean of the 6 elements. Variance component analysis was used to calculate intraclass correlation coefficients (ICCs), which represent the ratio of rater variance to the sum of rater variance and the total variance.

To assess the internal consistency of the tool, Cronbach α was calculated using the same webinar data set for the “average” video. This video was selected because it was considered the most difficult to rate because it did not represent an extreme of performance but blended effective and ineffective behaviors. In addition, this debriefing was rated when the raters had received the most training, at the end of the webinar.

To assess 1 aspect of the validity of the DASH, the mean scores for each of the videos across the 114 webinar rater trainees were calculated and compared by means of a 1-way repeated-measures analysis of variance.

All statistical analyses were performed using STATA version 12.0.


The ICCs for the each element and for the instrument overall are reported in Table 3. Notably, the ICCs are nearly all more than 0.6 for the individual elements. The ICC for element 5 is just less than this at 0.57. The ICC for the DASH overall was 0.74. Cronbach α for the average video was 0.89 across the entire webinar rater data set.

Intraclass Correlation Coefficients

The ratings for the superior, average, and poor videos are shown in Figure 1. The differences among the ratings of the 3 standardized debriefings were statistically significant (F = 486.2, df = 2,226; P < 0.001).

DASH scores by group. Difference between groups is statistically significant (P < 0.001).


Debriefing has a long and important history of use in the military and aviation industries and has a rapidly expanding role within health care education.23,25,29–31,60,61 Regardless of the specific setting, the goal of debriefing remains the same: to promote reflection and learning and, ultimately, to thereby improve performance. In clinical practice and structured educational encounters, health care providers across the spectrum of training and professional life have many learning opportunities. There is evidence to suggest that simulation accompanied by high-quality debriefings facilitates the transfer of new knowledge, skills, and attitudes to the clinical domain, primarily through the enactment of the reflection stage of experiential learning and by providing the opportunity for the experimentation aspect of adult learning.7,62–65 An assessment tool that helps determine debriefing quality and provides debriefers with valuable feedback can provide crucial support for the educational processes within debriefing.

Data regarding the psychometric properties of the DASH in the context of training raters reveal promising interrater reliability and internal consistency. Although further evidence is required, support for DASH validity is grounded in both its content and the scores arising from its use. For the content, the extensive theoretical background and practice-based experience integrated into the DASH provide support for the content relevance of the DASH. The performance of the DASH scores, specifically the statistically significant difference between the scores for debriefings of varying quality, provides some nascent evidence for the validity of DASH scores. That is, the DASH was designed to measure the quality of debriefing performance, and DASH scores in this study did vary with the described varying debriefing archetypes. However, this evidence is limited by the actor-as-debriefer nature of the videos. More definitive evidence for validity will ideally be sought from the analysis of more complex and larger samples of debriefings. One such study is specifically planned for the data from the EXPRESS study.65,66 Ultimately, the optimal test for a rating tool such as the DASH is whether it predicts learning, not just debriefing quality.

The DASH is distinct from existing health care debriefing tools in 2 ways. Although such other tools exist, they are specifically intended for particular contexts. In addition, none of these debriefing assessment instruments provide behavioral anchors. The DASH and its behavioral anchors rely on extensive and well-defined domains of behavior from activities closely related to debriefing, the debriefing literature, and expert experience. Beyond the psychometric arguments in support of DASH reliability and validity presented here, the 3 levels of granularity of the DASH—elements, dimensions, and behavioral anchors—have the potential both to guide detailed formative assessment and to support rigorous summative assessment. The DASH handbook is a detailed description of the qualities and behaviors that comprise a debriefing. This may help educators use the DASH to provide feedback linked to specific areas of strength and areas for improvement.

Limitations to the present work include those common to all rating instruments and those specific to the DASH. As with all assessment tools, the data presented here speak only to the psychometrics of the DASH data in this particular setting. It is hoped that further studies will examine the properties of the DASH when used by raters from different backgrounds and different simulation settings. Similar to other behavior rating instruments, the DASH is limited in its use to trained users, and thus, rater training is a necessary step to its implementation. Another potential limitation of the DASH concerns its generalizability. Although the foundation of the DASH, through synthesis of relevant theory, empirical data from related fields, and the involvement of multiple experts in its conception, is intended to bridge differences in debriefing styles, how well the DASH is able to assess different debriefing strategies will require further investigation. Because there is no single criterion standard for debriefing quality, the DASH of necessity has judgments embedded within it regarding optimal debriefing behaviors. The DASH development process was aimed at identifying behaviors common to all effective debriefing styles, but the ultimate success of this endeavor will require further empirical evidence.


In conclusion, this study is a first step to our collective understanding of how the DASH performs. The evidence presented here suggests that, in the present setting, the DASH yields reliable data for the assessment of health care simulation debriefings. It is hoped that other studies of the instrument will help it become a useful tool to guide educators in their use of debriefing as a critical educational modality.


The authors thank the EXPRESS investigators; Elizabeth Hunt, MD, MPH, PhD; Monica Kleinman, MD; Vinay Nadkarni, MD, MS; Kristen Nelson McMillan, MD; and Akira Nishisaki, MD, without whom this work could not have been completed, and the authors thank John Boulet, PhD, and Heather L. Corliss, PhD, MPH, for their invaluable statistical guidance. The authors also thank the Simulation in Healthcare reviewers for their feedback that substantially improved this article.


1. Harden RM. Trends and the future of postgraduate medical education. Emerg Med J 2006; 23: 798–802.
2. Issenberg SB, McGaghie WC, Hart IR, et al.. Simulation technology for health care professional skills training and assessment. JAMA 1999; 282: 861–866.
3. POISE. 2010. Available at: Accessed December 9, 2010.
4. Cheng A, Nadkarni V, Hunt E, Qayumi K, EXPRESS Investigators. A multifunctional online research portal for facilitation of simulation-based research: a report from the EXPRESS pediatric simulation research collaborative. Simul Healthc 2011; 6: 239–243.
5. Cheng A, Hunt EA, Donoghue A, et al.. EXPRESS—Examining Pediatric Resuscitation Education Using Simulation and Scripting. The birth of an international pediatric simulation research collaborative—from concept to reality. Simul Healthc 2011; 6: 34–41.
6. International Pediatric Simulation Society. 2011. Available at: Accessed April 25, 2011.
7. Cook DA, Hatala R, Brydges R, et al.. Technology-enhanced simulation for health professions education: a systematic review and meta-analysis. JAMA 2011; 306: 978–988.
8. Nishisaki A, Nguyen J, Colborn S, et al.. Evaluation of multidisciplinary simulation training on clinical performance and team behavior during tracheal intubation procedures in a pediatric intensive care unit. Pediatr Crit Care Med 2011; 12: 406–414.
9. Eppich WJ, Adler MD, McGaghie WC. Emergency and critical care pediatrics: use of medical simulation for training in acute pediatric emergencies. Curr Opin Pediatr 2006; 18: 266–271.
10. Volk MS, Ward J, Irias N, Navedo A, Pollart J, Weinstock PH. Using medical simulation to teach crisis resource management and decision-making skills to otolaryngology housestaff. Otolaryngol Head Neck Surg 2011; 145: 35–42.
11. Halamek LP. Teaching versus learning and the role of simulation-based training in pediatrics. J Pediatr 2007; 151: 329–330.
12. Cheng A, Duff J, Grant E, Kissoon N, Grant VJ. Simulation in paediatrics: an educational revolution. Paediatr Child Health 2007; 12: 465–468.
13. Adler MD, Trainor JL, Siddall VJ, McGaghie WC. Development and evaluation of high-fidelity simulation case scenarios for pediatric resident education. Ambul Pediatr 2007; 7: 182–186.
14. Nishisaki A, Hales R, Biagas K, et al.. A multi-institutional high-fidelity simulation “boot camp” orientation and training program for first year pediatric critical care fellows. Pediatr Crit Care Med 2009; 10: 157–162.
15. Shilkofski NA, Nelson KL, Hunt EA. Recognition and treatment of unstable supraventricular tachycardia by pediatric residents in a simulation scenario. Simul Healthc 2008; 3: 4–9.
16. Brett-Fleegler MB, Vinci RJ, Weiner DL, Harris SK, Shih MC, Kleinman ME. A simulator-based tool that assesses pediatric resident resuscitation competency. Pediatrics 2008; 121: e597–e603.
17. Weinstock PH, Kappus LJ, Garden A, Burns JP. Simulation at the point of care: reduced-cost, in situ training via a mobile cart. Pediatr Crit Care Med 2009; 10: 176–181.
18. Nishisaki A, Donoghue AJ, Colborn S, et al.. Effect of just-in-time simulation training on tracheal intubation procedure safety in the pediatric intensive care unit. Anesthesiology 2010; 113: 214–223.
19. Cooper JB, Singer SJ, Hayes J, et al.. Design and evaluation of simulation scenarios for a program introducing patient safety, teamwork, safety leadership, and simulation to healthcare leaders and managers. Simul Healthc 2011; 6: 231–238.
20. Morgan PJ, Kurrek MM, Bertram S, Leblanc V, Przybyszewski T. Nontechnical skills assessment after simulation-based continuing medical education. Simul Healthc 2011; 6: 255–259.
21. Hunt EA, Hohenhaus SM, Luo X, Frush KS. Simulation of pediatric trauma stabilization in 35 North Carolina emergency departments: identification of targets for performance improvement. Pediatrics 2006; 117: 641–648.
22. Dismukes RK, Gaba DM, Howard SK. So many roads: facilitated debriefing in healthcare. Simul Healthc 2006; 1: 23–25.
23. Fanning RM, Gaba DM. The role of debriefing in simulation-based learning. Simul Healthc 2007; 2: 115–125.
24. Van Heukelom JN, Begaz T, Treat R. Comparison of postsimulation debriefing versus in-simulation debriefing in medical simulation. Simul Healthc 2010; 5: 91–97.
25. Rudolph JW, Simon R, Raemer DB, Eppich W. Debriefing as formative assessment: closing performance gaps in medical attention. Acad Emerg Med 2008; 15: 1110–1116.
26. Kolb DA. Experiential Learning: Experience as the Source of Learning and Development. Englewood Cliffs, NJ: Prentice Hall; 1984.
27. Issenberg SB, McGaghie WC, Petrusa ER, Gordon DL, Scalese RJ. What are the features and uses of high-fidelity medical simulations that lead to most effective learning: a BEME systematic review. Med Teach 2005; 27: 10–12.
28. Schön D. Educating the Reflective Practitioner: Toward a New Design for Teaching and Learning in the Professions. San Francisco, CA: Jossey-Bass; 1987.
29. Rudolph JW, Simon R, Dufresne RL, Raemer DB. There’s no such thing as a “nonjudgmental” debriefing: a theory and method for debriefing with good judgment. Simul Healthc 2006; 1: 49–55.
30. Rudolph JW, Simon R, Rivard P, Dufresne RL, Raemer DB. Debriefing with good judgment: combining rigorous feedback with genuine inquiry. Anesthesiol Clin 2007; 25: 361–376.
31. Dismukes RK, McDonnell LK, Jobe KK. Facilitating LOFT debriefings: instructor techniques and crew participation. Int J Aviat Psychol 2000; 10: 35–57.
32. Bylund C, Brown R, Lubrano di Ciccone B, Diamond C, Eddington J, Kissane DW. Assessing facilitator competence in a comprehensive communication skills training programme. Med Educ 2009; 43: 342–349.
33. Gururaja RP, Yang T, Paige JT, Chauvin SW. Examining the effectiveness of debriefing at the point of care in simulation-based operating room team training. In: Henriksen K, Battles JB, Keyes MA, Gary ML, eds. Advances in Patient Safety: New Directions and Alternative Approaches (Vol 3: Performance and Tools). Rockville, MD: Agency for Healthcare Research and Quality; 2008.
34. Reed SJ. Debriefing experience scale: development of a tool to evaluate the student learning experience in debriefing. Clin Simul Nurs 2012; 8 (6): e211–e217.
35. Arora S, Ahmed M, Paige J, et al.. Objective Structured Assessment of Debriefing (OSAD): bringing science to the art of debriefing in surgery. Ann Surg August 14, 2012 [epub ahead of print].
36. American Educational Research Association, APA, and National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC: AERA; 1999.
37. Simon R, Rudolph JW, Raemer DB. Debriefing Assessment for Simulation in Healthcare. Cambridge, MA; 2009. Available at:
38. Smith PC, Kendall LM. Retranslation of expectations: an approach to the construction of unambiguous anchors for rating scales. J Appl Psychol 1963; 47: 149–155.
39. Shapira Z, Shirom A. New issues in the use of behaviorally anchored rating scales: level of analysis, the effect of incidence frequency, and external validation. J Appl Psychol 1980; 65: 517–523.
40. Jacobs RAU, Kafry D, Zedeck S. Expectations of behaviorally anchored rating scales. Pers Psychol 1980; 33: 595–640.
41. Argyris C, Putnam R, Smith DM. Action Science: Concepts, Methods and Skills for Research and Intervention. San Francisco, CA: Jossey-Bass; 1985.
42. Edmondson A. Psychological safety and learning behavior in work teams. Adm Sci Q 1999; 44: 350–383.
43. Knowles MS, Holton EF, Swanson RA. The Adult Learner: The Definitive Classic in Adult Education and Human Resource Development. 6th ed. Burlington, MA: Elsevier; 2005.
44. Watzlawick P, Weakland JH, Fisch R. Change: Principles of Problem Formation and Problem Resolution. New York, NY: Horton; 1974.
45. Darling M, Parry C, Moore J. Learning in the thick of it. Harv Bus Rev 2005; 83: 84–92, 192.
46. Ende J. Feedback in clinical medical education. JAMA 1983; 250: 777–781.
47. Harlen W, James M. Assessment and learning: differences and relationship between formative and summative assessment. Assess Educ Princ Pol Pract 1997; 4: 365–377.
48. Vaughan D. Theory elaboration: the heuristics of case analysis. In: Becker H, Ragin C, eds. What Is a Case? New York, NY: Cambridge University Press; 1992: 173–202.
49. Vaughan D. The Challenger Launch Decision: Risky Technology, Culture and Deviance at NASA. Chicago, IL: University of Chicago Press; 1996.
50. Davis JP, Eisenhardt KM, Bingham CB. Developing theory through simulation methods. Acad Manage Rev 2007; 32: 480–499.
51. Martin JA, Regehr G, Reznick R, et al.. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 1997; 84: 273–278.
52. Regehr G, MacRae H, Reznick RK, Szalay D. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med 1998; 73: 993–997.
53. Hodges B, Regehr G, McNaughton N, Tiberius R, Hanson M. OSCE checklists do not capture increasing levels of expertise. Acad Med 1999; 74: 1129–1134.
54. Berk RA. A Guide to Criterion-Referenced Test Construction. Baltimore, MD: The Johns Hopkins University Press; 1984.
55. Dumas JC, Redish JC. A Practical Guide to Usability Testing. 2nd ed. Bristol, UK: Intellect; 1999.
56. Weisinger H. The Critical Edge: How to Criticize Up and Down Your Organization and Make It Pay Off. New York, NY: Little Brown & Co; 1989.
57. Weisinger H. The Power of Positive Criticism. New York: AMACOM; 2000.
58. Kegan R, Lahey LL. How the Way We Talk Can Change the Way We Work. San Francisco, CA: Jossey-Bass; 2001.
59. Bernardin HJ, Buckley MR. Strategies in rater training. Acad Manage Rev 1982; 6: 206–212.
60. Lederman LC. Toward a systematic assessment of theory and practice. Simul Gaming 1992; 23: 145–160.
61. Steinwachs B. How to facilitate a debriefing. Simul Gaming 1992; 23: 186–195.
62. Savoldelli GL, Naik VN, Park J, Joo HS, Chow R, Hamstra SJ. Value of debriefing during simulated crisis management: oral versus video-assisted oral feedback. Anesthesiology 2006; 105: 279–285.
63. Draycott TJ, Crofts JF, Ash JP, et al.. Improving neonatal outcome through practical shoulder dystocia training. Obstet Gynecol 2008; 112: 14–20.
64. Crofts JF, Fox R, Ellis D, Winter C, Hinshaw K, Draycott TJ. Observations from 450 shoulder dystocia simulations: lessons for skills training. Obstet Gynecol 2008; 112: 906–912.
65. Siassakos D, Draycott T, O’Brien K, Kenyon C, Bartlett C, Fox R. Exploratory randomized controlled trial of hybrid obstetric simulation training for undergraduate students. Simul Healthc 2010; 5: 193–198.
66. 66. Donoghue A, Ventre K, Boulet J, Brett-Fleegler M, Nishisaki A, Overly F, Cheng A, Donoghue A. EXPRESS Pediatric Simulation Research Investigators. Simul Healthc 2011 Apr; 6 (2): 71–77.

Medical education; Health care education; Assessment; Debriefing; Simulation; Psychometrics; Behaviorally anchored rating scale

Supplemental Digital Content

© 2012 Society for Simulation in Healthcare