Since the 2000 Institute of Medicine report “To Err Is Human,”1 the increased public concern over medical errors has created a challenging environment for training novice resident physicians to competently perform invasive procedures. Resident training programs are evolving to design training modules that foster an environment of expedient acquisition of critical knowledge and procedural skills within the contemporary mandate of minimal patient risk.2 Training programs recognize that better evidence-based training is both necessary and achievable. Medical educators have proposed partial task trainer simulators as a practical method to teach potentially dangerous (for patients) technical procedures to residents in a setting that preserves patient safety.2 Partial task simulators offer physicians-in-training the opportunity to learn the psychomotor skills necessary to competently perform invasive procedures such as central venous catheter (CVC) insertion,3 interventional radiology procedures,4 and endotracheal intubations.5
Regardless of how sophisticated the partial task simulation trainers become, their relationship to clinical performance must be established. Better objective assessment is a key strategy in attaining the goal of reduced medical errors. An ideal evaluation system would avoid subjective assessments—such as global ratings of resident performance by more senior physicians and Likert-type scales that assess the psychological metrics of the resident's own performance—in favor of identification of the presence or absence of predefined error events while the procedure is actually being performed. Real-time data collection in studies of emergency medical care, however, is a major challenge because of the need to maintain patient privacy and to avoid creating an environment that might artificially influence resident physician behavior. The most important objective of any training method is to demonstrate skills transfer—that is, an improvement in technical skills in the hospital setting as a result of simulation training. However, few data suggest that skills transfer actually occurs.6,7 The evaluation of skills transfer of invasive procedures such as CVC insertion requires the development of an in-hospital assessment tool that protects patient privacy, minimizes observational bias, and avoids artificially influencing physician behavior.
We devised an independent rater (IR) direct observation system of nonphysicians trained to observe ultrasound-guided CVC insertion in real-time in the hospital setting. This IR system provides a feasible method to evaluate medical students' and residents' clinical performance. We developed an IR training system and direct observation clinical assessment tool before initiating a prospective study to assess the impact of simulation training on ultrasound-guided CVC insertion by residents in the hospital setting (results of the latter will be published separately). The assessment tool we developed allows for (1) the opportunity for immediately available on-call IRs to observe the procedure once paged, (2) minimal disruption by the IR during the performance of the procedure, (3) elimination of IR bias through the selection of raters who do not know the resident performing the procedure, and (4) preservation of patient confidentiality by avoiding videotaping the hospital procedure.
We trained and hired IRs during the month prior to first enrolling residents into our study. We randomized all first- and second-year residents who rotated through the emergency department, the medical intensive care unit, and the surgical intensive care unit at a tertiary care teaching hospital into one of two CVC insertion training models: either simulator training or training through the traditional apprenticeship. The simulation training included hands-on training of ultrasound-guided CVC insertion with a partial task trainer until competent.
To test our CVC assessment tool and to determine whether nonphysicians could work as IRs, we developed a pool of IRs who, with minimal medical background, required a minimally cumbersome four-hour IR training course. The IR system used in this study is a paid, on-call service funded by an Agency for Healthcare Quality and Research grant. Once hired and trained, the IRs assessed CVC insertions in the hospital setting for a 20-month period. IRs hired for the study observed resident-placed CVC insertions in the emergency department, the medical intensive care unit, and surgical intensive care unit. IRs worked one of two 8-hour shifts daily (8 am to 4 pm or 4 pm to midnight) and were paid $12 per hour. We did not create an overnight shift (midnight to 8 am) because the hospital hired nonresident health care providers to perform invasive procedures overnight in the intensive care units. IRs carried a beeper and needed to be within five minutes of the hospital at all times during the assigned shift. Once paged to a patient's room, these IRs observed the procedure, completed a checklist with 50 data-collection points (Appendix 1, Categories 1–6), recorded the duration (in minutes) of the procedure from sterile preparation to flushing the distal port, and documented technical errors and procedural complications. We blinded IRs to resident training (simulation group versus control).
We recruited IR applicants by posting fliers at the Yale School of Medicine (SOM), Yale School of Nursing, Yale School of Public Health, and the affiliated tertiary care teaching hospital. We sent e-mails to the career services offices at these locations and to deans of the schools of health sciences at three local universities for distribution via listservs at each institution. The job was posted online at the undergraduate Yale Student Employment Office. Additionally, we posted the job announcement online (www.publichealthjobs.net), and we spread the announcement via word of mouth. We defined “IR applicant” as any individual who submitted a résumé. We then categorized applicants who passed an interview and reference check as IR trainees. We did not require any particular educational or experience criteria for the applicant to become a trainee, although people with health or science backgrounds and research experience were preferred. We did not permit residents, physician associates, and nurses employed at the teaching hospital to apply to become IRs because of potential bias they may have had from previous exposure to residents included in the study.
All IR trainees completed a two-hour information and didactic session that included a PowerPoint instructional presentation on CVC insertion focusing on indications, anatomy, and the IR's role in the study. The PowerPoint presentation was followed by two video presentations; the first video reviewed ultrasound technology, and the second demonstrated an entire ultrasound-guided CVC insertion with running commentary on the correct method and with descriptions of potential technical errors and complications associated with CVC insertion. Instructors for the course included senior emergency medicine residents and emergency medicine faculty at the study institution. At the end of this session, we gave IR trainees an ultrasound-guided CVC insertion study guide to take home for review.
The IR trainees returned for a second two-hour testing session to observe 5 of 10 choreographed CVC insertion videotapes. We evaluated IR trainees on their ability to time the procedure, to accurately complete a 50-data-point checklist (Appendix 1, Categories 1–6), and to identify all technical errors and/or complications. We designed the checklist for this study based on a review of the literature as well as a consensus of the correct steps for CVC insertion by faculty in emergency medicine with ultrasound fellowship training, faculty in surgical critical care, and faculty in internal medicine critical care.8–12 These CVC insertion videos included key outcome variables and up to five choreographed technical errors (List 1). As previously mentioned, the total time of IR training was approximately four hours: two hours of didactics and two hours of testing. Compensation for the IRs at $12.00 per hour for four hours of training totaled $48.00 per IR trained.
From this cohort of IR trainees, those candidates who could accurately assess the time of the procedure to within one minute, validate the procedural checkpoints to within 95% accuracy, and detect technical errors and complications within a 3% margin of error were eligible to be hired as IRs. While the decision to hire IRs was based overall on accurately completing the checklist, we believed the key outcome variables were particularly relevant to this study because a variable such as an increased number of cannulation attempts during CVC insertion may be associated with an increased risk of technical complications such as hematoma formation and catheter-associated infection.13 Likewise, some evidence shows that increased length of time for inserting a CVC leads to higher risk of catheter-related infections.8,9,12 Maintenance of guidewire control was initially included as a key outcome variable because of the potential catastrophic complication of guidewire embolization as well as the emphasis on the repetitive resident manual dexterity training of this specific skill on the partial task trainer.
To confirm IR accuracy in data collection in the hospital setting over the course of the study, a research associate (RA) responded to an intermittent hospital page and simultaneously, with the hired IRs, completed the procedural checklist. These spot checks were performed to confirm IR accuracy and to determine interrater reliability. The RA participated in the IR training course and completed the videotape testing sessions at the beginning of the study.
In order to assess retention of IR observational skills, a sample of a subset of hired IRs who observed twenty or more CVC insertions in the hospital setting returned at the conclusion of the study for a poststudy videotape testing session. This subset of IRs observed the majority of in-hospital CVC insertions, received their training before the enrollment of resident participants, and observed CVCs over the entire 20-month data-collection period. For the poststudy, this subset of IRs observed three choreographed videotapes: one previously viewed videotape (hiring phase) and two new videotapes. We assessed IR performance with the same criteria used during the initial hiring phase of the study: the abilities to time the procedure, to accurately complete a 50-data-point checklist, and to identify all technical errors and/or complications.
This study was approved by the Yale University SOM Human Investigation Committee (0605001388).
On the basis of interviews and reference checks, we selected 38 of the 49 applicants (78%) to be IR trainees. Most applicants were undergraduates from one of three local universities. Table 1 shows IR applicants by education and gender. All 38 IR trainees completed the IR training module and returned for an examination session. Twenty-seven (71%) of the 38 IR trainees met criteria to be hired as IRs. We did not hire 11 of the trainees because of their inability to accurately assess the time of the procedure to within one minute, validate the procedural checkpoints to within 95% accuracy, and detect technical errors and complications within a 3% margin of error.
The hired IRs averaged 1.30 (SD = 0.67) overall errors and 0.74 (SD = 0.45) errors in identifying key outcome variables (List 1) per choreographed CVC insertion videotape. The hired group had 97% agreement with the standard answer on the 50 procedural checkpoints observed. The 11 trainees who did not meet hiring criteria averaged 5.18 (SD = 2.23) overall errors and 2.09 (SD = 0.54) errors in identifying key outcome variables (List 1) per choreographed CVC insertion demonstration.
Figure 1 demonstrates the performance of the applicants (hired versus those not hired) categorized by accuracy of documenting number of CVC cannulation attempts, identification of choreographed technical errors and complications, accuracy of timing the procedure, and completion of the procedural checklist. The hired group performed better than the not-hired group in all categories. The greatest difference between the hired and not-hired groups was correctly determining the number of cannulation attempts.
We measured the reliability of the IRs' observational assessment of each data point on the checklist by crude agreement with the standard answer for each videotape (Table 2). The hired group had 100% crude agreement for 30 of the 50 checklist data points and >88% for all remaining checklist items except for keeping hands on the guidewire at all times during CVC insertion (77%) and determining the number of times hands were removed from the guidewire (73%; Table 2). We reevaluated the data for the hired group by omitting the guidewire data point from the checklist because both the hired and not-hired groups performed poorly on these data points during testing. By omitting the guidewire data points, 48% (13/27) hired IRs had zero key outcome variable errors compared with only 26% (7/27) when guidewire questions were included.
The RA simultaneously completed the procedural checklist during an in-hospital CVC insertion along with a subset (n = 7/14, 50%) of IRs who observed more than 20 CVCs during the course of the study. There was 100% concordance between the IRs and RA on all key outcome variables except for the guidewire. The IRs and RA recorded identical results for timing the duration (within one minute) of the procedure, counting cannulation attempts, and identifying technical errors (seven total). Both the IRs and RA recorded breaks in sterile technique during two CVC insertions, and both the IRs and the RA identified the presence of malignant arrhythmias (three episodes of ventricular tachycardia). During three of the CVC insertions, both the IRs and RA recorded removal of hands from the guidewire, but there was a discrepancy in the number of times this technical error occurred. In one additional CVC insertion, the IR did not record removal of the hand from the guidewire, whereas the RA recorded two episodes of hand removal.
To assess retention of observational skills, the subset of IRs who observed more than 20 CVC insertions returned for poststudy test assessment in the month following the conclusion of data collection. We were able to test 93% (13/14) of this subset. Figure 2 demonstrates comparison of prestudy and poststudy choreographed videotape CVC insertion test results for this subset. All IRs consistently scored greater than 90% accuracy for the three poststudy tests for key outcome variables, the remaining checklist items, and timing of the procedure. During the poststudy testing, IRs again had difficulty identifying maintenance of guidewire control. We found no evidence of IR skills decay over the course of the study.
Among all trainees, we found no association between education level (undergraduates versus some level of postgraduate education) and hired status (hired or not hired; chi-square P = .184). We also found no significant association between education type (medical student trainees versus all other trainees) and hired status (chi-square P = .088). These comparisons suggest that education level is not a good predictor of success as an IR and that a medical education background is not necessary to be a competent IR.
Discussion and Conclusions
In the past decade, medical education has experienced a trend toward using simulation including laparoscopic task trainers, partial task trainers for procedural training, and high-fidelity mannequin simulators that address team communication and clinical judgment. Most research studies to date have used methodologies to assess the merits of simulation-based training on either trainee satisfaction and/or confidence level using Likert scales or on resident- or student-rated performance improvements post simulation training.14–17 However, minimal evidence demonstrates improved performance in the in-hospital clinical setting.6,15,18 The value of simulation as an educational tool, because it is both time intensive and expensive, requires objective evidence of skill transfer—that is, documented improvement in the ability to perform a task on a real patient that was learned in a simulation-based environment. Therefore, medical education must establish techniques to evaluate performance in the clinical arena to confirm the value of simulation in terms of improved patient safety.
Three potential tools for data collection at the bedside are (1) self-reporting by the trainee, (2) audiovisual recording, and (3) direct observation by an IR. Britt and colleagues3 reviewed their initial resident experience with CVC insertion on actual patients after simulation training. At the end of partial task training, residents completed a Likert scale questionnaire and submitted a log of the initial 10 CVCs inserted on actual patients. Residents perceived that simulation improved their performance on patients, and they self-reported improvement in both average needle sticks and total time for CVC insertion.3 Obvious limitations of self-reporting include the desire of the trainee to please the investigator, a bias toward minimizing errors, and the inherent unreliability of the data collected.
Audiovisual recording allows for repeated examination of procedures to extract detailed qualitative and quantitative data. Video data may be particularly valuable in observations of team coordination and for procedures requiring communication among team members because video recordings may overcome difficulties in capturing subtle cues and fleeting errors. Video data allow for repeated review of individual behaviors and tacit interchanges among team members that might be too subtle to identify through either self-reporting or checklists. Video recording observation also permits interrater reliability studies in which multiple raters evaluate multiple subjects. In a prospective, randomized, blinded study, Seymour and colleagues6 reviewed archived videotapes of laparoscopic cholecystectomites to show that virtual reality training, a three-dimensional environment created by a computer that closely resembles reality, improved operating room performance of residents during laparoscopic cholecystectomy. The surgeon–investigator supervising the resident during the procedure was blinded to the participants' training status. During videotape review of the laparoscopic procedure, the investigators collated potential observable measures of surgical performance and rated these measures for errors. Videotape evaluation was feasible in that study because videotaping the abdominal cavity secured anonymity for both the patient and the surgical resident, thereby eliminating both the issues of patient confidentiality and attending surgeon bias in terms of preconceptions of the surgical resident. However, the investigators were unable to assess the impact of supervision and prompting by the operating room surgeon–investigator on surgical resident performance because only the intraabdominal procedure was recorded. Disadvantages of audiovisual recording procedures for purposes of evaluating skills transfer include the inability to videotape in multiple hospital locations, the inability to capture any aspect of the procedure outside the frame of the videotape, and, in most cases, the inability to preserve the privacy and confidentiality of these records for patients and health care workers.19
Direct observation, a third potential tool for assessing skills transfer to the clinical setting, overcomes some of the disadvantages of video recording by allowing for both increased flexibility of assessing physicians-in-training at different hospital locations and an improved ability of the observer to move within the room to gain perspective. Some studies advocate medical experts as observers, but others support trained nonmedical observers.20–22 In our study, we intentionally recruited nonphysicians for training as IRs because of a number of potential advantages. The use of nonphysicians potentially reduced the residents' performance anxiety regarding their fears of being observed by physician mentors. Despite a consent form assuring each resident that participation in the study would in no way affect his or her standing in the residency training program nor be included in the resident evaluation process, residents might be apprehensive about having another resident or attending physician present in the room and evaluating them during the performance of an invasive procedure.
The use of nonphysician IRs also precluded the raters from having preconceived perceptions of the correct way to insert a CVC. In addition, nonphysician IRs might not have the preconceived bias regarding the medical specialty of the residents that a trained physician may have; for instance, an emergency medicine resident or attending physician might expect an emergency medicine or surgery resident to be more proficient than an internal medicine resident at performing invasive procedures. Finally, nonphysician observers would not feel compelled to intervene during a procedure in which an error in technique or a complication occurred. Observation by a more experienced physician might lead the observer to correct the resident inserting the CVC.
We found that teaching observational skills for invasive procedures to IRs who do not have extensive medical backgrounds is feasible. Our hired and trained IRs made few errors in identifying either key outcome variables or overall errors during observed, choreographed CVC insertions. We carefully defined, based on objective performance criteria, completion of the training phase for the hired IRs. In the planning phase of this study, we were uncertain whether the criterion levels were set too high, but our results demonstrate that nonphysician IRs could attain these high levels. We strongly recommend IR training to assess and improve interrater reliability whenever necessary and possible before trials start. Studies lacking power because of unreliable assessments carry the risk of false-negative findings and raise ethical questions23; however, in a randomized controlled trial of a new triage acuity scale, the Emergency Severity Index (ESI), emergency department nurses were randomized to training with ESI or the use of a previous triage scale, with results showing good interrater reliability.24
The IR testing phase also confirmed the reliability of our checklist assessment tool for CVC insertion. Other studies have demonstrated the reliability of checklist assessment tools to evaluate residency performance by direct observation10 and of scoring sheets for National Institutes of Health Stroke Scale certification using DVDs.25 In the current study, IR trainees used the same clinical checklist during real-time scoring that they used during the clinical phase of the project. Excellent agreement occurred among IRs on all components of the checklist except for maintenance of hand on the guidewire during CVC insertion, which, as mentioned above, we considered to be a key outcome variable because of the potential catastrophic complication of loss of the guidewire during CVC insertion and the risk of advancement/embolization of the wire beyond the skin insertion site. The IR training data show that maintenance of hand control on the guidewire at all times may not be reliably assessed by videotape observation. Difficulty in assessing the safe threading of the guidewire was a surprising result. Initially, we postulated that it may have been related to the angle of observation during the videotaping. However, in the clinical setting, while both the IR and RA accurately detected the technical error of removing the hand from the guidewire, they did not agree on the number of times. This difficulty in assessing the number of times the hand was removed from the guidewire may in fact be related to the position of the IR in the patient's room and the angle of observation. We did not dictate exact positioning for our IRs or have them stand, for example, on a step stool. In future studies of direct observation, this may be necessary.
An important limitation of this study is that whereas scoring on videotaped, simulated CVC insertions confirmed IR reliability, in the hospital setting only one IR was available during each CVC insertion. This precluded consistent interrater reliability measures to check the consistency of observations between IRs in the clinical setting. We did, however, have an RA simultaneously rate a select group of IRs in an effort to provide some limited data related to this issue. Additionally, our simulator model did not permit evaluation of certain choreographed errors; for instance, we could not test whether IRs recognized arterial cannulation because the partial task trainer did not distinguish between arterial (carotid artery) and venous (internal jugular vein) blood. But, in the training session, we did explain to the IRs potential complications, including arterial cannulation leading to the withdrawal of bright red pulsatile blood. Also, scripted video scenarios of CVC insertion may not approximate direct observation in the hospital setting. However, Slagle and colleagues22 demonstrated a high concordance between real-time and video analyses of 20 routine anesthetic procedures as well as intrarater and interrater reliability.
One potential disadvantage of employing nonphysician IRs is that they may feel less comfortable in the hospital setting, experiencing exposure to critically ill patients and witnessing invasive procedures that involve the sight of blood and bodily fluids. However, we informed our IR applicants of their hospital duties at the time of recruitment, and most demonstrated an interest in a career in the health care field. Further, nonphysician IRs may be less able to evaluate either the relevance of aspects of a procedure that occur outside of a defined checklist or the importance of a change in the sequence of steps on the checklist. Nonphysician IRs may be more hesitant to ask questions, and they may be less familiar with medical vocabulary and abbreviations. Finally, they may have uncertainty about the roles or levels of training of various medical personnel.
Given the results of this study, we advocate using IRs to objectively monitor resident performance of CVC insertion. The use of IRs in the hospital setting may be generalized to use with the evaluation of other systems needed to fulfill Accreditation Council for Graduate Medical Education core competencies and to increase patient safety at training institutions. The IR system as we have described it—including our processes of recruiting, training, and hiring IRs—is a feasible method of evaluating resident and medical student performance of procedural skills in the hospital setting. Using nonmedical IRs is a novel approach to skills transfer evaluation that may provide complete and objective quantitative data on the progress and competence of physicians-in-training without the need for an intensive time commitment provided by highly trained medical staff. This process can serve as a prototype for a large variety of studies that could provide validation for the efficacy of either a whole training program or individual modalities or modules within a training program.
Regardless of how sophisticated laboratory assessment becomes, its relationship to performance in the clinical setting must be established. Simulators will likely continue to be used, and their role in training novice physicians will also likely grow. Although little evidence exists to show that simulation training improves patient care, many health care institutions continue to adopt the technology. Definitive experiments to improve academic medicine's understanding of the effect of simulators on skills transfer will allow medical educators to use simulators more intelligently to improve provider performance, reduce errors, and, ultimately, increase patient safety. Although such experiments will be difficult and costly, they may be justified if they can help medical educators determine how to best apply new technology.
The authors wish to thank and acknowledge all of the following: Ingrid Tuckler for her administrative assistance, Dr. Christopher Moore for his consultative support related to ultrasound training, and all the independent raters participating in this study.
Financial support for this project was provided by a patient safety grant from the Agency for Healthcare Research and Quality (Grant # U18 HS16725).
This project was presented in a poster session at the Association of American Medical Colleges Annual Meeting in Washington, DC, November 2007.
1 Kohn LT, Corrigan JM, Donaldson MS, eds; Committee on Quality of Health Care in America; Institute of Medicine. To Err is Human: Building a Safer Health System. Washington, DC: National Academy Press; 2000.
2 Binstadt ES, Walls RM, White BA, et al. A comprehensive medical simulation education curriculum for emergency medicine residents. Ann Emerg Med. 2007;49:495–504.
3 Britt RC, Reed SF, Britt LD. Central line simulation: A new training algorithm. Am Surg. 2007;73:680–682.
4 Cotin S, Duriez C, Lenoir J, Neumann P, Dawson S. New approaches to catheter navigation for interventional radiology simulation. Med Image Comput Comput Assist Interv Int Conf Med Image Comput Comput Assist Interv. 2005;8(pt 2):534–542.
5 Schaefer JJ 3rd. Simulators and difficult airway management skills. Paediatr Anaesth. 2004;14:28–37.
6 Seymour NE, Gallagher AG, Roman SA, et al. Virtual reality training improves operating room performance: Results of a randomized, double-blinded study. Ann Surg. 2002;236:458–463.
7 Torkington J, Smith SG, Rees BI, Darzi A. Skill transfer from virtual reality to a real laparoscopic task. Surg Endosc. 2001;15:1076–1079.
8 Atkinson P, Boyle A, Robinson S, Campbell-Hewson G. Should ultrasound guidance be used for central venous catheterisation in the emergency department? Emerg Med J. 2005;22:158–164.
9 Miller AH, Roth BA, Mills TJ, Woody JR, Longmoor CE, Foster B. Ultrasound guidance versus the landmark technique for the placement of central venous catheters in the emergency department. Acad Emerg Med. 2002;9:800–805.
10 Shayne P, Gallahue F, Rinnert S, et al. Reliability of a core competency checklist assessment in the emergency department: The Standardized Direct Observation Assessment Tool. Acad Emerg Med. 2006;13:727–732.
11 Mansfield PF, Hohn DC, Fornage BD, Gregurich MA, Ota DM. Complications and failures of subclavian-vein catheterization. N Engl J Med. 1994;331:1735–1738.
12 McGee DC, Gould MK. Preventing complications of central venous catheterization. N Engl J Med. 2003;348:1123–1133.
13 Randolph AG, Cook DJ, Gonzales CA, Pribble CG. Ultrasound guidance for placement of central venous catheters: A meta-analysis of the literature. Crit Care Med. 1996;24:2053–2058.
14 Derossis AM, Bothwell J, Sigman HH, Fried GM. The effect of practice on performance in a laparoscopic simulator. Surg Endosc. 1998;12:1117–1120.
15 Fried GM, Feldman LS, Vassiliou MC, et al. Proving the value of simulation in laparoscopic surgery. Ann Surg. 2004;240:518–525.
16 Kim J, Neilipovitz D, Cardinal P, Chiu M, Clinch J. A pilot study using high-fidelity simulation to formally evaluate performance in the resuscitation of critically ill patients: The University of Ottawa Critical Care Medicine, High-Fidelity Simulation, and Crisis Resource Management I Study. Crit Care Med. 2006;34:2167–2174.
17 Kneebone RL, Kidd J, Nestel D, et al. Blurring the boundaries: Scenario-based simulation in a clinical setting. Med Educ. 2005;39:580–587.
18 Hyltander A, Liljegren E, Rhodin PH, Lönroth H. The transfer of basic skills learned in a laparoscopic simulator to the operating room. Surg Endosc. 2002;16:1324–1328.
19 Mackenzie CF. Video data for patient safety. Indian J Crit Care Med. 2004;8:194–198.
20 Carthey J. The role of structured observational research in health care. Qual Saf Health Care. 2003;12(suppl 2):ii13–ii16.
21 Fraind DB, Slagle JM, Tubbesing VA, Hughes SA, Weinger MB. Reengineering intravenous drug and fluid administration processes in the operating room: Step one: Task analysis of existing processes. Anesthesiology. 2002;97:139–147.
22 Slagle J, Weinger MB, Dinh MT, Brumer VV, Williams K. Assessment of the intrarater and interrater reliability of an established clinical task analysis methodology. Anesthesiology. 2002;96:1129–1139.
23 Clauser BE, Clyman SG, Swanson DB. Components of rater error in a complex performance assessment. J Educ Meas. 1999;1:29–45.
24 Worster A, Gilboy N, Fernandes C, et al. Assessment of inter-observer reliability of two five-level triage and acuity scales: A randomized controlled trial. Can J Emerg Med. 2004;6:240–245.
25 Lyden P, Raman R, Liu L, et al. NIHSS training and certification using a new digital video disk is reliable. Stroke. 2005;36:2446–2449.