To reduce medical errors, it is first helpful to describe and understand them. Reason's1 general taxonomy of errors (knowledge-based, rule-based, and skill-based errors) has formed the foundation for several systems of classifying medical errors2 and for reporting errors in organizations more generally.3 Many of the articles regarding medical errors focus on the office-based practice of family medicine4–7 and describe mainly clerical, administrative, and prescription-related errors. Articles that do report on the hospital setting focus on errors in prescribing, dispensing, and administering medications8 or on “slips.” These are technical, or skill-based, errors in which the right thing is done, but incorrectly.9,10
Our study was based on a taxonomy of general surgical errors10 that has evolved from an incident-reporting process the University of South Florida uses as part of its routine review of surgical operations. Other than slips, the most frequently occurring errors reported by residents are “mistakes” (doing the wrong thing) involving (a) judgment, (b) inattention to detail, and (c) insufficient understanding of the problem. Our study focused on these “human-factor” mistakes.
We defined a judgment error as a poor decision based on good information, often taking the form of a refusal to ask for help or to admit to being tired or otherwise questionably able to perform a procedure. Inattention errors are lapses of attention or failure to follow standard operating procedure. In problem-understanding errors, a decision is made based on available, but insufficient, information.
Improving the quality of medical care is a complex issue involving a host of factors,11 one of which is the education and training of clinicians.12–14 Training is widely believed to improve task performance.15 It seems plausible, therefore, that training aimed specifically at avoiding the most common types of surgical errors should reduce their frequency. This article describes the development and evaluation of a training program designed to do just that. The training consisted of conventional lecture, behavior modeling,16,17 and active learning using role-plays in a realistic scenario. Performance data were collected during the training sessions, and on-the-job surgical complication data were collected both before and after the training. Thus, the aims of the study were to determine whether error-reduction training would affect behavior in the training session and whether the training would reduce surgical complications on the job.
Participants were surgical residents enrolled in an accredited general surgery training program in the southeastern United States. All 40 surgical residents were scheduled to participate as part of their training. Of those, 33 arrived and took part in the training. Participants could exclude their data from the study, and one resident chose to do so. Of the 32 remaining in the analysis, eight were PGY 1 (postgraduate year 1, or first year of residency), seven were PGY 2, seven were PGY 3, six were PGY 4, and four were PGY 5. Approval by the institutional review board for conducting research was obtained before data collection.
Development of training materials
Situational judgment test.
A situational judgment test was developed in consultation with an experienced surgeon. In the test, the resident was asked to imagine that he or she had just completed medical training and was now the newest member of a small surgical practice. In the scenario, a patient needed a complex arterial bypass procedure on one leg, and the surgery was scheduled. Three unfolding events were described, and the resident was asked to write in the space provided what he or she would do in response to each situation. Each event and its corresponding question were linked to one of the three described types of errors. In the first question, the new surgeon's partner could not participate in the surgery, and the new surgeon had to decide what to do (judgment). The scenario moved to the surgery itself, where the new surgeon was directed to the wrong leg to begin the surgery (attention to detail). The final question moved to the end of surgery, where no pulse was palpable in the leg after the bypass (problem understanding).
The training video was developed to teach surgical residents the three kinds of errors in a way that would be generally useful in surgery. In the video, a physician “host” first defines the three classes of errors. Then, a surgical scenario illustrates each type of error. Next, a panel of experts discusses the components of the scenario and the errors embedded within. Finally, each surgical scenario is reenacted, this time showing improved behaviors in response to each situation. Thus, the training video offered not only oral definitions of the types of error but also behavior modeling of both ineffective and effective responses to problem situations.
The role-play was designed to provide opportunities for the surgical residents to make or avoid the same three general kinds of errors. In the role-play, the resident encountered a patient and nurse (actors playing scripted roles) in the recovery room and reacted to the problems presented. For example, to test attention to detail, results for a different patient's EKG were placed in the target patient's chart; there were negative consequences to the patient if the resident failed to notice the discrepancy between the patient's name and the name on the erroneous EKG.
On arrival, the resident was informed of the study purpose and procedure and was asked to sign an informed consent. The resident completed the situational judgment test and then moved to one of two rooms (A or B). In room A, the resident watched the video training program. In room B, the resident participated in the role-play, followed by a one-on-one debriefing with an expert surgeon observer. Participants in the video condition proceeded from room A to room B; participants in the control condition proceeded from room B to room A.
Participants were randomly assigned to condition (video versus control). In the end, 19 residents participated in the video condition, and 13 in the control condition. The difference in those numbers is primarily due to no-shows. Because the residents had no way of knowing their assigned condition, the no-shows and the resulting missing data do not seem to be the result of participants' reactions to their assignments. Each role-play was video-recorded for subsequent ratings by judges.
After the video and role-play were both complete, participants completed a short questionnaire concerning their reactions to the training. One month after the sessions, each resident was asked to complete the same situational judgment test that had been administered immediately before the training program.
In months subsequent to the training sessions, psychologists blind to condition rated each recorded role-play for performance related to each of the problems embedded in the scenario. On-the-job records of surgical complications (described in more detail below) were recorded both before and after the study.
The study design is such that the randomized experimental portion of the study only concerns the video portion of the training. Because all participants went through both video and role-play (in random order), the long-term effects of training could not be compared with those of a true (no training) control group. However, the effects of the video training could be assessed experimentally, and changes over time could be assessed for the residents as a whole.
Situational judgment test.
A scoring rubric was developed in consultation with an academic surgeon. Two psychologists independently read each response to each question on the situational judgment test and awarded points to each answer on the basis of the scoring rubric. The first two items were scored simply on whether the surgeon elected to proceed or postpone (postponing was a plus for judgment) and whether the surgeon noticed the wrong leg (noticing was a plus for attention to detail). The third item was scored by giving credit for actions that would allow the surgeon to identify multiple reasons for the lack of pulse (points were awarded for actions, such as checking cardiac enzymes or flushing anastomosis, that allowed insight as to whether the lack of pulse was due to the graft or to a heart problem).
On the basis of their available schedules, one of two surgical faculty members observed and rated the residents during role-play using a form developed for the scenario. Psychologists later viewed video recordings of the role-plays and completed a checklist. The checklist was scored by applying a weighting scheme devised in consultation with a medical expert. In the role-play, the resident encountered a patient and nurse in the recovery room (the patient was portrayed by a high-fidelity simulator given a voice by a confederate in an adjacent room, and the nurse was another confederate). In the scenario, the patient had completed a laparoscopic cholecystectomy and, after about 45 minutes in the recovery room, had experienced a loss of blood pressure, which caused the nurse to call the resident for help. During the scenario, the resident could call the surgeon who performed the operation for information (judgment; asking for help), the charted EKG at bedside was for the wrong patient (attention to detail), and the patient had a history of heart problems, so the loss of blood pressure could be due either to internal bleeding from the operation or to a cardiac event (problem understanding).
Residents completed a six-item reactions questionnaire that asked about perceptions of both the video and role-play in terms of realism, interest, and use in training.
Complication report recording (on-the-job errors).
Frequencies of complications and errors by category were recorded as part of a mandatory weekly morbidity and mortality conference performed in five affiliated hospitals and cumulated for each month. Residents recorded errors using a standard format.10 The recording of incidents began before the training described in this study (12-month baseline) and continued for 6 months posttraining in order to evaluate whether the training seemed to have any impact on job performance. Complications were analyzed with respect to (a) total errors, (b) slips and mistakes, and (c) the kinds of errors described in the training (labeled “primary index” in the tables and figures). All records were de-identified as part of data collection before the analyses.
Reactions to the training were favorable overall, averaging 4.06 (SD = 0.54) on a 6-point scale. Cronbach α was 0.78 for the total scores.
Situational judgment test results
All participants took the situational judgment tests before the exercise began and one month after the exercise ended. Agreement between judges on scores for all three items was good, with the correlation between judges' scores ranging from 0.79 to 1.00. Therefore, the score for each resident was taken as the average of the two judges' scores. For the pretest, the means of the three items (postpone surgery [judgment], wrong leg [attention to detail], and response to no pulse [problem understanding]) were 0.89, 0.57, and 2.93, respectively, with respective standard deviations of 0.30, 0.51, and 2.27 (N = 23 for all tests; only 23 residents completed both pretest and posttest). For the posttest, the means of the three items were 1.00, 0.74, and 3.87, with respective standard deviations of 0, 0.45, and 3.76. Dependent t tests showed a significant increase for the second item only.
Faculty evaluations and psychologists' evaluations were used as dependent variables to assess the effects of the video training on performance in a medical simulation. For two of the outcome variables, frequencies were computed; these are shown for the psychologists' recordings in Table 1. The results of Fisher exact tests were not significant (for the EKG item [inattention to detail], P = .38, phi = 0.01; for the surgeon consultation [judgment], P = .28, phi = −0.05). Surgical faculty data were nearly identical to those of the psychologists and are not reported separately to conserve space.
Scales based on the checklists were formed to measure investigation of both heart problems and blood loss (problem understanding), with larger numbers indicating more thorough investigation. For the blood loss scale, for the video group, the mean and standard deviation were 3.97 and 1.01, and for the control group, the analogous statistics were 3.67 and 0.65 (the difference in means was not significant by t test, d = 0.34). For the cardiac scale, for the video group, the statistics were 3.11 and 1.35. For the control group, the statistics were 3.13 and 1.26 (the difference in means was not significant by t test, d = −0.02).
A month-to-month analysis determined whether complications and errors changed over time. Data related to procedures, complications, and errors are shown by month in Table 2 (the initial month of reported data is November). Mean and standard deviation for each variable in Table 2 by time period (pretest or first 12 months versus posttest or last 6 months) are shown in Table 3. The association between months and number of procedures completed was not significant (r = 0.23, N = 18, P > .05). Linear regression and correlational analyses indicated that the number of complications (r = −0.47, P < .05) and the number of errors (r = −0.55, P < .05) decreased significantly over time, as did the percentage of complications (number of complications divided by number of procedures, r = −0.51) and percentage of errors (r = −0.60). The number of primary index errors (errors of the type targeted for training in this study, r = −0.24) and the percentage of primary index errors (r = −0.28) did not decrease significantly over time. The results are shown graphically in Figure 1.
A training program for surgical residents was devised to reduce three kinds of error: judgment, inattention to detail, and problem understanding. The training consisted of a situational judgment test, lecture and behavior modeling (through video), and role-playing with feedback. Outcome data were collected before, during, and after training to examine both proximal and distal effects of the training on resident reactions and behaviors.
The video presentation of the lecture and behavior modeling of error types was evaluated by comparing the performance of those who had seen the video with those who had not seen the video before the role-play. Although residents' reactions to the video were positive, the video seemed to have little or no effect on behavior measured in the role-play. There was some evidence that training had an impact on attention to detail through the improvement on the situational judgment test item from before training to after training.
There were significant reductions in complications and total numbers of errors during the 18-month course of the study, and the differences were consistent with effective training. However, the results were not significant for the subset of behaviors deliberately targeted by the training. Thus, the result of the training evaluation indicated that the training was only partially effective. On the basis of the study results, the mechanism by which on-the-job performance was improved was not clearly identified.
Interpretations of findings
Why should a training program that does not produce measurable impact on targeted behaviors nonetheless seem to result in reductions in complications and errors in surgical procedures? One explanation is that the weekly completion of the error analyses sensitized the residents to surgical errors, thus helping the residents to consider at length the causes of such errors and the strategies they might use to prevent them. Thus, the completion of the forms, rather than the training itself, may have been responsible for the observed improvement.
A second possible explanation is the reliability of the performance measures. It may be that the performance measure in the role-play was not sufficiently reliable in the sense of generalizing from one situation to another. For example, in the medical student testing literature, student performance in one simulation typically fails to correlate highly with performance on another, even when the two stations are apparently quite similar.18 The distal criterion measure, however, was based on thousands of procedures rather than a single patient. Therefore, the job performance criterion may have been more sensitive to the effects of training than was the immediate test in the role-play. Additionally, the evaluation of the training only considered the video; we did not have a separate no-training control group to evaluate the entire training experience.
It could be argued that the study results were simply a Hawthorne effect such that reductions in complications and errors were simply due to scrutiny by the researchers. Were such an explanation operative, however, we would expect to see a drop in complications and errors after the introduction of scrutiny followed by a return to baseline within a few weeks or months. No such drop and return to baseline are evident in the complications and error reports. Further, the reports are part of a reporting system that has been in place in the hospital for more than two decades, is widely accepted, and undergoes weekly oversight by the chairman and faculty of the department of surgery.
Contributions to the literature
Although the number of surgical procedures increased slightly after training, the study showed a meaningful decrease in the number of complications (from 26 to 18 per month on average) and errors (from 18.5 to 13.2 per month on average) after training. Such a result is clearly important from the standpoint of patient outcomes. Thus, this study adds to the quantitative literature on medical training evaluation, particularly to error training in surgery. It is important to consider that the training targeted individual residents and errors of the sort most directly under the residents' control rather than targeting systems errors or addressing team training.12 Without minimizing the importance of teams and systems, the results suggest that training efforts directed primarily at individuals can prove useful in reducing surgical complications and errors.
Study limits and future research
Although all the current surgical residents in the medical school were invited to participate in the study, the total number of residents was not large for conventional statistical significance tests. An obvious next step is to include a larger number of residents. It would be desirable to extend the training and increase the number of role-play scenarios so that a more reliable criterion could be used for evaluation of the proximal training effects. With larger numbers of participants, it would also be possible to have a control group for the entire training module, rather than just the video portion.
Because the design did not include a “no training” comparison group, it is possible that some external variable not related to the training was responsible for the decrease in errors. We are unaware of any such variable, however. It remains for future research to replicate and extend our findings. One possible design for helping to disentangle the reasons for the current results would be to rerun the study using an existing, well-designed but unmodified complication-reporting system as a surrogate for error (not adding the error-reporting system). Another useful approach would be to see whether instituting an error analysis component to the existing complication reports would, by itself and without training, result in a decrement in complications and errors in surgery.
Future endeavors might train the attending physicians instead of, or in addition to, the residents. Many of the decisions made in the planning and execution of a surgical procedure are either made by or approved by the attending physician. Thus, training might have a greater impact on complications and errors if it were applied to the attending physicians. Whether such a program is culturally feasible in most institutions is an open question that must be addressed before such research can be conducted.
Errors have many different causes, and the training program used in the current study only addresses those that might be described as human factor errors, which depend on individual failures in thinking (judgment, inattention to detail, and problem understanding). Additional analyses (e.g., preventability analysis) might enhance the interpretation of why the particular improvements were observed in the current study and facilitate the future application of patient safety training.