Surgical excellence involves integration of knowledge, technical skills, surgical intuition, and decision-making. Despite centuries of experience using the Halstedian method of surgical education, graduate medical education of the 21st century demands a change. Resident work hour restrictions and decreased case loads influence the way future gynecologists will master surgical procedures.1 Residents now have functionally less operating room opportunities to master a bigger skill set.
Technologic advances in minimally invasive surgery challenge both the novice and master surgeons to learn novel psychomotor tasks quickly to safely practice this new craft of surgery. Accepting that increased time restraints, cost, and ethical considerations of the modern operating room make acquisition of basic skills prohibitive, investigators have developed and validated nonprocedure-specific simulators for laparoscopic skills training over the past decade.2
Despite a consensus in the general surgery literature suggesting that simulators positively influence a novice surgeon's mastery of psychomotor skills and tasks, well-designed studies evaluating how this translates into operating room performance are limited.3–6 Our primary hypothesis is that the use of laparoscopic skills simulators improves the scores and competence rate in validated laparoscopic simulator tasks, translating into improved performance in the operating room. Secondarily, we hypothesize that the level of residency modifies the effect of the intervention with laparoscopic simulator training.
MATERIALS AND METHODS
We conducted a multicenter, randomized, controlled trial to estimate the effect of a laparoscopic simulator curriculum on skills laboratory simulators and operating room performance among gynecology residents. The study was conducted between July 2006 and March 2010, and the following eight Accreditation Council for Graduate Medical Education--accredited obstetrics and gynecology residency programs across the country participated in the project: University of Texas Southwestern Medical Center; University of Texas Medical School at Houston (LBJ and Hermann campuses); University of Alabama at Birmingham; Uniformed Services University of the Health Sciences; Orlando Health; Ochsner Clinic Foundation; and Virginia Commonwealth University. The study was approved by each individual Institutional Review Board and monitored by those of the following central institutions: the University of Texas Southwestern Medical Center and the Ochsner Clinic Foundation.
All residents were offered the opportunity and consented to participate in the study. The first part of the study comprised collecting baseline information for each participant using a multiple-choice test on laparoscopic surgical principles, baseline performance on five previously validated laparoscopic simulation tasks, and baseline assessment of the resident's performance of a laparoscopic Pomeroy bilateral tubal ligation at the beginning of the gynecology rotation. Next, the residents were randomized to either traditional teaching (no simulation) or faculty-directed sessions in a laparoscopic simulation laboratory. Finally, the last phase of the study was to perform a final laparoscopic bilateral tubal ligation, to retest on the five previously performed simulated tasks, and to repeat the multiple-choice test.
The assigned resident interventions were either traditional teaching or faculty-directed simulation teaching. Traditional teaching was defined as the institutional standard, which did not include simulation during the study period. Faculty-directed simulation sessions involved spending 30 minutes on each of the five previously validated laparoscopic simulations that serve as the standard for the fundamentals of laparoscopic surgery7—peg transfer, pattern cutting, placement and securing of a ligating loop, suturing with an intracorporeal knot, and suturing with an extracorporeal knot. Those randomized to simulation training had open access to the simulation laboratory and were encouraged to practice the five fundamentals of laparoscopic surgery simulation tasks while remaining compliant with the Accreditation for Council for Graduate Medical Education duty hour regulations. The additional time spent in the laboratory was not recorded, but the goal was to practice until proficiency was achieved on two consecutive attempts of each fundamentals of laparoscopic surgery task.
Residents were randomized using a computerized generated list in blocks of 12 and stratified by level of residency (lower level and upper level). Allocation concealment was achieved by a central telephone system, and the randomization was implemented by the attending physician at each institution. Participating residents were instructed to not discuss their randomization and faculty members were not given a master list of allocation. The primary outcomes were the final total normalized score for the simulator tasks and the score in operating room performance.
We used STATA 11.0 for our statistical analysis. Sample size was calculated using estimates of failure on simulated laparoscopic tasks. Although the available data were sparse, Fraser et al7 (2003) found that 68–86% of lower-level residents failed, whereas Goff et al8 (2002) reported a 62–73% initial failure rate among upper-level residents.7,8 For a two-sided α of 0.05 and β of 0.20, a study size of 44 lower-level residents (postgraduate year [PGY] 1 and 2) and 66 upper-level (PGY 3 and 4) residents were necessary to demonstrate a 50% improvement in performance on the simulated tasks.
The simulation score generated for each skill station took into account precision of performance and speed.7 A penalty score was calculated by objectively evaluating performance with a predefined list of errors. The timing score was calculated by subtracting the time to complete the exercise from a preset cut-off time. The final score was compiled by subtracting the penalty from the timing score. Subsequently, each task was normalized by dividing each individual's score by the maximum score achieved by a chief general surgery resident (PGY 5) for the same task (task 1=237, task 2=280, task 3=142, task 4=520, task 5=297), and then multiplying that number by 100. Thus, the scores were able to equally contribute to the total score, which was calculated by adding the normalized scores for all five tasks. Competence was defined as a total normalized score of 270 or more.
To date, the best available method of technical skills assessment involves observation with criteria.9 For the evaluation in operating room performance, a score was generated using a previously described objective structured assessment of technical skills type of measurement10 (a global rating scale with seven components of the Likert scale: respect for tissue; time and motion; instrument handling; knowledge of instruments; use of assistants; flow of operation and forward planning; and knowledge of specific procedure). The assessments were completed by the attending surgeon immediately after the procedure to evaluate the resident's intraoperative surgical intuition and decision-making abilities. This objective structured assessment of technical skills generates a score that consistently demonstrates levels of reliability and validity comparable with objective structured clinical examinations.11,12
Continuous variables were examined for normality using the Shapiro-Wilk normality test. A paired t test was used to assess differences within cohorts. A Wilcoxon rank-sum test was necessary to test the differences between cohorts attributable to a non-normal distribution of the variables. Binomial categorical data were analyzed with a χ2 or Fisher exact test when appropriate. Our final model involved a multiple linear regression analysis for the main effects, including baseline total normalized scores and randomization group as covariates, with the final total normalized score as the dependent variable.
We also performed a logistic regression for the prediction of a final score of pass or fail (fail=less than 270). Similarly, we performed a multiple linear regression with baseline objective structured assessment of technical skills scores and randomization as covariates and the final objective structured assessment of technical skills score as the dependent variable.
We tested the effect of the following a priori--specified variables in the study that we believe may help predict the outcome: the number of laparoscopic cases at randomization; the level of residency (lower or upper); and baseline psychomotor testing with a peg board test. We accounted for the correlation of observations within centers using a robust estimator (Huber-White sandwich estimator) with residency site as the cluster variable.13
To test the hypothesis of a modification effect by level of residency, we introduced an interaction term in the model between the level of residency and randomization. We tested the effect of modification of the number of operative laparoscopic cases at baseline. We tested the final model for the assumptions of normality (Shapiro-Wilk), linearity (augmented partial residual plot), and equal variance (variance ratio test). All significance tests were conducted at P≤0.05 without correction for multiplicity.
We enrolled 116 residents from eight centers (range 3–35 residents). We obtained a complete follow-up rate of 87.9% (102/116). We present the study flow in Figure 1, including the reasons for failure to complete the study protocol. The most common reason was the inability to perform a second tubal ligation because of decreases in surgical volume and changes in rotation schedules. We did not identify systematic differences in failure to complete the study between both groups.
Baseline characteristics between the simulator training group and the control group are shown in Table 1. We found no differences between baseline characteristics, including baseline exposures such as previous simulation, video game, or billiard experiences.
The first set of comparisons comprised the presimulation and postsimulation testing within and between both cohorts (Table 2). For each of the five tasks, we found no significant performance differences at baseline. Posttesting at the conclusion of the protocol demonstrated significant differences between the groups. The final total normalized score was significantly higher in those residents who were randomized to proctored simulation training (378±54 compared with 264±86; P<.01). The time for completion of the study between the initial and the final total normalized score assessment was not statistically significant different between the two groups (control 107±124 days compared with simulator training 142±174 days; P=.11).
The multiple linear regression model using baseline total normalized scores explained 67% of the variability in the outcome. The randomization group alone was responsible for 37% of such variance. The coefficient for the randomization group was 112, indicating that, controlling for the baseline total normalized score, being in the randomization group was associated with an average increase of 112 points (95% confidence interval 67.9–156.9) in the final normalized score. Regarding the proportion of residents achieving competence in the treatment compared with control conditions, small cell sizes obviated the use of logistic regression. As such, we provide descriptive statistics (ie, proportions), unadjusted for site or baseline variability, to characterize outcomes; however, we cannot provide inferential tests. Among those in the simulation and control groups, 30.2% (16/53) and 31.7% (19/60) demonstrated competence, respectively, at baseline. At the conclusion of the project, 96.2% (51/53) and 61.1% (36/60) were competent among the simulation and control groups, respectively (P<.01).
We measured the internal consistency of the objective structured assessment of technical skills score with the Cronbach α, obtaining excellent intercorrelation among the score items at 0.93. In Table 3, we show the operating room performance results measured by objective structured assessment of technical skills. We found a statistically significant difference between both groups. The predictive model using baseline total normalized scores explained 33% of the variance in the outcome. Randomization group alone was responsible for 6% or such variance. The coefficient for the randomization group was 2.2, indicating that, controlling for the baseline objective structured assessment of technical skills score, being in the randomization group was associated with an average increase of 2.2 points (95% confidence interval 1.29–3.2) in the final objective structured assessment of technical skills score.
Level of residency, number of laparoscopic operative cases, and psychomotor testing results did not improve the predictive ability of our model. The additions of interaction terms, level of residency, and baseline operative laparoscopic cases with the randomization group were not statistically significant for simulator task scores or objective structured assessment of technical skills scores.
As surgical educators struggle to maximize the number of opportunities for skills development and repetition in the operating room, many are turning to simulation to augment the experience of surgical residents. A focus on patient safety, in conjunction with decreased surgical volumes, poses a significant challenge to obstetrics and gynecology residency programs. Such factors dictate that residency programs maximize acquisition of surgical skills of the residents before they enter the operating room. The emergence of simulation in medical education provides an opportunity for residents to learn and practice basic surgical tasks before performing them on live patients.
We examined the data with a multiple linear regression model, including the prespecified clinically relevant variables for the prediction of the total normalized scores. We found that the variables that allowed the better model were the baseline total normalized score and the randomization group. We were cognizant of that because of the inclusion of several centers and the fact that we had to account for the correlation of measurements within the centers, so we introduced a robust estimation with clustering by center. Although center-to-center variability may exist, the current study is not designed to evaluate such a question (ie, the sample does not comprise a random selection of possible sites). This may diminish the generalizability of the conclusions. We did not improve our model by adding the level of residency or the number of laparoscopic cases at baseline.
A growing body of literature has attempted to answer whether simulation leads to real-time improvements in surgical performance, yet many of these studies are plagued with nonvalidated tests of surgical skills, underpowered data, or homogeneity of level of training among participants.3,4,14 Our study did demonstrate a significant improvement in surgical performance, with randomization accounting for improvement in objective structured assessment of technical skills scores by 2.2 points. Given that this is the most valid tool available, the difference is clinically relevant because in any given category this increase would move the learner's abilities from poor to above average.
The procedure we chose to observe was a laparoscopic Pomeroy bilateral tubal ligation. Our goal was to have residents perform a procedure that residents at all levels of training could safely accomplish but that was difficult enough to allow meaningful evaluations of laparoscopic skills. The laparoscopic Pomeroy has been well-described as a technique used for resident education15,16 while preserving appropriately low failure and surgical complication rates. The procedure allows for residents to demonstrate a variety of laparoscopic surgical skills, including tissue manipulation and cutting, use of the ligatures and laparoscopic retrieval bags, and ambidextrous operative skills. Our originally intended procedure, the laparoscopic Parkland bilateral midsegment salpingectomy, was considered too difficult to ensure faculty acceptance of it as the sterilization technique of choice.
We acknowledge a few weaknesses with our study. First, it was difficult to facilitate the completion of the study protocol by many residents during a single rotational block. Each center experienced significant month-to-month variation in the volume of patients seeking tubal ligation as well as challenges in consistent resident scheduling. Additionally, many patients encountered last-minute cancellations of the scheduled ligation procedure. To offset this potential bias, we collected the most objective data available, Accreditation Council for Graduate Medical Education case log statistics for gynecologic laparoscopy, to control for surgical experience. We did not find any statistically significant difference in the time to complete the study between the groups. Another weakness of the study was the unblinded design of the randomization assignments. We attempted to minimize the effect of this by keeping the simulation teachers separate from the surgical proctors. In addition, residents were instructed to not discuss their randomization with others.
Our study did possess a number of strengths. First, we used the most validated laparoscopic simulation tasks as the foundation of our curriculum. Multiple studies have demonstrated no advantage in acquisition of psychomotor skills when comparing high-fidelity (virtual reality) and lower-fidelity (box trainers) simulators.17,18 Our intent was to use a validated and easily reproducible laparoscopic simulation that could be introduced into any obstetrics and gynecology residency program.
Finally, this project does not discount the classic methods of surgical education. The fact that both trained and nontrained groups experienced significant improvement in operative performance helped to establish construct validity. As expected, those residents who were randomized to simulation training experienced the greatest levels of improvement. Ultimately, simulation education should be used to enhance, not replace, the learning experience in the operating room.
The findings suggest that using proficiency-based simulated laparoscopic skills offer significant benefit over the traditional gynecologic surgical education among lower-level residents. The use of easily accessible, low-fidelity tasks should be incorporated into formal laparoscopic training. This curriculum validates the importance of moving from quantity to quality of performance. This research could serve as the foundation to standardize methods of documenting surgical competency and for curriculum development in residency education. Future research efforts should examine how improved simulation technology can further facilitate the development, improvement, and maintenance of surgical skills among all levels of surgeons. Ultimately, these studies will provide validation in the use of simulation to verify maintenance of competency among practicing surgeons.
1. Blanchard MH, Amini SB, Frank TM. Impact of work hour restrictions on resident case experience in an obstetrics and gynecology residency program. Am J Obstetrics Gynecol 2004;191:1746–51.
2. Peters JH, Fried GM, Swanstrom LL, Soper NJ, Sillin LF, Schirmer B, et al.. Development and validation of a comprehensive program of education and assessment of the basic fundamentals of laparoscopic surgery. Surgery 2004;135:21–7.
3. Banks EH, Chudnoff S, Karmin I, Wang C, Pardanani S. Does a surgical simulator improve resident operative performance of laparoscopic tubal ligation? Am J Obstetrics Gynecol 2007;197:541 e1–5.
4. Coleman RL, Muller CY. Effects of a laboratory-based skills curriculum on laparoscopic proficiency: a randomized trial. Am J Obstetrics Gynecol 2002;186:836–42.
5. Derossis AM, Antoniuk M, Fried GM. Evaluation of laparoscopic skills: a 2-year follow-up during residency training. Can J Surg 1999;42:293–6.
6. Seymour NE, Gallagher AG, Roman SA, O'Brien MK, Bansal VK, Andersen DK, et al.. Virtual reality training improves operating room performance: results of a randomized, double-blinded study. Ann Surg 2002;236:458–63.
7. Fraser SA, Klassen DR, Feldman LS, Ghitulescu GA, Stanbridge D, Fried GM. Evaluating laparoscopic skills: setting the pass/fail score for the MISTELS system. Surg Endosc 2003;17:964–7.
8. Goff BA, Nielsen PE, Lentz GM, Chow GE, Chalmers RW, Fenner D, et al.. Surgical skills assessment: a blinded examination of obstetrics and gynecology residents. Am J Obstetrics Gynecol 2002;186:613–7.
9. Moorthy K, Munz Y, Sarker SK, Darzi A. Objective assessment of technical skills in surgery. BMJ 2003;327:1032–7.
10. Martin JA, Regehr G, Reznick R, MacRae H, Murnaghan J, Hutchison C, et al.. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 1997;84:273–8.
11. Harden RM, Stevenson M, Downie WW, Wilson GM. Assessment of clinical competence using objective structured examination. BMJ 1975;1:447–51.
12. Anastakis DJ, Regehr G, Reznick RK, Cusimano M, Murnaghan J, Brown M, et al.. Assessment of technical skills transfer from the bench training model to the human model. Am J Surg 1999;177:167–70.
13. Huber P. The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley (CA): University of California Press; 1967. p. 221–33.
14. Larsen CR, Soerensen JL, Grantcharov TP, Dalsgaard T, Schouenborg L, Ottosen C, et al.. Effect of virtual reality training on laparoscopic surgery: randomised controlled trial. BMJ 2009;338:b1802.
15. Murray JE, Hibbert ML, Heth SR, Letterie GS. A technique for laparoscopic pomeroy tubal ligation with endoloop sutures. Obstetrics Gynecol 1992;80:1053–5.
16. Robinson DC, Stewart SK, Reitan RE, Gist RS, Jones GN. Laparoscopic pomeroy tubal ligation: a comparison with tubal cauterization in a teaching hospital. J Reprod Med 2004;49:717–20.
17. Keyser EJ, Derossis AM, Antoniuk M, Sigman HH, Fried GM. A simplified simulator for the training and evaluation of laparoscopic skills. Surg Endosc 2000;14:149–53.
18. Munz Y, Kumar BD, Moorthy K, Bann S, Darzi A. Laparoscopic virtual reality and box trainers: is one superior to the other? Surg Endosc 2004;18:485–94.