Simulation has gained much attention in recent years as an important adjunct to skill acquisition in the clinical context for a variety of reasons. From an ethical perspective, simulation training has the benefit of allowing residents to learn in a laboratory environment prior to experience on real patients. From an educational perspective, residents can interact with the simulator on their own time, increasing the opportunities for practice without the need for direct supervision.1,2 Further, drawing on adult learning3 and self-directed learning4 theories, it has also been suggested that self-guided learning may lead to better skill retention and transfer than externally controlled practice5 and that unsupervised learning experiences can promote and enhance the skills thought critical for a practitioner to maintain competence when in independent practice.
It is also important to note, however, that the expertise literature stresses the importance of guided practice6,7 to avoid the entrenchment of bad habits or the possibility of “laboring in vain” on tasks that are too easy or too difficult to maximize the learning opportunity.8 One of the most crucial aspects of deliberate practice is the assessment of performance by an expert accompanied by immediate feedback to allow for the correction of errors.6,7 This important evaluative component of teaching (“diagnosing the student”) is also emphasized by Irby et al9 in their work describing what constitutes excellent clinical teachers. A key challenge for the simulation education community, therefore, will be to determine how this evaluative component of learning can be effectively integrated into unsupervised learning contexts10 such as the simulation labs.
Current virtual reality (VR) simulators provide good opportunities for addressing this challenge. With the exponential rise in computing power, today's high-fidelity VR simulators not only provide lifelike contexts in which trainees can repeatedly practice but also collect several performance metrics for each task performed. Although several studies have shown that these data can be used to distinguish between novice and expert performance,11 no one has examined whether they can be used to “diagnose the learner,” identifying the types of errors being performed by the trainee. If this were possible, the simulator could tailor the challenges to the individual learner's needs, guiding the trainee's “unsupervised” learning activities. The goal of this study was to develop a “proof of concept” demonstration such that we might use the data collected by a VR simulator to identify (i.e., statistically predict) the problem areas in trainees' performances that would otherwise be identified by an expert observing the trainees directly. For this “proof of concept” demonstration, we chose a laparoscopic surgical simulator that has several preset training modules. As a first step in this development, it was necessary to determine the specific types of problem areas that trainees experience when learning laparoscopic techniques and to determine the reliability with which expert laparoscopic teachers can identify these difficulties in trainee performance. Thus, the particular purpose of this first study was to classify problem areas that novices have while learning laparoscopic surgical techniques, and to ascertain whether these problem areas are identifiable in novice performances on a VR simulator.
Ethical approval was obtained from both the University of Toronto and University Health Network research ethics boards.
Creation of the list of problem areas
Fourteen expert (fellowship-trained) laparoscopic surgeons at the University of Toronto teaching hospitals were invited to participate in individual semistructured interviews between October and December 2008. The interviews were designed to have the experts identify and reflect on the types of difficulties and problem areas novices face in early laparoscopic training. Interview probes were developed to encourage each expert surgeon to recall specific trainee performances in an attempt to explore all the possible ideas for novice difficulties that the surgeon might have seen. After each interview, a list of problem areas was elaborated and/or modified in an iterative fashion in order to represent the growing set of problem areas being described and to address redundancies arising when the same problem was described with slightly different language. This modification was performed during meetings of the research team, which included a laparoscopic surgeon educator, a surgical resident, and an education researcher. After each surgeon had exhausted his or her own ideas and views, the evolving list was provided and the surgeon was given the opportunity to comment on the problem areas already identified. Interviews lasted between 20 and 50 minutes. Interviews continued until a stable list of common novice problem areas was created. The final list was presented to each of the experts for final approval before moving forward.
The resulting list of common problem areas was developed into a set of six-point rating scales so that a given laparoscopic performance could be evaluated on each of the potential problem areas on a scale from 0 (not a problem at all) to 5 (serious problem).
Expert identification of problem areas in novice laparoscopic performances
Twenty laparoscopically naïve participants (third- and fourth-year medical students, and first-year surgical residents from the University of Toronto) were recruited to complete a standard set of seven basic laparoscopic training tasks, as well as a partial laparoscopic cholecystectomy, and an intracorporeal suturing task on a VR laparoscopic simulator, the “LapMentor.” The LapMentor was used for this study because of its ability to measure, record, and display the metrics associated with each performance (e.g., total time, path length, and accuracy). Performances were videotaped using FRAPS real-time video capture software.
Two laparoscopic surgeons were trained to identify each problem area using examples, definitions, and explanations. Each rater then separately rated the seven tasks as well as the partial laparoscopic cholecystectomy and the intracorporeal suturing task for each of the 20 videotapes.
To establish the extent to which the manifestation of the problem areas was consistent across tasks, we calculated the seven-task alpha coefficient for each of the five problem areas for each rater. To establish the interrater reliability for the identification of each problem area we calculated the intraclass correlation coefficient (ICC) for each of the five problem areas. Finally, to determine the extent to which raters could isolate the particular problems that each novice was experiencing, we calculated the “interproblem” correlations for each rater.
The 14 experts identified five problem areas that are most commonly experienced by novice learners. The list and description of these areas are presented in Table 1. The 20 right-handed novices (12 males, 8 females) included 12 medical students and 8 surgical residents. Six had previously used a simulator (all for three hours or less).
As can be seen in Table 2, the seven-task alpha coefficients were quite reasonable (mean = .85 and .82 for Rater 1 and Rater 2, respectively), suggesting that across the seven tasks, raters were able to identify the extent to which a novice was expressing the problem areas or not.
Also seen in Table 2, the single-rater ICC was moderate (range = 0.26–0.62) for each of the five problem areas. These levels of interrater reliability raise some doubts about the extent to which any two raters see the same errors in a particular performance. However, with 10 to 15 raters, it would likely be possible to identify with good reliability the extent to which a given novice was experiencing a potential problem area or not. Such stability would be necessary if we were to try to build a predictive model to identify the presence or absence of these problem areas from the data collected by the simulator during the novice's performance.
However, it is also important to note that both raters demonstrated little ability to discriminate among the five problem areas for a given novice's performance. The high average interproblem correlation coefficient for each of the raters (0.807 and 0.855) suggests that if a novice was rated poorly on one construct, then he or she was usually rated poorly on all constructs.
The results were not substantially different for the more “authentic” surgical tasks (partial laparoscopic cholecystectomy and intracorporeal suturing). However, because there were only two such tasks evaluated in this study, the results were less stable and therefore are not presented here.
The initial intention of our research program was to develop a procedure by which we could use data collected during performances on a simulator to predict the errors that an expert would identify if watching those performances. If we could accomplish this task, it would allow the simulator to use these predictions to offer more informed educational experiences for students in the unsupervised learning context. We anticipated several complications in the evolution of this procedure but were somewhat surprised at how early in the process these complications arose.
Consistent with what we had anticipated, the 14 expert laparoscopic surgeons were able to generate a consensus list of the most common problem areas that novices struggle with when learning laparoscopic skills. However, when it came to the identification of these problem areas in videotaped simulated performances, the raters were, at best, only moderately able to agree on which novices had the most difficulties in these problem areas and appeared unable to discriminate amongst the problem areas in the context of these novice performances. The quantitative data collected suggest that, with more raters, the interrater reliability for identifying which participants have difficulties with these problem areas could be increased; however, this would not ensure discrimination of each participant's particular problem areas.
There are several possible explanations for our findings. It may be, for example, that the simulator we used is not sophisticated enough for our experts to delineate or discriminate which of the five most common problem areas a novice is experiencing. The five problem areas identified by the experts were reflective of their real operative experiences with novice trainees. It is possible, therefore, that these five identified problem areas do not translate to the simulator tasks we had our novices complete, and thus our raters were justifiably unable to identify and discriminate these problem areas on the participant videos. As one example of this, the haptic feedback of VR simulators may not yet be accurate enough to expose the problem of “loss of tactile sense” that experts see novices struggling with when operating on a real human patient.
There may also have been conceptual problems with the construction of the five problem areas we created. For example, it is possible that the five problem areas were described more abstractly or theoretically than the typical expert teacher uses when “diagnosing” a student and developing a responsive learning plan to address the learner's moment-by-moment needs. That is, at least in a technical domain such as laparoscopy, an expert teacher may be focusing on more explicit details of action (such as the placement of one's hands or the motion needed to effectively drive a needle through the tissue) rather than on more general concepts such as 2D-to-3D spatial translation. If this were the case, the generic problem areas we produced through our conversations with the experts might be legitimate conceptual areas of weakness, but at the wrong level of specificity to be of any use for our experts in their efforts to diagnose the trainees' immediate needs. Consistent with this possibility, it is worth noting that the five problem areas identified in our study are different from those often seen in the literature. The difficulties and problems reported in the literature have traditionally been more procedure-specific and in a stepwise or checklist format associated with a particular surgery.12 By contrast, this study was designed to elicit problem areas that were not procedure-specific but, rather, were at a more fundamental basic level. Although our approach is consistent with an increasing trend to use more generic score sheets, like the GOALS global rating scale, in laparoscopic surgery,13 it may be that for the purposes of the current study, this may have been a mistake. If so, we might need to return to the first step of the process and determine exactly what kinds of assessments our teachers are making when they are “diagnosing” their students' learning needs.
An alternative conceptual error we may have made in our consideration of the five problem areas was the assumption that these problems were conceptually distinct when they are in fact highly interrelated. For example, it was identified in the expert interviews that a novice who neglects his nonoperative hand will more than likely also have difficulties with countertraction. Similarly, there exists a relationship between 2D/3D problems and magnification of movements. In both cases, the experts identified that the problem areas were related but asserted that they were two separate problems. Lurie et al14 found a similar trend in their review of tools used to evaluate the six Accreditation Council for Graduate Medical Education competencies. On the basis of their review, they concluded that it is “currently not possible to measure the competencies independently of one another in any psychometrically meaningful way.”14 Notably, Lurie et al14 recognized that the inability to meaningfully measure the constructs does not mean that they are incorrect. Instead, it may simply be problematic to assume that “once defined [they] would reveal themselves in a straightforward fashion through measurement.”14 We may, therefore, have been naïve in our starting assumption that the expert teacher “diagnoses” a single area of weakness to target for educational intervention. This process may, in fact, be much more complex than our simple model had recognized.
Our aim for this initial study was to determine the specific types of problem areas trainees have when learning laparoscopic techniques and to determine the reliability with which expert laparoscopic teachers can identify these difficulties in trainee performances. This study was to be part of a larger program of research that would determine if simulators were able to assist with the diagnosis and assessment of novice problem areas and, thus, engage the novice in directed self-guided learning. Although there are data to support the idea that directed self-guided learning in the setting of simulation can be effective,5 and that having direct expert instruction is not necessary to improve the development or retention of a surgical skill,15 this current study suggests that we are not yet ready to create the conditions by which the simulator itself can provide the diagnostic assessment of novice problem areas in order to provide directed self-guided learning. At this time, it appears that the processes by which an expert diagnoses novice difficulties are not easily reducible to a set of independent linear scales, much less predictable by a statistical model.
The authors would like to acknowledge the Royal College of Physicians and Surgeons of Canada for funding this project through the Medical Education Research Grant. The authors also would like to thank Dr. Ashley Vergis for his contribution to this project.
The authors obtained ethics approval for this study from the University of Toronto and University Health Network research ethics boards.
1 Cook DA, Dupras DM. Teaching on the Web: Automated online instruction and assessment of residents in an acute care clinic. Med Teach. 2004;26:599–603.
2 Ainoda N, Onishi H, Yasuda Y. Definitions and goals of “self-directed learning” in contemporary medical education literature. Ann Acad Med Singapore. 2005;34:515–519.
3 Knowles M. The Adult Learner: A Neglected Species. 3rd ed. Houston, Tex: Gulf Publishing; 1984.
4 Candy PC. Self-Direction for Lifelong Learning: A Comprehensive Guide to Theory and Practice. San Francisco, Calif: Jossey-Bass; 1991.
5 Brydges R, Carnahan H, Safir O, Dubrowski A. How effective is self-guided learning of clinical technical skills? It's all about process. Med Educ. 2009;43:507–515.
6 Ericsson KA. The Road to Excellence: The Acquisition of Expert Performance in the Arts and Sciences, Sports, and Games. Mahweh, NJ: Erlbaum; 1996.
7 Ericsson KA. Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Acad Med. 2004;79(10 suppl):S70–S81.
8 Nelson TO, Leonesio RJ. Allocation of self-paced study time and the labor-in-vain effect. J Exp Psychol Learn Mem Cogn. 1988;14:676–686.
9 Irby DM, Ramsey PG, Gillmore GM, Schaad D. Characteristics of effective clinical teachers of ambulatory care medicine. Acad Med. 1991;66:54–55.
10 Eva KW, MacDonald RD, Rodenburog D, Regehr G. Maintaining the characteristics of effective clinical teachers in computer assisted learning environments. Adv Health Sci Educ Theory Pract. 2000;5:233–246.
11 Carter FJ, Schijven MP, Aggarwal R, et al. Consensus guidelines for validation of virtual reality surgical simulators. Surg Endosc. 2005;19:1523–1532.
12 Ziv A, Wolpe PR, Small SD, Glick S. Simulation-based medical education: An ethical imperative. Acad Med. 2003;78:783–788.
13 Vassiliou MC, Feldman LS, Andrew CG, et al. A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg. 2005;190:107–113.
14 Lurie S, Mooney CJ, Lyness JM. Measurement of the general competencies of the accreditation council for graduate medical education: A systematic review. Acad Med. 2009;84:301–309.
15 Nousiainen M, Brydges R, Backstein D, Dubrowski A. Comparison of expert instruction and computer-based video training in teaching fundamental surgical skills to medical students. Surgery. 2008;143:539–544.