Incorporating simulation-based surgical skills training in the educational curriculum addresses the current challenges trainees face, including acquiring complex skills efficiently despite work-hour restrictions, cost pressures, and policies intended to reduce patient waiting times.1,2 Simulation-based training also enables the implementation of the principles of so-called proficiency-based training, which focuses on assisting trainees to reach a specified level of performance and achieve a uniform set of skills required to perform certain procedures. Quantitative assessment of the level of proficiency that uses standard and objective metric measurements is critically important in improving the quality of surgical education and in training professional, competent surgeons.3 As in all metric systems, the measurement tools used in the assessment of surgical proficiency need to be practical, usable, objective, valid, and reliable to be accepted as standard. Tools used in assessing surgical skills are listed in Table 1.
Measurement Tools Used in Simulation-Based Surgical Skills Training
Questionnaires are designed to generate feedback from trainees regarding their personal feeling of comfort or knowledge level in performing a surgical procedure. Although questionnaires can be practical, low-cost assessment tools, they have several inherent shortcomings, including subjectivity and unfeasibility in terms of standardization. Furthermore, validating questionnaires can be challenging, because they evaluate subjective measures that can be biased by many variables related to the subjects’ self-assessments of qualitative parameters. Most published studies in which comfort or knowledge questionnaires were used as proficiency measures of surgical procedures report that such questionnaires are not validated instruments.4,5 Thus, a questionnaire is not a suitable measurement tool for validated, standard, and metric assessments of surgical competence.
Objective Structured Assessment of Technical Skills
The Objective Structured Assessment of Technical Skills (OSATS) was the first assessment tool that made possible the quantitative measurement of surgical skill or task performance in surgical simulation.6 The OSATS is done by independent observers who evaluate the trainee’s performance using a checklist consisting of a set of specific surgical maneuvers that have been deemed essential elements of the procedure, such as appropriate placement of plate on bone using a C-arm and securing proximal and distal fixation1 (Figure 1). Accordingly, it is critical to train the observers who perform the scoring so that interrater (observer) reliability is >0.80 (ie, has almost perfect agreement between observers) to achieve unbiased results.
The OSATS methodology was designed for performance evaluation after the completion of a training session; however, it can be used during training to standardize formative feedback. When the trainer (faculty member) uses a checklist during the novice practice sessions, the skills or tasks can be monitored in real time. When a checklist item is not performed by the trainee, standardized, formative feedback can be given. Training and assessment are two sides of the same coin; thus, training can be greatly enhanced by formative feedback, especially in training to proficiency.
It is important to note that an OSATS checklist reports whether each and every essential step of a surgical procedure was completed; the tool does not measure quality or surgical finesse.7
Global Rating Scale
The Global Rating Scale (GRS) is another commonly used surgical skills assessment tool used to measure characteristic surgical behaviors (ie, surgical finesse) during the performance of any given procedure (eg, respect for soft tissues, fluidity of movements, familiarity with the instruments)1,8 (Figure 2). Although the GRS was developed to complement OSATS, some researchers include this assessment tool as a component of OSATS. Because the surgical skill parameters measured using the GRS have characteristic differences from items included in the OSATS checklist, it would be wise to think of the GRS and OSATS as separate measurement tools. Typically, the GRS uses a rating scale such as the Likert scale and measures surgical behaviors in general. Therefore, the GRS arguably provides a comprehensive assessment, which includes objective and subjective criteria and measures nontechnical cognitive skills (eg, decision making, judgment). Adding subjective criteria to any measurement tool that uses rating scales but does not use well-defined yes-or-no checklists, however, results in the limitations associated with subjectivity, including ambiguity, poor interrater reliability, and bias.
In a recent study, Bernard et al7 reported that OSATS checklist scores showed strong interrater reliability (>0.8) between the evaluators in assessing technical skills pertaining to shoulder surgical approaches. The GRS scores were found to be moderately reliable (0.75) between evaluators, however. The results of this study also showed that the OSATS checklist and GRS scores correlated with the trainees’ levels of experience. This finding supports the validity of these measurement tools in differentiating between the skills of novice and experienced trainees in performing surgical approaches to the shoulder.
Motion Tracking and Analysis
Objective assessment of performance with simulators requires metrics to provide accurate measurement of surgical skills. The most commonly used metric measurement methods include task completion time and accuracy. Task completion time and error can provide sufficient data to differentiate between a novice and an expert in the simulated performance of surgical skills. Detection of differences between the intermediate level of expertise and the novice or expert level can be challenging, however, because the latter two parameters do not supply metric information about the fluidity of hand movements when performing a task. Motion tracking and analysis appears to be an objective and valid tool for assessing surgical skills in terms of precision and economy of movement during the performance of surgical procedures.9,10 Motion tracking systems can be mounted to surgical tools and attached to or worn on the hands as sensors. The movements of these sensors are recorded as three-dimensional coordinates to measure a variety of motions, including the total path length traced by each sensor and the number of translational or rotational movements.11 The main disadvantage of these systems is that they require attaching extra devices to surgical tools or wearing equipment on the arms or hands, which can be cumbersome and impractical for the assessment of surgical skills. This method does not always require such cumbersome implementations, especially in arthroscopic procedures. Motion analysis systems can be built into a simulator to track and analyze instrument tip trajectory data.12 Howells et al9 showed the validity of a motion analysis system in its ability to differentiate between subjects with different expertise levels in arthroscopic skills. The authors recorded the time taken, total path length, and number of movements used when performing simulated arthroscopic tasks using a shoulder simulator equipped with an electromagnetic motion tracking system. They found significant differences between the performances of the surgeon and nonsurgeon groups (P < 0.0001) and between senior and junior surgeons (P < 0.05).
Despite the increasing availability of simulators that can track and analyze motion and the positive effects of this method on the objective assessment of proficiency in surgical skills, the effect of these metrics on trainees’ skill transfer to the operating room has yet to be proved.13 It seems unlikely that motion analysis will be used widely in the actual operating room setting. Motion tracking and analysis will likely remain a research tool for selected laboratory-based simulation studies.
Video recording of the operation for later assessment of surgical skills has several advantages over the currently used assessment methods, including real-time OSATS testing, during which an observer must be ready in the operating room to rate a trainee’s performance on a checklist. Video-based feedback is a practical method that enables the assessment of surgical performance using the same measurement tools as the OSATS or the GRS at the time that is most convenient for the rater14 (Figure 3). Multiple raters can examine the same video recording and score the performance, which may be effective in reducing bias. Video recordings are edited after the procedure, and unnecessary parts of the recording are cut; thus, the evaluation of surgical performance using video recordings also may reduce the time needed to assess the complete procedure. Although the process of editing may take extra time,15 the potential time saved by enabling multiple raters to assess the edited recordings arguably far outweighs the time spent on editing.
Video recordings from cameras positioned in the operating room or simulation centers can be valuable additions to the surgical skills assessment of almost any type of procedure, but arthroscopic operations in particular are well suited for this type of assessment, because the monitor output can be saved automatically as a video file.16 Jabbour and Sidman16 demonstrated the feasibility of using time-synced multicamera videos to show the instrument handling and the surgical field from the arthroscopic camera video recordings. The authors stated that the “mean duration of OSATS videos was 11 minutes and 20 seconds, which was significantly less than the time needed for an expert to be present at the administration of each 30-minute OSATS (P < 0.001).”
Video recording is valuable not only for the initial training of a novice or for training an experienced surgeon in a new procedure, but also for the maintenance of certification in periodic assessments. The public increasingly demands more oversight to ensure the quality of surgical performance. A standardized quantitative review of video-recorded procedures can serve many purposes, such as life-long learning with self-assessment for improvement and quality assurance for risk management, or privileging and credentialing. As technology improves, video capture likely will play a greater role in the formative and summative assessment of surgical skill.
Crowd-Sourced Assessment of Technical Skills
The term crowdsourcing has been defined as “the practice of obtaining information or input into a task or project by enlisting the services of a large number of people, either paid or unpaid, typically via the Internet.”17 Crowd-Sourced Assessment of Technical Skills (C-SATS) is an emerging adjunct for the video-recorded evaluation of surgical skills. In this method, video-recorded surgical performance can be assessed (using the OSATS or the GRS) by online crowds of raters who are decentralized, anonymous, and independent; some observers may not have received medical training.18
Holst et al19 studied the validity of C-SATS in assessing the surgical performance of 12 surgeons in porcine robot-assisted urinary bladder closures. The authors compared the assessments of 7 expert raters with those of 487 crowd workers recruited from Amazon Mechanical Turk. The expert surgeon graders took 14 days to complete the scoring of video-recorded performances using the GRS, whereas the crowd workers completed the assessments of 12 videos in 4 hours and 28 minutes. Each rater from Amazon Mechanical Turk was paid $0.75 per video. Concordance between the surgeon graders and the crowd workers was 0.93 (the Cronbach Alpha). Interrater (observer) reliability among the surgeon graders was 0.89.
The surgical skills assessment tools currently in use require too much time and too many resources and therefore are not practical for frequent use or scaling to larger studies.18,19 Using crowdsourcing to assess surgical skills may reduce the burden of the assessments and rater bias while increasing the accountability of outcomes by soliciting contributions from a large online community rather than from professionals who are related to a particular project in some way.18,19
Direct Objective Metric Measures
Directly and objectively measuring a concrete aspect of a skill using universal metric measurements holds promise for improving reliability, validity, clinical relevance, and applicability in large-scale studies or high-stakes board examinations while reducing time and expense. Examples of such parameters include the mechanical strength of a knot or a fracture fixation construct, the distance travelled when navigating a wire to a certain location such as the center of the femoral head, the accuracy of reduction, or the time to completion of a skill task. Research has demonstrated the value of using these parameters in surgical skills assessments when compared with accustomed methods such as OSATS.20,21
In a simulated intra-articular fracture reduction model, Anderson et al20 showed that OSATS did not correlate well with the actual fracture reduction measured using three-dimensional digital models of the final reductions of articular surfaces. Similarly, in a distal radius fracture fixation model, Putnam et al21 showed that the biomechanical strength of the fracture construct did not correlate with medical knowledge tests and OSATS scores. These data strongly suggest that the previously described direct and objective metric measurement parameters are critically important adjuncts to more commonly used techniques, such as the OSATS and the GRS, in the assessment of surgical skills.
Currently, the consensus in orthopaedic leadership—and in surgical subspecialties in general—is that a paradigm shift that integrates simulation-based surgical skills training into curricula is necessary in surgical education. Furthermore, simulation-based assessment in high-stakes surgery board examinations across all surgical subspecialties is also foreseeable. The measurement techniques that will determine the level of proficiency are not yet well defined, however. In a systematic review, Slade Shantz et al22 studied the internal validity of arthroscopic simulators and aimed to document whether any standard validation protocols exist. The authors reported excessive heterogeneity in the literature concerning performance metrics used in assessments and noted that the simulators can discriminate between novices and experts. Nevertheless, questions remain regarding the ability of simulators to discriminate between novice and intermediate proficiency levels. The main reason for the inability to discriminate between relatively close proficiency levels could be the use of nonstandardized measurement techniques that are not sensitive enough to quantify such differences.
As the use of simulators becomes more commonplace, it will be critical to define a full–life-cycle, simulation-based surgical training curriculum using proficiency-based progression methodology (ie, training to a benchmark that has been established by expert performance)23 along with objective, reliable, and valid measurement protocols that are standardized across all training programs nationwide. Perhaps such measurement protocols will need to combine more objective techniques such as OSATS, motion analysis, and direct metrics with video recording and C-SATS. Although the curriculum may be subject to change based on the country in which it is used, it will be necessary to achieve consistency in measurement protocols internationally to set a standard and ensure efficiency in communicating the level of surgical proficiency.
Simulation-based skills training and assessment are increasingly incorporated into surgical education and certification processes. Measurement techniques to quantify the level of proficiency in the performance of surgical procedures will be the key element in the success of this paradigm shift in surgical education. Some of the current measurement methods used to assess surgical skills include the OSATS, the GRS, motion tracking, video recording, C-SATS, and direct or objective metric measures.24 These methods are used alone or in combination based on the preferences of each research group or institution. Therefore, heterogeneity exists in the literature concerning the available evidence needed to draw conclusions. There is a need to define a standard, full–life-cycle, simulation-based surgical education curriculum along with measurement protocols using reliable, valid, and objective metrics and to adopt a proficiency-based progression methodology.24
Evidence-based Medicine: Levels of evidence are described in the table of contents. In this article, references 13 and 23 are level I studies. References 4-9, 11, 14, 15, and 19-21 are level II studies. References 10, 12, and 22 are level III studies. Reference 16 is a level IV study. References 1-3, 18, and 24 are level V expert opinion.
References printed in bold type are those published within the past 5 years.
1. Atesok K, Mabrey JD, Jazrawi LM, Egol KA: Surgical simulation
in orthopaedic skills training
. J Am Acad Orthop Surg 2012;20(7):410-422.22751160
2. Atesok K, Satava RM, Van Heest A, et al.: Retention of skills after simulation-based training in orthopaedic surgery. J Am Acad Orthop Surg 2016;24(8):505-514.27348146
3. Gallagher AG: Metric-based simulation training to proficiency in medical education: What it is and how to do it. Ulster Med J 2012;81(3):107-113.23620606
4. Beth Grossman L, Komatsu DE, Badalamente MA, Braunstein AM, Hurst LC: Microsurgical simulation exercise for surgical training. J Surg Educ 2016;73(1):116-120.26762839
5. Monod C, Voekt CA, Gisin M, Gisin S, Hoesli IM: Optimization of competency in obstetrical emergencies: A role for simulation training. Arch Gynecol Obstet 2014;289(4):733-738.24346119
6. Reznick R, Regehr G, MacRae H, Martin J, McCulloch W: Testing technical skill via an innovative “bench station” examination. Am J Surg 1997;173(3):226-230.9124632
7. Bernard JA, Dattilo JR, Srikumaran U, Zikria BA, Jain A, LaPorte DM: Reliability and validity of 3 methods of assessing orthopedic resident skill in shoulder surgery. J Surg Educ 2016;73(6):1020-1025.27267562
8. Alvand A, Logishetty K, Middleton R, et al.: Validating a global rating scale to monitor individual resident learning curves during arthroscopic knee meniscal repair. Arthroscopy 2013;29(5):906-912.23628663
9. Howells NR, Brinsden MD, Gill RS, Carr AJ, Rees JL: Motion analysis: A validated method for showing skill levels in arthroscopy. Arthroscopy 2008;24(3):335-342.18308187
10. Mason JD, Ansell J, Warren N, Torkington J: Is motion analysis a valid tool for assessing laparoscopic skill? Surg Endosc 2013;27(5):1468-1477.23233011
11. Clinkard D, Holden M, Ungi T, et al.: The development and validation of hand motion analysis to evaluate competency in central line catheterization. Acad Emerg Med 2015;22(2):212-218.25676530
12. Tashiro Y, Miura H, Nakanishi Y, Okazaki K, Iwamoto Y: Evaluation of skills in arthroscopic training based on trajectory and force data. Clin Orthop Relat Res 2009;467(2):546-552.18791774
13. Stefanidis D, Yonce TC, Korndorffer JR Jr, Phillips R, Coker A: Does the incorporation of motion metrics into the existing FLS metrics lead to improved skill acquisition on simulators? A single blinded, randomized controlled trial. Ann Surg 2013;258(1):46-52.23470570
14. Dath D, Regehr G, Birch D, et al.: Toward reliable operative assessment: The reliability and feasibility of videotaped assessment of laparoscopic technical skills. Surg Endosc 2004;18(12):1800-1804.15809794
15. Karam MD, Thomas GW, Koehler DM, et al.: Surgical coaching from head-mounted video in the training of fluoroscopically guided articular fracture surgery. J Bone Joint Surg Am 2015;97(12):1031-1039.26085538
16. Jabbour N, Sidman J: Assessing instrument handling and operative consequences simultaneously: A simple method for creating synced multicamera videos for endosurgical or microsurgical skills assessments. Simul Healthc 2011;6(5):299-303.21527869
17. Oxford Living Dictionaries. Online volume 2017.
18. Lendvay TS, White L, Kowalewski T: Crowdsourcing to assess surgical skill. JAMA Surg 2015;150(11):1086-1087.26421369
19. Holst D, Kowalewski TM, White LW, et al.: Crowd-sourced assessment of technical skills: Differentiating animate surgical skill through the wisdom of crowds. J Endourol 2015;29(10):1183-1188.25867006
20. Anderson DD, Long S, Thomas GW, Putnam MD, Bechtold JE, Karam MD: Objective structured assessments of technical skills (OSATS) does not assess the quality of the surgical result effectively. Clin Orthop Relat Res 2016;474(4):874-881.26502107
21. Putnam MD, Kinnucan E, Adams JE, Van Heest AE, Nuckley DJ, Shanedling J: On orthopedic surgical skill prediction: The limited value of traditional testing. J Surg Educ 2015;72(3):458-470.25547465
22. Slade Shantz JA, Leiter JR, Gottschalk T, MacDonald PB: The internal validity of arthroscopic simulators and their effectiveness in arthroscopic education. Knee Surg Sports Traumatol Arthrosc 2014;22(1):33-40.23052120
23. Angelo RL, Ryu RK, Pedowitz RA, et al.: A proficiency-based progression training curriculum coupled with a model simulator results in the acquisition of a superior arthroscopic Bankart skill set. Arthroscopy 2015;31(10):1854-1871.26341047
24. Atesok K, MacDonald P, Leiter J, et al.: Orthopaedic education in the era of surgical simulation
: Still at the crawling stage. World J Orthop 2017;8(4):290-294. 28473955