Secondary Logo

Predicting the Technical Competence of Surgical Residents

Hamstra, Stanley, J

Clinical Orthopaedics and Related Research: August 2006 - Volume 449 - Issue - p 62-66
doi: 10.1097/01.blo.0000224060.55237.c8
SECTION I: SYMPOSIUM I: C. T. Brighton/ABJS Workshop on Orthopaedic Education

I describe an approach to predicting competence in technical skills for the purposes of resident selection. To demonstrate a predictive relationship, it is necessary to use measures that exhibit variation, reliability, and validity. There is little evidence that such measures are routinely used in the process of selecting residents. I argue that the selection of assessment instruments in the predictor domain must be guided by relevant theoretical considerations, while assessment in the surgical domain must make use of more objective and reliable instruments than is currently the practice. I present a brief summary of research on predicting operative technical competence.

From the Department of Medical Education, University of Michigan, Ann Arbor, MI.

The Department of Medical Education, University of Michigan, Ann Arbor, MI has received funding from the PSI Foundation of Ontario and the Natural Sciences and Engineering Research Council of Canada.

Correspondence to: Stanley J. Hamstra, PhD, University of Michigan, G1208 Towsley Center, 1500 E. Medical Center Drive, Box 201, Ann Arbor, MI 48109-0201. Phone: 734-763-1424; Fax: 734-936-1641; E-mail:

Predicting technical competence is an important task in identifying promising candidates during resident selection. To predict competence in any domain, one must identify good indicators of competence, select reasonable predictors, and demonstrate a statistical relation between the two. Both indicators must demonstrate reliable variation in scores, but this is not always easily attained. Many measures of competence, such as end of rotation evaluations, show relatively little spread.8,21,33 Fortunately, many of the available predictors (eg, medical knowledge, psychomotor ability, visuospatial ability) show good variation and have reliable and valid tests. However, because there are so many potential predictors, the challenge is to find a relevant test.

Other questions arise. For example, what do we mean exactly by competence in an orthopaedic residency program? What exactly will be measured? Is it the general impression of a number of surgeons with whom the resident has worked? Is it measured performance on an operative task in a laboratory setting? These questions speak to the problem of identifying essential constructs, that is, what is essential to the practice of orthopaedic surgery?

I considered issues related to the selection and measurement of relevant constructs in the surgery domain (in this case, operative competence) and the predictor domain. It is instructive to discuss each of these in turn because there are separate issues of measurement relating to each.

Back to Top | Article Outline

Surgery Domain

Importance of Assessment

Assessment of competence in surgery has been designed traditionally for pass/fail decisions. To be sure, there is often general discussion about a particular trainee's strengths and weaknesses and especially some acknowledgement of individual differences, but the main goal of a training program is to determine whether the trainee has met the minimum standards to progress to independent practice. However, for the task of identifying relevant predictors, there are two problems with this approach. First, given the specific task of assessing individuals for pass/fail decisions, there is no inherent requirement we distinguish among individuals, only that they meet a certain criterion. Although it is common for attending surgeons to identify the stars in a group of trainees, it seems the general tendency of this system is to minimize variation. Second, notwithstanding formative feedback during training, the assessment is one of overall competence and the general judgment of fit to practice with little emphasis on specific dimensions of competence.

While this model is useful for pass/fail decisions, such a system is largely unhelpful for predicting operative competence. To do this, we must identify specific components of technical performance and assess them with measures sensitive to subtle differences among individuals.

Back to Top | Article Outline

Valid and Reliable Measures of Operative Competence

Before 1995, there were few measures of operative competence in surgery with documented evidence of reliability and validity. Winckel et al41 noted, “… the formal assessment of technical skills has, usually, been relegated to one or two items on summary achievement indicators, such as in-training reports …” and “… completion of assessment forms is often performed at the end of a rotation and is based on recollection of events….”, including “…reliance on nonspecific evaluations…” or simply “…documentation of procedures performed…. With the gradual introduction of evidence-based medicine in the curriculum, there came a parallel call for the development of more objective and sensitive measures of operative competence. To address this, Winckel et al used checklists and expert global ratings to examine surgical trainees on 41 live operations on human patients (three operative procedures: cholecystectomy, inguinal hernia repair, and bowel resection). They found a statistically significant difference between junior (PGY-2 or 3) and senior (PGY-3 or 4) residents and high inter-rater reliability. They concluded the incorporation of structured guidelines to the assessment of operative competence leads to high construct validity. This was an evolution from traditional measures of operative competence, such as a single summary assessment made by an attending surgeon at the end of a rotation through his or her service.

After this success, there was a concerted effort to develop a specific instrument for the reliable and valid assessment of operative competence. One of the first validated measures was the Objective Structured Assessment of Technical Skills (OSATS).12,20 The OSATS is comprised of expert global ratings and a checklist and was developed based on the idea that more structured assessments would lead to greater reliability and validity.19,26,40 Several studies confirmed the reliability and construct validity of the OSATS across a wide variety of operative procedures.27 There is also evidence the expert global rating component captures subtle aspects of competence to which the checklist alone is insensitive, such as the procedure's flow.25

In addition to OSATS, other effective means of assessing operative competence have included final product ratings35 and offline videotape assessment.2,6 Pass rating and completion time also exhibit construct validity because these show changes in performance after educational intervention similar to those obtained with the expert ratings. (Importantly, expert raters should be blinded to the participant's experimental group during these studies.) The pass rating typically asks, “Would you allow this candidate to carry out this procedure on your next patient?” which confers some sense of clinical relevance. Completion time is also considered an important clinical variable in the operating room, but its relevance has been challenged recently on the grounds that it may fail to distinguish between efficiency and sloppiness.16

There has also been an attempt to use more objective measures of competence, including the Imperial College Surgical Assessment Device (ICSAD).7 This consists of affixing motion sensors under surgical gloves on the hands of individuals performing an operative procedure. Derived measures include number of individual hand movements and total path length that the hands travel during the procedure. This assessment method has been used successfully in a number of studies, including microsurgical anastomosis,15 craniofacial plate fixation39 and basic tasks such as knot tying and suturing.3 Efforts to refine assessment instruments are ongoing, with recent developments aimed at collecting more comprehensive measures of hand motion, such as optoelectric tracking technology.10 This type of assessment lends itself to analysis of the process of operative performance, theoretically going beyond simple final product outcome measures.

A recent study by Grober et al16 examined the potential of using clinically relevant outcomes for competence assessment in 50 surgical residents. Expert ratings of operative competence were obtained as were clinically relevant outcomes in laboratory mice via surgical reexploration 1 month after the educational intervention. The task was specific to urological microsurgery (ie, vas deferens anastomosis) and the clinically relevant outcomes included measures of patency in the anastomosis and presence of sperm on microscopy after completing the procedure. After training, anastomotic patency rates and rates of sperm presence on microscopy were higher for groups that received hands-on training when compared with didactic training, validating the use of clinically relevant outcomes as a new metric for assessing technical skills.

Operative training simulators have incorporated metrics specific to each trainer, and there are currently several studies underway examining whether these metrics of performance meet the test for reliability and validity and transfer to the clinical setting.13,34,36 In sum, validation evidence has accrued for the following metrics: time, final product ratings, pass/fail ratings, checklists, expert global ratings, some clinically relevant measures, hand motion, and some simulator-based metrics.

Back to Top | Article Outline

Predictor Domain: The Shotgun Approach versus a Theory-Based Approach

Given the measure for assessing operative competence identified above, what corresponding measures are available in the literature as predictors? This is a classic problem in experimental psychology, related closely to identification of individual differences and personnel selection.

Back to Top | Article Outline

How Do We Choose a Relevant Predictor?

The challenge in identifying possible predictors among the thousands of tests available is it requires a careful and detailed search for relevant constructs with a theoretical relation to operative competence. Without this specificity and theory-guided approach, the exercise resembles one of searching for a needle in a haystack (see Anastakis et al1 for a review of this problem), an approach called the shotgun approach.11 A more sensible approach is to narrow the field of tests on the basis of theory. This approach fulfills two objectives. First, it forces us to think about the construct in question, increasing chances for positive outcomes, and second, it allows us to make specific predictions about what constructs are expected to correlate. A strong theoretical case should be made for why each variable should be measured. This follows from the principles of validation in psychometric theory.22,23

The theory-based approach to construct selection also tends to foster interdisciplinary discussion regarding the relevant constructs, in this case between surgeons and testing experts; the surgeons have direct intuitive access to good candidates for constructs based on their content experience, and the testing experts have access to the universe of potential tests and are generally equipped to identify those that are most susceptible to easy measurement. This could be accomplished in a focus group or in a series of one-on-one interviews of surgeons by the testing expert. This approach involves intuition, task analysis, and interdisciplinary collaboration.

Back to Top | Article Outline

The Role of Task Analysis

When a surgeon creates a checklist for an operative procedure, he or she is engaging in implicit task analysis. This effectively involves having a surgeon deconstruct a procedure into its component steps. When followed with input from colleagues (eg, in a focused discussion) the product is a first draft of task analysis for that procedure.17 It is often useful to discuss this with trainees as they will have a more pure understanding of a procedure's steps because experts have often chunked the steps into a state of unconscious competence.

An example of this approach is a study on the prediction of operative competence in plastic surgery.37 The authors were interested in identifying visual-spatial abilities that might predict competence on a spatially complex task, the Z-plasty. In much of the previous research on the topic, there was little or no theoretical justification for the particular visual-spatial tests used. Recognizing this as a weakness, the authors decided to choose the tests on the basis of theory rather than administer a battery of neuropsychological tests and see what correlated. In this case, they analyzed the surgical procedure in detail and then used principles of visual perception theory to select specific visual-spatial tests. The major reasons for choosing the Z-plasty for this study was that it related closely to visual-spatial constructs: the surgeon must visualize the end-product before beginning the procedure, and mentally manipulate an object in two or three dimensions. This study identified predictors of competence on the Z-plasty task, with a test of mental rotations correlating at a level of r = 0.47 (p < 0.01).

Back to Top | Article Outline

Research on Predicting Operative Competence

Several other studies have examined characteristics of surgical trainees in an attempt to predict operative competence. By and large, demographic information (eg, age, gender),28,29 medical school grades,18,24 and manual dexterity14,29,31,32 fail to correlate with operative competence. It is not surprising that variables such as medical school grades yield no relation to technical skill as these measures likely show little variability. But it also appears that tests of manual dexterity do not necessarily tap into the same constructs used during an operative procedure. More recent work in this area has examined this issue directly.

Back to Top | Article Outline

Visuospatial and Psychomotor Predictors

The relative importance of visuospatial and psychomotor skills has been studied in oral, plastic, and reconstructive surgery.40,41 The most striking findings of these studies were the correlations between scores on higher-level visuospatial tests (ie, the ability to mentally manipulate and rotate complex three-dimensional objects) and efficiency of hand motion during surgical procedures, with correlations ranging from 0.40 to 0.58. Surprisingly, manual dexterity did not correlate with efficiency of hand motion in these studies, suggesting that efficient hand motion and manual dexterity are separate constructs. A postmortem analysis of these results led to a detailed examination of the test of psychomotor ability, the Crawford small parts dexterity test.5 While this test was selected because it had the best validity data of those available in the literature, a closer examination revealed the individual tasks of inserting screws into a metal plate as quickly as possible and using small tweezers to place pins and collars into a series of holes in a metal plate did not necessarily relate to the amount of hand motion. Both of these tasks require fine motor control and precise hand movements, on the order of 1 to 2 cm, with occasional gross movements to pick up screws or pins but very little planning or visualization. In contrast, the hand-motion measurement system counts only the fairly gross movements, on the order of 5 to 20 cm. Thus, efficient hand motion may be more closely related to planning and preoperative visualization than precise motor control during any subtask, calling into question the importance of a steady hand in surgery. The authors proposed overall operative competence is not necessarily determined by manual dexterity on a fine scale but rather on the ability to plan and organize a series of relatively gross movements. In other words, the predictor chosen for this study did not match the surgical construct; it would have been better to have used a measure of manual dexterity requiring larger hand movements.

Back to Top | Article Outline

Predicting Initial versus Eventual Competence

Wanzel et al37 found residents' scores on selected visuospatial tests correlated strongly with competence on the Z-plasty. However, after 10 minutes of supervised practice and feedback, participants were retested on the Z-plasty and those with low visuospatial test scores performed as well as the higher-scoring group. This suggests, at most, that the learning curves might be different. It appears individuals with low visuospatial test scores may have more difficulty in performing a spatially complex surgical task initially but can learn with minimal practice and training. In a followup study, expert craniofacial surgeons who perform spatially complex surgical tasks on a regular basis were found to have visuospatial test scores and manual dexterity around the norm,38 suggesting their expertise is related less to complex spatial abilities or manual skills than repeated practice under the carefully controlled conditions of training during residency and fellowship. It is conceivable that intimate knowledge of and experience with surgical tasks, when combined with competent intraoperative judgment, may overshadow any advantage afforded by superior visuospatial ability. The ultimate skill may not be related to the same mechanisms that mediate the initial performance. For novices, innate abilities may help in acquiring technical skills, while for experts, experience alone may determine the acquisition of technical skills independent of, or perhaps in spite of, innate abilities.

Back to Top | Article Outline

Predictors Related to Orthopaedic Surgery

An example more relevant to orthopaedic surgery is one in which motor control is related to a bone drilling procedure. Dubrowski and Backstein9 found experts were able to anticipate bone density during a drilling task while novices relied more on sensory cues, resulting in a greater plunging effect when exiting the bone on the other side. This study is a good illustration of the very clear and direct correspondence between a construct in the surgical domain (motor control during drilling) and the performance metric developed to assess it (hand motion). It also illustrates well the importance of knowledge and experience over sensory and motor abilities. This study provides a novel way of uncovering an essential construct in orthopaedic surgery, namely, the detailed understanding of bone density differences across layers and cortices of bone. It is this understanding by the experts that allowed for their anticipatory response. In this case, the implication for predicting operative competence is a simple test of knowledge of bone density layers might be able to predict the hand motion response, and thus, the amount of plunging seen in the clinical setting.

Back to Top | Article Outline


Several areas of surgery and surgical training appear to be relevant to research in spatial vision and psychomotor behavior. For example, all operative tasks clearly involve visual processing, such as shape and texture discrimination. However, some tasks involve higher-level visuospatial processes, such as the ability to visualize or imagine an end product before a procedure is initiated (eg, craniofacial reconstructive surgery). Given the distinction between fine motor control and the smooth flow of operative procedures, there is also a need to explore the psychomotor aspects of surgical training in more detail.

Through a series of interdisciplinary discussions and explorations (ie, among surgeons, engineers, and psychologists), some progress has been made with respect to surgical planning (preoperative visualization),37,38 identifying tissue structures in noise, recovering threedimensional structures from two dimensions,30 and psychomotor aspects of manual dexterity.15,38

This approach has limitations because there has been little research on predictors of technical skill in orthopaedic surgery. Of the work that has been done in other areas of surgery, it appears connections can be made between operative competence and predictors such as visuospatial ability. The trick has been to analyze the surgical procedure in detail to identify aspects of performance that may relate to visuospatial ability by intuition or by appeal to theoretical analysis. Still, it is surprising how many studies of predictors are conducted without regard for making a priori connections between the surgical procedure of interest and the potential predictor variables.

There is also little research on the cognitive aspects of surgery in resident selection, for example deciding when and whether to operate. Perhaps it would be more helpful to improve measures for some of the constructs that are addressed in the resident selection process but have little evidence for reliability and predictive validity, such as motivation for the field and ability to learn and adapt. Other areas of competence that need to be monitored and assessed include the non-technical skills, such as cognitive decision-making, communication or professionalism. It is also becoming obvious there is a need to develop more and better measures of outcome during residency training. Certainly, the field would benefit from more interdisciplinary collaboration between testing experts and surgeons. Interestingly, psychomotor ability, as assessed by the Crawford small parts dexterity test, does not seem to be a good predictor of operative competence. It would be interesting to see if a validated test could be found that correlates with the psychomotor constructs unique to orthopaedic surgery.

Appropriate selection and high-quality training of aspiring surgeons is critical to the profession. In a domain that is often compared with surgery, the aviation industry has been successful in using aptitude tests for selecting pilots and predicting their success.4 However, the evidence for predicting operative competence from innate aptitudes of surgical applicants remains mixed, with some support for predicting operative competence among novices alone.37,38 Practically, the implication is that it may be possible to predict those novices who need extra training with the caveat that the predictors do not appear to contribute significantly to ultimate success in surgery.

Back to Top | Article Outline


The author thanks Patricia B. Mullan for comments on an earlier draft, participants of the 2005 Brighton/ABJS Workshop on Orthopaedic Education for a lively discussion, and most of all the surgery residents at the University of Toronto.

Back to Top | Article Outline


1. Anastakis DJ, Hamstra SJ, Matsumoto ED. Visual-spatial abilities in surgical training. Am J Surg. 2000;179:469-471.
2. Backstein D, Agnidis Z, Regehr G, Reznick RK. The effectiveness of video feedback in the acquisition of orthopedic technical skills. Am J Surg. 2004;187:427-432.
3. Bann S, Davis IM, Moorthy K, Munz Y, Hernandez J, Khan M, Datta V, Darzi A. The reliability of multiple objective measures of surgery and the role of human performance. Am J Surg. 2005;189:747-752.
4. Bell JA. Royal Air Force selection procedures. Ann R Coll Surg Engl. 1998;70:270-275.
5. Crawford JE, Crawford DM. Small Parts Dexterity Test. New York, NY: The Psychological Corporation; 1981.
6. Dath D, Regehr G, Birch D, Schlachta C, Poulin E, Mamazza J, Reznick R, MacRae HM. Toward reliable operative assessment: the reliability and feasibility of videotaped assessment of laparoscopic technical skills. Surg Endosc. 2004;18:1800-1804.
7. Datta V, Mackay S, Mandalia M, Darzi A. The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model. J Am Coll Surg. 2001;193:479-485.
8. Dauphinee WD. Assessing clinical performance: where do we stand and what might we expect? JAMA. 1995;274:741-743.
9. Dubrowski A, Backstein D. The contributions of kinesiology to surgical education. J Bone Joint Surg Am. 2004;86:2778-2781.
10. Dubrowski A, Sidhu RS, Park J, Brydges R, MacRae H. Challenging the optimal challenge point model. J Sport Exercise Psych. 2004;26:S65-S66.
11. Edenborough R. Using Psychometrics. London: Kogan Page; 2000.
12. Faulkner H, Regehr G, Martin J, Reznick RK. Validation of an objective structured assessment of technical skill for surgical residents. Acad Med. 1996;71:1363-1365.
13. Felsher JJ, Olesevich M, Farres H, Rosen M, Fanning A, Dunkin BJ, Marks JM. Validation of a flexible endoscopy simulator. Am J Surg. 2005;189:497-500.
14. Francis NK, Hanna GB, Cresswell AB, Carter FJ, Cuschieri A. The performance of master surgeons on standard aptitude testing. Am J Surg. 2001;182:30-33.
15. Grober ED, Hamstra SJ, Wanzel KR, Reznick RK, Matsumoto ED, Sidhu RS, Jarvi KA. Validation of novel and objective measures of microsurgical skill: hand-motion analysis and stereoscopic visual acuity. Microsurgery. 2003;23:317-322.
16. Grober ED, Hamstra SJ, Wanzel KR, Reznick RK, Matsumoto ED, Sidhu RS, Jarvi KA. The educational impact of bench model fidelity on the acquisition of technical skill: the use of clinically relevant outcome measures. Ann Surg. 2004;240:374-381.
17. Kaufman HH, Wiegand RL, Tunick RH. Teaching surgeons to operate-principles of psychomotor skills training. Acta Neurochir (Wien). 1987;87:1-7.
18. Keck JW, Arnold L, Willoughby L, Calkins V. Efficacy of cognitive/noncognitive measures in predicting resident-physician performance. J Med Educ. 1979;54:759-765.
19. Liu P, Miller H, Herr G, Hardy C, Sivarajan M, Willenkin R. Videotape reliability: a method of evaluation of a clinical performance examination. J Med Educ. 1980;55:713-715.
20. Martin J, Regehr G, Reznick RK, MacRae H, Murnaghan J, Hutchison C, Brown M. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997;84:273-278.
21. Maxim BR, Dielman TE. Dimensionality, internal consistency and interrater reliability of clinical performance ratings. Med Educ. 1987;21:130-137.
22. Messick S. Validity of psychological assessment. Am Psych. 1995; 50:741-749.
23. Nunnally JC, Bernstein I. Psychometric Theory, 3rd Ed. New York: McGraw-Hill; 1994.
24. Papp KK, Polk HCJ, Richardson JD. The relationship between criteria used to select residents and performance during residency. Am J Surg. 1997;173:326-329.
25. Regehr G, MacRae H, Reznick RK, Szalay D. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med. 1998;73:993-997.
26. Reznick RK. Teaching and testing technical skills. Am J Surg. 1993; 165:358-361.
27. Reznick R, Regehr G, MacRae H, Martin J, McCulloch W. Testing technical skill via an innovative “bench station” examination. Am J Surg. 1997;173:226-230.
28. Risucci D, Geiss A, Gellman L, Pinard B, Rosser J. Surgeon specific factors in the acquisition of laparoscopic surgical skills. Am J Surg. 2001;181:289-293.
29. Schueneman AL, Pickleman J, Freeark RJ. Age, gender, lateral dominance, and prediction of operative skill among general surgery residents. Surgery. 1985;98:506-515.
30. Sidhu RS, Tompa D, Jang RW, Grober ED, Johnston KW, Reznick RK, Hamstra SJ. Interpretation of three-dimensional structure from two-dimensional endovascular images: implications for educators in vascular surgery. J Vasc Surg. 2004;39:1305-1311.
31. Squire D, Giachino AA, Profitt AW. Heaney C. Objective comparison of manual dexterity in physicians and surgeons. Can J Surg. 1989;32:467-470.
32. Steele RJ, Walder C. Herbert M. Psychomotor testing and the ability to perform an anastomosis in junior surgical trainees. Br J Surg. 1992;79:1065-1067.
33. Streiner DL. Global rating scales. In: Neufeld VR, Norman GR, eds. Assessing Clinical Competence. New York, NY: Springer; 1985:119-141.
34. Sweet R, Kowalewski T, Oppenheimer P, Weghorst S, Satava R. Face, content and construct validity of the University of Washington virtual reality transurethral prostate resection trainer. J Urol. 2004; 172:1953-1957.
35. Szalay D, MacRae H, Regehr G, Reznick R. Using operative outcome to assess technical skill. Am J Surg. 2000;180:234-237.
36. Van Sickle KR, McClusky DA 3rd, Gallagher AG, Smith CD. Construct validation of the ProMIS simulator using a novel laparoscopic suturing task. Surg Endosc. 2005;19:1227-1231.
37. Wanzel KR, Hamstra SJ, Anastakis DJ, Matsumoto ED, Cusimano MD. Effect of visual-spatial ability on learning of spatially-complex surgical skills. Lancet. 2002;359:230-231.
38. Wanzel KR, Hamstra SJ, Caminiti MF, Anastakis DJ, Grober ED, Reznick RK. Visual-spatial ability correlates with efficiency of hand motion and successful surgical performance. Surgery. 2003; 134:750-757.
39. Wanzel KR, Ward M, Reznick RK. Teaching the surgical craft: from selection to certification. Curr Probl Surg. 2002;39:583-659.
40. Watts J, Feldman W. Assessment of technical skills. In: Neufeld VR, Norman GR, eds. Assessing Clinical Competence. 1985; New York, NY: Springer; 1985:259-274.
41. Winckel CP, Reznick RK, Cohen R, Taylor B. Reliability and construct-validity of a structured technical skills assessment form. Am J Surg. 1994;167:423-427.
© 2006 Lippincott Williams & Wilkins, Inc.