The goal of a properly designed surgery clerkship is to support the individual aspirations of all students, and to provide the basic surgical training necessary for all physicians.1 It is important for all graduating physicians to possess a basic level of skill for the performance of technical procedures. It is therefore important that the students leave their undergraduate surgical rotations with appropriate levels of technical skill regardless of their anticipated practice paths. Many clinical nonsurgical fields have become more invasive and procedure-oriented, making use of technical skill on a regular basis.2 Residents in nonsurgical fields are therefore in need of some of the same technical skill training as surgical residents. Ideally the best time to acquire basic technical skills is in the clerkship. However, little information is available showing whether or not under-graduate surgical rotations prepare students sufficiently to meet technical skills objectives. In order to assess whether technical skills are developed sufficiently in clerkship, a reliable and valid evaluation of these technical skills is required. Technical skill evaluation has been studied in postgraduate trainees.
The Objective Structured Assessment of Technical Skill (OSATS) was developed for the appraisal of technical skill in surgical trainees.3,4 The OSATS platform consisted of surgical trainees rotating among skill stations while being evaluated by surgeons using checklists and global rating scales. This examination has been demonstrated to be statistically reliable while satisfying many important validity issues.5 A similar evaluation tool has been developed for the assessment of technical skill in family practice residents. This Structured Assessment of Minor Surgical Skill (SAMSS) has been shown to have moderate to high reliability and show evidence of construct validity.6 The skills tested were different between the two examinations, reflecting the specific task and skill expectations of the surgical and family practice residents. Since many of the procedures that clinical clerks should be performing are considered minor surgical procedures, the SAMSS format may be directly applicable to this group of trainees. However, since medical students differ from residents due to their heterogeneous career aspirations, aptitudes, and skill expectations, it is important to demonstrate that the SAMSS is a valid and reliable assessment of technical skill in clerkship prior to generalizing the SAMSS to the medical student arena.
Of the 172 clinical clerks at the University of Ottawa (U of O) in 2001, 31 (14 third year, 17 fourth year) were randomly selected for the study. The randomization procedure stratified for the three hospital-based sites in the U of O program. Eleven junior surgical residents participated in this study as a validation group. These numbers were based on the validation analysis that would compare the groups at the three levels of training. Anticipating that level of training will account for at least 25% of the variance in scores,6 we would anticipate a minimum effect size of 1.15 standard deviations. Utilizing an alpha error of .05, a beta error of .20, and an expected effect size of 1.15, 12 subjects per group were required to attain the desired power. We attempted to increase our n in the test group for the purpose of increasing our power for subsequent exploratory analysis. This was partly successful, since due to scheduling issues recruiting fourth-year clinical clerks was more successful than recruiting third-year clinical clerks.
A total of 21 surgeons and three anesthetists served as examiners, and were blinded to the medical students' levels of training. An eight-station, two-hour OSCE-style examination was conducted in which participants rotated from station to station. Each examination unit or “station” consisted of a bench-model simulation representative of the surgical task being tested. The eight stations for this examination included: excision of skin lesion; closure of skin incision; excision of sebaceous cyst; incision and drainage of abscess; venipuncture, endotracheal intubation; insertion of a chest tube; and insertion of a Foley catheter. These procedures were chosen based on studies that have been done identifying which procedures graduating physicians should be competent to perform.7,8,9,10
Participant performance was assessed using two scoring instruments, a task-specific checklist and a global rating scale. The checklists consisted of between nine and 21 items, depending on the task. The global rating scales consisted of seven domains, including respect for tissue, time and motion, instrument handling, knowledge of instruments, flow of operation, knowledge of specific procedures, and knowledge of universal precautions. The range was from 1 to 5.
To accommodate 42 participants the examination was run twice. Iterations consisted of three tracks of eight stations. Each station was allotted 15 minutes for a total testing time of two hours. Third-and fourth-year medical students as well as residents from the surgical validation group were scattered randomly throughout all tracks and iterations with approximately two to three surgery residents per group of eight candidates. With three concurrent runs of eight stations, there were 24 concurrent stations being examined with one expert at each station. The three anesthetists observed the endotracheal intubation station.
All of the participant's checklist scores and global ratings for a particular station were converted into percentages for analysis. Inter-station reliability was analyzed across the eight stations of the examination using Cronbach's alpha and was accomplished in two phases. The first phase involved analysis of all participants. The second phase made use of clinical clerks' scores, since this was the target population for which the examination was designed.
An ANOVA was computed separately for the checklist and global rating scale scores to analyze the differences in mean scores of the third-year medical students, the fourth-year medical students, and the junior surgery residents. Post-hoc analyses were performed using a Tukey's HSD test to compare scores between the third- and fourth-year medical students and the surgery residents.
Interstation reliability scores are reported separately for all candidates and then for medical students. The reliabilities when all candidates were used were .78 and .82 for checklist and global rating scales, respectively. When medical students only were used, the reliabilities decreased to .71 and .65 for checklist and global rating scales, respectively.
As shown in Figure 1, which contains the mean checklist scores for the stations, the mean scores varied as a function of training level, with the mean score for third-year students being the lowest (64%), followed by scores for the fourth-year students (79%), with the residents (PGY) scoring the highest (89%). The ANOVA indicated a significant main effect of training level, F(2,39) = 29.56, MSE = 530, p < .001, with post-hoc analysis revealing significant differences between third- and fourth-year students (p < .001) and between fourth-year students and residents (p < .01).
Figure 2 displays the mean global rating for each station as a function of training level. Again, the mean ratings varied as a function of training level, with the mean rating for third-year students being the lowest (57%), followed by fourth-year students (69%), and residents scoring the highest (87%). ANOVA indicated a significant main effect of training level, F(2,39) = 68.68, MSE = 323, p < .001, with post-hoc comparisons revealing significant differences between third- and fourth-year students (p < .001) and between fourth-year students and the residents (p < .001).
The evaluation of undergraduate technical skill performance during the surgical clerkship is often inconsistent. This may be due to the lack of defined technical objectives or the inability to demonstrate performance as a result of limited exposure to technical procedures. The lack of an objective assessment tool inherently hinders the consistent and systematic evaluation of undergraduate performance and the monitoring of interventions designed to improve performance. The SAMSS examination, however, shows promise for providing objective assessment of technical skill in medical students.
The results of the SAMSS examination for medical students suggest evidence for construct validity. The statistically significant differences in the mean scores between the residents and the medical students endorses the hypothesis that increased surgical training leads to improved performance in the chosen surgical skill tasks on this examination. Further evidence in support of construct validity exists in the statistically significant difference between the performances of the third- and fourth-year medical students. This difference in performances, in light of one year of added training, is in contrast to reported observations of no performance distinction between PGY1 and PGY2 family practice residents when the SAMSS was applied to that group of trainees.6 The distinction between the undergraduate years may be a result of differing volumes of surgical experience. The fourth-year participants had completed all of their clinical rotations at the time of the examination, whereas the third-year students would still be gaining additional surgical skill experience prior to graduation. The performance similarity among the family practice groups may be explained by the fact that family practice residents do not have a formal surgical rotation; therefore, the experience and skill set they acquire during medical school may be close to the sum for their training.
The exam achieved a high level of reliability, indicating that the number and quality of the stations were sufficient for measuring the abilities of the candidates. The reliabilities of the checklist scores and the rating scales dropped when we removed the PGY candidates from the exam, especially for the global rating scale (.78 to .71 for the checklist, .81 to .68 for the global ratings). While the reliabilities for the clerks alone might be considered somewhat lower than would ideally be desired, there is some reason to believe that these lower reliabilities may be an underestimate of the actual reliabilities for clerks. The presence of PGY candidates likely reduced the range of potential scores that could be used to score or rate the medical students. That is, because the residents were so much better than the medical students, their scores and ratings were at the high end of the scales, and thus compressed the potential range of scores for the medical students. This effect would have influenced the rating scale the most because it is more subjective than the checklist. Previous observations of global rating scale scores on OSCE exams11 suggest that more skilled subjects are better evaluated with global rating scales. The converse may also be true, that novices might be better graded with checklists. The inclusion of both measures did not appear to significantly increase the evaluative burden on the examiner. Further research should be performed to determine whether either mode could be employed exclusively without sacrificing reliability or validity. Since construct validity has been demonstrated with the use of the residents they may not be required for further iterations of this examination. This would then allow the medical students access to the entire range of the marking schemes. This may then increase the reliabilities for both checklist and global rating scales.
Finally, given the relatively small sample size, the psychometric results are potentially unstable and should be replicated to ensure that the reliability of the examination is really as high as it would appear from this initial study.
The SAMSS examination has several potential applications. With increasing attention being paid to technical skill training during the undergraduate surgical clerkship, the SAMSS platform could be used for objective analysis of curricular efficacy. Similarly, the SAMSS examination could be used to identify and quantify any effects of educational interventions. Alternatively, the stations themselves could be used for training purposes and provide minor surgical experience with no patient contact. Subsequent iterations should serve to enhance the observed statistical endorsement of construct validity and high reliability, with the potential for a role at the certification level for medical students.
Technical skill training is an important component of the undergraduate training period. The efficacy of undergraduate skill training programs is difficult to assess without an objective measure of technical skill performance. The SAMSS examination is a feasible method for evaluating minor surgical skill performances of medical students at its present levels of validity and reliability.
1. Sachdeva AK. Surgical educators and the contemporary training of generalists. Am J Surg. 1994;167:337–41.
2. Kern SJ, Filipi CJ, Gerhardt JD, Reeves MJ, Wright KM. A new concept for implementation of a required general surgery clerkship. Am J Surg. 1996;172:281–2.
3. Martin JA, Regehr G, Reznick R, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997;84:273–8.
4. Reznick R, Regehr G, MacRae H, Martin J, McCulloch W. Testing technical skill via an innovative “bench station” examination. Am J Surg. 1997;173:226–30.
5. Faulkner H, Regehr G, Martin J, Reznick R. Validation of an objective structured assessment of technical skill for surgical residents. Acad Med. 1996;71:1363–5.
6. Friedlich M, MacRae H, Oandasan I, et al. Structured assessment of minor surgical skills for family medicine residents. Acad Med. 2001;76:1241–6.
7. Folse JR (chair), et al. Prerequisite objectives for graduate surgical education: a study of the graduate medical education committee, American College of Surgeons. J Am Coll Surg. 1998;186(1);50–62.
8. The Joint Working Group on Family Medicine in Surgery. Report of the Post-graduate Family Medicine Education Joint Committee on Residency Training in Family Medicine. 1990:81–97.
9. Morrison JM, Murray TS. Survey of minor surgery in general practice in the west of Scotland. Br J Surg. 1993;80:202–4.
10. Norris TE, Felmar E, Tolleson G. Which procedures should be taught in family practice residency programs? Fam Med. 1997;29:99–104.
11. Cohen R, Rothman AI, Poldre P, Ross J. Validity and generalizability of global ratings in an objective structured clinical examination. Acad Med. 1991;66:545–8.