Objective: To develop an objective structured assessment for evaluating surgical skills of obstetrics and gynecology residents and to evaluate the reliability and validity of the assessment.
Methods: A seven-station, objective, structured assessment of technical skills was administered to 24 residents. The test included laparoscopic procedures (port placement, salpingostomy, suturing, vessel ligation) and open abdominal procedures (hypogastric ligation, repair of enterotomy, salpingo-oophorectomy.) All surgical tasks were done on pigs. Residents were timed and assessed at each station using three methods of scoring, a task-specific checklist, global rating scale, and pass-fail grade.
Results: Assessment of construct validity (the ability of the test to discriminate among residency levels) found significant differences on the checklist and the global rating scale by residency level. Reliability indices calculated with Cronbach's α were 0.89 for the global rating scale and 0.89–0.95 for the individual skills checklists. Interrater reliability was 0.87 for the global rating scale and 0.78–0.98 for the checklists.
Conclusion: Objective, structured assessment of technical skills can assess residents' surgical skills with high reliability and validity. These assessments have possible application for identifying residents who need additional training and might provide a mechanism to ensure competence of surgical skills.
Teaching and evaluating surgical skills are two of the most important tasks of an academic surgeon; however, very little research has been done on them, especially in obstetrics and gynecology. Surgical skills usually are learned while operating on patients under supervision of a preceptor. Evaluation of skills usually is based on subjective assessment, is done at the end of a rotation, and is based on recollection. Surgical evaluations usually do not occur immediately after residents do a procedure. Studies that have evaluated those assessments have shown poor reliability and unknown validity.1 Recent studies have evaluated alternative methods of teaching and assessing surgical skills.2–8
When developing an instrument to evaluate surgical skills, reliability and validity are important.7–10 Reliability is consistency of the examination, ie, the extent to which results are replicated each time the test is given. Interrater reliability is the extent to which the examiners agree on grades for examinees. The closer the correlation is to 1.00 the better the interrater reliability. Validity is the extent to which the test measures the skill it is intended to measure. If a resident does well on an objective assessment of technical skills, it should mean that resident is a competent surgeon, but that is very difficult to measure, so we often settle for proxy measures. Construct validity is the extent to which the test can distinguish residency levels. A valid test would show significant differences in scores between residency years.
The purpose of our study was to develop an objective, structured assessment of technical skills for obstetrics and gynecology residents and to evaluate its reliability, validity, and interrater reliability.