Secondary Logo

Journal Logo

RIME: Assessment of Performance

The Ottawa Surgical Competency Operating Room Evaluation (O-SCORE)

A Tool to Assess Surgical Competence

Gofton, Wade T., MD, MEd, FRCSC; Dudek, Nancy L., MD, MEd, FRCPC; Wood, Timothy J., PhD; Balaa, Fady, MD, MEd, FRCSC; Hamstra, Stanley J., PhD

Author Information
doi: 10.1097/ACM.0b013e3182677805
  • Free

Abstract

The surgical and technical competence of residents has “historically been assessed poorly and continues to receive little attention among the core competencies defined by CanMEDS and the Accreditation Council for Graduate Medical Education (ACGME).”1 At present, national certification boards mainly rely on written and oral examinations to assess competence; however, these examinations do not assess trainees’ ability to perform required procedures.2 Assessing actual performance is the responsibility of residency programs.

Clearly, it is vital that residency programs have tools to accurately evaluate the ability of trainees to perform the surgical procedures relevant to their specialty field. Ideally, a surgical training program would use multiple forms of assessment to determine the competency of a trainee. Clinically oriented evaluation tools such as the Patient Assessment and Management Examination,3 the Objective Structured Assessment of Technical Skills (OSATS),4 and other laboratory-based skill assessments5–7 allow for the evaluation of decision-making and basic technical skills but require additional resources and are logistically challenging to administer. Procedure or case logs, though not new, are increasingly being used by residency programs and the ACGME to assess trainee exposure.8 Such logs are often dependent on accurate completion by the trainee and do not allow for timely feedback and evaluation. The result is a log of procedural performance lacking content validity rather than a record reflecting the trainee’s operative ability.9,10 In essence, just being there does not mean that the trainee is competent to do it himself or herself. An evaluation linked to the logged procedure that assesses trainee performance would improve the value of the procedure log in defining a resident’s surgical competence.

Although many tools are of value in the assessment of technical skill, a means by which to evaluate overall surgical competence remains elusive. Procedure-specific evaluation tools are available, such as the Operative Performance Rating System11 and the Global Operative Assessment of Laparoscopic Skills (GOALS).12–14 Yet, developing and validating procedure-specific assessment instruments for all surgical specialties and required procedures would be prohibitive.

More recently, Doyle et al15 introduced the Global Rating Index for Technical Skills (GRITS), a global rating scale based on OSATS and GOALS. GRITS demonstrated good construct validity and reliability, with the potential to be of practical use across a wide range of operative procedures. However, even though GRITS directed raters to rate trainee performance with reference to how a “trained surgeon would perform,” the researchers observed an end-aversion bias (or central scoring tendency) that demonstrated raters’ reluctance to assign low marks on the tool.15

It is generally believed that experts understand what competent clinical performance entails and can judge both the quality and appropriateness of trainees’ practice.16,17 Therefore, a strong case can be made that only someone with knowledge of a similar scope of practice is qualified to judge competence.18 Surgical educators spend a significant amount of time directly observing their trainees’ cognitive and technical surgical skills in the operating room. Although such observation provides an excellent opportunity for assessment, these OR days usually do not include robust objective evaluation of trainees’ procedural and decision-making skills.

The mini-Clinical Evaluation Exercise (mini-CEX), developed by the American Board of Internal Medicine, was designed to assess internal medicine residents’ clinical assessment skills and to encourage direct observation with immediate evaluation during residency training.19,20 It has been found to have good construct validity and reliability, and it is used by many programs to evaluate several of the ACGME-defined competencies. Using the mini-CEX to assess residents’ evaluation of actual patients under clinical supervision has been found to have a level of reliability similar to structured examinations involving standardized patients.21 This is a significant advantage because such a tool allows measurement of actual daily performance.

The Royal College of Physicians of the United Kingdom has also appreciated the need for competency-based technical assessment tools. Assessment using the Direct Observation of Procedural Skills (DOPS) focuses on procedural rather than clinical skills during one specific patient encounter. The DOPS was designed to assess competency in eight mandatory procedures in the foundation program, a two-year planned program of training and assessment for all doctors graduating from medical school in the United Kingdom.22,23 When the DOPS is used to assess procedures beyond those considered foundation skills, it is often modified to be procedure-specific, which may suggest that the DOPS needs to be customized for different specialties.24

As surgeries vary in complexity, it is expected that trainees will achieve competence for various procedures at different phases of their training. Given that residency programs must determine whether their trainees are competent to perform procedures independently, a tool that measures surgical competence for diverse procedures would have significant benefits for surgical educators. We therefore decided to develop a succinct surgical assessment tool that could be used to evaluate competence on any surgical procedure. On the basis of our review of the literature, we determined that this tool should not ask raters to evaluate performance relative to year of training because this has led to a tendency to avoid assigning low ratings. In this article, we describe the tool we developed—the Ottawa Surgical Competency Operating Room Evaluation (O-SCORE)—and report on its utility in assessing first- through fifth-year surgical residents’ competence in selected procedures. We also explore raters’ use of the O-SCORE rating scale.

Method

Developing the O-SCORE

An expert group of medical educators—four surgeons (W.G. and three others), two evaluation experts (N.D., S.H.), and one psychometric researcher (T.W.)—considered the key features of any surgical procedure and reviewed previously validated evaluation instruments4,12,15,21,25 to identify the essential content required in a tool designed to evaluate competence in a surgical procedure.26 We defined surgical competence as readiness for independent performance of the procedure.

A key aspect of the tool’s design was the development of the rating scale’s descriptive anchors. We decided that the O-SCORE would assess trainee ability with respect to readiness for independent performance of the particular procedure rather than in comparison with a peer group (i.e., it would assess the resident’s ability to do the procedure independently instead of a second-year resident’s ability relative to the ability of other second-year residents). This established an external reference criterion on which to base ratings. In an effort to force raters away from a central scoring tendency, we based the colloquial wording of the anchors on language that surgical educators would typically use to describe their degree of active participation in each key aspect of a procedure. We asked local surgeons to review the wording for clarity and relevance and made revisions based on their feedback.

The pilot version of the O-SCORE was a unique 14-item tool for the assessment of surgical competence on one surgical procedure from start to finish (preoperative plan to postoperative plan). It included 10 items rated on a 5-point scale, 2 yes/no questions, and 2 open-ended questions asking about one specific aspect of the case performed well and one requiring improvement. Anchors in the 5-point scale ranged from 1 = “I had to do” (i.e., trainee required complete hands-on guidance or did not do the procedure) to 5 = “I did not need to be there” (i.e., trainee had complete independence and is practice-ready). We made additional modifications to the tool during the piloting phase, as described below.

Phase 1: Piloting and Refining the O-SCORE

We received approval from the Ottawa Hospital Research Ethics Board (OHREB) to pilot the O-SCORE in the University of Ottawa Division of Orthopaedic Surgery. We selected four index surgical procedures commonly performed by on-call trainees of all levels. Staff surgeons and trainees in the orthopaedic surgery program were invited to participate in the study. Trainees were advised that these assessments would not be used for evaluation purposes and that they would be confidential and blinded for review by the primary research group. Participation was voluntary; residents received a small honorarium for the completed evaluation forms they returned. We held training sessions for raters and trainees in an effort to promote a cultural shift in rating and to encourage use of the entire scale. During the four-month data collection period (October 2008 to January 2009), participating residents asked their supervisors to evaluate them using the 14-item pilot tool. We conducted a psychometric analysis of the tool to evaluate its internal structure26; this included descriptive statistics, correlations across items, a generalizability analysis, and comparisons across postgraduate years (PGYs) of training.

At the completion of the data collection period, we held two focus groups (one for staff surgeons, one for residents) to assess the response process26 and feasibility from the rater and trainee perspectives. We reviewed focus group transcripts for common themes regarding the tool and its use.

On the basis of the results, we refined the O-SCORE to include 11 items (8 items rated on the 5-point competency scale described above, 1 yes/no question about competency to perform the procedure independently, and the 2 open-ended questions for feedback). See Appendix 1 for the 11-item O-SCORE.

Appendix 1
Appendix 1:
The Ottawa Surgical Competency Operating Room Evaluation (O-SCORE)

Phase 2: Testing the O-SCORE across specialties

We received approval from OHREB to study the O-SCORE in the University of Ottawa Orthopaedic and General Surgery residency programs. Participants were enrolled in a similar fashion as in the piloting phase, and participation remained voluntary in both specialties. All of the orthopaedic raters and eight of the orthopaedic trainees had been involved in the piloting phase. The general surgery raters and trainees had no previous experience with the O-SCORE. We held introductory sessions for raters and trainees within both specialties to familiarize them with the tool and use of the rating scale.

During a four-month period (January 2011–May 2011), the O-SCORE was used to assess participating orthopaedic residents’ performance on six common procedures in which trainees at all levels would routinely participate: open reduction and internal fixation (ORIF) of a wrist, ORIF of a hip, ORIF of an ankle, hip hemiarthroplasty, total hip arthroplasty, and knee arthroscopy. During this same period, the O-SCORE was also used to assess participating general surgery residents’ performance on five common procedures in which trainees at all levels would be involved: hernia repair, laparoscopic appendectomy, laparoscopic cholecystectomy, emergency laparotomy, and axillary node dissection. We asked staff surgeons to use the tool to assess enrolled trainees when these trainees participated in one of the selected procedures.

Our quantitative analysis focused on validating the scores on the O-SCORE. We determined the psychometric characteristics of the O-SCORE by considering the descriptive statistics and the correlation between ratings on the items. In addition, we determined the reliability of the scale by using a generalizability analysis. We conducted an analysis of variance on the ratings using G_String27 and urGENOVA28 to generate variance components. Surgery specialty (orthopaedic surgery, general surgery) was treated as a grouping factor and crossed with items. Observations (an assessment of a procedure by a rater) were nested within postgraduate trainees, and postgraduate trainees were nested within surgery specialty. It should be noted that for observations, the raters and the procedures were confounded with one another, and therefore the separate effect of each cannot be determined. In addition, for some trainees, the same rater provided a rating on multiple similar procedures as well as on different procedures, whereas for other trainees, completely independent raters provided each rating. For purposes of this analysis, we assumed that each rating for an observation within a trainee was independent of every other rating.

Additional validity evidence was provided by comparing the performance of different factors on the O-SCORE. For each procedure, we created a total procedure score for each trainee by determining the average rating across the eight scaled-response items. We then used these total procedure scores in a series of factorial ANOVAs to study the effect of surgery type and whether residents were deemed ready to perform the procedure independently, surgery type and the influence of PGY level, and surgery type and the complexity of the procedure.

We gave all participants the opportunity to provide written feedback on the clarity and utility of the scale. We held focus groups for the trainees to explore their thoughts with regard to the use of the scale.

Results

Phase 1: Piloting and refining the O-SCORE

Eleven surgeons used the pilot version of the O-SCORE to evaluate 20 orthopaedic residents’ performance in 72 procedures. Nine observations were excluded because of incomplete O-SCOREs, leaving 63 complete observations for an average of 3.15 observations per resident. The generalizability analysis demonstrated high reliability (0.82) as well as high correlations between items (range = 0.68–0.86). We found a significant effect of PGY level: The total procedure score differed between all levels except for PGYs 4 and 5.27 Other analyses found no differences in terms of the four procedures that were evaluated.

Our qualitative analysis of focus group data indicated that surgeons found the rating scale easy to use. Trainees felt that the tool helped define important aspects of the case and also improved the amount and quality of feedback they received.29

Phase 2: Testing the O-SCORE across specialties (internal structure)

Overall, 34 surgeons used the refined, 11-item O-SCORE to assess 37 residents’ performance in 163 procedures, for an average of 4.41 observations per resident (range = 1–25 observations per resident). In the orthopaedic surgery program, 19 surgeons completed 116 assessments of 22 trainees (average = 5.27 observations per resident), whereas in the general surgery program, 15 surgeons completed 47 assessments of 15 trainees (average = 3.13 observations per resident).

Table 1 displays the descriptive statistics for each of the eight items rated on the five-point scale. The mean rating for each item was relatively high; however, there was a range for each item, suggesting that raters were willing to provide lower ratings. The table also displays the corrected item–total correlations, which were moderate to high, indicating that ratings on some of the items were highly correlated with one another. For the yes/no item that asked about the trainee’s readiness to safely perform the procedure independently, the distribution of scores was roughly equal: 85 (52%) of the 163 procedures or observations were marked as “yes,” and 78 (48%) were marked as “no.”

Table 1
Table 1:
Descriptive Statistics for the Ottawa Surgical Competency Operating Room Evaluation (O-SCORE)*

Reliability

Table 2 displays the variance components associated with the administration of the O-SCORE. As shown in this model, differences between the trainees and the observations within each trainee accounted for the largest proportion of variability in the item ratings, indicating that there were differences across the trainees as a function of the observations even though the same rater sometimes rated a trainee more than once. Variability attributed to the O-SCORE’s eight rated items accounted for a smaller proportion of the variance, suggesting that there were some differences across the items despite the high correlations we found (Table 1). Surgery specialty accounted for very little of the variability in the item ratings, suggesting that the ratings across specialties were similar. To confirm this interpretation, we ran a second model without using surgery specialty as a factor, and the results in terms of the proportion of variance accounted for by each facet were virtually identical to those of the first model. With these eight items, assuming 4.41 O-SCORE observations per trainee, the resulting g-coefficient was 0.80. Therefore, it would take at least 5 O-SCORE observations per trainee to produce a g-coefficient of 0.80.

Table 2
Table 2:
Ottawa Surgical Competency Operating Room Evaluation (O-SCORE) Variance Components

Validity and other analyses

For all subsequent analyses, there were no significant main effects or interactions involving surgery type (orthopaedic surgery versus general surgery). Therefore, surgery type was collapsed to increase power. There was a significant difference (F[1,161] = 124.21, P < .001, η2p = 0.44) in mean total procedure scores for observations in which trainees were deemed not ready to safely perform the procedure independently (3.58 ± 0.61; n = 84 observations) and those in which trainees were deemed ready to safely perform the procedures on their own (4.54 ± 0.48; n = 79 observations). The correlation between the response on this item and the total procedure score was 0.66 (P < .001), indicating a moderately high relationship between the total procedure score and the item about independent performance.

Mean total procedure scores on the observations also increased with PGY levels (F[2,160] = 19.85, P < .001, η2p = 0.19). Post hoc t tests indicated that total scores on the 44 procedures done by PGY 1 and PGY 2 residents (3.57 ± 0.70; n = 15 PGY 1 and 29 PGY 2 procedures) were significantly lower (P < .001) than those on the 61 procedures done by PGY 3 residents (4.08 ± 0.61). Further, total procedure scores for the 61 procedures done by PGY 3 residents were significantly lower (P < .02) than those for the 58 procedures done by PGY 4 and PGY 5 residents (4.38 ± 0.68; n = 33 PGY 4 and 25 PGY 5 procedures). There were no differences in total procedure ratings as a function of procedure complexity (high = 4.09 ± 0.69, n = 29; medium = 4.06 ± 0.73, n = 95; low = 4.00 ± 0.38, n = 39; F < 1).

Analysis of the qualitative data indicated that the rating scale was practical and useful for staff and residents. Importantly, residents indicated that they were accepting of low scores on these competency-based assessments. They commented that the O-SCORE assessments made it clear to them which areas of their surgical performance they needed to improve to become competent to do a particular procedure independently. Staff surgeons indicated that the tool was easy to use and that the rating anchors made it easier for them to evaluate trainees because they did not have to decide whether trainee performance on that specific case represented below-average, average, or above-average performance for a particular PGY level. They also reported that the colloquial anchors simplified the rating process for them in that the wording closely reflected how they considered trainee performance and readiness for greater surgical independence.

Discussion

More formal documentation of surgical trainees’ competence in performing their specialties’ required procedural skills is needed. We therefore developed a succinct surgical evaluation tool to capitalize on the opportunity for assessment afforded by direct OR supervision. On the basis of our review of the literature, we decided that the tool needed to be generalizable to any surgical procedure to simplify use for raters and facilitate its incorporation into procedure logs. We also decided that the rating system should be based on the trainee’s readiness for safe independent performance to avoid central rating tendencies. We suggest that our developed tool—the O-SCORE—represents an advance in the evaluation of actual surgical performance. Evidence to support the validity of the scores was collected from a variety of sources, as described in this article’s results section.

Tools like the mini-CEX have demonstrated that assessing trainee performance with real patients using multiple evaluations by multiple raters provides a level of reliability similar to that of structured examinations21 and allows programs to identify competent residents and those in need of remediation. Surgical residency programs need similar tools. Unfortunately, studies of surgical assessment tools have shown raters to have a tendency to avoid low scores, resulting in an end-aversion bias with many rating scales.30 Our results, in contrast, demonstrate that raters were willing to use the O-SCORE’s entire rating scale for each item. We believe that setting “able to safely perform this procedure independently” as our standard of competence—rather than requesting an evaluation of how trainees performed relative to others in their year of training—enabled surgeons to assess trainees more accurately. In the focus groups, raters indicated that the “colloquial” anchors (e.g., “I needed to be in the room just in case”) reflected how they determined a trainee’s level of independence, supporting the response process and providing evidence for validity of the O-SCORE. We also believe that raters were willing to use the whole scale because the stakes of this procedural evaluation were low, similar to the mini-CEX.19,20 In the follow-up focus group, we found that trainees were not overly concerned with receiving low scores but appreciated the increase in immediate and honest feedback that the O-SCORE encouraged.

The O-SCORE demonstrated a high degree of correlation between the eight scale-rated items and the yes/no item about competence for independent practice, supporting internal construct validity. Therefore, why not just ask the simple question, “Is the resident safely able to perform this procedure independently?” Although this question provides an important end point, we believe the responses to the other questions provide value to both the trainee31,32 and the rater. The eight scale-rated items highlight for the trainee the aspects of the procedure that an expert considers to be important and aid the rater in identifying areas for trainee improvement. The two open-ended questions allow the rater to give the trainee specific positive feedback as well as point out a particular area for improvement.

The O-SCORE was able to accurately differentiate among senior (PGYs 4 and 5), midlevel (PGY 3), and junior trainees (PGYs 1 and 2). With many procedures, a trend for improvement in performance could be seen with an increase in PGY level. However, with a competency-based scoring system, one may anticipate that this trend could be less evident with a relatively straightforward procedure because competence to perform independently might be achieved by most PGY 3s, resulting in a ceiling effect. With more complex procedures, one may anticipate a rising trend across years.

A cumulative sum score (CUSUM) is a type of time-trend analysis that has been used for quality control in industry and, more recently, in several clinical procedures.33–37 It can be helpful in monitoring progress, even in the early stages of learning, but it depends on the ability to define a binary outcome (i.e., successful competent performance). Surgical time or blood loss have been used as potential end points, but they are less-than-ideal markers of a successful outcome. We suggest that the yes/no item on the O-SCORE, “Resident is able to safely perform this procedure independently,” may provide a better end point and thereby allow for a more accurate assessment of progression and competency.

Our study has limitations. The O-SCORE is a form of direct observation without specific criteria, which many would argue introduces a degree of unreliability or differences in opinion among surgeons. However, it is generally agreed that experts can identify expertise when they see it,16,17 and the rating scale retains some criteria to mitigate this effect. Multiple assessments by multiple observers will also minimize this effect, as supported by the results of research on the mini-CEX.21 However, additional work is needed to further evaluate the O-SCORE’s interobserver variability.

Another limitation of this study is that the raters were unblinded. Because of the O-SCORE’s reliance on an expert’s assessment of a trainee’s competence and the length of time of each observation (an entire procedure), blinding the raters was logistically impossible. Notably, there was a range of scoring within PGYs, suggesting that the trainees were not scored according to year of training, even by unblinded viewers. Finally, at present we have only provided support for validity of the O-SCORE in two surgical specialties (orthopaedic surgery and general surgery) at a single institution.

The results of our study suggest that the O-SCORE may provide an objective and reliable measure of perioperative decision making and procedural competency. Completing several O-SCORE assessments during a trainee’s rotation could provide a clinical supervisor with more objective evidence to support the rating that he or she assigns the trainee’s technical skills on the end-of-rotation faculty evaluation report. The O-SCORE could also be of value in demonstrating trainee competence in specialty-defined core procedures by providing evidence for the residency training committee’s recommendations about which residents are ready to complete their training and start practicing independently.

Acknowledgments: The authors wish to thank Drs. Stephen Papp, Garth Johnson, and Allan Liew for their participation in the development of the O-SCORE. They would also like to thank Greg Hansen (pilot phase) and Shay Seth (phase 2) and Julia Foxall (pilot phase and phase 2) for their participation in data collection as well as the residents and faculty of the Divisions of Orthopaedic and General Surgery at the University of Ottawa.

Funding/Support: The pilot study was supported by the Royal College of Physicians and Surgeons of Canada. Assessment and comparison across specialties was supported by Physician Services Incorporated Ontario and the Department of Surgery at the University of Ottawa.

Ethical approval: Ethics approval for the pilot study and subsequent comparison across specialties was received from the Ottawa Hospital Research Ethics Board.

Previous presentations: An abstract of the pilot results was presented at the Canadian Conference on Medical Education, St. John’s, Newfoundland, Canada, May 2010.

References

1. Sidhu RS, Grober ED, Musselman LJ, Reznick RK. Assessing competency in surgery: Where to begin? Surgery. 2004;135:6–20
2. Savoldelli GL, Naik VN, Joo HS, et al. Evaluation of patient simulator performance as an adjunct to the oral examination for senior anesthesia residents. Anesthesiology. 2006;104:475–481
3. MacRae HM, Cohen R, Regehr G, Reznick R, Burnstein M. A new assessment tool: The patient assessment and management examination. Surgery. 1997;122:335–343
4. Martin JA, Regehr G, Reznick R, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997;84:273–278
5. Mackay S, Datta V, Chang A, Shah J, Kneebone R, Darzi A. Multiple objective measures of skill (MOMS): A new approach to the assessment of technical ability in surgical trainees. Ann Surg. 2003;238:291–300
6. Datta V, Mackay S, Mandalia M, Darzi A. The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model. J Am Coll Surg. 2001;193:479–485
7. Smith SG, Torkington J, Brown TJ, Taffinder NJ, Darzi A. Motion analysis. Surg Endosc. 2002;16:640–645
8. Accreditation Council for Graduate Medical Education. . . Surgery policy information: ACGME case log system. http://www.acgme.org/acWebsite/RRC_440/440_policyArchive.asp. Accessed June 27, 2012
9. Reznick RK. Teaching and testing technical skills. Am J Surg. 1993;165:358–361
10. Cuschieri A, Francis N, Crosby J, Hanna GB. What do master surgeons think of surgical competence and revalidation? Am J Surg. 2001;182:110–116
11. Larson JL, Williams RG, Ketchum J, Boehler ML, Dunnington GL. Feasibility, reliability and validity of an operative performance rating system for evaluating surgery residents. Surgery. 2005;138:640–647
12. Vassiliou MC, Feldman LS, Andrew CG, et al. A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg. 2005;190:107–113
13. Kurashima Y, Feldman LS, Al-Sabah S, Kaneva PA, Fried GM, Vassiliou MC. A tool for training and evaluation of laparoscopic inguinal hernia repair: The Global Operative Assessment of Laparoscopic Skills–Groin Hernia (GOALS-GH). Am J Surg. 2011;201:54–61
14. Vaillancourt M, Ghaderi I, Kaneva P, et al. GOALS–incisional hernia: A valid assessment of simulated laparoscopic incisional hernia repair. Surg Innov. 2011;18:48–54
15. Doyle JD, Webber EM, Sidhu RS. A universal global rating scale for the evaluation of technical skills in the operating room. Am J Surg. 2007;193:551–555
16. Mahara MS. A perspective on clinical evaluation in nursing education. J Adv Nurs. 1998;28:1339–1346
17. Girot EA. Assessment of competence in clinical practice—A review of the literature. Nurse Educ Today. 1993;13:83–90
18. Klass D. Assessing doctors at work—Progress and challenges. N Engl J Med. 2007;356:414–415
19. Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): A preliminary investigation. Ann Intern Med. 1995;123:795–799
20. Holmboe ES, Huot S, Chung J, Norcini J, Hawkins RE. Construct validity of the miniclinical evaluation exercise (miniCEX). Acad Med. 2003;78:826–830
21. Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: A method for assessing clinical skills. Ann Intern Med. 2003;138:476–481
22. Carr S. The Foundation Programme assessment tools: An opportunity to enhance feedback to trainees? Postgrad Med J. 2006;82:576–579
23. . Royal College of Physicians. Foundation doctors: FY1. http://www.rcplondon.ac.uk/medical-careers/foundation-doctors/f1. Accessed June 27, 2012.
24. Griffiths CE. Competency assessment of dermatology trainees in the UK. Clin Exp Dermatol. 2004;29:571–575
25. Reznick R, Regehr G, MacRae H, Martin J, McCulloch W. Testing technical skill via an innovative “bench station” examination. Am J Surg. 1997;173:226–230
26. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: Theory and application. Am J Med. 2006;119:166.e7–166.e16
27. McMaster University, Faculty of Health Sciences, Program for Educational Research and Development. G_String. http://fhsperd.mcmaster.ca/g_string/index.html. Accessed June 18, 2012
28. University of Iowa, College of Education, Center for Advanced Studies in Measurement and Assessment. urGENOVA. http://www.education.uiowa.edu/centers/casma/computer-programs.aspx. Accessed June 18, 2012
29. Hansen G, Dudek NL, Wood T, Gofton WShows How—A Tool to Evaluate Surgical Competence. Abstract presented at: Canadian Conference on Medical Education; May 2010; St. John’s, Newfoundland, Canada
30. Streiner DL, Norman GR Health Measurement Scales. 2008 New York, NY Oxford University Press
31. Wass V, Van der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. Lancet. 2001;357:945–949
32. Dauphinee DNewble D, Jolly B, Wakeford R. Determining the content of certification examinations. In: The Certification and Recertification of Doctors: Issues in the Assessment of Clinical Competence. 1994 New York, NY Cambridge University Press:92–104
33. Van Rij AM, McDonald JR, Pettigrew RA, Putterill MJ, Reddy CK, Wright JJ. CUSUM as an aid to early assessment of the surgical trainee. Br J Surg. 1995;82:1500–1503
34. Young A, Miller JP, Azarow K. Establishing learning curves for surgical residents using cumulative summation (CUSUM) analysis. Curr Surg. 2005;62:330–334
35. Naik VN, Devito I, Halpern SH. CUSUM analysis is a useful tool to assess resident proficiency at insertion of labour epidurals. Can J Anaesth. 2003;50:694–698
36. Bartlett A, Parry B. CUSUM analysis of trends in operative selection and conversion rates for laparoscopic cholecystectomy. ANZ J Surg. 2001;71:453–456
37. Williams SM, Parry BR, Schlup MM. Quality control: An application of the CUSUM. BMJ. 1992;304:1359–1361
© 2012 Association of American Medical Colleges