Central venous catheter (CVC) insertion is one of the most commonly performed bedside procedures.1 However, CVCs are a particularly common source of morbidity and mortality in hospitalized patients, with more than 15% of patients developing complications associated with this procedure.2,3 Complications include not only arterial puncture and pneumothorax but also bloodstream infection related to contamination at the time of insertion. Research has shown that detailed attention to technique and increased procedural experience reduce the errors associated with CVC insertions.4 In teaching hospitals, residents are often the primary operators of this invasive technique; yet, they frequently lack sufficient training and confidence in inserting CVCs and in performing other similar procedures.5,6
Professional societies, accreditation bodies, and hospitals recognize the growing importance of proper training in performing CVC insertion, and they have instituted competence requirements for this technical skill. The requirements of the Accreditation Council for Graduate Medical Education’s internal medicine (IM) residency program stipulate that all residents must show basic proficiency in CVC insertion.7 The American Board of Internal Medicine (ABIM) requires that program directors attest to residents’ competence in procedures such as CVC insertion.8 The ABIM also expects practicing internists to be evaluated and credentialed before performing these procedures unsupervised. The increased stringency of hospital credentialing requires explicit delineation of procedural training and qualifications, and hospitals may be unlikely to grant privileges solely on the basis of specialty training or inadequate documentation from residency training many years prior.9
Despite the pressing need for explicit measures to determine competence in CVC insertion, we could not find any literature that offers any data-driven recommendations or standardized instruments. Most program directors and hospital administrators have relied on the ABIM’s previous requirement that residents perform at least five CVC insertions during their IM residency, a number not supported by data. In fact, the only quantitative evidence for competence suggests that operators who have performed 50 or more CVC insertions have half the complication rate of operators who have performed fewer than 50 CVC insertions.3 The ABIM has since removed this stipulation, recognizing that experience alone is an insufficient proxy for competence.8 Furthermore, studies show that residents are not comfortable with their procedural skills even when they meet the ABIM requirements.5,10 Still other studies exploring alternative means to measure procedural competence find that self-assessment is inaccurate.11,12 As a result of the growing recommendations, paucity of quantitative standards, and resident discomfort, faculty supervision and evaluation of residents’ procedural performance are becoming increasingly commonplace.13,14 Moreover, experts recommend that direct observation be combined with validated checklists15,16 and other standard assessment measures to determine a resident’s readiness to perform procedures.17
We sought to create a valid assessment tool for faculty to use to assess IM resident performance in CVC insertion. We used the Delphi and Angoff methods (described below) to develop the instrument, and we tested the instrument by examining performance data from a random cohort of residents. We hypothesized that (1) we would be able to reach expert consensus on a checklist of procedural steps, (2) we could identify minimum passing scores (MPSs) to delineate borderline and competent trainees, and (3) the borderline and competent MPSs would each correlate with a separate five-point global rating of performance.
In early 2007, we (C.C.S., D.F.K., P.C.) began with an initial listing of important procedural steps in subclavian (SC) CVC insertion, based on review of the literature.18,19 We categorized each item a priori as a “major” or “minor” procedural step, to designate relevance. This process resulted in a 24-item checklist (Table 1).
Consensus-building by the modified Delphi method
To determine the criteria for successful CVC insertion, we used the modified Delphi method, an iterative process designed to achieve consensus among experts on critical decisions.20 In this method the investigator administers repeated survey rounds to a preselected group of experts who are unaware of each other’s identity. This approach is designed to eliminate individual influence on group decision making.21 After analyzing and incorporating the experts’ responses from the first survey, the investigator conducts a second survey round with the experts for further comments and feedback. The investigator administers repeat rounds of surveys until the experts reach full consensus.
In initiating the Delphi method, we selected seven experts in pulmonary critical care or anesthesia critical care from three different institutions to serve as judges for the checklist, based on their extensive experience in performing and supervising CVC insertion. All seven agreed to participate. We sent an introductory cover letter and the checklist via e-mail to our panel of experts, explaining that we would use the checklists to assess residents’ insertion of CVCs on a physical simulator. We instructed the experts to use a nine-point Likert-like scale categorized into three anchors (1-3 = “not important,” 4-6 = “somewhat important,” 7-9 = “very important”) to rate the criteria. We asked the experts to make free-text revisions of the checklist items, and additionally we asked for input on whether the major and minor designations for each procedural step were appropriate.
We calculated mean and median ratings from the first round of the Delphi method, eliminating items with a mean rating of 1 to 3 (“not important”) from the checklist. We revised the checklists using text suggestions. We used the ratings to calculate the internal consistency of the instrument. We resubmitted the checklist to the seven experts for additional comments or revisions (second round of the Delphi method).
Standard-setting by the Angoff method
Once we agreed on criteria for the CVC checklist, we employed the Angoff method22–24 to establish MPSs. In the original method described by Angoff,22 content experts are provided with a checklist of criteria and asked to judge, item by item, whether or not a borderline trainee would perform each step correctly. For our study, we distributed the CVC checklist to a panel of eight pulmonary critical care or anesthesia critical care experts from five institutions. We chose our panel of experts based on their expertise in CVCs and their experiences with clinical education. (Three members of this expert panel had also served as panelists in the Delphi exercise.) We instructed the experts to consider each procedural step and indicate “yes” or “no” based on their sense of whether a borderline trainee would successfully complete the step. We defined a borderline trainee as one who has basic proficiency under supervision—specifically, one who knows or can do just enough to pass a test, separating him- or herself from those who fail. To help the experts conceptualize the examinee, we described the borderline trainee as a first-month intern who successfully performs just enough of the checklist items to make a faculty supervisor feel comfortable allowing him or her to perform the central line procedure on a patient with faculty supervision. We then asked the experts to consider the same question for a competent trainee. We defined the competent trainee as one who has mastery without needing supervision, using the example of a high-performing senior resident in the last month of training who completes each step well enough and scores high enough so that the experts would feel comfortable allowing him or her to perform a procedure independently.
Once we received all scored checklists from the panelists, we calculated for each major criterion the proportion of experts who responded positively as to whether a borderline trainee would be able to complete the step successfully. We calculated the mean of the proportion of positive responses across all major criteria. We multiplied the mean by the total number of major criteria to calculate the MPS. We used this formula to determine the number of major criteria and the number of minor criteria for the MPS, for both the borderline and competent trainees.
We repeated both the Delphi and Angoff methods to create a checklist for internal jugular (IJ) CVC insertion; however, we did not have resident performance data for IJ insertions to further validate the tool (Appendix).
Comparison of standards to performance data
To test the standards against actual performance data, we used results from IM residents, ranging from postgraduate year (PGY)-1 to PGY-3, who were assessed at Beth Israel Deaconess Medical Center (BIDMC). The residents were already scheduled through random assignment to intensive care unit (ICU) rotations from January to June 2007. These residents (N = 42) were required as part of their residency to undergo CVC training on a part-task simulator (Central Venous Access Head/Neck/Upper Torso, Blue Phantom, Kirkland, Wash) prior to beginning their ICU rotation. The residents were unlikely to be familiar with the simulator, because it was not part of any other activity or program at BIDMC. As part of their training, they performed SC CVC insertions up to three times on a simulator while being videotaped. We divided the videotapes between two trained faculty supervisors (neither authors nor expert judges), who completed the SC assessment tool and assigned a global rating to the trainee’s performance using a five-point Likert-type scale anchored at three points (1 = “unable to complete procedure without assistance,” 3 = “demonstrates essential skills to complete procedure,” and 5 = “demonstrates mastery of procedure skills”). These two faculty evaluators had previously shown high interrater agreement (R = 0.92) on a subset of the resident data after having been trained to use the tool.
We created a dummy variable defined by whether the trainee had exceeded the borderline MPS. We created a univariable logistic regression model to examine global rating thresholds associated with a borderline MPS, creating dichotomous independent variables for the global rating thresholds as follows: global rating = 1, global rating ≥2, global rating ≥3, global rating ≥ 4, and global rating = 5. We repeated the analyses, clustering by trainee, to determine whether the effect of multiple observations of a single trainee would affect our results. We repeated the univariable logistic regression with the MPS for competence, with and without clustering.
We performed all analyses using STATA 8.0 (College Station, Texas). The institutional review board of BIDMC approved the study protocol in advance.
The Delphi process: Consensus-building and validation
In the first round of the Delphi process, the seven experts rated all criteria as “somewhat important” to “very important” (i.e., a rating of greater than 3). The range of averages was from 6.0 to 8.4. The internal consistency coefficient (using Cronbach alpha) was 0.94, and the interrater coefficient (using kappa) was 0.27. The experts suggested text rephrasing, clarifications, or modifications for 5 of the 24 criteria. Table 1 shows the results of Delphi round #1 including the medians, means, standard deviations, and types of modifications for each criterion.
In the second round of the Delphi method, all seven experts unanimously agreed to the content of the instrument and had no further recommendations.
Standard-setting, Angoff method
All eight experts returned checklists for the Angoff method. For the borderline trainee, the proportion of positive responses ranged for each major criterion from 0.1 to 1.0 with a mean proportion of 0.6, and for each minor criterion from 0 to 1.0 with a mean proportion of 0.2. For the competent trainee, the proportion of positive responses ranged for each major criterion from 0.9 to 1.0 with a mean proportion of 1.0, and for each minor criterion from 0.6 to 1.0 with a mean proportion of 0.9. Therefore, the MPS for a borderline trainee was 11 (0.6 × 18 = 10.8, rounded up) major criteria and 1 (0.2 × 6 = 1.2, rounded down) minor criterion. The MPS for a competent trainee was 18 major (1.0 × 18) criteria and 5 (0.9 × 6 = 5.4, rounded down) minor criteria. For the Angoff method, the Cronbach alpha coefficient was 0.72 and 0.80 (borderline and competent, respectively). The percent agreement on all criteria was 2 out of 24 (8%) for the borderline trainee, and 21 out of 24 (88%) for competent trainee.
Comparison of standards to performance data
Forty-two trainees performed 94 CVC insertions (an average of 2.2) on the simulator. Preliminary review of the performance data resulted in the finding that faculty evaluators could not assess the trainees on checklist items #19 (“Orients bevel toward feet”) and #24 (“Responds appropriately to monitor showing supraventricular tachycardia/ ventricular tachycardia during insertion”). The video was not of sufficient resolution to show the orientation of the bevel, and the simulator did not have the capacity to display arrhythmias during insertion. Deleting these two items from the SC checklist then resulted in modification of the borderline MPS to 10 major criteria (0.6 multiplied by 17) and 2 minor criteria (0.3 multiplied by 5). The revised MPS for competence became 17 (1.0 multiplied by 17) major criteria and 5 (0.9 multiplied by 5) minor criteria.
Residents’ performance by major and minor criteria categorized by global rating is summarized in Table 2.
Our faculty evaluators gave 71 resident-performed CVC insertions (76%) scores that exceeded the borderline MPS; 23 CVC insertions were associated with scores that failed to pass the borderline MPS. Of 31 observations exceeding the borderline MPS, 9 were associated with a global rating of 1, and every observation receiving a global rating ≥3 met the borderline MPS (Figure 1). In the univariable logistic regression, achieving the borderline MPS was most strongly associated with a global rating ≥2 (odds ratio 150, 95% confidence interval 18-1,300). The relative magnitude of the correlation was not affected when we repeated the analyses to adjust for multiple observations of the same trainee.
Our faculty found no resident-performed CVC insertions that met the MPS for competence of 17 major and 5 minor criteria. Residents who received the highest global rating of 5 achieved a range of 14 to 16 major and 4 to 5 minor criteria (Table 2).
Discussion and Conclusions
In this study, we created an assessment tool to evaluate IM residents as they performed CVC insertions on simulators. The instrument was created by expert consensus using a modified Delphi method. Through the Angoff standard-setting method, we determined both an MPS to identify borderline trainees in need of continued supervision and an MPS to identify competent trainees ready for independently inserting CVCs. Finally, we found in our cohort of trainees that the borderline MPS was associated with a low-normal global rating and that no trainee was able to meet the MPS for competence. Other institutions may use our assessment tool to evaluate resident trainees as they perform CVC insertions.
We achieved our objective of using the modified Delphi method to create a consensus-driven assessment tool. During the Delphi process, we made few modifications to the original tool, as the experts uniformly agreed with the inclusion and importance of the checklist items. We also met our objective to use the Angoff standard-setting method to determine MPSs for the borderline and competent trainees. We found that experts had higher agreement on the definition of the competent trainee than that of the borderline trainee. Experts uniformly may share the expectation that a competent trainee should perform all steps successfully. However, they may be in less agreement about the definition of the borderline trainee because each expert may have his or her own view on which and how many areas, including knowledge, psychomotor skill, or ability to make adjustments when adverse events arise, a borderline trainee may lack. Additionally, an expert deciding whether a borderline trainee should achieve a particular step may be influenced by how familiar the step should be to the trainee, rather than judging the skill of the trainee. For example, criterion #11 refers to sterile precautions, a step that would be familiar to a trainee performing any kind of invasive procedure. Eighty percent of experts felt that a borderline trainee would achieve this step. In contrast, criterion #21 refers to actions the operator should take if the vessel is not cannulated. This step requires multiple steps and is unique to the SC CVC, and accordingly, only 10% of experts thought that a borderline trainee would achieve this step.
Our finding that the borderline MPS was most closely associated with a global rating of 2 was consistent with our hypothesis about the borderline trainee having basic proficiency under supervision but not yet proficient without supervision (global rating of 3). Thus, our instrument demonstrates some evidence of concurrent validity for the determination of the borderline trainee in that it correlates with a low-normal global rating. We had hypothesized that the results would produce a correlation between the MPS for competence and the global rating of 5, but instead we found that none of the trainees in our study achieved the MPS for competence. Previous studies support the observation that a low number of performed procedures, as would be expected from our cohort of residents, is inadequate to confer confidence,5,6,10 much less competence, and additionally, Sznajder and colleagues3 demonstrated that complication rates, a patient-centered marker for competence, do not decrease until operators have performed 50 or more CVC insertions.
Our work compares favorably with the literature on standard-setting and procedural assessment. Studies using the Hofstee and Angoff methods to derive MPSs for advanced cardiac life support training similarly resulted in higher standards for competence than those achieved by residents using other measures (e.g., an arbitrary passing score of 70%).25 It is noteworthy that our work supports principles that direct observation with criteria, simulation models, and videotapes tend to be highly reliable and valid approaches to technical skills assessment compared with gestalt observations.26 Our results contrast with previous studies in the surgical literature showing that global ratings may have higher construct validity compared with procedural checklists27,28; however, the context and content of our study differ significantly. Both of the surgical studies relied on the Objective Structured Assessment of Technical Skill,29 a tool validated for surgical procedures whose score is determined by the average of global ratings along seven dimensions. Our global rating, in contrast, more closely resembles the solitary global rating, the sign-off on procedure logs, or tacit appraisal (“has a good pair of hands”) traditionally used to assess residents’ procedural skills in IM. Additionally, our checklists provide objective criteria for evaluation and identify specific areas in need of improvement, features important in a teaching setting.
Our study has several limitations. Our findings that IM residents were unable to meet competence standards may not be generalizable to trainees in more technically oriented specialties, such as surgery or anesthesia. Although we created our assessment tool with actual patients in mind, the performance data are based on simulators, which promote patient safety but lack the clinical variability of real patients. In our analyses we adopted the compensatory approach, whereby success in one step negates failure in another, but in the real world failing to achieve some of the major items in a stepwise process would obviate a procedure altogether. Readers will note that we were forced to remove two items from the original checklist because of technical limitations. From a measurement standpoint, the Delphi method was not rendered invalid by the removal of these items, as experts were not constrained to choose a certain number of major and minor criteria to be included. Secondly, the retention of these items for assessment in live clinical patients would have required differing thresholds for passing, depending on whether an arrhythmia actually occurred. Ultimately, we felt that deleting the items would help the generalizability of the tool without threatening its psychometric properties. Lastly, the experts generated the thresholds for passing without being familiar with the residents’ actual performance and set higher expectations for this study than what is currently expected in residency training; this observation is consistent with experience previously noted in the literature that experts may overestimate the ability of novices in the absence of performance data.23 However, we would argue that performance standards should not be norm referenced; that is, they should not be established by “setting the curve” to the group being evaluated.
When residents do not meet rigorous, data-driven performance standards, there may be practical (and unintended legal) implications for all stakeholders. Residency programs are expected to determine whether graduating residents are competent at medical procedures such as CVC insertion. However, if we move to true competency-based training, how should a program handle a PGY-3 who does not exceed the MPS for competency? Should faculty be required to supervise residents until they demonstrate competence? Should faculty themselves be required to demonstrate competence? What happens at teaching hospitals that lack the resources and infrastructure to supervise CVCs? What are the ramifications for patients, who ultimately assume the burden of risk? BIDMC recognizes the limitations of current procedural assessment and has addressed some of these concerns by requiring that faculty supervise all invasive medical bedside procedures 24 hours a day.
Faculty may use our assessment instrument to assess residents’ skill in CVC insertion. The relative simplicity of the tool, using explicit criteria instead of subjective ratings of generic steps, facilitates further validation among populations such as faculty experts or residents of other specialties. Though intended for assessment, our instrument also serves an instructional purpose in that it operationalizes a procedure often taught in a verbal and unstandardized manner. We aim in the future to determine predictive validity by exploring the relationship between MPSs and complication rates in patients and to apply this instrument to our research on the impact of simulation training. We also wish to explore the relationship between analytic and holistic judgments of technical skill, recognizing that the latter may capture finer aspects of mastery, while the former may be more appropriate for novices. While we recognize that the ultimate measure of competence will require multiple measurements over time in various settings and inclusion of clinical outcomes, we believe our work is one of the first to define thresholds for basic proficiency and competence in a manner supported by evidence.
The authors gratefully acknowledge Shruti Mulki for providing critical assistance with data management. They also wish to thank the experts involved in development of the assessment tool (Dr. Benjamin Medoff, Dr. J. Woodrow Weiss, Dr. Daniel Talmor, Dr. Armin Ernst, Dr. Taylor Thompson, Dr. Scott Manaker, Dr. Patricia Kritek, and Dr. Carey Thomson) and the faculty raters (Dr. Trustin Ennacheril and Dr. Michael Cho).
1 Raad I. Intravascular-catheter-related infections. Lancet. 1998;351:893–898.
2 Merrer J, De Jonghe B, Golliot F, et al; French Catheter Study Group in Intensive Care. Complications of femoral and subclavian venous catheterization in critically ill patients: A randomized controlled trial. JAMA. 2001;286:700–707.
3 Sznajder JI, Zveibil FR, Bitterman H, Weiner P, Bursztein S. Central vein catheterization. Failure and complication rates by three percutaneous approaches. Arch Intern Med. 1986;146:259–261.
4 Pronovost P, Needham D, Berenholtz S, et al. An intervention to decrease catheter-related bloodstream infections in the ICU. N Engl J Med. 2006;355:2725–2732.
5 Hicks CM, Gonzalez R, Morton MT, Gibbons RV, Wigton RS, Anderson RJ. Procedural experience and comfort level in internal medicine trainees. J Gen Intern Med. 2000;15:716–722.
6 Wickstrom GC, Kolar MM, Keyserling TC, et al. Confidence of graduating internal medicine residents to perform ambulatory procedures. J Gen Intern Med. 2000;15:361–365.
9 Wigton RS, Alguire P; American College of Physicians. The declining number and variety of procedures done by general internists: A resurvey of members of the American College of Physicians. Ann Intern Med. 2007;146:355–360.
10 Huang GC, Smith CC, Gordon CE, et al. Beyond the comfort zone: Residents assess their comfort performing inpatient medical procedures. Am J Med. 2006;119:71:e17–e24.
11 Davis DA, Mazmanian PE, Fordis M, Van Harrison R, Thorpe KE, Perrier L. Accuracy of physician self-assessment compared with observed measures of competence: A systematic review. JAMA. 2006;296:1094–1102.
12 Duffy FD, Holmboe ES. Self-assessment in lifelong learning and improving performance in practice: Physician know thyself. JAMA. 2006;296:1137–1139.
13 Smith CC, Gordon CE, Feller-Kopman D, et al. Creation of an innovative inpatient medical procedure service and a method to evaluate house staff competency. J Gen Intern Med. 2004;19:510–513.
14 Ramakrishna G, Higano ST, McDonald FS, Schultz HJ. A curricular initiative for internal medicine residents to enhance proficiency in internal jugular central venous line placement. Mayo Clin Proc. 2005;80:212–218.
15 Reznick RK. Teaching and testing technical skills. Am J Surg. 1993;165:358–361.
16 Reznick RK, MacRae H. Teaching surgical skills—Changes in the wind. N Engl J Med. 2006;355:2664–2669.
17 Eva KW, Regehr G. Self-assessments in the health professions: A reformulation and research agenda. Acad Med. 2005;80:S46–S54.
18 McGee DC, Gould MK. Preventing complications of central venous catheterization. N Engl J Med. 2003;348: 1123–1133.
19 Mermel LA. Prevention of intravascular catheter-related infections. Ann Intern Med. 2000;132:391–402.
20 Clayton MJ. Delphi: A technique to harness expert opinion for critical decision-making tasks in education. Educ Psychol. 1997;17:373–386.
21 RAND Science and Technology Policy Institute. E-Vision 2000: Key Issues That Will Shape Our Energy Future. Summary of Proceedings, Scenario Analysis, Expert Elicitation and Submitted Papers. Available at: (www.rand.org/scitech/stpi/Evision/summary.pdf
). Accessed April 3, 2009.
22 Angoff WH. Scales, norms, and equivalent scores. In: Thorndike RL, ed. Educational Measurement. 2nd ed. Washington, DC: American Council on Education; 1971:508–600.
23 Downing SM, Tekian A, Yudkowsky R. Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teach Learn Med. 2006;18:50–57.
24 Norcini JJ. Setting standards on educational tests. Med Educ. 2003;37:464–469.
25 Wayne DB, Fudala MJ, Butter J, et al. Comparison of two standard-setting methods for advanced cardiac life support training. Acad Med. 2005;80:S63–S66.
26 Watts J, Feldman WB. Assessment of technical skills. In: Neufeld VR, Norman GR, eds. Assessing Clinical Competence. New York, NY: Springer; 1985:259–274.
27 Regehr G, MacRae H, Reznick RK, Szalay D. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE- format examination. Acad Med. 1998;73:993–997.
28 Aggarwal R, Grantcharov T, Moorthy K, Milland T, Darzi A. Toward feasible, valid, and reliable video-based assessments of technical surgical skills in the operating room. Ann Surg. 2008;247:372–379.
29 Martin JA, Regehr G, Reznick R, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997;84:273–278.