Perspective: The ACGME Toolbox: Half Empty or Half Full?
Green, Michael L. MD, MSc; Holmboe, Eric MD
Dr. Green is associate residency program director and associate professor, Department of Internal Medicine, Yale University School of Medicine, New Haven, Connecticut.
Dr. Holmboe is senior vice president for quality research and academic affairs, American Board of Internal Medicine, Philadelphia, Pennsylvania, and adjunct professor, Department of Internal Medicine, Yale University School of Medicine, New Haven, Connecticut.
Correspondence should be addressed to Dr. Green, Yale Primary Care Residency Program, Waterbury Hospital, 64 Robbins Street, Waterbury, CT, 06721; telephone: (203) 573-6574; fax: (203) 573-6707; e-mail: firstname.lastname@example.org.
The Accreditation Council for Graduate Medical Education Outcome Project changed the currency of accreditation from process and structure to outcomes. Residency program directors must document their residents' competence in six general dimensions of practice. A recent systematic review, published in the March 2009 issue of Academic Medicine, concluded that the instruments currently available are psychometrically inadequate for evaluating residents in five of the six competencies.
In this perspective, the authors refute the findings of this earlier review. They demonstrate that the review's search strategy was limited, failing to capture many important evaluation studies. They also question the appropriateness of the analysis of the included articles, which focused, to the exclusion of other important properties, on an instrument's ability to discriminate among residents' performance in the six competencies.
Finally, the authors argue that the problem is not the lack of adequate evaluation instruments but, rather, the inconsistent use and interpretation of such instruments by unskilled faculty. They urge the graduate medical education community—if it is to realize the promise of competency-based education—to invest in training for faculty evaluators rather than waiting for new instruments.
In July 2001, the Accreditation Council for Graduate Medical Education (ACGME) Outcome Project changed the currency of accreditation from process and structure to outcomes.1 The Outcome Project requires residency program directors to provide more than a schedule of rotations, a written curriculum, and agreements with clinical training venues. Instead, directors must objectively document their residents' competence in six general dimensions of practice. In phase 1 of the Outcome Project, programs defined objectives to demonstrate learning in the competencies. In phase 2, they integrated the competencies into their curricula and expanded their evaluation systems to assess residents' performance in the competencies. Programs are currently in phase 3, which requires them to use aggregate performance data for curriculum reform.
The transition to competency-based training has not been easy. At the time of this writing (2009, year eight of the project), the medical education community had yet to realize “widespread operationalizing of outcomes in the evaluation of residents or in the accreditation of programs.”2 Teaching hospitals were distracted by financial pressures and compliance with resident duty hours restrictions. Program directors lamented the lack of faculty expertise, time, and resources.3 In addition, some educators challenged the premise of the Outcome Project, warning that assessment of individual domains (“anatomizing clinical competence”) fails to capture residents' ability to integrate their knowledge, skills, and attitudes to perform complex tasks in the care of patients.4,5 Finally, the most constant cautionary refrain, in our experience, has been the nihilistic din that the instruments currently available are inadequate for evaluating residents in the six competencies.
A recent systematic review would seem to confirm this last protestation. Lurie and colleagues6 reviewed the psychometric characteristics of 56 evaluation instruments and concluded that the “literature to date has not yielded any method that can assess the six ACGME general competencies as independent constructs.” However, we believe that the authors painted an overly pessimistic picture of the state of this science. In this essay, we refute the findings of the review and argue that the evaluation instruments currently available are equal to the task.
First, Lurie and colleagues' literature search was hardly exhaustive and certainly missed many important studies of evaluation instruments and strategies. The search terms “ACGME” and “competencies” capture only a small subset of studies given that investigators frequently develop and test evaluation instruments without the Outcome Project framework specifically in mind. It would be important to include more descriptive terms, such as those describing the main competencies (“professionalism,” “communication”) and those describing discreet domains within a competency (“quality improvement [QI]” and “evidence-based practice,” which are within practice-based learning and improvement [PBLI]). Furthermore, the authors did not search additional databases (e.g., Health and Psychosocial Instruments, EMBASE–Excerpta Medica, PsychINFO), the tables of contents of medical education journals, proceedings of meetings, non-peer-reviewed publications (e.g., ACGME Bulletin), or Internet sites (e.g., ACGME, the Association of American Medical Colleges' MedEdPortal, Best Evidence Medical Education). Nor did they consult experts in the field.
Important additional evaluation studies, for example, were included in a systematic review of instruments to evaluate evidence-based practice,7 which encompasses only two of eight subdomains of a single competency (i.e., PBLI). The more sophisticated search methodology of this review captured thousands of articles, of which 347 were selected for initial review and 115 (including 103 unique instruments) for final inclusion.7 Similarly, whereas Lurie et al reviewed only two studies of direct observation, a recent systematic review included 85 articles describing 55 instruments (32 studied in residents and fellows) for direct observation of clinical skills.8 Investigators have conducted similar evaluation reviews for professionalism,9,10 communication and interpersonal skills,11 and simulation.12 Also, a recent textbook reviewed the domains, formats, feasibility, and psychometric characteristics of numerous assessment strategies for all six ACGME competencies.13
Lurie and colleagues' analysis of the articles their inexhaustive search did capture casts further doubt on the veracity of their findings. They did not determine the level of agreement between multiple raters in either the study inclusion or data abstraction processes. Assurance of interrater reliability is especially important in this review, given the potentially subjective judgments about the psychometric characteristics of evaluation instruments. Furthermore, arguing that they could not anticipate the kinds of studies their search would identify, they offered no quality criteria by which to judge studies of evaluation instruments. We believe this omission lacks credulity not only because the critical elements of establishing the psychometric strength of evaluation instruments are well known14,15 but also because similar reviews have specified, a priori, criteria for the type of study, analysis, and results required for establishing different psychometric characteristics.7,8,11,12
No single hierarchy for preferred psychometric characteristics of evaluation instruments exists. Rather, investigators should conduct appropriate analyses based on the intended evaluation purposes and context of the instrument.16 Van der Vleuten17 identifies five important psychometric characteristics: validity, reliability, educational impact, cost, and acceptability. The relationship among the characteristics in his utility model, however, is multiplicative. If one of the elements is zero, then the utility will be zero. While Lurie and colleagues did not initially identify particularly desired psychometric characteristics for evaluating the ACGME competencies, their analysis betrays their singular endorsement of one type of validity: an instrument's ability to discriminate among residents' performance in the six competencies. None of the instruments passed this litmus test, either based on factor analyses of scores (such as on global rating scales or 360-degree evaluations) or because this property was not tested (as in direct observation or portfolios).
This finding, however, may reflect the way evaluators used the instruments (see below) rather than inherent properties of the instruments themselves. What an instrument measures depends more on the content of the method or the task posed to the examinee than on any characteristic of the method itself.17,18 To borrow Van der Vleuten's17 example, a multiple-choice question (MCQ) does not measure factual knowledge because it requires a selection from a list of options. Rather, a particular MCQ measures factual knowledge if it requires a choice among facts. More important, an instrument's ability to discriminate among performance in the competencies, in our opinion, remains a relatively unimportant property. Program directors are not confined to using a single instrument. Indeed, multimethod assessment tells a more comprehensive story of a trainee's developing competence and minimizes the potential limitations of any single instrument.19
We would endorse another type of discrimination—the ability of an instrument to discriminate among different levels of performance within a single competency or a subdomain of a single competency. Instruments with this type of discriminative validity would be useful to chart residents' development along “milestones” of competence.20,21 Indeed, Lurie and colleagues cite this property of multiple-choice and script-concordance tests in their argument that medical knowledge represents an “exception” to the challenge of measuring the ACGME competencies. Ironically, the authors endorse these instruments, even though, like the ones they condemn, they have not been shown to discriminate among the six different competencies as independent constructs.
Evaluators can discriminate among different levels of performance within several of the ACGME competencies with currently available instruments. In the evidence-based practice-assessment systematic review mentioned above, the authors identified seven instruments, mostly targeting skills, that demonstrate not only good validity in general (as evidenced by objective outcomes, established interrater reliability, and three or more types of validity evidence) but also good discriminative validity in particular.7 Also, portfolio systems can document evidence-based practice performance in clinical practice.22 The mini-clinical evaluation exercise may not be sensitive to differential performance in multiple competencies, but it has a strong psychometric track record in evaluating clinical skills, which fall in the patient-care competency.8,23 Standardized patients and ratings of videotaped clinical encounters are already used for high-stakes evaluation of communication.11 Assessing professionalism remains a challenge for medical educators. Given the nature and occurrence of professional behaviors, multisource feedback24 appears suitable as it allows for multiple observations over time by multiple observers in multiple, authentic settings. A few preliminary studies have shown promising evidence for the reliability and validity of multisource observations to measure professionalism in controlled settings.25,26 Although studies suggest that reasonable reliability can be achieved with multisource feedback, its validity remains to be established in authentic educational settings.
Finally, Lurie and colleagues separately considered instruments for evaluating PBLI and systems-based practice (SBP). The majority of the 14 included articles reported evaluations of resident QI initiatives. Thus, their analysis neglected other dimensions of PBLI and SBP, such as use of information technology, critical appraisal of scientific studies, self-assessment, care coordination, and work in various delivery systems. Lurie and colleagues concluded that, within the area of QI, audits of “clinical outcomes” are used exclusively in assessment. In fact, QI skills can also be assessed either by a validated “case-based” examination27,28 or through a standardized review of a QI proposal.29 Furthermore, regarding clinical outcomes, we would celebrate, not lament, the demonstration that a team of residents can assess and improve performance.
The biggest problem in evaluating competencies is, in our opinion, not the lack of adequate assessment instruments but, rather, the inconsistent use and interpretation of those available by unskilled faculty. We do not make this claim pejoratively. Nor do we fail to recognize the commitment and effort of faculty, who, in many residency programs, volunteer their time. Nonetheless, most “workplace-based” assessments,30 regardless of the instrument, require an experienced evaluator both to observe trainee behaviors in authentic settings and to synthesize the observations into a judgment. Numerous studies have demonstrated significant deficiencies in faculty's direct observation, evaluation skills.31 Simply putting a new assessment tool into the hands of untrained educators will not likely improve the quality of the evaluation.
Faculty can develop and maintain evaluation skills, but development and maintenance require substantial training and ongoing practice.24,31,32 Brief, one-time training interventions are not effective.33–35 Some, questioning the feasibility of the considerable faculty development investment, will see us tilting at windmills and prefer more easily implemented “objective” instruments. But medical educators should not promote feasibility at the expense of validity36,37—especially not “direct validity,”38 which represents the extent to which the tasks posed by the test represent the real-world tasks of interest. Further, objective assessments are not necessarily more reliable than subjective ones.16,37
Residency program directors must invest in developing a cadre of trained faculty evaluators if they are to realize the promise of competency-based education. The graduate medical education community has already made substantial progress. The American Board of Internal Medicine, for example, offers a five-day course on evaluating clinical competence that includes evidence-based “direct observation of competence”32 training. To date, over 150 program directors and key faculty have taken the course and subsequently implemented changes to the evaluation programs at their institutions.
In addition to insufficient training in evaluation, faculty also lack a common understanding of expected behaviors for residents progressing through training.31 They need these “performance dimensions” in order to know what to look for during their observations. The language of the ACGME competencies lacks this level of granularity and developmental progression. Thus, we support Lurie and colleagues6 in their call for “an explicitly stated set of expectations that would link the ideals of the general competencies to the realities of measurement.” The articulation of developmental milestones within each competency, which is currently under way,2,20,21 will accomplish this and should reduce variability among faculty evaluators.
In closing, we argue that program directors can find several instruments with sufficient psychometric support in the “toolbox,” which we view as “half-full” rather than “half-empty.” We also recognize that the toolbox is not overflowing with perfect instruments. Indeed, we expect education researchers to identify new assessment methods and improve existing ones. The question is, what should the graduate education medical community do right now? Neither waiting for the “perfect tool” nor succumbing to an unfounded nihilism is justifiable as the cry for public accountability in graduate medical education gets louder.39,40 Instead, the academic medicine community should strive for better application of the existing evaluation instruments by better preparing the evaluators for the job. We join Lurie and colleagues in their hope that their review discourages neither program directors from trying to implement competency-based training nor education researchers from trying to refine evaluation instruments and improve the performance of evaluators.
The opinions expressed in this essay are the authors' and do not necessarily represent the views of the American Board of Internal Medicine.
1Accreditation Council for Graduate Medical Education. Outcome Project. Available at: http://www.acgme.org/outcome
. Accessed January 22, 2010.
2Nasca TJ. The next step in the outcomes-based accreditation project. ACGME Bull. May 2008:2–4.
3Heard JK, Allen RM, Clardy J. Assessing the needs of residency program directors to meet the ACGME general competencies. Acad Med. 2002;77:750.
4Huddle TS, Heudebert GR. Taking apart the art: The risk of anatomizing clinical competence. Acad Med. 2007;82:536–541.
5Whitcomb ME. Redirecting the assessment of clinical competence. Acad Med. 2007;82:527–528.
6Lurie SJ, Mooney CJ, Lyness JM. Measurement of the general competencies of the Accreditation Council for Graduate Medical Education: A systematic review. Acad Med. 2009;84:301–309.
7Shaneyfelt T, Baum KD, Bell D, et al. Instruments for evaluating education in evidence-based practice: A systematic review. JAMA. 2006;296:1116–1127.
8Kogan JR, Holmboe ES, Hauer KE. Tools for direct observation and assessment of clinical skills of medical trainees: A systematic review. JAMA. 2009;302:1316–1326.
9Lynch DC, Surdyk PM, Eiser AR. Assessing professionalism: A review of the literature. Med Teach. 2004;26:366–373.
10Veloski JJ, Fields SK, Boex JR, Blank LL. Measuring professionalism: A review of studies with instruments reported in the literature between 1982 and 2002. Acad Med. 2005;80:366–370.
11Duffy FD, Gordon GH, Whelan G, et al. Assessing competence in communication and interpersonal skills: The Kalamazoo II report. Acad Med. 2004;79:495–507.
12Issenberg SB, McGaghie WC, Petrusa ER, Lee Gordon D, Scalese RJ. Features and uses of high-fidelity medical simulations that lead to effective learning: A BEME systematic review. Med Teach. 2005;27:10–28.
13Holmboe ES, Hawkins RE, eds. A Practical Guide to the Evaluation of Clinical Competence. Philadelphia, Pa: Mosby–Elsevier; 2008.
14Joint Committee on Standards for Educational and Psychological Testing; American Psychological Association; National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 1999.
16van der Vleuten CP, Schuwirth LW. Assessing professional competence: From methods to programmes. Med Educ. 2005;39:309–317.
17van der Vleuten CPM. The assessment of professional competence: Developments, research and practical implications. Adv Health Sci Educ Theory Pract. 1996;1:41–67.
18McGuire C. Written methods for assessing clinical competence. In: Hart IR, Harden RM, eds. Further Developments in Assessing Clinical Competence. Montreal, Quebec. Canada: Can-Heal; 1987.
19Epstein RM, Hundert EM. Defining and assessing professional competence. JAMA. 2002;287:226–235.
20Green ML, Aagaard EM, Caverzagie KJ, et al. Charting the Road to Competence: Developmental Milestones for Internal Medicine Residency Training. J Grad Med Educ. 2009;1:5–20.
22Green ML. Evaluating evidence-based practice. In: Holmboe ES, Hawkins RE, eds. A Practical Guide to the Evaluation of Clinical Competence. Philadelphia, Pa: Mosby–Elsevier; 2008.
23Norcini JJ, Blank LL, Duffy FD, Fortna GS. The Mini-CEX: A method for assessing clinical skills. Ann Intern Med. 2003;138:476–481.
24Lockyer J, Clyman SG. Multisource feedback. In: Holmboe ES, Hawkins RE, eds. A Practical Guide to the Evaluation of Clinical Competence. Philadelphia, Pa: Mosby–Elsevier; 2008.
25Cruess R, McIlroy JH, Cruess S, Ginsburg S, Steinert Y. The professionalism mini-evaluation exercise: A preliminary investigation. Acad Med. 2006;81(10 suppl):S74–S78.
26Srinivasan M, Litzelman D, Seshadri R, et al. Developing an OSTE to address lapses in learners' professional behavior and an instrument to code educators' responses. Acad Med. 2004;79:888–896.
27Morrison LJ, Headrick LA, Ogrinc G, Foster T. The quality improvement knowledge application tool: An instrument to assess knowledge application in practice-based learning and improvement. J Gen Intern Med. 2003;18:250.
28Ogrinc G, West A, Eliassen MS, Liuw S, Schiffman J, Cochran N. Integrating practice-based learning and improvement into medical student learning: Evaluating complex curricular innovations. Teach Learn Med. 2007;19:221–229.
29Leenstra J, Beckman T, Reed D, et al. Validation of a method for assessing resident physicians' quality improvement proposals. J Gen Intern Med. 2007;22:1330–1334.
30Norcini J, Burch V. Workplace-based assessment as an educational tool: AMEE Guide No. 31. Med Teach. 2007;29:855–871.
31Holmboe ES. Faculty and the observation of trainees' clinical skills: Problems and opportunities. Acad Med. 2004;79:16–22.
32Holmboe ES, Hawkins RE, Huot SJ. Effects of training in direct observation of medical residents' clinical competence: A randomized trial. Ann Intern Med. 2004;140:874–881.
33Cook DA, Dupras DM, Beckman TJ, Thomas KG, Pankratz VS. Effect of rater training on reliability and accuracy of Mini-CEX scores: A randomized, controlled trial. J Gen Intern Med. 2009;24:74–79.
34Holmboe ES, Fiebach NH, Galaty LA, Huot S. Effectiveness of a focused educational intervention on resident evaluations from faculty a randomized controlled trial. J Gen Intern Med. 2001;16:427–434.
35Noel GL, Herbers JE Jr, Callow MP, Cooper GS, Pangaro LN, Harvey J. How well do internal medicine faculty members evaluate the clinical skills of residents? Ann Intern Med. 1992;117:757–765.
36Hager P, Gonczi A, Athanasou J. General issues about assessment of competence. Assess Eval Higher Educ. 1994;19:3–16.
37van der Vleuten CP, Norman GR, Graaff E. Pitfalls in the pursuit of objectivity: Issues of reliability. Med Educ. 1991;25:110–118.
38Ebel R. The practical validation of tests of ability. Educ Meas Issues Pract. 1983;2:7–10.
39Medicare Payment Advisory Commission. Medical education in the United States: Supporting long-term delivery system reforms. In: Report to the Congress: Improving Incentives in the Medicare Program. Available at: http://www.medpac.gov/chapters/Jun09_Ch01.pdf
. Accessed January 22, 2010.
© 2010 Association of American Medical Colleges
What does "Remember me" mean?
By checking this box, you'll stay logged in until you logout. You'll get easier access to your articles, collections,
media, and all your other content, even if you close your browser or shut down your
To protect your most sensitive data and activities (like changing your password),
we'll ask you to re-enter your password when you access these services.
What if I'm on a computer that I share with others?
If you're using a public computer or you share this computer with others, we recommend
that you uncheck the "Remember me" box.
Data is temporarily unavailable. Please try again soon.