Huddle, Thomas S. MD, PhD; Heudebert, Gustavo R. MD
Residency program directors are pondering the assessment of competence in their trainees as a result of the American Council for Graduate Medical Education (ACGME) Outcome Project. Having defined six broad professional competencies as the ends to be accomplished by residency training, the ACGME wants residencies to show that their trainees possess these when training has been completed and, preferably, to do so by objective measurement. Programs are to specify learning objectives related to each of the broad competencies and to measure their achievement by residents as evidence that residents possess the competencies and are, hence, well trained to practice medicine.
The Outcome Project, launched by the ACGME in 1999, was aimed at developing a new model for accrediting training programs, to overcome the inadequacies of traditional ways of evaluating residents. Rather than merely considering the structures and processes of training, the traditional bases for residency accreditation, the ACGME proposed to base its accreditation of training programs on training outcomes, as discerned in the competence of graduating residents. To assess such outcomes, the ACGME began by developing definitions of general medical competencies. From these competencies, residency programs are tasked with developing specific learning objectives, the achievement of which can be assessed through objective measures.
In contrast to objectively measuring training outcomes, traditional assessment of a resident's performance has involved written examinations and global ratings by faculty who have observed the resident in a clinical setting over the length of a clinical rotation. Written examinations are suitably objective and are approved by the ACGME as an appropriate part of resident evaluation. Global ratings do not fare so well, being less reproducible and less objective.1 Global ratings are usually produced by the aid of checklists that allow faculty to consider different aspects of the resident's performance separately while deciding on a global assessment. Such assessment is irreducibly perspectival, subject to biases affecting the faculty person's view of the resident. Although checklists and global ratings are among the ACGME's suggested methods of evaluation, they are viewed as less desirable than a range of other suggested evaluation methods that are distinguished from global ratings by their scrutiny of small samples of resident behavior in a structured setting. Chart-stimulated recall oral examinations, objective structured clinical examinations, record reviews, simulations and models, standardized oral examinations, and standardized patient scenarios are all preferred to more traditional global ratings. These ways of selectively assessing resident performance are favored for their greater objectivity and reproducibility. The Outcome Project has been planned to move forward in four distinct phases, two of them already completed by residency programs nationwide, at least in theory.
The first phase (2001–2002) required training programs to begin developing learning objectives based on the six general competencies (see List 1) and to include these objectives in the training curriculum. The second phase (2002–2006) asked training programs to develop tools to assess the six general competencies; these tools were to be developed locally or with the aid of a “toolbox”1 of suggested assessment techniques developed by the ACGME and the American Board of Medical Specialties. In the current phase (2006–2011), training programs are integrating the six competencies into clinical training and providing evidence of curricular change on the basis of data acquired from measuring competency-based outcomes. The anticipated fourth and final phase will involve identifying benchmark training programs and developing programs of excellence to bring the successful innovations of the project to internists training in the United States.
Medical educators have long sought to objectify and measure the outcomes of medical education beyond results obtained on standardized tests; the ACGME Outcome Project is but the latest manifestation of that impulse, which can be traced back at least to the early 1970s. Such efforts to render objective the outcomes of clinical training have not been notably successful so far. In what follows, we will suggest that such efforts have been unsuccessful at least in part because the attempt to capture higher-level clinical competence in objective measures is misconceived. Clinical competence can be fully assessed only in actual clinical contexts by those who already possess it. Such assessment will be irreducibly subjective and perspectival. In arguing for these conclusions, we will allude to earlier attempts to objectify outcomes in medical education that predate the competency movement; we will suggest that the difficulties that eventually doomed those earlier efforts have not been avoided by attempts to derive learning objectives from general competencies. We will argue that the measurable bits of performance that follow from anatomizing clinical competence according to discrete learning objectives do not and cannot add back together to constitute the skill and ability of the competent physician. Having suggested that clinical competence is better thought of as a kind of creative and flexible capacity to respond to clinical context than as possession of a given set of objectifiable responses to such context, we will go on to consider the implications of such a conception of competence for its assessment. We will suggest that the notion of competence implied by the ACGME Outcome Project is already being invoked in discussions of certification and proper training length, and, finally, that insofar as such “competence” becomes the benchmark for certification decisions, the traditional excellence of American clinical training will be endangered.
Possible Uses of Objective Competency Assessment
The kinds of use to which objective measures of competence might be put are suggested by discussions about the proper length of internal medicine training, recently a subject of some controversy. Leaders in the subspecialty of cardiology, for instance, have begun to favor a shorter period of internal medicine residency for the trainee who has determined to become a cardiologist.2,3 They argue that subspecialists are not going to practice general internal medicine; why, then, should they become fully trained generalists before training to be cardiologists? The current practice of requiring general internal medicine training before specialization is a relic of a bygone era when important portions of a specialist's work continued to be general internal medicine. In the present context of a much more developed specialty network, subspecialists such as cardiologists will stick to their specialty; hence, completing generalist training before subspecialization seems unnecessary. We need more specialists, and their training period is egregiously long; why not shorten it if that can be done?
It is an easy step from certifying competence as suggested by the ACGME to using such certification as a means of justifying alterations in training length and timing of subspecialization. The ACGME has not, so far as we know, suggested such uses of its conception of competence, but others are invoking it in arguments for shortening internal medicine training. The argument begins by observing that the traditional three-year period of internal medicine residency training came about for contingent historical reasons having little to do with assessments of how much training was actually required to practice internal medicine; given the development of general competencies for internal medicine elaborated in specific learning objectives and attempts to provide for their objective assessment, one can specify a desired level of competency in medicine for trainees going into, say, cardiology, and stipulate achievement of that level as the measure of adequate internal medicine training. Would such achievement not be a better end point of training than completion of an arbitrary time period? And would not achieving a given competency level make more sense for internal medicine training in any event than the current system of requiring a given period of training and the passing of an examination?
Competency, it is presumed by this kind of argument, is an objective characteristic of trained physicians that, if assessed correctly, can be “read off” the trainee for purposes of making decisions about completion of training or suitability for further training. Such a view of competency is of a piece with the twin thrusts of the ACGME Outcome Project: specification and measurement. Competence remains a vague and subjective concept, the argument might run, unless we can say what it is and measure it. If we cannot specify it, do we really know what we are talking about? And once we specify it, have we specified anything important if we cannot assess it by measurement? Whereas if we can specify and measure it, then we have it in our grasp.
From Learning Objectives to General Competencies
The impulse to specify and measure learning outcomes in medical education is not a new one. Curricular learning objectives had a vogue in medical schools in the 1970s that led to the compilation of objectives covering the entire medical curriculum at some medical schools. Educators hoped that by carefully making explicit exactly what it was they hoped to teach, they could then devise ways to assay students at the end of their course, to confirm success or failure objectively. Breaking down clinical knowledge and performance into discrete, measurable chunks whose presence or absence in the trainee's repertoire could be objectively confirmed would allow such testing of clinical performance.
The promise of learning objectives for objectivity in evaluating clinical performance was not, however, fulfilled. It was eventually realized that the bite-sized elements of performance identified by discrete learning objectives did not capture the ability to choose whether and when to make use of such elements in real clinical situations.4 What was wanted was a measure of the more holistic clinical ability to bring abstract knowledge and discrete skills appropriately to bear in the clinical setting. The outcome of successful clinical education was the capacity or competence of the clinician to interpret and act skillfully; that was the true goal of clinical evaluation, to which the resources of objective evaluation needed to be directed.
Hence, the movement toward viewing competencies and learning outcomes as the more appropriate ends of the instructional process to be assayed, which has dominated the thinking of the bodies charged with the direction of medical education in the United States since the early 1990s. The ACGME presumes it will avoid the problems associated with learning objectives in the 1970s by focusing on objectives that are derivable from the general competencies. But the thrust of the Outcome Project is to be objective measurement of learning outcomes, the benchmark for which will be specific learning objectives. To be amenable to such measurement, the learning objectives will have to be specified smaller units of the broader competencies, similar in that regard to the learning objectives of the 1970s. Hence, it is unclear that the ACGME will really get beyond the difficulties with learning objectives experienced by medical educators of that earlier time.
From General Competencies Back to Learning Objectives
What sorts of difficulties are involved in conceiving competence in terms of specifiable learning objectives? It is clear that some aspects of professional competence are indeed specifiable. One can isolate such specific elements of clinical performance as history taking, the physical exam, and various procedures, and assess whether trainees can, in fact, perform them. And one can specify abstract knowledge about disease states and test its possession by trainees in written examinations. But competence, of course, involves not merely a repertoire of knowledge and skills but the capacity to make use of these as demanded by the particular clinical setting in which one finds oneself. And attempts at specifying such use of knowledge and skills in the form of learning objectives, the achievement of which is amenable to measurement, runs into difficulties. One might, for instance, specify that fully trained internists ought to be competent to evaluate and treat congestive heart failure. In what does this skill or ability consist? Certainly it involves a considerable amount of specifiable “knowledge that”—the textbook abstractions about heart failure that doctors learn and that can be assessed objectively in examinations. But the recognition and proper handling of heart failure in the clinic or on the ward is a different matter. When should fatigue or shortness of breath in a given patient lead one to do diagnostic studies for heart failure? How does one interpret biochemical evidence of heart wall stress in a patient who may have lung disease that could explain both shortness of breath and right-sided heart dysfunction? Is another patient with vascular redistribution on chest x-ray, but with a normal brain natriuretic peptide level, volume overloaded or not?
One could elaborate answers to questions like these, with numerous qualifications according to possible patient idiosyncrasies, and one would eventually arrive not at a guide for the treatment of heart failure, but at a description of the possible universe of heart failure patients no less various than the hundreds of patients in any large heart failure clinic. And, even if one specified the proper responses to the variety of heart failure patients to that extent, the complete set of heart failure learning objectives so obtained would, if conceived merely as a set of abstractions possessed by the trainee, be completely inadequate. For merely possessing the abstractions would in no way guarantee their being properly brought to bear on the next new patient that walked into the clinic.
So, the capacity to make use of knowledge and skills in the clinical setting slips through our fingers if we attempt to specify it in terms of learning objectives whose achievement can be assessed by the kind of delimited objective assessment tool favored by the ACGME. Either the objectives describe what we want but are so general that they fail to make contact with particular clinical tasks, as in be able to evaluate and treat congestive heart failure; or they are so specific that their individual accomplishment approximates practical acquaintance with given individual heart failure patients. In the one case, one has remained at the level of a broad, generic competence without advancing to specification and measurement; in the other, one is back with the previous difficulty, with learning objectives of having to specify a comprehensive set of dispositions to think and act in particular ways corresponding to a large universe of possible clinical situations.
Why does competence slip through our fingers if we break it down into specific objectives and view it as their sum? This is what the ACGME Outcome Project appears to do; it anatomizes competence by characterizing it in general terms and then analyzing it into components of propositional and procedural knowledge that, if confirmed to be present in the practitioner, allow her to be designated “competent.” While its defined competencies are general, the ACGME's preference for specification and measurement demand that trainees' knowledge as it is actually assessed be in small pieces that, it is presumed, add back together to constitute general competence.
How to Better Think About Competency
Competence at treating heart failure implies a capacity to confront the new patient coming in the door and size him up; in doing so, one uses the tools of history taking and physical examination—but these, with the subsequent array of laboratory and imaging techniques one may employ, become modes of progress toward apprehending the patient in a satisfactory way. That apprehension may be reached quickly if the patient is straightforward, so that one seems to come to the correct diagnosis with little effort. Or, if one can't make sense of the patient's clinical picture, one seeks to achieve a sense of the patient that that is nonetheless satisfying—if not in the form of an answer, at least in the form of a question. That is, one seeks to formulate a problem.
Competence is thus open ended; one cannot specify in advance the range of possible presentations of heart failure one is going to see; or rather, although one can specify a subset of the infinitely various possible set of presentations, the point of competence is not that one has mastered representations corresponding to that set. Rather, it is that faced with a particular case, one can come to a proper representation of it after confronting it and coming to grips with it. Such a “take” on a case may involve a diagnosis if a straightforward presentation is involved, or a problem formulation for a more difficult case.
This process of interacting with the case to come to a proper view of it is not exclusively propositional, or even conscious. As one talks with and examines a patient, one is responding to all sorts of cues that lead in one direction rather than another as the picture develops. Some of these lead us without conscious thought on our part, as the testimony of experts recreating their performances confirms; experts often have difficulty ascertaining how they came to the conclusions they did. Or, they are able to retrospectively rationalize responses that came to them without deliberation.
Educators infer competence in trainees who, as we watch them evaluate cases, know their way around the world to which we are introducing them. The competent are not led astray by deceptive appearances; they are impressed by the important aspects of a case and are not misled by issues that are merely distracting or superfluous. We evaluate such trainees not according to knowledge of this or that abstraction, or even in terms of getting it right with this or that patient; we seek to assure ourselves that they have a sense of context—what's relevant or important and what isn't; what the important question is, even if they don't know the answer to it. The answer is less important than getting the question right. The answer is usually obtainable if one asks the right question, whereas there is no clear fix to not being able to ask the right question.
We thus suggest that a close scrutiny of how clinicians work reveals that the knowledge of medicine and the techniques of obtaining information that trainees learn become not competence itself, but its building blocks. It is only in the use of these tools in apprehending the patient, an apprehension that is skillful insofar as it is responsive to context, that complete competence is exhibited. It is such skillful apprehension, in formulating a problem if not in immediately coming to an answer, that allows the solid but not flashy third-year resident to outperform the very smart intern.
The ACGME's approach to the evaluation of competence suggests that we might characterize the “knowledge of the territory” that we expect competent physicians to have by specifying the universe of possible patient presentations and corresponding characterizations of each with its unique diagnostic and therapeutic approach, and then require physicians to know most or all of a given set of specifications. Drawing the proper inferences from given patient presentations according to the rules of clinical medicine can then be objectively assessed and stamped as competence or its lack. This way of characterizing medical work is referred to by Donald Schon as “technical rationality”: one learns the realm of facts, the principles that relate them to one another, and one can then draw the proper conclusions according to one's purposes.5
Medical work is actually much closer to Schon's “reflection-in-action,” a process of responsiveness to an evolving situation in which perception may lead immediately to the right “take,” or else to propositional reflection that feeds back on the perceptual process as a problem is then constructed. Such construction is dependent on propositional knowledge but also on the physician's “feel” for the territory in which the patient's problem fits. To the extent that specifications of patient presentations divide up a territory of disease into genera and species, the abstract content of them corresponds to textbook knowledge. The procedural content, of knows how to identify and treat heart failure, or how to treat patient X with this variant of heart failure, although more to the point, still encompasses an almost infinitely variable set of possible clinical circumstances, to the extent that one attempts to specify the broader competence in terms of its components. Dividing up the realm of clinical medicine into discrete specifications obscures the creative and constructive activity of the practitioner as he or she engages with a new case. It is this capacity that enables a perceptive diagnosis or else formulation of the right problem that constitutes higher-level competence: a kind of sensitivity to, even more than knowledge of, an area of interest.
An analogy between learning medicine and learning a language or a complex game might clarify the character of the competence we seek to elucidate. As we consider the ability to speak a language or play a game, we might focus on realms of possible well-formed sentences or game scenarios. But languages or games as spoken or played are primarily activities rather than realms. The language or game may delimit the space in which the activity takes place, but the activity itself can take an infinite variety of form within the given space. In assessing speakers of a language or players of a game for competence, our concern is not to map out the space in question and specify some part of it in terms of declarative and procedural knowledge to be demanded of speakers or players-in-training. It is to have trainees play the game or speak the language and judge the adequacy of their performance. In doing so we are interested less, perhaps, in breaking down their performance into elements than in its overall suitedness to the task at hand demanded by the state of play or speech in which they find themselves at any given moment. We want from them a kind of responsiveness to context rather than any given armamentarium of knowledge and technique (we may indeed want the latter, but it alone is not a clear sign of competence).
Objectivity and Assessment
It might be objected at this point that the ACGME's competencies are not incompatible with the kind of flexible responsiveness to the clinical setting we are claiming competence to be. At the general level at which they are presently stated, that is true. Paradoxically, the general competencies are an attempt by the ACGME to avoid overspecification in medical education.6 But the ACGME's demand for objectivity and measurement in assessment of competence ineluctably leads to the assessment of pieces of performance in the fragmentary fashion that we have argued cannot be presumed to add up to the kind of competence we actually are interested in.
Consider the ACGME's suggested methods for evaluating the competent care of patients. Of the general competencies specified by the ACGME, those most closely related to getting patient care right seem to be, under the general competency patient care,7 the following abilities—physicians must be able to:
* Gather essential and accurate information about patients.
* Make informed decisions about diagnostic and therapeutic interventions based on patient information and preferences, up-to-date scientific evidence, and clinical judgment.
* Develop and carry outpatient-management plans.
The ACGME's favored methods for evaluating these competencies, with the exception of checklists, involve controlled settings in which trainee work is either simulated or retrospectively assessed through memory and written records. Controlled settings cannot sufficiently recreate the pressure of ongoing clinical work in which trainees must respond to context, and checklists notoriously reward thoroughness rather than expertise or competence.8 The ACGME recommends the methods they do with the laudable goal of increasing objectivity. That is, minimizing measurement error from subjective influences and from any doubt that what is being measured is something “out there,” seen in the same way by all potential evaluators. It is the attempt at specifying what is to be evaluated so that its detection is as straightforward and as judgment independent as possible that leads to the preference for checklists and yes/no criteria over ratings.9 The difficulty is in the presumption that what can be measured in the ways suggested by the ACGME fully captures clinical competence. Instead, the objectively measurable may be limited to the knowledge and procedural skills, that is, to the building blocks of competence rather than the thing itself.
Testing for the building blocks is warranted at earlier stages of clinical training. OSCEs and simulated patients are put to good use in the third and fourth years of medical school and, possibly, during internship. And some of the assessment methods suggested by the ACGME might work well for purposes other than that of evaluating higher-level clinical competence. Simulated patients and 360-degree evaluations might be very helpful for evaluating doctor–patient communication and professionalism. But for clinical competence during the later stages of residency, we clinical educators must use methods that are better suited to what we are assessing. Such evaluation must engage with trainees in the actual work setting, because it is their responsiveness to that context that must be assessed. Evaluation cannot be limited to judgment-independent aspects of performance, because such aspects do not sum up to clinical competence. Clinical judgment in context is our quarry, and that can only be assessed in the first person by a fully competent clinician participating in clinical work with the trainee. Assessment must thus be subjective for validity, but this requirement need not sacrifice reliability.9
The present use of global ratings at the end of resident rotations suffers from a number of difficulties, such as generosity error and the halo effect. The inevitable biases introduced by subjective evaluation can, however, be overcome by suitably wide sampling. The scrutiny of global ratings in various performance tests suggests that reproducibility of scores is affected less by evaluators' subjectivity than by the variability of an examinee's performance across cases and settings.9 Reproducibility of subjective global rating scores can thus be achieved if raters are trained and if sampling is sufficiently extensive. Ways in which sampling might be extended beyond the summative monthly evaluation include introducing trained clinician evaluators into postcall rounds or ward walk rounds to rate interns' and residents' clinical performance. Even more valuable would be observation of residents as they assess and begin the treatment of new patients admitted on call. Over the course of a month on service, an attending whose only task was evaluation could spend some afternoons and early evenings on call with the resident team to watch given team members deal with new admissions or cross-cover. The seasoned clinician could rate the resident according to how well his or her performance met the actual demands of the clinical situation in real time. Several patient evaluations per resident could be accumulated in this way over the course of the clinical rotation. Such methods of evaluation would require a vast increase in faculty time and energy devoted to trainee assessment, just as would the methods suggested by the ACGME. Moreover, we argue that the advantage of having faculty present in the clinical setting in real time would not be limited to a valid assessment of clinical competence, which the ACGME's methods cannot provide. Having faculty spend more time with trainees in the clinical setting would also foster education. Although the costs would be substantial, the gain for both assessment and training would be similarly great. Although global ratings could be much improved by methods such as these, the summative monthly evaluation is unlikely to be completely expendable. There may be no substitute for the cumulative impression of the trainee gained by the ward attending during a month of close interaction on service. The challenge is to make such evaluations as reproducible as possible, perhaps through better training of faculty evaluators.
Conclusion: The ACGME Outcome Project and American Graduate Medical Education
We have suggested that the ACGME is promulgating assessment tools that are ill suited to the higher-level aspects of competence on which we must focus as graduate medical educators. Other medical educators also have misgivings about the presumption that objective assessment of performance at tasks susceptible to measurement and benchmarking can serve as an adequate index of clinical competence.10,11 Although it is asserted that evidence favors competency-based education over more traditional approaches,12,13 such evidence as is cited suggests that the superiority is exhibited in the better inculcation of specific tasks by competency-based approaches, often tasks that are amenable to objective measurement—which, of course, begs the question of whether the competence being measured is what we are really interested in.
If the impact of the Outcome Project is limited to the ACGME's favored assessment regimen becoming de rigueur in American residency programs, the damage is likely to be limited to the loss of faculty and trainee time and effort that could be more usefully spent in other activities. Global ratings by experienced faculty are unlikely to be completely displaced and would still serve as a useful complement to the objective measures favored by the ACGME. The use of the ACGME's conception of competence in discussions of training length and subspecialization timing portends, we suggest, the possibility of more serious damage to American medical training. It is suggested that objective measures of competence replace traditional training length and certification by program directors as the gauge of trainee readiness to complete training and move on. If we are correct about what elements of competence objective measures can scrutinize effectively, such measures will not (because they cannot) actually assess the kind of competence we wish to assess. And trainees who have successfully developed basic knowledge and skills but who do not necessarily perform well under the press of the clinical setting may be judged to be far more competent than they actually are. Training length is likely to shrink for many trainees if training lengths for given purposes are set by objective measures. Many trainees who may well be able to demonstrate their armamentarium of specific knowledge and skills may, nevertheless, lack the developed clinical perception and judgment that only length of training in an appropriate clinical setting under good supervision may supply. This is not to suggest that three years is necessarily the correct training length for internal medicine or that some degree of focus on a subspecialty might not be appropriate during the later stages of internal medicine training. It is simply to assert that objective measures of competence cannot be adequate to guarantee its presence. If competent practice is like playing a game or speaking a language well, breadth of experience gained through time spent in training gains in value far beyond what might be required to meet standards set by objective measures. Hence, the value of years of medicine training or numbers of cases in surgery are independent of performance on examinations or in other objective assessment exercises.
The road to certification according to objective measures has already been partially traversed in Great Britain since the imposition of the Calman reforms on graduate medical education there, which have shortened some specialist training under a model emphasizing “signing off” on trainees who meet competency-based standards akin to such as might follow from the ACGME's model.11 The results of these reforms have engendered deep misgivings among academic clinicians in the United Kingdom.14
Were U.S. trainees to move through graduate medical education and out into practice without the high level of clinical competence for which American graduate education has become known throughout the world, that would be a tragedy and a betrayal of our traditional ideals of clinical training. Those ideals, traceable to Osler and his effect on American clinical education,15 demand that we pause in our quest to certify competence by objective measures, and that we be sure that we know just what it is we are seeking and how best to obtain and assess it before we jeopardize the model for graduate medical education that has served us well for the past 100 years.