Through its Outcome Project, the Accreditation Council for Graduate Medical Education (ACGME) has given its imprimatur to the multidimensional approach to measuring clinical competence.1 The Outcome Project, a long-term initiative begun in 1999, identified six general clinical competencies: patient care, medical knowledge, practice-based learning and improvement, interpersonal and communication skills, professionalism, and systems-based practice. All accredited graduate medical education (GME) programs in the United States will be required to include the general competencies in their curricula, and, in a major paradigm shift, will be required to demonstrate that trainees have mastered the competencies prior to graduation. This initiative has generated a great deal of discussion in the medical education community about how best to measure these six competencies.
The literature offers a wide array of tools that residency programs might use to evaluate their residents across the general competencies. The options available range from the classic written examinations that have been used routinely for over a century,2 to peer ratings,3 patient satisfaction surveys,4 and more recent innovations such as standardized patients,5 objective structured clinical examinations (OSCEs),6 and high-fidelity simulations.7 The ACGME has summarized some of the most important strengths and weaknesses of 13 assessment tools in its Toolbox of Assessment Methods.8
Among the options available in the ACGME's Toolbox are global performance ratings, the instrument used most frequently by GME programs to assess trainees’ competence.9,10 Although the Toolbox does not recommend global ratings as being especially suitable for assessing any one of the six general competencies, it is likely that many residency programs will update and fine-tune their existing global rating forms as a first step toward comprehensive assessment of the six competencies. One report of this approach has already been published.11 The reasons for this strategy are obvious. First, it will be far easier and more logical for program directors to begin by revising existing instruments rather than by abandoning them and proceeding directly to develop new tools. Correspondingly, it is much more likely that faculty will readily support and use instruments that are similar to the rating forms with which they are already familiar. Finally, residents are likely to accept minor changes in the wording and content of a known instrument, whereas they may be threatened if a new and unfamiliar instrument is substituted for standard rating forms.
The widespread use of global rating forms in GME is a practice that seems likely to continue in the near future. The purpose of our study was to ascertain whether an existing global rating form for residents could be modified to assess the six ACGME general competencies.
Our study involved 1,367 residents in programs in 92 specialties and subspecialties at Thomas Jefferson University Hospital and the Albert Einstein Medical Center in Philadelphia, Pennsylvania, at the end of the 2001–02 and 2002–03 academic years. Our research use of the data, which had been collected as part of the routine management of the GME programs, was reviewed and approved by the Institutional Review Board of Thomas Jefferson University.
We developed a 23-item, one-page global rating form by modifying an earlier, 20-item version that had been evaluated between 1998 and 2000. In that evaluation, program directors had been instructed to rate each resident using a five-point scale (1 = severe deficits, 2 = marginal competence, 3 = expected competence, 4 = above-expected competence, 5 = outstanding competence). A factor analysis of ratings, using the 1998 version of the form for 882 residents in multiple specialty and subspecialty programs at one academic medical center, revealed that the 20 items could be grouped as two independent and reliable factors corresponding to the constructs of knowledge/data gathering/processing skills and interpersonal skills/attitudes. Validity evidence included statistically significant increments in the rating scores on each of the two factors related to advanced levels of residency training and to higher scores on the three steps of the U.S. Medical Licensing Examination.12 The content, construct, concurrent and predictive validity of previous versions of the form have been documented.13–21
In order to develop the new form, we analyzed the content of the items to ensure that they represented the six ACGME general competencies (see Table 1). Eleven of the items in the 1998 version were retained without change in the new form. Two items related to clinical judgment and sensitivity to patients’ age, gender, culture and disabilities were edited for clarification. Seven items with less precise wording (e.g., “interpersonal skills” and “professional skills”) were deleted and replaced by ten new, more explicit items designed to probe similar areas (e.g., “effectiveness as a team member,” “effectiveness as a consultant to other specialties” and “compassion and empathy”). The relationships between the general competencies and the content of the items in the new form were assigned by referring to the full-text descriptions of the ACGME general competencies.8 For example, Item 1 related to patient history was linked to a statement in the ACGME's description of patient care: “gather essential and accurate information about their patients.” Although each item was primarily associated with one of the six competencies, we recognized that many items spanned multiple competencies. For example, Item 3 related to differential diagnosis was linked not only to a statement in patient care, “make informed decisions about diagnostic and therapeutic interventions based on patient information and preferences, up-to-date scientific evidence and clinical judgment,” but also to another statement in medical knowledge, “demonstrate an investigational and analytic thinking approach to clinical situations.”
As shown Table 1, the 23 items in the final form were arranged in a random sequence unrelated to their specific content or to the six competencies. This minimized rating bias due to the “halo effect” in which raters tend to assign the same ratings to adjacent items on a form. The sequence also blinded the raters to the items’ assignment to the six general competencies so it would be meaningful to perform factor analysis to investigate the statistical interrelationships among ratings on the 23 items.
Subsequently, over a period of several months, we reviewed revisions to the form together with the content specifications. A copy of the machine-readable rating form is available on request.
The forms were mailed to residency program directors in April 2002, with the instructions that they rate each resident using the same five-point scale of severe deficits through outstanding competence used in the 1998 evaluation. They were instructed to rate each resident in relation to all residents ever supervised, not only in relation to those in the current group. Nonrespondents received follow-up reminders by mail and telephone in May and June, 2002. We repeated the procedure in 2003.
We analyzed the properties of the 23 items and their interrelationships to estimate the validity of the ratings. Means, standard deviations and product-moment correlation coefficients were calculated across all items. The interitem correlation matrix was subjected to principal components analysis followed by varimax rotation retaining factors with eigenvalues greater than one. Unweighted subtest scores were calculated by assigning each item to a subtest based on its highest factor loading.
Completed rating forms were available for 1,295 (95%) residents during the 2001–02 and 2002–03 academic years. Sixty-three percent of the residents were men and 37% were women; the sample was representative of the gender distribution at both institutions. Approximately two-thirds of the residents were in the first three years of graduate medical education. The sample was representative across training programs with the exception of lower response rates in cardiology, obstetrics–gynecology (ob–gyn), pediatrics, and neurosurgery. Although nearly half of the residents were in the five largest programs including emergency medicine, general surgery, internal medicine, ob–gyn, and pediatrics, the sample also included a broad spectrum of 61 specialty and subspecialty programs with fewer than ten trainees.
The vast majority of residents were rated at the level of expected competence or higher on all items. Table 2 shows the items ranked by the mean rating calculated using the five-point scale of 1 = severe deficits to 5 = outstanding competence. The highest mean rating of 4.04 was found for Item 8 (professional ethical standards) where 72% of residents were rated at the level of above expected or outstanding. Only 15 residents (1%) were rated as either being marginal or having severe deficits on this item. The mean for Item 20 (compassion and empathy) was identical to Item 8, where 73% were rated as above expected or outstanding.
On the other hand, mean ratings were lowest with regard to consideration of costs in care and management of resources. Only 49% of residents were rated at the level of above expected or outstanding on Item 10 (costs). Similarly, on Item 23 (accessing and managing resources) 53% were rated at the level of above expected or outstanding. As Table 2 shows, the program directors consistently assigned higher ratings on items related to the competencies of professionalism and interpersonal skills, whereas they assigned lower ratings on average on items tied to the competencies of knowledge, patient care, and system-based care. The validity of the ratings provided by residency and fellowship program directors across diverse specialty and subspecialty programs is supported by the higher ratings in areas that might be expected, such as ethics, and lower in the areas of cost containment and management. The weak preparation of young physicians in the latter domains has been well-documented in the literature.22,23
The product-moment correlation coefficients describing the relationships among items further support the internal validity of the instrument (data not shown). For example, the correlation coefficient was .80 between Item 8 and Item 20 on professionalism. Similarly, the correlation between Item 23 and Item 10, the lowest rated items assessing cost considerations in health care was .80. In contrast, the correlation between Item 8 (ethics) and Item10 (costs), which measure divergent competencies, was lower at .61. A review of the entire correlation matrix of 23 items revealed the highest correlation (.86) between Item 3 (clinical judgment in differential diagnosis) and Item 16 (clinical judgment in treatment plans). The lowest correlation (.45) was found between Item 2 (knowledge of basic sciences) and Item 6 (relationships with other health care personnel).
Table 3 summarizes the results of the factor analysis after rotation, which accounted for 77% of total variance. The first and largest factor, which yielded an eigenvalue of 16.5, included items tied to the general competencies of interpersonal communication skills and professionalism, whereas items related to knowledge, patient care, practice improvement, and systems-based care dominated the second factor, which had an eigenvalue of 1.32. The alpha reliabilities for the two factors were .80 and .88, respectively.
The stability of the two-factor solution was first tested by performing separate factor analyses for three different subgroups: men versus women, two clusters of specialties, and four levels of training. The residents were separated into two clusters of specialty programs broadly defined as people-oriented specialties (e.g., primary care, psychiatry) as opposed to technology-oriented specialties (e.g., radiology, surgical specialties and subspecialties). Four levels of training were designated as first, second, third, and fourth year or higher. The results of the factor analysis for each of the gender, specialty and training level subgroups, with the exception of the results for second-year residents, yielded two factors that were similar to those obtained for the entire sample of 1,295 residents. The results for the second-year residents differed only in the sequence of the two factors.
The stability of the two-factor solution was also tested by forcing a six-factor solution, in which the additional four factors yielded eigenvalues of .59, .43, .42, and .35, respectively. The first and largest factor of interpersonal communications skills and professionalism remained unchanged. The second factor included seven items related to systems-based practice, and practice-based learning and improvement. Factor 3 comprised four items related to medical knowledge and patient care. Factors 4 and 5 each included one item related to teaching medical students and knowledge of the contraindications of particular procedures, respectively. No factor loading in the sixth factor exceeded .43.
Analysis of variance followed by the Duncan multiple range test for the four levels of training indicated that residents at levels three and above scored higher than first- and second-year residents on Factor 1 (F3,1,220 = 7.70). A similar analysis for Factor 2 indicated significant differences (F3,1,220 = 20.57) among the four levels of training.
Assessing residents’ clinical competence is essential. However, the process has always presented unique challenges including defining the concept of clinical competence; describing the requisite, integrated knowledge, skills and attitudes that constitute clinical competence; and finding the most valid and reliable measurement tools, subject to practical considerations of cost and acceptance by patients, evaluators, trainees, and the profession.
Two approaches have been employed to define competence. The unidimensional approach uses a global construct and posits that competence is a quality that can ultimately be reduced to a single score, even when this score is derived from subscores reflecting measurements of qualities across multiple competencies.24–26 Although the unidimensional approach has practical advantages, it has several shortcomings. The most serious is that a physician with deficiencies in one area (e.g., medical knowledge) can achieve a satisfactory overall rating by demonstrating outstanding competence in another (e.g., interpersonal and communication skills). Medical educators have long recognized that a trainee with outstanding competence in one area but with deficiencies in another cannot be judged to be a well-qualified physician. Another serious limitation of global indices is that they provide little or no information that can be used for constructive feedback, an important component of residents’ development.27
The alternative multidimensional approach considers that clinical competence has multiple qualities that cannot be reduced meaningfully to a single score.19,28 This multidimensional approach is intuitively appealing and reduces the chances that a marginal trainee will be able to compensate for serious deficiencies in one area with strengths in another. It also supports evaluation systems designed to provide constructive feedback to trainees, faculty evaluators, and program directors. A consensus has emerged, based on nearly four decades of empirical research, that clinical competence is very complex, requiring assessments of multiple skills.24,28–32
This consensus on a multidimensional approach is reflected in the ACGME Outcome Project, a major educational initiative that is having a powerful effect on the way GME is conducted in the United States. The ACGME's six general competencies are now required in all residency curricula. The curricula and the assessment of residents’ proficiency in the six general competencies are now monitored routinely during accreditation visits by the ACGME's Residency Review Committees. In addition, sponsoring institutions are required to support their GME programs as they respond to this initiative. Institutional accreditation visits now evaluate the sponsoring institution's role in, and support of, the implementation of the Outcome Project. It is generally accepted that one of the most challenging aspects of the Outcome Project is the development and implementation of appropriate assessment tools.
We undertook this study to find out how program directors from a broad spectrum of 92 residency programs would rate their trainees on a global rating form consisting of items representing the ACGME competencies. They rated residents significantly higher on items related to professionalism, interpersonal skills, and attitudes than on items related to knowledge, data gathering, and data processing. We found evidence of validity in the program directors’ ratings of residents. The lowest ratings were found on items related to cost considerations in medicine, which is consistent with published reports of young physicians’ deficiencies in this area.23 Although the items on the form were arranged randomly, the correlations between conceptually related items were higher than correlations between unrelated items. Overall, the mean ratings for residents at advanced levels of training were higher than were those of younger trainees.
We also investigated the degree to which the program directors’ ratings reflected the multidimensional nature of clinical competence. While we did not expect to identify six factors because of the overlap in definition across the six general competencies and the relative brevity of the form, we expected to find more than two independent factors. Although several studies have reported that two large factors related to clinical performance and personal attributes account for about three-fourths of the variance in ratings of physicians,3,12 other studies have yielded small third and fourth factors associated with systems-based care16 and patient satisfaction.29 A more recent study of ratings for 382 first-year residents on a 24-item form yielded two major factors entitled “interpersonal communication” and “clinical skills,” which accounted for 78% of variance.33
Our analysis identified two large two factors (77% of variance), which were described as “interpersonal communication skills and professionalism” (Factor I) and “medical knowledge, patient care and systems-based care” (Factor II) using the taxonomy of the ACGME competencies. The items related to practice improvement were distributed across the two factors. For example, we found higher loadings for “willingness to accept feedback” and “effectiveness in teaching medical students” on Factor I, while “commitment to quality assurance and improvement” and “adaptation to new technology” had higher loadings on Factor II. Although the other studies cited earlier have concurred in the identification of only two factors, our analysis differed in that our “interpersonal skills” factor (Factor I) was much larger than our “clinical performance” factor (Factor II); Factor II was actually quite small. We account for this by noting that our global rating form was designed to weigh all six competencies equally. Because four of the six competencies are not specifically related to clinical care, this approach gives more weight to skills traditionally regarded as “nonclinical”—e.g., communication skills, ability to work on a team, effectiveness as a consultant, self-directed learning skills. This would account for the “interpersonal communication skills and professionalism” factor being larger than the “medical knowledge, patient care, and systems-based care” factor.
Although the concept that clinical competence is comprised of six distinct competencies is attractive, our results seem to show that program directors rate residents in only two dimensions. Our research indicates that it is possible to measure only two distinct “competencies” with a carefully designed global rating form of reasonable length; it may be difficult, if not impossible, to separate out six distinct areas of residents’ performance as defined by the ACGME. Indeed, any attempt to do so may well be hampered by inherent artificiality. Generally, program directors in our study did not seem not to distinguish specific proficiencies in systems-based practice and practice-based learning and improvement from those in patient care or medical knowledge; nor did they seem to distinguish between specific proficiencies in professionalism and those in interpersonal skills and communications. Indeed, this is intuitively sensible; it is difficult to imagine a resident who is a consummate professional but not an excellent communicator.
When we first developed the specific items in the global rating form to describe the knowledge, skills, and attitudes, we noted overlap in the implicit meaning and explicit definitions of the six ACGME competencies, and found it challenging to write items that would correspond directly to single competencies. For example, “competence in taking history from patient and family” would require competency in medical knowledge, patient care, interpersonal and communication skills, and professionalism. This inherent overlap may well account for much of the evaluators’ inability to distinguish between the competencies.
Charles Bosk, in Forgive and Remember: Managing Medical Failure, describes an observational, qualitative study of evaluation in a surgical residency.34 In this description of the dynamics of attending–resident interaction, Bosk recognized that the faculty perceived residents’ performance in essentially two areas: cognitive–technical and moral–ethical–social. Other studies have quantitatively supported Bosk's observations, showing that raters identify the same two components (cognitive and attitudinal) of performance in medical students and residents.32,35,36 One may hypothesize that many studies of global assessment forms identify only two competencies (medical knowledge and interpersonal skills) because that is essentially how faculty, program directors, and attending physicians perceive residents’ performance. If, however, faculty were carefully trained to focus on the knowledge, behaviors, skills, and attitudes in the six general competencies, a global assessment tool might be more effective at distinguishing among them. Unfortunately, the literature on training for medical assessment is very limited, and does not indicate that training programs are particularly useful in increasing the accuracy of evaluators.37,38 However, there is an extensive business literature on training evaluators (a discussion of which is beyond the scope of this paper), which suggests that particular types of rater training programs may be effective in improving the performance of evaluators. We are now in the process of discussing how our global rating form may be used to educate faculty in recognizing the different areas of the six general competencies and how this can translate into effective methods for training our evaluators. It must be recognized that careful training of faculty in assessment techniques is a time-consuming and expensive, but vital, task. This underscores the importance of investment in faculty development in our academic institutions.
Our study represents a significant contribution to the literature on assessment in GME. Global rating forms are the most commonly used assessment instrument in GME, and are used across all specialties. Experience at our own institution, reports in the literature, and the activities of some of the specialty organizations assisting their training programs in meeting ACGME requirements indicate that one of the first approaches to assessing the six general competencies will be to revise preexisting global rating forms or develop new global rating forms that reflect the six general competencies. The simple and direct nature of this approach may lead educators to rely heavily on these forms when assessing residents’ proficiency in the six general competencies. Our research indicates that it would be wise to proceed with caution in this direction.
If there really are six distinct and independent competencies, then a carefully designed global rating form of reasonable length may be unable to distinguish among them. This is a particularly important finding. Littlefield et al.,9 in a study supported by the National Board of Medical Examiners’ Stemmler Foundation, documented the prevalence and importance of global rating forms in GME assessment, but noted the need for high response rates in order for them to provide valid data. A brief global rating form will help ensure the highest possible return rate. A much longer form might be able to distinguish among the general competencies, but would result in a decreased return rate. However, it is possible that other instruments may be better able to distinguish among the competencies.
In summary, our findings indicate that a global rating form may not be an appropriate instrument for distinguishing among the six ACGME general competencies. Furthermore, the findings of similar studies question the very premise that six distinct competencies constitute global physician competence. Although the ACGME Toolbox of Assessment Methods offers a number of alternatives that GME programs can use to assess residents’ proficiency across the six general competencies, serious challenges exist to the practical implementation of many of these tools, including limited knowledge of evaluators, their inexperience with newer alternatives such as standardized patients, OSCEs, and resident portfolios, constraints on trainee time, cost, and acceptance by the trainees. Further research is needed to describe and validate the concept of six general competencies. If these competencies are indeed validated as six distinct entities, then further research and faculty development will be needed to determine the most appropriate assessment methods.
Dr. Robert Sataloff and two anonymous reviewers provided thoughtful comments and suggestions on earlier versions of the paper. Dr. Mohammadreza Hojat provided expert consultation on the execution and interpretation of the factor analysis. This study could not have been completed without the cooperation of the 92 program directors who completed the rating forms.
1.Leach D. Competence is a habit. JAMA. 2002;287:243–4.
2.Osler W. Examinations, examiners, and examinees. Lancet. 1913;2:1047–50.
3.Ramsey PG, Wenrich MD, Carline JD, Inui TS, Larson EB, LoGerfo JP. Use of peer ratings to evaluate physician performance. JAMA. 1993;269:1655–60.
4.Ware JE Jr, Hays RD. Methods for measuring patient satisfaction with specific medical encounters. Med Care. 1988;26:393–402.
5.Boulet JR, McKinley DW, Norcini JJ, Whelan GP. Assessing the comparability of standardized patient and physician evaluations of clinical skills. Adv Health Sci Educ. 2002;7:85–97.
6.Harden RM. Assessment of clinical competence using objective structured examination. BMJ. 1975;1:447–51.
7.Issenberg SB, McGaghie WC, Hart IR, et al. Simulation technology for health care professional skills training and assessment. JAMA. 1999;282:861–6.
9.Littlefield J, Paukert J, Schoolfield J. Quality assurance data for residents’ global performance ratings. Acad Med. 2001;76:S102–S104.
10.Gray JD. Global rating scales in residency education [review]. Acad Med. 1996;71(1 suppl):S55–S63.
11.Reisdorff EJ, Hayes OW, Carlson DJ, Walker GL. Assessing the new general competencies for resident education: a model from an emergency medicine program. Acad Med. 2001;76:753–7.
12.Nasca TJ, Gonnella JS, Hojat M, et al. Conceptualization and measurement of clinical competence of residents: a brief rating form and its psychometric properties. Med Teach. 2002;24:299–303.
13.Veloski JJ, Herman MW, Gonnella JS, Zeleznik C, Kellow WF. Relationships between performance in medical school and first postgraduate year. J Med Educ. 1979;54:909–16.
14.Gonnella JS, Hojat M. Relationship between performance in medical school and postgraduate competence. J Med Educ. 1983;58:679–85.
15.Herman MW, Veloski JJ, Hojat M. Validity and importance of low ratings given to medical school graduates in noncognitive areas. J Med Educ. 1983;58:837–43.
16.Hojat M, Veloski JJ, Borenstein BD. Components of clinical competence ratings of physicians: an empirical approach. Educ Psychol Meas. 1986;46:761–9.
17.Turner BJ, Hojat M, Gonnella JS. Using ratings of resident competence to evaluate NBME examination passing standards. J Med Educ. 1987;62:572–81.
18.Hojat M, Borenstein BD, Veloski JJ. Cognitive and noncognitive factors in predicting the clinical performance of medical school graduates. J Med Educ. 1988;63:323–5.
19.Gonnella JS, Hojat M, Erdmann JB, Veloski JJ. Assessment measures in medical school, residency, and practice: the connections. Acad Med. 1993;68:S3–S106.
20.Hojat M, Glaser KM, Veloski JJ. Associations between selected psychosocial attributes and ratings of physician competence. Acad Med. 1996;71:S103–S105.
21.Xu G, Veloski JJ, Hojat M. Board certification: association with physicians’ demographics and performances during medical school and residency. Acad Med. 1998;73:1283–9.
22.Blumenthal D, Gokhale M, Campbell EG, Weissman JS. Preparedness for clinical practice: reports of graduating residents at academic health centers. JAMA. 2001;286:1027–34.
23.Halpern R, Lee MY, Boulter PR, Phillips RR. A synthesis of nine major reports on physicians’ competencies for the emerging practice environment. Acad Med. 2001;76:606–15.
24.Price PB, Taylor CW. Measurement of physician performance. J Med Educ. 1964;39:203–10.
25.Taylor CW, Lewis FG. Syntheses of multiple criteria of physician performance. J Med Educ. 1969;44:1063–9.
26.Brumback GB, Howell MA. Ratings of clinical effectiveness of employed physicians. J Appl Psychol. 1972;56:241–4.
27.Ende J. Feedback in clinical medical education. JAMA. 1983;250:777–81.
28.Epstein RM, Hundert EM. Defining and assessing professional competence. JAMA. 2002;287:226–35.
29.Arnold L, Willoughby TL, Calkins EV. Understanding the clinical performance of physicians: a factor analysis approach. J Med Educ. 1984;59:591–4.
30.Keck JW, Arnold L, Willoughby L, Calkins V. Efficacy of cognitive/non-cognitive measures in predicting resident physician performance. J Med Educ. 1979;54:759–65.
31.Dowaliby FJ, Andrews BJ. Relationship between clinical competence ratings and examination performance. J Med Educ. 1976;51:181–8.
32.Maxim BR, Dielman TE. Dimensionality, internal consistency and interrater reliability of clinical performance ratings. Med Educ. 1987;21:130–7.
33.Paolo A, Bonaminio G. Measuring outcomes of undergraduate medical education: residency directors’ ratings of first-year residents. Acad Med. 2003;78:90–5.
34.Bosk CL. Forgive and Remember: Managing Medical Failure. Chicago: University of Chicago Press, 1979.
35.Dielman TE, Hull AL, Davis WK. Psychometric properties of clinical performance ratings. Eval Health Prof. 1986;3:103–17.
36.Verhulst SJ, Colliver JA, Paiva REA, Williams RG. A factor analysis study of performance of first-year residents. J Med Educ. 1986;61:132–4.
37.Noel GL, Herbers JFJ, Caplow MP, Cooper GS, Pangaro L, Harvey J. How well do internal faculty members evaluate the clinical skills of residents? Ann Intern Med. 1992;117:757–65.
38.Newble DI, Hoare J, Sheldrake PF. The selection and training of examiners for clinical examinations. Med Educ. 1980;14:345–9.