Share this article on:

A Multisite, Multistakeholder Validation of the Accreditation Council for Graduate Medical Education Competencies

Smith, C. Scott MD; Morris, Magdalena RN, MSN; Francovich, Chris EdD; Tivis, Rick MPH; Bush, Roger MD; Sanders, Shelley Schoepflin MD; Graham, Jeremy DO; Niven, Alex MD; Kai, Mari MD; Knight, Christopher MD; Hardman, Joseph MD; Caverzagie, Kelly MD; Iobst, William MDfor the Pacific Northwest Consortium for Outcomes in Residency Education

doi: 10.1097/ACM.0b013e3182951efc
Research Reports

Purpose The Accreditation Council for Graduate Medical Education’s (ACGME’s) six-competency framework has not been validated across multiple stakeholders and sites. The objective of this study was to perform a multisite validation with five stakeholder groups.

Method This was a cross-sectional, observational study carried out from October to December, 2011, in the internal medicine residency continuity clinics of eight internal medicine residency programs in the Pacific Northwest, including a VA, two academic medical centers, a military medical center, and four private hospitals. The authors performed a cultural consensus analysis (CCA) and a convergent-discriminant analysis using previously developed statements based on internal medicine milestones related to the six competencies. Ten participants were included from each of five stakeholder groups: patients, nurses, residents, faculty members, and administrators from each training site (total: 400 participants).

Results Moderate to high agreement and coherence for all groups were observed (CCA eigenvalue ratios ranging from 2.16 to 3.20); however, high differences in ranking order were seen between groups in four of the CCA statements, which may suggest between-group tension in these areas. Analyses revealed excellent construct validity (Zcontrast score of 5.323, P < .0001) for the six-competency framework. Average Spearman correlation between same-node statements was 0.012, and between different-node statements it was –0.096.

Conclusions The ACGME’s six-competency framework has reasonable face and construct validity across multiple stakeholders and sites. Stakeholders appear to share a single mental model of competence in this learning environment. Data patterns suggest possible improvements to the competency-milestone framework.

Dr. Smith is director, Boise VA Center of Excellence in Primary Care Education, Boise, Idaho, and professor of medicine, University of Washington, Seattle, Washington.

Ms. Morris is nursing faculty, Carrington College, Boise, Idaho.

Dr. Francovich is associate professor of leadership studies, Gonzaga University, Spokane, Washington.

Mr. Tivis is statistician, Boise VA Center of Excellence in Primary Care Education, Boise, Idaho.

Dr. Bush is attending physician, Virginia Mason Medical Center, Seattle, Washington.

Dr. Schoepflin Sanders is clinical faculty instructor, Internal Medicine Residency, Providence St. Vincent Medical Center, Portland, Oregon.

Dr. Graham is faculty, Providence Internal Medicine Residency Spokane, Spokane, Washington.

Dr. Niven is clinical associate professor of medicine, University of Washington, Seattle, Washington, and internal medicine program director, Madigan Healthcare System, Tacoma, Washington.

Dr. Kai is affiliate associate professor of medicine, Oregon Health & Science University, and faculty, General Internal Medicine, Providence Portland Medical Center, Portland, Oregon.

Dr. Knight is associate professor of medicine, University of Washington, Seattle, Washington.

Dr. Hardman is assistant professor of medicine, Oregon Health & Science University, Portland, Oregon.

Dr. Caverzagie is associate vice chair for quality and physician competence, Internal Medicine, University of Nebraska Medical Center, Omaha, Nebraska.

Dr. Iobst is vice president, Academic Affairs, American Board of Internal Medicine, Philadelphia, Pennsylvania.

Correspondence should be addressed to Dr. Smith, Boise VA Center of Excellence in Primary Care Education, VA Medical Center (111), 500 W. Fort St., Boise, ID 83702; telephone: (208) 422-1325; fax: (208) 422-1319; e-mail:

The Accreditation Council for Graduate Medical Education (ACGME) unveiled the Outcome Project in September 1999.1 By July 2002, all graduate medical education programs in the United States began a transition from process measures to six general outcome measures (i.e., the six core competencies).2 The development of this framework began with expert consensus, comprehensive literature review, focus groups, and stakeholder commentary. The ACGME framework now also directly influences undergraduate medical education,3 maintenance of certification,4 and hospital privileging/Joint Commission accreditation.5 However, soliciting stakeholders’ reactions to an existing proposal is not the same as including stakeholders in the initial brainstorming phase of development. Stakeholders’ feedback could have been subject to “errors of omission,” that is, forgetting to include items of importance from their perspective that were not triggered by simply reviewing the list of proposed competencies. It is critical, then, to validate the influential ACGME model with a broad group of stakeholders across several settings.

Back to Top | Article Outline


Cultural consensus analysis

Cultural consensus analysis (CCA) is a standard anthropological technique that determines whether and to what degree groups hold shared knowledge, and whether there are conflicting preferences and values between groups.6 CCA can be performed as a series of true/false statements, multiple-choice statements, or a forced-choice ranking of statements (our method). Identifying the correct statements to use in CCA requires significant qualitative data collection and analysis, such as ethnographic observation and/or focus groups. Ideally, forced-choice CCA includes 12 to 20 statements. Using fewer statements does not require participants to make enough value judgments, while using more statements becomes too complex for participants to accurately sort them.

Back to Top | Article Outline

Milestones and the milestone version of CCA

The statements we used in our CCA study were drawn from previous work. In 2007, to guide internal medicine training programs in objectively documenting the six general ACGME competencies, a 33-member task force was convened to identify a series of developmental milestones based on the Dreyfus7 model of skill acquisition to be used in competency-based assessment of trainees. This group consisted of program directors, experts in evaluation and quality, and representatives of the internal medicine stakeholder organizations. They identified 142 milestones and 38 subcompetencies that could be used to standardize assessment of the ACGME competencies in internal medicine programs nationally.8

These milestones provided clarity for the competency-based assessment process, but individual assessment of all milestones proved unwieldy and frequently led to detailed checklists as the primary assessment method. Faculty called for a smaller set of measures focused on key elements of development. This smaller set could be used to validate the competency model, guide the development of focused assessment tools, and determine whether clinic stakeholders shared a single mental model of trainee competence.

A seven-person group consisting of members of the original milestones committee, national internal medicine organizations, and experts in CCA was convened in 2011 to create this focused set. First, they converted milestones into active statements (active statements are required for the CCA process). For instance, “Access medical information resources to answer critical questions and support decision making,” became “Find and analyze new studies to help patient care.” Second, they iteratively summarized, prioritized, and combined milestones. This resulted in some compound statements (e.g., “think of cost and risk when making decisions”). Using a test–retest technique, they assessed the sensitivity to framing (e.g., reversing “cost” and “risk”) in a small cohort and found no differences in ranking. Finally, statements were simplified to a Flesch–Kincaid reading level of eighth grade.9 This milestones version of CCA (M-CCA) consists of a set of 12 cards, each with a statement describing a milestone, such as “Keep timely, complete, and clear chart notes” (a statement related to the Communication competency). The set includes two statements per competency. Each competency generally has two different types of M-CCA statements, one aimed at immediate patient care tasks and one reflecting a broader aspirational value.

The purpose of the current study was to assess the construct validity of the ACGME six-competency model. A previous CCA has been used to statistically validate an underlying conceptual model on which the other statements were based.10 In the analysis of participants’ rankings of the statements, statements derived from a single node in a model (in this case, the two milestone statements within a single competency) should be much more highly correlated than they are with any of the other statements across the entire data set, and this should be true for all nodal pairs if the model is valid. We used the M-CCA statements and this technique to validate the ACGME competencies with patients, clinic nurses, internal medicine residents, internal medicine faculty members, and administrators across eight locations. We chose these groups because they are the five groups most often felt to affect the resident continuity clinic in our previous studies.

Back to Top | Article Outline


This study was conducted at eight internal medicine continuity clinics in the Pacific Northwest including a VA hospital, two academic medical centers, a military medical center, and four private hospitals between October and December 2011. The study received IRB approval at all eight sites. One of three experienced CCA investigators (C.S.S., M.M., C.F.) approached a convenience sample of potential participants at each site in a standardized way and in a semiprivate area. No participants were preidentified. The investigator asked the participants if they would be willing to hear more about an anonymous study that required about five minutes. If a participant agreed, the investigator explained that this was a study to better understand resident learning in clinic and that it was important to get everyone’s viewpoint. The investigator said that participants would be asked to review a deck of 12 cards, each with a statement about something that might happen during a clinic visit, and to sort the cards by order of importance to them. We continued quota-based sampling until we had collected data from 10 patients, 10 clinic nurses (MA or BSN), 10 internal medicine residents, 10 internal medicine faculty, and 10 administrators (from clerks to the CEO) at each site, for a total of 400 participants. The sorting order, site, and group were the only data recorded on a standardized sheet.

We conducted analysis in two ways. First, we performed a CCA on each of the five participant groups and the entire data set using UCINET Software (Analytic Technologies, Lexington, Kentucky). Consensus analysis measures the coherence and strength of a shared mental model within and between proposed groups. To do this, an N × N matrix (where N is the number of participants) is created, and each element is filled with the proportion of statements that the pair of participants ranked identically. Participants are assumed to have an unknown level of “competence,” the proportion of answers that would align with their (also unknown) group norm. Various values of competence are explored for each individual until a “least squares” solution between predicted and observed competence is identified. The “correct” order of the statements (also unknown to us), which would align that participant with his or her cultural partners, is then calculated a posteriori, using Bayes’11 theorem identifying the preferred order of statements for the group that must hold true given each individual’s statement order and competence. The standard for assuming a single-factor structure (i.e., a shared set of values and beliefs within the proposed group), such that the group has a single cultural viewpoint of the data, is that there are no negative values for individual competence and that the eigenvalue ratio between the first factor is “several times” that of the next factor (usually set at ≥3).6 Eigenvalue ratios between two and three are considered moderate evidence for a single-factor solution.

For the second analysis, we assumed that the two statements from the same competency would have a relatively high correlation (convergence) and that statements from different competencies should have low correlation (discrimination) in rankings across all sites and all groups. We assessed convergent validity by calculating the Spearman correlation coefficient for the rankings of all M-CCA statement pairs from the same competency. We assessed discriminant validity using the entire data set by calculating the Spearman correlation coefficient for the rankings of all M-CCA statement pairs from different competencies. We performed this analysis on the entire data set (from all groups and sites). We calculated the convergent-discriminant validity scores to calculate a Zcontrast score using the methods of Westen and Rosenthal12 in order to quantify the construct validity of the six-competency model.

Back to Top | Article Outline


Table 1 shows the CCA eigenvalue ratios and statement preference orders for each stakeholder group and for the data set as a whole. These values demonstrate that every group achieved a moderate to high likelihood of a single-factor solution, which suggests reasonable face validity for the competency framework. Individual rankings show that there may be some tension between groups (due to large ranking differences) across all sites with regard to the following four M-CCA statements:

Table 1

Table 1

  1. Care for patients with different diseases in clinic, nursing home, and hospital.
  2. Know how to read basic labs and X-rays, and use medications.
  3. Keep timely, complete, and clear chart notes.
  4. Show respect for society and the medical profession.

The average Spearman correlation between same-node M-CCA statements was 0.012 (SD 0.155) and between different-node M-CCA statements was –0.096 (SD 0.136) (see Table 1 for all Spearman rankings). The convergent-discriminant Zcontrast score for the model was 5.323 (P < .0001), suggesting extremely good construct validity.

Back to Top | Article Outline


The moderate to high likelihood of a single-factor solution for all groups in CCA combined with the high Zcontrast score in convergent-discriminant analysis suggests that the competency framework has moderate to high face and construct validity across the five stakeholder groups tested in internal medicine teaching clinics. This means that these groups have a shared mental model of competence in this environment and the framework is a suitable measure from their perspective. However, it does not necessarily follow that this is the best framework for evaluation, because adding new competencies (as proposed by others)13 or changing a current competency’s focus and specificity might improve these data.

CCA preference orders revealed common tensions across resident continuity clinics. Statements creating tension included “Keep timely, complete, and clear chart notes” (more important to patients than others) and “Show respect for society and the medical profession” (more important to nurses and administrators than others). These value/preference differences may cause recurring problems for continuity clinics. For instance, in a prior study there was a large difference in the perceived value of the electronic medical record between administrators (enthusiastic) and faculty (unenthusiastic) at one site. This discrepancy was associated with the biggest problem identified by an independent interdisciplinary focus group at the institution: “Tension between faculty and administration over the computerized medical record.”14 In our study, differences between participant groups’ rankings of “Care for patients with different diseases in clinic, nursing home, and hospital” and “Know how to read basic labs and X-rays, and use medications” suggest that the training aspect of continuity clinic may be of lower value to patients, nurses, or administrators than it is to faculty and residents.

It is interesting to note the rank-order pattern between M-CCA statements from the same competency. Each competency generally had two different types of M-CCA statements, one which was aimed at immediate patient care tasks and one which reflected a broader aspirational value. At times this led to large ranking differences that were consistent across groups, such as the rankings for Professionalism and Systems-Based Practice statements (see Table 1). This may reflect a general bias toward concrete and personal wording over abstract and general wording. These findings suggest that refinement of the competency and milestone framework may improve evaluative validity even further.

There are some limitations to this study. It took place in specific resident continuity clinics, and because context is extremely important for cultural studies, the results may not be relevant toinpatient or subspecialty experiences or generalizable to other sites or disciplines beyond internal medicine. M-CCA should be tested in other contexts. Participants often found it difficult to sort the statements because so many were felt to be of equal importance, and thus preference differences may be exaggerated. Also, the results are very dependent on the extent to which the M-CCA statements truly reflect the competency and milestone framework. The process of grouping and simplifying the 142 milestones into the M-CCA statements may have been flawed. Exploration and refinement of the M-CCA statements, particularly the compound statements, should be furtherstudied.

Back to Top | Article Outline


The six-competency framework appears to have reasonable face and construct validity across multiple stakeholder groups at a wide variety of internal medicine training programs. These stakeholders have a shared mental model of continuity clinic. Beginning with a shared mental model significantly influences the learning environment, and the learning environment is a major determinant of safety, quality, and educational effectiveness. Thus, it appears that the broad influence of the ACGME competencies is warranted. It also appears that, whenever possible, milestones should be specified as concrete direct patient care interactions and not as aspirational values.

Acknowledgments: Lieutenant Colonel Patricia Short, MD, helped with study preparation and manuscript review.

Funding/Support: The Pacific Northwest Consortium for Outcomes in Residency Education is supported in part by the American Board of Internal Medicine.

Other disclosures: None.

Ethical approval: The following IRBs reviewed and approved this study: VA Puget Sound HCS (for the Boise VA as coordinating center and study site), University of Washington, Oregon Health and Science University, Benaroya Research Institute (Virginia Mason), Madigan Healthcare System, IRB Spokane, and Providence Health Services (includes St. Vincent and Portland).

Back to Top | Article Outline


1. Swing S. ACGME launches outcome assessment project. JAMA. 1998;279:1492
2. Batalden P, Leach D, Swing S, Dreyfus H, Dreyfus S. General competencies and accreditation in graduate medical education. Health Aff (Millwood). 2002;21:103–111
3. Mooney CJ, Lurie SJ, Lyness JM, Lambert DR, Guzick DS. Development of an audit method to assess the prevalence of the ACGME’s general competencies in an undergraduate medical education curriculum. Teach Learn Med. 2010;22:257–261
4. Heffron MG, Simspon D, Kochar MS. Competency-based physician education, recertification, and licensure. WMJ. 2007;106:215–218
5. Catalano EW Jr, Ruby SG, Talbert ML, Knapman DGMembers of Practice Management Committee, College of American Pathologists. . College of American Pathologists considerations for the delineation of pathology clinical privileges. Arch Pathol Lab Med. 2009;133:613–618
6. Romney AK, Weller SC, Batchelder WH. Culture as consensus: A theory of culture as informant accuracy. Am Anthropol. 1986;88:313–338
7. Dreyfus SE, Dreyfus HLA Five Stage Model of the Mental Activities Involved in Direct Skill Acquisition. Air Force Office of Scientific Research Under Contract F49620-79-C-0063. 1980 Berkeley, Calif University of California
8. Gree ML, Agaard EM, Caverzagie KJ, et al. Charting the road to competence: Developing milestones for internal medicine residency training. J Grad Med Educ. 2009;1:5–20
9. Smith CS, Hill W, Francovich C, et al. Developing a cultural consensus analysis based on the internal medicine milestones (M-CCA). J Grad Med Educ. 2011;3:246–248
10. Smith CS, Morris M, Hill W, Francovich C, Christiano J. Developing and validating a conceptual model of recurring problems in teaching clinic. Adv Health Sci Educ Theory Pract. 2006;11:279–288
11. Lee PM Bayesian Statistics: An Introduction. 20043rd ed Wiley Hoboken, NJ
12. Westen D, Rosenthal R. Quantifying construct validity: Two simple measures. J Pers Soc Psychol. 2003;84:608–618
13. Weinberger SE. Providing high-value, cost-conscious care: A critical seventh general competency for physicians. Ann Intern Med. 2011;155:386–388
14. Smith CS, Morris M, Hill W, et al. Testing the exportability of a tool for detecting operational problems in VA teaching clinics. J Gen Intern Med. 2006;21:152–157
© 2013 by the Association of American Medical Colleges