In 1998, the Accreditation Council for Graduate Medical Education (ACGME) required that all medical training institutions have a designated institutional official (DIO) who would be responsible for monitoring and ensuring compliance of all ACGME-accredited graduate training programs within that institution.1 This continuously evolving role is extremely challenging in large academic institutions that may have well over 50 separate residency and fellowship programs with hundreds, if not thousands, of residents and fellows. Although the creation of the DIO position likely improved institutional accountability and streamlined ACGME communications with institutions by channeling them through a single individual, few resources or evaluation tools were provided to help DIOs successfully fulfill their job descriptions.2 Indeed, a recent survey by Riesenberg et al3 found that DIOs come from a wide variety of backgrounds, with different levels of the requisite training that is needed to be a successful DIO. Most concerning, there seems to be a great deal of overlap between DIO and program director responsibilities, creating redundancy and inefficiency of work, as well as confusion and resentment between program directors and DIOs.4
To address all of these concerns, in 2005 our graduate medical education committee (GMEC) developed a residency program report card with quantitative and qualitative metrics that fulfills several simultaneous goals: continuous tracking of program performance, establishing clear residency program standards of performance, and improving communication between program directors and the DIO.
University Hospitals Case Medical Center (UHCMC) has over 60 different ACGME-accredited residency and fellowship programs. Including unaccredited programs, there are over 90 program directors, 520 residents, and 215 fellows. Our GMEC is composed of 25 individuals, including program directors, residents, senior management, and the DIO, who serves as chairman of the committee.
Development of the Report Card
With their established goals in mind, the GMEC reviewed the medical and program evaluation literature using PubMed and the Education Resources Information Center social science publication database for factors that are most important in evaluating residency program quality. Search terms included program evaluation, residency evaluation, resident selection, RRC, ACGME, and report card. Also, all of the past residency review committee (RRC) reports from UHCMC programs were reviewed to see what subjective and objective factors RRCs and the ACGME felt were most important when evaluating residency programs. On the basis of these results, the GMEC designed an institution-wide program-evaluation instrument that had four sections: (1) quality of candidates recruited, (2) the resident educational program, (3) graduates' success, and (4) overall house officer satisfaction. These sections were felt to best represent the theoretical construct of residency program performance. Each section was then further subdivided into quantitative and qualitative metrics. These metrics were chosen on the basis of our review of the literature as well as a desire to select metrics that would be specific, relevant, and measurable. The GMEC developed a scoring system based on the metrics that gave each section a total of three points, for a cumulative score out of a possible 12 points. Each metric was also standardized so that three points was most desirable or indicated highest quality for that metric, and zero points was least desirable or indicated lowest quality. Data for each section were supplied by the DIO, residency program directors, or residency program coordinators. It is important to note that not all information collected in the report card currently has a standardized metric associated with it, because this report card is continuously being improved and developed (Appendix 1).
Quality of candidates
A residency program's overall quality of candidates is determined by the mean score for the following metrics: whether the residency program filled during the last resident match, the average USMLE Step 1 score of applicants accepted to the program, and the percentage of accepted applicants who were inducted into the Alpha Omega Alpha (AOA) honor society.
The match is the residency application system by which resident applicants are paired with residency programs that have ranked those applicants as being acceptable additions to the residency program. Applicants who go unmatched during this process, or who are not initially accepted into a residency program, enter a process appropriately termed the scramble. This process allows applicants to declare their unmatched status and gives residency programs an opportunity to interview and select these remaining resident applicants. Prior studies have demonstrated that program quality is a major determinant in a resident's decision to apply to a certain program.5
The USMLE Step 1 examination is the initial examination in a three-step process towards medical licensing in the United States. Students usually complete this examination during the second to third year of medical school, and it is heavily weighted by many different types of residency programs in the resident selection process.6–9 We use residents' Step 1 scores as opposed to Step 2 scores in our program assessment, because verified Step 2 scores are not available for all interns in each residency program. In general, several studies have demonstrated that Step 1 and Step 2 scores correlate with each other.10–12 However, Step 2 scores seem to vary depending on the time interval between completing the third-year curriculum and taking the test, with significantly lower scores noted closer to the end of the final year of medical school.13 This effect has not been described with Step 1 scores, likely because of more uniform test-administration times at the end of the second year of medical school.
Finally, the AOA honor society is the only national medical honor society in the United States. Election to AOA typically occurs in the third and fourth years of medical school and is limited to those students with the highest levels of academic achievement. AOA status is quite prestigious in the United States and is, therefore, used by many resident programs as an important factor when ranking resident applicants.14–18
Therefore, the quality of candidates in each program is rated on the report card as follows:
- For those programs with a score of 3, the residency program filled during the match, the average accepted applicant Step 1 score was ≥230, and/or ≥50% of accepted applicants were in AOA.
- For those programs with a score of 2, the residency program filled outside of the match, the average Step 1 score was 200 to 229, and/or 25% to 49% of accepted applicants were in AOA.
- For those programs with a score of 1, the residency program filled during the scramble, the average applicant Step 1 score was 175 to 199, and/or <25% of accepted applicants were in AOA.
- For those programs with a score of 0, the residency program neither filled during the match nor the scramble, and/or had an average applicant Step 1 score of <175.
A median score is then calculated from each section to give a total score out of a maximum three points per section. It is important to note that we do not believe that the total quality of resident candidates is exclusive to only the metrics discussed above. There are many other facets of a resident candidate, including professionalism, communication skills, intrinsic motivation, and emotional intelligence. Unfortunately, there is a lack of validated evaluation instruments to measure these important qualities, so we have not incorporated them into our report card.
Resident educational program
The quality of each resident educational program was measured by the mean of three metrics: (1) length of ACGME accreditation, (2) total number of major citations during the program's last formal RRC site visit, and (3) the median evaluation score from the DIO's interview with the residency program director.
Each residency program is accredited by the ACGME for a set number of years before the program must reapply for accreditation. The minimum length of time of accreditation is one year, and the maximum length of time is five years. A longer period of accreditation implies a greater overall quality of the program. If a program does not achieve minimal standards of accreditation, the program can be placed on probation until it meets the necessary accreditation requirements, or a program may be suspended completely.
The second metric for evaluating the quality of a resident educational program was the number of major citations from that program's RRCs. The ACGME determines accreditation largely on the basis of the results of the RRCs that conduct site visits of each program seeking accreditation. Each RRC is specific for a given discipline within medicine and is typically composed of physicians in the particular specialty. Serious areas of deficiency in which the RRC recommends a residency program improve will be listed as major citations in that program's RRC report. Examples of major citations include a <60% first-time pass rate on the board examinations, inadequate institutional support, or inadequate evaluation of the residents and fellows within a program. Logically, residency programs with a higher-quality educational program should have few, if any, major citations, and vice versa.
The last metric for evaluating the quality of an educational program that the GMEC used was the median evaluation score from the DIO's interview with each residency program director. These interviews are referred to as one on ones with the DIO and typically occur once a year. During these meetings, the DIO asks individual program directors a series of questions (Appendix 2) encompassing the following areas: career preparation for graduates, faculty teaching and supervision, faculty interest in resident education, written resident evaluations, resident work environment, resident scholarly activity, fulfillment of ACGME core competencies, and monitoring/compliance with ACGME work hour mandates. Each of these sections is scored on a rating scale from 1 to 5, where 1 = poor, 3 = average, and 5 = excellent. Minutes of the meeting are sent to the program director, and a copy of the minutes is kept in the DIO's office. Although this metric is subjective, it firmly reinforces the very important role of the DIO as an intermediary between the ACGME and the residency program directors. The interview also allows the DIO to go beyond an administrative role and serve as a true mentor for program directors.
Therefore, the resident educational programs are rated on the report card as follows:
- For those with a score of 3, the program had ACGME accreditation for at least four years, received no major citations, and/or received a median DIO evaluation score ≥4.
- For those with a score of 2, the program had ACGME accreditation for two to three years, received ≤3 major citations, and/or received a median DIO evaluation score of 3.
- For those with a score of 1, the program had ACGME accreditation for less than two years, received ≥4 major citations, and/or received a median DIO evaluation score of 2.
- For those with a score of 0, the program is on probation status and/or received a median DIO evaluation score of 1.
A median score is then calculated from each section to give a total score out of a maximum three points per section.
Success of graduates
Overall success of graduates from each residency program is measured by three metrics: (1) board examination passing rate, (2) average in-service examination score, and (3) program attrition rate.
All graduates of a residency program must pass their respective discipline's board examination to be considered board certified for that particular field of medicine or surgery. The board pass rate is therefore an amalgam of residency program educational experience combined with the resident's ability to retain and demonstrate cognitive knowledge on a standardized examination.19 The GMEC required each program director to submit the total number of resident graduates who took the examination and the mean first-time pass rate over the previous three years.
In addition, each discipline has an in-service examination that functions both to prepare residents for the actual board examination and to allow residents to self-assess their individual knowledge deficits. The in-service examination is scored on a percentile basis.
Finally, the last measure of resident graduate success is the number of residents who chose or were asked to leave the residency program. The GMEC felt that a large resident attrition rate could be indicative of a poor educational program or a poor residency application system that accepted residents who could not succeed in the program. Programs that minimized resident attrition added to overall resident graduate success in the short term through retention of residents and in the long term through increased overall program reputation.
Therefore, the success of graduates from each program is assessed on the report card as follows:
- For those with a score of 3, resident graduates had an ≥85% board pass rate, mean in-service scores among senior residents was in the ≥70th percentile, and/or the residency program had <10% resident attrition.
- For those with a score of 2, resident graduates had a 75% to 84% board pass rate, mean in-service scores among senior residents was in the 50th to 69th percentile, and/or the residency program had 11% to 20% resident attrition.
- For those with a score of 1, resident graduates had a 60% to 74% board pass rate, mean in-service scores among senior residents was in the 20th to 49th percentile, and/or the residency program had 21% to 50% resident attrition.
- For those with a score of 0, resident graduates had a <60% mean pass rate averaged over the last three years, mean in-service scores among senior residents was <20th percentile, and/or the residency program had >50% resident attrition.
A median score is then calculated from each section to give a total score out of a maximum three points per section.
House officer satisfaction
To use data that stems directly from residents, the GMEC also measures overall resident satisfaction using the scores from the ACGME house officer satisfaction survey. This anonymous survey is electronically distributed by the ACGME every year and requires residents to rate their program using a five-point scale where 1 = poor, 3 = average, and 5 = excellent in the same program areas as covered by one on one with the DIO (mentioned above). A median score is then calculated, and three points are given for a median score >4, two points for a median score of 3 to 4, one point for a median score of 2 to 3, and zero points for a median score of <2.
Putting It All Together
It would be inappropriate for us to publish our actual completed report cards so far, because this project is still a work in progress. In addition, publication of the true scores at this time might unnecessarily embarrass some program directors whose programs received low scores. Table 1 lists a set of hypothetical scores to illustrate how the final report card is summarized. The mean score out of a possible three points for each section is listed in a spreadsheet along with the program's total score out of 12 points. This format allows the DIO to compare residency programs quickly and to identify specific programmatic strengths and weaknesses. The total scores are shared only via written communication between the DIO and that program's residency program director. The hospital's chief medical officer also reviews the data, but the card is not distributed further. However, any residency program with a report card score lower than 9 is encouraged to schedule a meeting with the DIO to receive concrete feedback and to plan appropriate steps to raise the report card score the following year.
So far, our actual results, which are not reported in this article, have demonstrated that programs that are typically seen as the most competitive (e.g., dermatology) also have the highest program report card scores. Although this result adds to the construct validity of our evaluation instrument, the report card will likely be most useful as a means to track trends and progress among individual programs. The report card also allows cross-comparisons of residency programs or fellowships within the same department (e.g., vascular surgery compared with cardiothoracic surgery). Given that residency program cultures vary greatly even within the same institution,20 we do not use the report card to compare residency programs across disciplines (e.g., family practice compared with internal medicine).
Is Our Report Card Valid?
Validity is the evidence that either supports or contradicts the interpretation of the report card results. In the recent medical education literature, construct validity is considered the primary type of validity21 and has replaced older validity frameworks such as face validity. Constructs are theoretical concepts that are based on observations of human behavior but that have multiple facets, including content, representation of study participants, and accountability of construct validity threats.22 Residency program quality is the construct of our report card system.
The content of the report card must be considered realistic by all stakeholders, including residents, attendings, and the ACGME. Each metric used in the report card was thoughtfully selected in an evidence-based manner with careful input from all stakeholders, as discussed above. In addition, these metrics ensure representation of the various components of each residency program. The process we used to select and evaluate each metric was consistent with the guidelines outlined by the Joint Committee on Standards for Educational Evaluation.23
Validity threats to our report card system include construct underrepresentation and construct-irrelevant variance.24 Construct underrepresentation weakens the internal structure of the report card, which is the reliability and generalizability of the results. This component of validity is the primary weakness of our report card system, because there are no previously published institution-wide residency-program-evaluation tools against which to compare our programs. However, our goal is to collect data prospectively using our report card system and to determine long-term follow-up results. Also, we hope that our system stimulates other residency programs to modify this report card for their institutions so that we can further enhance this assessment instrument. Finally, another general validity threat is construct-irrelevant variance. This threat refers to systematic errors that can occur during the analysis of residency program data. We have endeavored to minimize these errors by using simplistic statistical analysis and a three-point metric scale for each component of resident program quality. To date, there is no published evidence that a broader numerical scale increases construct validity of a residency-program-evaluation instrument.
Taking It Further
Program evaluation across multiple medical disciplines is a relatively unresearched field in the medical education literature.25 Our dimensional approach to develop a program report card is similar to others reported in the literature.26 However, unlike other program report cards in the literature,27 our program report card is GMEC generated and DIO driven instead of resident generated only. This program report card is a first step towards developing a valid and reliable program-evaluation tool that can be used for residency programs in many different disciplines, and it has several advantages and disadvantages from a DIO's point of view.
An advantage of this instrument is that it does not add further duties to the DIO's increasing list of responsibilities, because it uses data that are already collected in preparation for internal review and RRC site visits. This dependence on data that are also examined by RRCs ensures that residency programs are meeting expectations before RRC site visits, and, therefore, it may prevent major citations and probation/suspensions. Also, the program report cards allow our GMEC to set internal institutional benchmarks for which residency programs should strive. Finally, and most important, the program report card emphasizes the importance of communication between residency program directors and the DIO to improve programmatic weaknesses and build on programmatic strengths.
The disadvantages of this instrument are the difficulties in providing concrete construct validity. Although the process we used to develop the report card supports the validation process as discussed above, we recognize that our program report card is a surrogate for a perception of residency program quality by each program's RRC. Unfortunately, each RRC has widely disparate views of program quality that are not always communicated or consistent. This obfuscation is especially ironic in this time of public outcry for process transparency in the business, political, and accounting arenas. Future directions of our program report card are to work closely with the ACGME and/or the group on resident affairs of the Association of American Medical Colleges to begin to set national benchmark criteria for acceptable residency program performance in each medical discipline. These standards would require multicenter trials of a standardized report card to allow DIOs and program directors to compare residency programs objectively and to identify areas of improvement at a local and national level.
2 Nelson RL. Commentary: The designated institution official: one DIO's perspective. Acad Med. 2006;81:17–19.
3 Riesenberg LA, Rosenbaum PF, Stick SL. Competencies, essential training, and resources viewed by designated institutional officials as important to the position in graduate medical education. Acad Med. 2006;81:426–431.
4 Riesenberg LA, Rosenbaum P, Stick SL. Characteristics, roles, and responsibilities of the Designated Institutional Official (DIO) position in graduate medical education. Acad Med. 2006;81:8–16.
5 Villaneuva AM, Kaye D, Abdelhak SS, Morahan PS. Comparing selection criteria of residency program directors, physicians, and employers. Acad Med. 1995;70:261–271.
6 Altmaier EM, Johnson SR, Tarico VS, Laube D. An empirical specification of residency performance dimensions. Obstet Gynecol. 1988;72:126–130.
7 Boyse TD, Patterson SK, Cohan RH, et al. Does medical school performance predict radiology resident performance? Acad Radiol. 2002;9:437–445.
8 Edwards JC, Currie ML, Wade TP, Kaminski DL. Surgery resident selection and evaluation. A critical incident study. Eval Health Prof. 1993;16:73–86.
9 Black KP, Abzug JM, Chinchilli VM. Orthopaedic in-training examination scores: a correlation with USMLE results. J Bone Joint Surg Am. 2006;88:671–676.
10 Vosti KL, Bloch DA, Jacobs CD. The relationship of clinical knowledge to months of clinical training among medical students. Acad Med. 1997;72:305–307.
11 Myles TD, Henderson RC. Medical licensure examination scores: relationship to obstetrics and gynecology examination scores. Obstet Gynecol. 2002;100:955–958.
12 Ogunyemi D, Taylor-Harris D. Factors that correlate with the U.S. Medical Licensure Examination Step-2 scores in a diverse medical student population. J Natl Med Assoc. 2005;97:1258–1262.
13 Pohl CA, Robeson MR, Veloski J.USMLE Step 2 performance and test administration date in the fourth year of medical school. Acad Med. 2004;79(10 suppl):S49–S51.
14 Erlandson EE, Calhoun JG, Barrack FM, et al. Resident selection: applicant selection criteria compared with performance. Surgery. 1982; 92:270–275.
15 Kron IL, Kaiser DL, Nolan SP, Rudolf LE, Muller WH Jr, Jones RS. Can success in the surgical residency be predicted from preresidency evaluation? Ann Surg. 1985;202:694–695.
16 Fine PL, Hayward RA. Do the criteria of resident selection committees predict residents' performances? Acad Med. 1995;70:834–838.
17 Boyse TD, Patterson SK, Cohan RH, et al. Does medical school performance predict radiology resident performance? Acad Radiol. 2002;9:437–445.
18 Daly KA, Levine SC, Adams GL. Predictors for resident success in otolaryngology. J Am Coll Surg. 2006;202:649–654.
19 Norcini J. The relationship between features of residency training and ABIM certification performance. J Gen Int Med. 1987;2:330–336.
20 Elliott RL, Juthani NV, Rubin EH, Greenfeld D, Skelton WD, Yudkowsky R. Quality in residency training: toward a broader, multidimensional definition. Acad Med. 1996;71:243–247.
21 Messick S. Validity. In: Linn RL, ed. Educational Measurement. 3rd ed. New York, NY: American Council on Education and Macmillan; 1989:13–104.
22 Downing SM. Validity: on the meaningful interpretation of assessment data. Med Educ. 2003;37:830–837.
23 The Joint Committee on Standards for Educational Evaluation. The Program Evaluation Standards: How to Assess Evaluations of Educational Programs. 2nd ed. Thousand Oaks, CA: Sage Publications, Inc; 1994.
24 Downing SM, Haladyna TM. Validity threats: overcoming interference with proposed interpretations of assessment data. Med Educ. 2004;38:327–333.
25 Bierer SB, Fishleder AJ, Dannefer E, Farrow N, Hull AL. Psychometric properties of an instrument designed to measure the educational quality of graduate training programs. Eval Health Prof. 2004;27:410–424.
26 Iverson DJ. Meritocracy in graduate medical education? Some suggestions for creating a report card. Acad Med. 1998;73:1223–1225.
27 Bellini L, Shea JA, Asch DA. A new instrument for residency program evaluation. J Gen Intern Med. 1997;12:707–710.