Secondary Logo

Journal Logo

Research Reports

National Internal Medicine Milestone Ratings: Validity Evidence From Longitudinal Three-Year Follow-up

Hauer, Karen E. MD, PhD; Vandergrift, Jonathan MS; Lipner, Rebecca S. PhD; Holmboe, Eric S. MD; Hood, Sarah; McDonald, Furman S. MD, MPH

Author Information
doi: 10.1097/ACM.0000000000002234

Abstract

Competency-based medical education (CBME) defines expected outcomes of training; milestones define the steps in development of the competencies (i.e., abilities) needed for unsupervised practice.1,2 The Accreditation Council for Graduate Medical Education (ACGME) and American Board of Internal Medicine (ABIM) have defined 22 subcompetencies with associated milestones for the six ACGME/American Board of Medical Specialties core competencies for internal medicine (IM) residents.3 Program directors working with clinical competency committees first used the IM milestones to report to the ACGME and ABIM on residents’ performance in the 2013–2014 academic year. Milestone ratings for the first cross-sectional cohort of IM residents were higher for postgraduate year (PGY) 3 residents than for residents earlier in training.4 Similar increases in ratings for higher PGY levels using milestones have been shown in cross sections of other specialties.5,6 However, to our knowledge the developmental trajectory for individual residents over time, as characterized by longitudinal milestone ratings, is as yet undescribed.7

Introduction

The complete set of IM reporting milestones is intended to fulfill a core premise of CBME by charting the course toward expected outcomes of training.8,9 For residents to accomplish these milestones, they need opportunities to engage in and be assessed doing the activities described in each subcompetency milestone.10,11 Initial experience with reporting milestones showed that some subcompetencies were “not assessable” for some residents, meaning that the program director had insufficient information to rate a resident’s performance on this subcompetency. In the first year of IM milestones reporting, 2,889 residents (13.3% of the population of 21,774 residents in the United States) had at least one subcompetency rated as “not assessable.”4 PGY1 residents were most likely to have “not assessable” ratings, although these ratings occurred for all three PGYs. These “not assessable” ratings constitute a threat to the validity of the milestones because gaps in scores limit the inferences that can be made about performance.12

The validity of milestones to measure residents’ ability to provide safe, high-quality, patient-centered care is of high interest to medical educators, residents themselves, and the public.13,14 Longitudinal milestones reporting for individual residents can strengthen evidence of the validity of the milestones by demonstrating when during training and for which milestones residents show gains in their learning. Across resident cohorts, comparisons in milestone ratings for the same PGY could suggest changes in program directors’ understanding of the milestones framework and each subcompetency milestone,15 or differences among trainee cohorts. At the program level, program directors working with clinical competency committees can respond to “not assessable” subcompetency milestones by adding to or modifying curricula. These findings would add evidence that milestones influence curriculum development. Through milestone ratings, individual residents rated on milestones as below expectations or below their peers can be offered extra support. For example, because medical knowledge milestone ratings are correlated with performance on the IM certification examination,16 determination of low performance on which medical knowledge subcompetency milestone ratings assigned during training most strongly predict lower certification examination scores would enable targeted remediation.

The purpose of this study was to present validity evidence for the IM milestones through three approaches: determining whether the percentage of “not assessable” milestones decreased over time for residents throughout training, and for subsequent cohorts of residents; reporting mean longitudinal subcompetency milestone ratings for individual IM residents over their three years of residency training to determine the degree to which milestone ratings characterized residents’ development of competence in the six ACGME competencies3; and examining the correlation of medical knowledge milestone ratings in each PGY with certification examination scores to determine the predictive validity of milestone ratings for certification examination outcomes. Together, these analyses contribute to the validity argument for the milestones. Using Kane’s12 validity framework, this study strengthens understanding of how much milestone ratings capture evaluators’ observations of their residents (using “not assessable” as a measure of inferences about scoring), and how milestones data can be used for extrapolation of information about residents’ performance on these ratings to their real-world performance, by examining their trajectory and comparing ratings with another performance measure.

Method

Study design

We structured this investigation as a retrospective cohort study.

Subjects and setting

Subjects were all 36,658 PGY1 to PGY3 IM residents in U.S. ACGME-accredited IM residency programs during the 2013–2014, 2014–2015, and 2015–2016 academic years not in preliminary positions or combined training programs (e.g., medicine–pediatrics). We subsequently excluded 659 (1.8%) residents missing training data from one or more years and another 782 (2.1%) residents from programs that did not have PGY1–PGY3 residents for all three training years (2013–2014 to 2015–2016). After exclusions, 35,217 residents were included in the analysis.

Data sources

Study data consisted of demographic information and milestones reporting data for these residents from the ABIM’s administrative database for the 2013–2014, 2014–2015, and 2015–2016 academic years, and 2016 certification examination scores (attempts, pass/fail, and score from 200 to 800). Program directors submitted required milestone ratings for residents in the 22 subcompetencies (see Supplemental Digital Appendix 1, available at http://links.lww.com/ACADMED/A544), collectively called milestones reporting data, to the ABIM and ACGME at the end of 2013–2014, and semiannually in 2014–2015 and 2015–2016. This study uses end-of-year data only.

The demographic data we collected were resident characteristics (age, sex, U.S. or international nativity, U.S. or international medical school, osteopathic vs. allopathic training) and program characteristics: program size (number of residents), program certification examination pass rates, program director tenure, program type (university-based, community-based university affiliated, community-based, or military-based residencies), Department of Health and Human Service (HHS) geographic region, and county type (urban or rural).17,18 Milestone ratings included ratings in each of the 22 subcompetencies. Subcompetency milestone ratings use a five-level rating scale with narrative descriptions (milestones) for each item (ratings are at or between levels, for nine possible ratings: 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5). Critical deficiencies are indicated by ratings of 1 to 1.5; readiness for unsupervised practice is indicated by ratings of 4 to 5.

The University of California, San Francisco Institutional Review Board approved the study as exempt.

Analyses

We applied two general approaches in examining how milestones were applied over time. First, we used repeated cross sections to examine how milestones were being used within PGYs across training years (e.g., how ratings for PGY1 residents in 2013–2014 differed from PGY1 residents in 2014–2015 and PGY1 residents in 2015–2016). Second, we limited the analysis to residents who had milestone ratings for all three PGYs—that is, PGY1 residents in 2013–2014 who typically completed their PGY3 training and took the certification exam in 2016. This analysis allows evaluation of how milestone ratings changed for one cohort of residents through training and how medical knowledge milestone ratings from each PGY relate to performance on the ABIM’s IM certification examination.

Milestone ratings within cross sections of PGYs across training years.

We examined mean milestone ratings across training years within a PGY to identify any general trends toward a more strict or lenient application of milestone ratings by clinical competence committees. This included examining each subcompetency separately as well as examining the average of subcompetency ratings within a competency. Residents with one or more “not assessable” rating for a subcompetency were excluded from the summary for that competency. We tested for trends in mean competency ratings across training years within each PGY using linear regression, applying the false-discovery rate method to adjust P values for multiple hypothesis testing.18

We also examined whether there were any changes in the frequency with which residents were assigned “not assessable” milestone ratings across training years. A program director assigns a “not assessable” milestone rating when there is not enough information to rate the resident on this milestone; “not assessable” ratings are not missing data. We tested for trends in the use of “not assessable” ratings across training years within PGYs using logistic regression, applying the false-discovery rate method to adjust P values for multiple hypothesis testing.18

Finally, we examined whether there were any differences in mean competency ratings or in the application of “not assessable” ratings by program characteristics. The program characteristics we examined were program size, program certification examination pass rates, program director tenure, program type, HHS region, and urban-rural county type. Significant differences across levels within a PGY were tested by comparing the level against the grand mean for all training years.

Longitudinal analysis of milestone ratings throughout training.

Limiting the data to only residents who started training in 2013–2014 and had three complete years of data by the end of 2015–2016, we used box and whisker plots to examine the central tendency and spread of mean ratings for all subcompetencies within each competency by training year.

Correlation of medical knowledge milestone ratings with certification examination performance.

In three regression analyses, we evaluated the relationship between medical knowledge milestone ratings and the following: the probability of attempting the IM certification exam after completing training in 2016, using logistic regression; the probability of passing the IM certification exam in 2016 among those attempting the exam, using logistic regression; and resident-equated scores on the IM certification exam among those taking the exam, using linear regression. We present results for the third analysis of resident-equated scores as an absolute difference in score and as a percentile change. Separate regressions were run for two medical knowledge subcompetencies, “MK1: Clinical knowledge” and “MK2: Knowledge of diagnostic testing and procedures.” In each regression, we controlled for milestone ratings from each PGY as well as the resident’s sex, birth nativity, U.S. or international medical school, allopathic medical school; and program type, urbanicity, HHS region, program director tenure, three-year program pass rate, and program size. We used two thresholds for certification examination pass rates at the program level: 80% (the accreditation standard) and 95% (high-performing programs in the certification examination).

All analyses accounted for correlated errors from residents being nested within programs, as well as residents potentially receiving more than one rating over time, by applying a Huber–White cluster adjustment at the program level.19,20 We conducted all statistical analyses using SAS statistical software, version 9.3 (SAS Institute, Cary, North Carolina) and Stata statistical software, version 14 (StataCorp, College Station, Texas).

Results

Of the 35,217 residents in this sample, 6,963 residents who began training in 2013–2014 had three full years of milestone data (PGY1 in 2013–2014, PGY2 in 2014–2015, PGY3 in 2015–2016) and were included in the longitudinal analyses.

Milestone ratings within cross sections of PGYs across training years

There were no significant differences in milestone ratings for successive cohorts of PGY1, PGY2, or PGY3 residents over the three study years (Appendix 1). As shown in Appendixes 2 and 3, program-level analyses demonstrated some regional variations in milestone ratings. In programs with certification examination pass rates above 95%, PGY3 mean milestone ratings for all six competencies were significantly higher than in programs with lower mean examination pass rates. Program director tenure over 10 years was associated with higher PGY3 ratings for the patient care, medical knowledge, systems-based practice, and professionalism competencies (Appendixes 2 and 3).

Appendix 4 shows the percentage of residents with “not assessable” milestones over time. There was a significant trend toward a decreased percentage of residents with any “not assessable” ratings over the three years, including 1,566 (22.5%) of PGY1s in 2013–2014 versus 1,219 (16.6%) in 2015–2016 (P = .01), 865 (12.3%) of PGY2s in 2013–2014 versus 573 (8.1%) in 2015–2016, and 342 (5.1%) of PGY3s in 2013–2014 versus 177 (2.6%) in 2015–2016 (P = .04). Program-level and regional variation in “not assessable” milestone ratings is shown in Supplemental Digital Appendixes 2 and 3, available at http://links.lww.com/ACADMED/A544. There was no significant change in the number or percentage of residency programs that applied a “not assessable” rating to at least one resident across the three study years (Supplemental Digital Appendix 2, http://links.lww.com/ACADMED/A544). Similarly, no clear patterns emerged based on program region or urban-rural classification. However, as shown in Supplemental Digital Appendix 3 (http://links.lww.com/ACADMED/A544), residents from large programs (26 or more residents per year) had significantly more “not assessable” ratings than residents in smaller programs, particularly in PGY2. Programs with certification examination pass rates under 80% had fewer residents with “not assessable” ratings in 2014–2015 and 2015–2016 than did programs with higher pass rates. There was no association between program director tenure and the percentage of residents with “not assessable” ratings.

Longitudinal analysis of milestone ratings throughout training

Appendix 1 shows mean subcompetency milestone ratings for IM residents who were PGY1s in 2013–2014 and were rated again in PGY2 (in 2014–2015) and PGY3 (in 2015–2016). For these residents, mean milestone ratings across subcompetencies within a competency increased from around 3 (behaviors of an early learner or resident who is advancing) in the PGY1 year (ranging from a mean of 2.73 to 3.19 across subcompetencies) to around 3.5 (behaviors of a resident who is advancing) in the PGY2 year (ranging from a mean of 3.27 to 3.66 across subcompetencies) to around 4 (ready for unsupervised practice) in the PGY3 year (ranging from a mean of 4.00 to 4.22 across subcompetencies, P trend < .001 for all subcompetencies). Supplemental Digital Appendix 4, available at http://links.lww.com/ACADMED/A544, shows the change in milestone ratings across PGYs for the cohort entering training in 2013–2014.

Correlation of medical knowledge milestone ratings with medical knowledge certification examination performance

For the cohort entering PGY1 in 2013–2014, each of the two medical knowledge subcompetency milestones (MK1, MK2) rated in PGY2 and PGY3 were significantly correlated with 2016 certification examination attempt and pass rates, and MK2 was also significantly associated with PGY1 attempt rates. For each increase of 0.5 units in each medical knowledge subcompetency milestone rating, the difference in examination pass rates was 2.3% for MK1 (P < .001) and 2.1% for MK2 (P < .001) for PGY2s, and 2.5% for MK1 (P < .001) and 2.5% for MK2 (P < .001) for PGY3s (Table 1). Each of the two medical knowledge subcompetency milestones was significantly correlated with 2016 certification examination scores in all 3 PGYs. For each increase of 0.5 units in each medical knowledge subcompetency milestone rating, the difference in examination scores for PGY1s was 4.4 points for MK1 (P < .001) and 4.0 points for MK2 (P < .001); for PGY2s, 12.2 points for MK1 (P < .001) and 9.6 points for MK2 (P < .001); and for PGY3s, 19.5 points for MK1 (P < .001) and 19.0 points for MK2 (P < .001) (Table 1). For PGY3s, a 19-point difference in the milestone rating corresponds with an increase from the 50th to the 59th percentile of exam performance.

Table 1
Table 1:
Adjusted Internal Medicine Certification Examination Performance, Controlling for Medical Knowledge Subcompetency Milestone Ratings From Each Postgraduate Year, From a Multifaceted National Study of Milestone Validitya

Discussion

These longitudinal findings provide evidence to support the validity of milestones as a mechanism to capture the construct which is the goal of training: residents’ progressive competence to provide patient care. Our findings show that, as program directors gained experience with milestones, they less frequently applied the rating of “not assessable” to residents’ milestones. We found that milestone ratings increase for the population of individual IM residents across their three years of training, which demonstrates how milestones can characterize a developmental trajectory. Residents’ medical knowledge milestone ratings were correlated with certification examination pass rates in PGY2 and PGY3 and with scores in all PGYs. Together, these findings support the argument that milestones are a valid method to rate resident performance to draw conclusions about their developing competence as part of a program of assessment.12,21

These findings demonstrate changes that occurred within programs as a consequence of applying milestone ratings. The decrease in “not assessable” ratings in three successive PGY cohorts suggests that programs may have benefited from gaining experience with milestones. One potential interpretation is that greater experience could translate to a better understanding of which assessment information provides evidence of performance in certain subcompetencies, or the addition of new assessment tools. Programs may also identify curricular gaps that can be remedied by adding or changing clinical assignments for residents so that they gain the opportunities described in the milestones.22 This outcome of curricular improvement is a central goal of milestones-based assessment.23 Program director tenure was not associated with differences in “not assessable” ratings, suggesting that it is the program as a system and/or curriculum that drives assessment of any given subcompetency milestone for residents, not necessarily the experience of the individual program director. This view of the complexity of milestone ratings aligns with the view of CBME as a complex set of interventions involving multiple components.24

This first study of milestone ratings for individual residents across all three years of training confirms findings from prior cross-sectional data showing that milestones characterize progressive development of competence during graduate medical education.4 The practical significance of our finding of progression from a level 3 to level 4 rating is operationalized through decisions about supervision; level 4 signifies readiness for unsupervised practice. This trajectory serves program directors working with clinical competency committees to assess residents’ current progress toward the outcomes defining readiness for independent practice. Milestone ratings can also identify residents not progressing as expected.9 Additionally, because milestones make expectations explicit to both learners and their supervisors, progress on milestones can serve as a framework for feedback to residents about next steps in their development and can help them identify areas on which to focus.25

This study builds upon prior work showing that milestone ratings of medical knowledge are correlated with certification examination scores.16 Our study extends this prior finding beyond the final end-of-training evaluation to show that medical knowledge ratings in all three training years correlate with certification examination scores, suggesting the opportunity for early intervention for residents with low medical knowledge ratings. This detection method may add to the contribution of in-training examination scores to predict residents at risk, given the fact that faculty ratings of medical knowledge detect different and variable information compared with the in-training examination.26–28 For instance, an individualized intervention could be designed for emergency medicine residents at risk of certification examination failure consisting of medical knowledge content review, and examination question practice improved certification pass rates.29 The finding that programs with higher milestone ratings for individual residents were those with higher certification examination pass rates and scores adds validity evidence for the milestones in relation to other variables (i.e., certification examination performance).13,14 Passing the certification examination is required for licensure, and IM certification and maintenance of certification examination scores have been shown to correlate with performance in practice.30,31

Our findings shed some light on how the process of milestone rating assignments may differ for programs with different characteristics. Large programs assigned more “not assessable” ratings. It is possible that in large programs, individual supervisors or clinical competency committee members knew individual residents less well or had less experience working with them clinically to inform milestone ratings. Large programs may have been more challenged to aggregate their large amounts of performance data for clinical competency committee review.32,33 The reasons for programs with lower certification examination pass rates having fewer “not assessable” ratings are not clear. It is possible that these programs were making particular efforts to monitor resident performance closely to enhance the performance of individual residents and the program as a whole, or to avoid appearing out of compliance with milestones reporting requirements. Our findings showed some differences in milestone ratings based on geographical region but without clear patterns of regional differences. It is possible that observed regional differences were due to larger individual programs within regions influencing the results, particularly in smaller regions.

This study has limitations. The milestone ratings were done when program directors were still gaining experience with this system; the approach to assessing residents and rating milestones may evolve further over time. The correlation of milestone ratings with certification examination scores may reflect program directors’ knowledge of in-training examination scores, which could have influenced their milestone ratings. Although it is unclear from these data how well milestone ratings correlate with performance in practice, ratings of performance during training and examination scores have been correlated with eventual performance in practice.30,34–36 Increases in ratings through training may simply reflect program knowledge of residents’ PGY rather than development of competence because ratings are not typically blinded to PGY. Faculty development for teachers and clinical competency committee members can counteract this threat to validity. Data were derived from a single specialty with only one cohort of residents who had milestone ratings throughout their residency. We were only able to correlate milestone ratings with one other available measure of performance—the certification examination. Additional evidence supporting the validity of milestones could derive from other information about residents’ performance in practice, and from understanding how program directors working with clinical competency committees make advancement decisions that represent the implications of milestone ratings for trainees.

This study demonstrates that IM program directors applying milestone ratings with clinical competency committees were able to apply ratings with fewer “not assessable” ratings over time, thereby strengthening the usefulness of milestones for them to draw conclusions about residents’ advancement toward unsupervised practice. Milestone ratings of resident performance throughout residents’ training demonstrated a developmental trajectory. The correlation between milestone ratings for medical knowledge and certification examination performance suggests opportunities for individualized interventions to strengthen trainees’ knowledge. Taken together, these findings provide some evidence of validity of the milestones by showing how they can inform both training programs and individual residents about areas for ongoing growth and improvement.

References

1. Frank JR, Mungroo R, Ahmad Y, Wang M, De Rossi S, Horsley T. Toward a definition of competency-based education in medicine: A systematic review of published definitions. Med Teach. 2010;32:631637.
2. Holmboe ES, Sherbino J, Long DM, Swing SR, Frank JR. The role of assessment in competency-based medical education. Med Teach. 2010;32:676682.
3. Iobst W, Aagaard E, Bazari H, et al. Internal medicine milestones. J Grad Med Educ. 2013;5(1 suppl 1):1423.
4. Hauer KE, Clauser J, Lipner RS, et al. The internal medicine reporting milestones: Cross-sectional description of initial implementation in U.S. residency programs. Ann Intern Med. 2016;165:356362.
5. Beeson MS, Holmboe ES, Korte RC, et al. Initial validity analysis of the emergency medicine milestones. Acad Emerg Med. 2015;22:838844.
6. Li ST, Tancredi DJ, Schwartz A, et al.; Association of Pediatric Program Directors (APPD) Longitudinal Educational Assessment Research Network (LEARN) Validity of Resident Self-Assessment Group. Competent for unsupervised practice: Use of pediatric residency training milestones to assess readiness. Acad Med. 2017;92:385393.
7. Sklar DP. Competencies, milestones, and entrustable professional activities: What they are, what they could be. Acad Med. 2015;90:395397.
8. Caverzagie KJ, Iobst WF, Aagaard EM, et al. The internal medicine reporting milestones and the Next Accreditation System. Ann Intern Med. 2013;158:557559.
9. Carraccio C, Englander R, Van Melle E, et al.; International Competency-Based Medical Education Collaborators. Advancing competency-based medical education: A charter for clinician–educators. Acad Med. 2016;91:645649.
10. Green ML, Aagaard EM, Caverzagie KJ, et al. Charting the road to competence: Developmental milestones for internal medicine residency training. J Grad Med Educ. 2009;1:520.
11. Iobst WF, Sherbino J, Cate OT, et al. Competency-based medical education in postgraduate medical education. Med Teach. 2010;32:651656.
12. Kane MT. Current concerns in validity theory. J Educ Meas. 2001;38:319342.
13. Messick S. Standards of validity and the validity of standards in performance asessment. Educ Meas Issues Pract. 1995;14:58.
14. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: Theory and application. Am J Med. 2006;119:166.e7166.e16.
15. Pangaro L, ten Cate O. Frameworks for learner assessment in medicine: AMEE guide no. 78. Med Teach. 2013;35:e1197e1210.
16. Hauer KE, Vandergrift J, Hess B, et al. Correlations between ratings on the resident annual evaluation summary and the internal medicine milestones and association with ABIM certification examination scores among US internal medicine residents, 2013–2014. JAMA. 2016;316:22532262.
17. Office of the Assistant Secretary for Health (ASH), U.S. Department of Health & Human Services. ASH regional offices. http://www.hhs.gov/ash/about-ash/regional-offices/. Accessed February 24, 2018.
18. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57:289300.
19. White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980;48:817838.
20. Huber PJ. The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. 1967:Vol 1. Berkeley, CA: University of California Press; 221233. https://books.google.com/books?hl=en&lr=&id=IC4Ku_7dBFUC&oi=fnd&pg=PA221&dq=The+behavior+of+maximum+likelihood+estimates+under+non-standard+conditions&ots=nOScDZLanQ&sig=VgbI0YjlEQXC4pKmp_2A-ADKAkA. Accessed February 24, 2018.
21. van der Vleuten CP, Schuwirth LW, Driessen EW, et al. A model for programmatic assessment fit for purpose. Med Teach. 2012;34:205214.
22. Holmboe ES, Call S, Ficalora RD. Milestones and competency-based medical education in internal medicine. JAMA Intern Med. 2016;176:16011602.
23. Holmboe ES, Yamazaki K, Edgar L, et al. Reflections on the first 2 years of milestone implementation. J Grad Med Educ. 2015;7:506511.
24. Van Melle E, Gruppen L, Holmboe ES, Flynn L, Oandasan I, Frank JR; International Competency-Based Medical Education Collaborators. Using contribution analysis to evaluate competency-based medical education programs: It’s all about rigor in thinking. Acad Med. 2017;92:752758.
25. Tekian A, Hodges BD, Roberts TE, Schuwirth L, Norcini J. Assessing competencies using milestones along the way. Med Teach. 2015;37:399402.
26. Hawkins RE, Sumption KF, Gaglione MM, Holmboe ES. The in-training examination in internal medicine: Resident perceptions and lack of correlation between resident scores and faculty predictions of resident performance. Am J Med. 1999;106:206210.
27. Ryan JG, Barlas D, Pollack S. The relationship between faculty performance assessment and results on the in-training examination for residents in an emergency medicine training program. J Grad Med Educ. 2013;5:582586.
28. Kolars JC, McDonald FS, Subhiyah RG, Edson RS. Knowledge base evaluation of medicine residents on the gastroenterology service: Implications for competency assessments by faculty. Clin Gastroenterol Hepatol. 2003;1:6468.
29. Visconti A, Gaeta T, Cabezon M, Briggs W, Pyle M. Focused Board Intervention (FBI): A remediation program for written board preparation and the Medical Knowledge core competency. J Grad Med Educ. 2013;5:464467.
30. Papadakis MA, Arnold GK, Blank LL, Holmboe ES, Lipner RS. Performance during internal medicine residency training and subsequent disciplinary action by state licensing boards. Ann Intern Med. 2008;148:869876.
31. Holmboe ES, Wang Y, Meehan TP, et al. Association between maintenance of certification examination scores and quality of care for Medicare beneficiaries. Arch Intern Med. 2008;168:13961403.
32. Witteles RM, Verghese A. Accreditation Council for Graduate Medical Education (ACGME) milestones—Time for a revolt? JAMA Intern Med. 2016;176:15991600.
33. Hauer KE, Chesluk B, Iobst W, et al. Reviewing residents’ competence: A qualitative study of the role of clinical competency committees in performance assessment. Acad Med. 2015;90:10841092.
34. Lipner RS, Young A, Chaudhry HJ, Duhigg LM, Papadakis MA. Specialty certification status, performance ratings, and disciplinary actions of internal medicine residents. Acad Med. 2016;91:376381.
35. Lipner RS, Hess BJ, Phillips RL Jr.. Specialty board certification in the United States: Issues and evidence. J Contin Educ Health Prof. 2013;33(suppl 1):S20S35.
36. Norcini JJ, Kimball HR, Lipner RS. Certification and specialization: Do they matter in the outcome of acute myocardial infarction? Acad Med. 2000;75:11931198.

Appendix 1 Mean Milestone Ratings by Postgraduate Year and Training Year for Internal Medicine Residents in Three Training Years (2013–2014, 2014–2015, 2015–2016), From a National Study of Milestone Validity

Appendix 2 Mean Milestone Ratings for Internal Medicine Residents by Program Characteristics and Postgraduate Year, 2013–2016, From a National Study of Milestone Validitya

Appendix 3 Mean Milestone Ratings for Internal Medicine Residents by Program Characteristics and Postgraduate Year, 2013–2016, From a National Study of Milestone Validitya

Appendix 4 Number and Percentage of Internal Medicine Residents With Subcompetency Milestone Ratings of “Not Assessable” in Three Training Years (2013–2014, 2014–2015, 2015–2016), From a National Study of Milestone Validity

Supplemental Digital Content

Copyright © 2018 by the Association of American Medical Colleges