Secondary Logo

Journal Logo

Institutional members access full text with Ovid®

National Internal Medicine Milestone Ratings: Validity Evidence From Longitudinal Three-Year Follow-up

Hauer, Karen E., MD, PhD; Vandergrift, Jonathan, MS; Lipner, Rebecca S., PhD; Holmboe, Eric S., MD; Hood, Sarah; McDonald, Furman S., MD, MPH

doi: 10.1097/ACM.0000000000002234
Research Reports

Purpose To evaluate validity evidence for internal medicine milestone ratings across programs for three resident cohorts by quantifying “not assessable” ratings; reporting mean longitudinal milestone ratings for individual residents; and correlating medical knowledge ratings across training years with certification examination scores to determine predictive validity of milestone ratings for certification outcomes.

Method This retrospective study examined milestone ratings for postgraduate year (PGY) 1–3 residents in U.S. internal medicine residency programs. Data sources included milestone ratings, program characteristics, and certification examination scores.

Results Among 35,217 participants, there was a decreased percentage with “not assessable” ratings across years: 1,566 (22.5%) PGY1s in 2013–2014 versus 1,219 (16.6%) in 2015–2016 (P = .01), and 342 (5.1%) PGY3s in 2013–2014 versus 177 (2.6%) in 2015–2016 (P = .04). For individual residents with three years of ratings, mean milestone ratings increased from around 3 (behaviors of an early learner or advancing resident) in PGY1 (ranging from a mean of 2.73 to 3.19 across subcompetencies) to around 4 (ready for unsupervised practice) in PGY3 (mean of 4.00 to 4.22 across subcompetencies, P < .001 for all subcompetencies). For each increase of 0.5 units in two medical knowledge (MK1, MK2) subcompetency ratings, the difference in examination scores for PGY3s was 19.5 points for MK1 (P < .001) and 19.0 for MK2 (P < .001).

Conclusions These findings provide evidence of validity of the milestones by showing how training programs have applied them over time and how milestones predict other training outcomes.

K.E. Hauer is associate dean for assessment and professor, Department of Medicine, University of California at San Francisco, San Francisco, California.

J. Vandergrift is a health services researcher, American Board of Internal Medicine (ABIM), Philadelphia, Pennsylvania.

R.S. Lipner is senior vice president of assessment and research, ABIM, Philadelphia, Pennsylvania.

E.S. Holmboe is senior vice president of milestones development and evaluation, Accreditation Council for Graduate Medical Education, Chicago, Illinois.

S. Hood is director of initial certification, ABIM, Philadelphia, Pennsylvania.

F.S. McDonald is senior vice president of academic and medical affairs, ABIM, Philadelphia, Pennsylvania.

Funding/Support: None reported.

Other disclosures: R.S. Lipner, F.S. McDonald, S. Hood, and J. Vandergrift are employed by the American Board of Internal Medicine. E.S. Holmboe is employed by the Accreditation Council for Graduate Medical Education and receives royalties from Elsevier for a textbook.

Ethical approval: The University of California, San Francisco Institutional Review Board approved the study as exempt.

Supplemental digital content for this article is available at

Correspondence should be addressed to Karen E. Hauer, University of California at San Francisco, 533 Parnassus Ave., U80, Box 0710, San Francisco, CA 94143; telephone: (415) 502-5475; e-mail:

© 2018 by the Association of American Medical Colleges