Table 2 and Figure 2 show that each level of training had different subcompetencies where M/U residents’ adjusted mean milestone level performance differed from their S peers. Results were similar when mean differences were adjusted for program, gender, and medical school (data not shown). Results from our nonparametric (rank transformed) analysis were similar to results from our parametric (adjusted mean difference) analysis (data not shown), with the exception of three items which were no longer statistically significant: PGY1–advocacy (SBP2), PGY2–evidence-based pediatrics (MK), and PGY3–diagnostic/therapeutic decisions (PC4). For PGY1s, M/U residents had globally lower milestone levels than their S peers for both the parametric and nonparametric analyses for all subcompetencies except advocacy (SBP2). Differences in mean milestone levels ranged from 0.60 to 0.97. Subcompetencies with almost one-full-milestone-level difference between M/U and S PGY1s included organize/prioritize (PC2: 0.93; 99.92% CI: 0.54–1.32), transfer of care (PC3: 0.97; 99.92% CI: 0.60–1.34), and help-seeking (Prof4: 0.96; 99.92% CI: 0.46–1.47). For PGY2s, the largest adjusted mean difference in milestone levels was in trustworthiness (Prof5: 0.78; 99.92% CI: 0.36–1.20). For PGY3s, the largest adjusted mean milestone level differences were ethical behavior (Prof3: 1.17; 99.92% CI: 0.26–2.08), incorporating feedback (PBLI4: 1.03; 99.92% CI: 0.25–1.81), and professionalization (0.96; 99.92% CI: 0.14–1.79). We found no significant difference in milestone levels for S residents in programs reporting M/U residents and programs with no M/U residents (data not shown).
Discrimination between M/U and S residents
Figure 3 and Appendix 1 show that for different levels of training, different subcompetencies discriminated between M/U and S learners. For PGY1s, although all subcompetencies acceptably discriminated between M/U and S learners (C-statistic ≥ 0.7), three subcompetencies had outstanding discrimination (C-statistic ≥ 0.9): organize/prioritize (PC2: C-statistic: 0.91; 99.92% CI: 0.78–0.1.00), transfer of care (PC3: 0.90; 99.92% CI: 0.84–0.96), and diagnostic/therapeutic decisions (PC4: 0.90; 99.92% CI: 0.83–0.96) (see also Supplemental Digital Appendix 1 at http://links.lww.com/ACADMED/A457). For PGY2s, no subcompetency discriminated between M/U and S learners. Trustworthiness (Prof5), which was found to have the largest mean milestone level difference between M/U and S PGY2s, was not significantly discriminatory (C-statistic: 0.73; 99.92% CI: 0.42–1.00). For PGY3s, two subcompetencies had excellent discrimination (C-statistic ≥ 0.8): quality improvement (PBLI3: 0.82; 99.92% CI: 0.50–1.00) and advocacy (SBP2: 0.87; 99.92% CI: 0.56–1.00). The three subcompetencies that had the largest mean milestone-level difference between M/U and S PGY3s were not significantly discriminatory: incorporating feedback (PBLI4: 0.73; 99.92% CI: 0.28–1.00), professionalization (Prof2: 0.70; 99.92% CI: 0.28–1.00), and ethical conduct (Prof3: 0.79; 99.92% CI: 0.21–1.00).
Our study is the first, to our knowledge, to use milestone ratings to describe the performance of a nationally representative sample of M/U and S pediatric residents. We found fewer M/U residents as training level progressed. Overall, mean milestone levels for M/U residents were lower than for S residents. M/U residents had different distributions of subcompetency gaps compared with S residents at different levels of training.
PGY1s who were identified as M/U received lower adjusted mean milestone scores than their S peers across all subcompetencies. We identified two subcompetencies that identified residents in need of remediation early in residency because they showed almost a one-point milestone-level difference between M/U and S PGY1s and had outstanding discrimination: organize/prioritize (PC2) and transfer of care (PC3). These findings are consistent with earlier studies suggesting that typical patient care deficits are identifiable early and may be targeted for remediation earlier in training.20–25
For PGY2s, the differences in adjusted mean milestone levels were smaller than for PGY1s. Trustworthiness was the subcompetency with the largest adjusted mean milestone difference between M/U and S PGY2s, with M/U PGY2s performing similarly to S PGY1s. In pediatrics, PGY2s begin to have a supervisory role, which may highlight the importance of “trustworthiness that makes colleagues feel secure when one is responsible for the care of patients”26 in entrustment decisions when allowing a resident to supervise.27,28 Trustworthiness includes colleagues’ perception of the physician’s knowledge, skills, and abilities as well as discernment of limitations, conscientiousness, and truthfulness, and may represent how the overall development of entrustment for S PGY2s differs from that for M/U PGY2s.27,29–31 However, in contrast to M/U PGY1s, who scored globally lower than their S peers, with most subcompetencies having outstanding discrimination (C-statistic ≥ 0.90) or excellent discrimination (C-statistic ≥ 0.80) between M/U and S PGY1s, no subcompetency discriminated between M/U and S PGY2s. This finding points to the importance of PDs individualizing performance improvement plans for M/U PGY2s based on their own personal performance lapses.
We found that mean milestone scores of M/U PGY3s were lower than S PGY1s for ethical behavior (Prof3). In addition, compared with S PGY3s, M/U PGY3s had the largest adjusted mean milestone differences in ethical behavior (Prof3), incorporating feedback (PBLI4), and professionalization (0.96). However, these subcompetencies did not discriminate between M/U and S PGY3s. These findings highlight the heterogeneity of M/U PGY3 performance; some, but not all M/U PGY3s had clear deficiencies in ethical behavior, incorporating feedback, or professionalization. Prior studies have also found that professionalism deficits tend to be identified later in training.22,32 Difficulties with professionalism or incorporating feedback may be better identified later in training because of increased observations of the trainee. Alternatively, these findings may suggest that successful performance in subcompetencies such as ethical behavior and incorporating feedback may be areas that PDs feel are important in making global M/U or S end-of-year assessments of resident performance. PDs may be making decisions about graduation in part based on professionalism characteristics demonstrated by medical students and residents that may be associated with future medical board disciplinary action, such as lack of professionalization, ethical behavior, and incorporating feedback.5,33–35
Our study has several limitations. First, PDs identified a small number of M/U residents. Fewer M/U residents were identified later in training, leading to wider CIs for PGY2 and PGY3 results, and this may underpower our study to detect significant differences between M/U and S resident (Type II errors). It is possible that fewer M/U residents were identified because M/U residents identified earlier in training are remediated or dismissed. Alternatively, PDs may be more reluctant to label residents further in training as M/U because the stakes are higher, with residents’ approval to sit for the board certification exam dependent on PD attestation of satisfactory performance and ability to practice without supervision. Second, our study was performed when milestones were new to programs, faculty, and Clinical Competency Committees. Inexperience may have resulted in higher ratings of high-performing residents and lower ratings of low-performing residents, which may lead to overestimation of differences between milestone levels for M/U versus S residents. Conversely, because data were collected early after implementation of milestones, there were no national expectations that may have otherwise biased scoring by the evaluators. Adjusted mean milestone differences and C-statistics for PGY3s were variable, suggesting that residents were not given the same milestone level across all subcompetencies. Third, our study had multiple comparisons, which may inadvertently lead to detecting differences where no true differences exist (Type I errors); however, we tried to minimize Type I errors by using a Bonferroni-corrected significance threshold of .0007963 (.05/63) to correct for multiple comparisons. Fourth, our study population had more USMG-MDs compared with all U.S. pediatric residents. It is possible that USMG-DOs and IMGs may have different deficiencies. Future studies in a more representative population of pediatric residents are needed to validate the findings in this derivation study. Fifth, programs may use different criteria to define M/U residents (milestone scores, narrative comments, etc.) and may have different thresholds for determining M/U (e.g., marginal/unsatisfactory residents perform poorly in all milestones, majority of milestones, or specific milestones). Sixth, our study was limited to pediatric residents; it is possible that residents in other specialties have different subcompetency gaps. Future studies of residents of other specialties are needed to verify whether our findings are generalizable beyond pediatrics. Finally, our study was limited to a single year. A longitudinal study which follows residents across training years may help us better understand the progression of M/U residents.
We found that M/U pediatric residents had different subcompetency gaps at different levels of training. While PGY1s have global deficits, senior residents may have different performance deficiencies that require individualized counseling and targeted performance improvement plans. Similar research in a larger validation cohort of pediatric residents, and in other specialties, that uses the subcompetencies found to identify M/U learners in this derivation study is needed to verify or contradict our findings and determine whether our findings are generalizable beyond pediatrics.
The authors wish to thank Robin Young, MS, from APPD LEARN (Association of Pediatric Program Directors, Longitudinal Educational Assessment Research Network), for assistance with this project. The authors wish to thank the participating residents and program directors.
The authors wish to thank the site investigators for the APPD LEARN Validity of Resident Self-Assessment Group: Dennis Basila, MD, and Katherine Dougherty, MD (Albany Medical Center); Richard S. Robus, MD (Blank Children’s Hospital); Christa Matrone, MD, and Daniel J. Schumacher, MD, MEd (Boston Combined Residency Program); Kimberly A. Gifford, MD (Children’s Hospital at Dartmouth-Hitchcock Pediatric Residency); Stephanie B. Dewar, MD, Dena Hofkosh, MD, MEd, and Rhett Lieberman, MD, MPH (Children’s Hospital of Pittsburgh of UPMC); Sue E. Poynter, MD, MEd (Cincinnati Children’s/University of Cincinnati); Dawn Tuell, MD (East Tennessee State University); Janara J. Huff, MD, and Marielisa Rincon, MD (Erlanger [UT-COM Chattanooga]); Kerry K. Sease, MD, MPH (Greenville Health System/University of South Carolina–Greenville); Sylvia H. Yeh, MD (Harbor-UCLA Medical Center); Michael P. McKenna, MD, and Laura Price, MD (Indiana University); Kathleen M. Donnelly, MD, and Meredith L. Carter, MD (Inova Children’s Hospital); David I. Rappaport, MD (Jefferson Medical College/AI DuPont Hospital for Children); Robert R. Wittler, MD (Kansas University School of Medicine–Wichita); Ariel Frey-Vogel, MD, MAT, and Shannon E. Scott-Vernaglia, MD (MassGeneral Hospital for Children); Kate Perkins, MD, PhD, and Savanna Carson, MS (Mattel Children’s Hospital UCLA); John D. Mahan, MD (Nationwide Hospital/Ohio State University); Matthew J. Kapklein, MD, MPH (New York Medical College at Westchester Medical Center); Sharon M. Unti, MD (Ann & Robert H. Lurie Children’s Hospital of Chicago/Northwestern University); Cynthia L. Ferrell, MD, MSEd (Oregon Health & Science University); Alston E. Dunbar, MD, MBA (Our Lady of the Lake Regional Medical Center); Vasudha L. Bhavaraju, MD, and Grace L. Caputo, MD, MPH (Phoenix Children’s Hospital/Maricopa Medical Center); Jane E. Kramer, MD (Rush University); Michelle Miner, MD (Southern Illinois University); Sharon Calaman, MD, Mario Cruz, MD, Nancy D. Spector, MD (St Christopher’s Hospital for Children/Drexel University College of Medicine); Stephen R. Barone, MD (Steven and Alexandra Cohen Children’s Medical Center of New York); Tammy Camp, MD (Texas Tech University Health Science Center–Lubbock); Sean P. Elliott, MD, and Hillary A. Franke, MD, MS (University of Arizona); Su-Ting T. Li, MD, MPH (University of California, Davis); Heather Fagan, MD, and Nicola Orlov, MD (University of Chicago); Tai M. Lockspeiser, MD, MHPE (University of Colorado); Nicole Paradise Black, MD, MEd (University of Florida–Gainesville); Jose J. Zayas, DO (University of Florida, College of Medicine–Jacksonville); Jennifer Di Rocco, DO, and Shilpa J. Patel, MD (University of Hawai’i); Amanda D. Osta, MD (University of Illinois–Chicago); Lisa Gilmer, MD (University of Kansas–Kansas City); Sara Multerer, MD, and Kimberly A. Boland, MD (University of Louisville); Rosina A. Connelly, MD, MPH, and R. Franklin Trimm, MD (University of South Alabama), Diane Straub, MD, and Meaghan Reyes (University of South Florida); Sandra R. Arnold, MD (University of Tennessee–Memphis); Deirdre A. Caplin, PhD (University of Utah); Ann P. Guillot, MD, and Jerry G. Larrabee, MD (University of Vermont); John G. Frohna, MD, MPH (University of Wisconsin); Renuka Verma, MD (Unterberg Children’s Hospital at Monmouth Medical Center); Jill Leavens-Maurer, MD (Winthrop University Hospital); and Ann E. Burke, MD (Wright State).
1. Eden J, Berwick DM, Wilensky GR; Institute of Medicine, Committee on the Governance Financing of Graduate Medical Education; Graduate Medical Education That Meets the Nation’s Health Needs. 2014.Washington, DC: National Academies Press.
2. Balogh E, Miller BT, Ball J; Institute of Medicine, Committee on Diagnostic Error in Health Care. Improving Diagnosis in Health Care. 2015.Washington, DC: National Academies Press.
3. Lipner RS, Hess BJ, Phillips RL Jr.. Specialty board certification in the United States: Issues and evidence. J Contin Educ Health Prof. 2013;33(suppl 1):S20–S35.
5. Hauer KE, Ciccone A, Henzel TR, et al. Remediation of the deficiencies of physicians across the continuum from medical school to practice: A thematic review of the literature. Acad Med. 2009;84:1822–1832.
6. Nasca TJ, Philibert I, Brigham T, Flynn TC. The next GME accreditation system—Rationale and benefits. N Engl J Med. 2012;366:1051–1056.
8. Carraccio C, Gusic M, Hicks P. The Pediatric Milestones Project—Competencies. Acad Pediatr. 2014;14(25 suppl):S13–S97.
9. Hicks PJ, Schumacher DJ, Benson BJ, et al. The pediatrics milestones: Conceptual framework, guiding principles, and approach to development. J Grad Med Educ. 2010;2:410–418.
10. Englander R, Hicks P, Benson B; Pediatric Milestone Project Working Group. Pediatrics milestones: A developmental approach to the competencies. J Pediatr. 2010;157:521–522.e1.
11. Englander R, Burke AE, Guralnick S, et al. The pediatrics milestones: A continuous quality improvement project is launched—Now the hard work begins! Acad Pediatr. 2012;12:471–474.
12. Schumacher DJ, Englander R, Hicks PJ, Carraccio C, Guralnick S. Domain of competence: Patient care. Acad Pediatr. 2014;14(2 suppl):S13–S35.
13. Guralnick S, Ludwig S, Englander R. Domain of competence: Systems-based practice. Acad Pediatr. 2014;14(2 suppl):S70–S79.
14. Schwartz A, Young R, Hicks PJ; Appd Learn. Medical education practice-based research networks: Facilitating collaborative research. Med Teach. 2014;38:1–11.
15. American Medical Association Fellowship and Residency Electronic Interactive Database. FREIDA Online. https://ama-assn.org/go/freida
. Published 2016. Accessed April 3, 2017.
17. Agresti A. Categorical Data Analysis. 2013.3rd ed. Hoboken, NJ: Wiley.
18. Conover WJ, Iman RL. Rank transformations as a bridge between parametric and nonparametric statistics. Am Stat. 1981;35:124–129.
19. Hosmer DW, Lemeshow S, Sturdivant RX. Applied Logistic Regression. 2013.3rd ed. Hoboken, NJ: John Wiley & Sons.
20. Riebschleger MP, Haftel HM. Remediation in the context of the competencies: A survey of pediatrics residency program directors. J Grad Med Educ. 2013;5:60–63.
21. Guerrasio J, Garrity MJ, Aagaard EM. Learner deficits and academic outcomes of medical students, residents, fellows, and attending physicians referred to a remediation program, 2006–2012. Acad Med. 2014;89:352–358.
22. Yao DC, Wright SM. National survey of internal medicine residency program directors regarding problem residents. JAMA. 2000;284:1099–1104.
23. Silverberg M, Weizberg M, Murano T, Smith JL, Burkhardt JC, Santen SA. What is the prevalence and success of remediation of emergency medicine residents? West J Emerg Med. 2015;16:839–844.
24. Epstein RM, Siegel DJ, Silberman J. Self-monitoring in clinical practice: A challenge for medical educators. J Contin Educ Health Prof. 2008;28:5–13.
25. Blum RH, Boulet JR, Cooper JB, Muret-Wagstaff SL; Harvard Assessment of Anesthesia Resident Performance Research Group. Simulation-based assessment to identify critical gaps in safe anesthesia resident performance. Anesthesiology. 2014;120:129–141.
27. Choo KJ, Arora VM, Barach P, Johnson JK, Farnan JM. How do supervising physicians decide to entrust residents with unsupervised tasks? A qualitative analysis. J Hosp Med. 2014;9:169–175.
28. Sterkenburg A, Barach P, Kalkman C, Gielen M, ten Cate O. When do supervising physicians decide to entrust residents with unsupervised tasks? Acad Med. 2010;85:1408–1417.
29. Hauer KE, Oza SK, Kogan JR, et al. How clinical supervisors develop trust in their trainees: A qualitative study. Med Educ. 2015;49:783–795.
30. Sheu L, O’Sullivan PS, Aagaard EM, et al. How residents develop trust in interns: A multi-institutional mixed-methods study. Acad Med. 2016;91:1406–1415.
31. Choe JH, Knight CL, Stiling R, Corning K, Lock K, Steinberg KP. Shortening the miles to the milestones: Connecting EPA-based evaluations to ACGME milestone reports for internal medicine residency programs. Acad Med. 2016;91:943–950.
32. Bhatti NI, Ahmed A, Stewart MG, Miller RH, Choi SS. Remediation of problematic residents—A national survey. Laryngoscope. 2016;126:834–838.
33. Fargen KM, Drolet BC, Philibert I. Unprofessional behaviors among tomorrow’s physicians: Review of the literature with a focus on risk factors, temporal trends, and future directions. Acad Med. 2016;91:858–864.
34. Papadakis MA, Teherani A, Banach MA, et al. Disciplinary action by medical boards and prior behavior in medical school. N Engl J Med. 2005;353:2673–2682.
35. Papadakis MA, Arnold GK, Blank LL, Holmboe ES, Lipner RS. Performance during internal medicine residency training and subsequent disciplinary action by state licensing boards. Ann Intern Med. 2008;148:869–876.
Supplemental Digital Content
© 2018 by the Association of American Medical Colleges