Faculty supervision of medical trainees is necessary to ensure both quality patient care and effective clinical education. Supervision lets faculty judge when to entrust trainees with certain responsibilities1 and gives them the information they need to determine when trainees’ fatigue or overscheduling has compromised their patients’ safety.2 Supervision, in the form of direct observation, is a well-established method for assessing (and—with the addition of feedback—improving) trainees’ clinical skills.3 In fact, the Accreditation Council for Graduate Medical Education has for several years mandated that postgraduate programs use direct observation of clinical encounters to assess trainees’ patient care and interpersonal and communication skills.4 And yet, despite the variety of pertinent and, in many cases, validated assessment tools,5 direct observation remains infrequent.6–9 Fully a third of second- and third-year internal medicine residents never undergo formal observation of their clinical skills.10 No wonder, then, that many trainees lack competence in important data-collecting clinical skills, such as history taking and physical examination11–14—a deficiency that often persists well into physicians’ careers.15 Clearly, the quality of faculty supervision of trainees can stand improvement. With these intertwined goals of better education and safer patient care in mind, the American Board of Internal Medicine (ABIM) designed the Clinical Supervision Practice Improvement Module (CS-PIM).
The ABIM’s Maintenance of Certification (MOC) program “promotes lifelong learning and the enhancement of the clinical judgment and skills essential for high quality patient care.”16 In addition to sitting for recertification exams, diplomates must earn points by completing modules in which they improve their medical knowledge and practice performance through structured activities and self-evaluation. One of these modules, the CS-PIM, asks faculty to engage in and self-evaluate their supervision of trainees, which includes direct observation, providing feedback, identifying errors, and auditing medical records.
The CS-PIM includes two specific assessment activities: (1) observing and documenting a trainee–patient clinical encounter, and (2) auditing the associated medical record. Faculty must complete a minimum of 10 observation–audit assessment cycles to receive MOC credit. When starting the module, faculty must first read and acknowledge the instructions, which include, among other things, recommendations that they communicate expectations to the trainee prior to the observation, maintain correct positioning by staying out of the direct line of sight of the trainee and patient during the observation, and observe trainees at least twice to better identify changes in their skills.
The structured documentation of the direct observation captures demographic information about the trainee and characteristics of the patient and clinical setting. It includes an evaluation of the trainee’s clinical skills as measured by the mini-Clinical Evaluation Exercise, the most comprehensively studied direct observation tool used in graduate medical education.5 Faculty also note three specific actions that the trainee did well and three that require improvement.
For the medical record audit, faculty must review the trainee’s note generated from the observed clinical encounter and document its quality, internal consistency, and the frequency with which it meets relevant patient- and care-setting-specific safety measures (List 1).
When the faculty physician has completed at least 10 assessment cycles, he or she receives a summary report of all of his or her observation and audit data. A personal action plan section then prompts the faculty physician to use these data to reflect on what he or she has learned, both about his or her trainees’ clinical skills, judgment, and quality of patient care, and about his or her own assessment and feedback skills. The faculty physician is then asked to develop an improvement plan by listing three things he or she would work on when assessing trainees in the future. The module ends with a retrospective, pre-/postmodule self-assessment survey that focuses on the four targeted skills: direct observation, providing feedback, identifying errors, and auditing medical records.
In addition to improving faculty’s supervisory skills through practice experience and guided reflection and change, the module also reduces redundancy for faculty in the MOC process by aligning their responsibility to perform observation and provide supervision with the practice performance requirements of MOC. The module can be seen as a quality improvement effort for faculty, wherein physicians engage in the Kolb learning cycle (experience–reflect–generalize–apply, analogous to the plan–do–study–act in education17). Treating the module as just such a training program, we evaluated the CS-PIM and its effects on faculty’s knowledge and behavior regarding direct observation and feedback.
To evaluate the module’s effects, we used the Kirkpatrick framework, which has been used in a variety of professional settings to evaluate the effectiveness of training programs.18 It consists of four sequential levels of measurement constructs: reaction, learning, behavior, and results. We assessed the first three levels using data collected from a variety of sources: information submitted by faculty during the module about the trainees (gender, training level), patients (gender, age), and clinical encounters (clinical focus, level of complexity, inpatient or outpatient setting); ratings collected in faculty self-assessments at the end of the module; feedback taken from a survey that faculty submitted on completing the module; and statistics on faculty’s use of the module (List 2). To determine characteristics of the faculty who completed the CS-PIM, we used professional and demographic data reported when they enrolled in the ABIM’s MOC program. The data cover the period from March 2009, when the CS-PIM first became available to ABIM diplomates, through October 2010, when the postmodule feedback survey was closed.
Kirkpatrick Level 1: Reaction
To evaluate reaction, we analyzed usage statistics and data from the feedback survey. Faculty were automatically requested to complete this survey on finishing the CS-PIM. The survey, hosted online using Grapevine Surveys (grapevinesurveys.com), was anonymous (not linked to specific participants) and voluntary (respondents were free to skip any or all questions). The 42-item survey used dichotomous, open-ended, and five-point Likert-type responses (anchors were either “poor” to “excellent” or “strongly agree” to “strongly disagree”) to gauge faculty’s experience with each module component. Specifically, the survey measured their perceptions regarding the module’s value in helping them observe trainees, identify errors, provide feedback, audit medical records, reflect on the summary report, and develop an action plan, and the module’s overall facilitation of their direct observation and feedback skills. We also inquired about the module’s usability, time requirements, and barriers to completion. A single author (S.G.R.) used summative content analysis to analyze narrative data captured in the open-ended responses.19 Because the survey was anonymous, we compared module completion time stamps to estimate how many faculty completed the survey.
Kirkpatrick Level 2: Learning
To evaluate learning, we used responses to the retrospective pre/post self-assessment questions that prompted faculty to consider how their knowledge and skills had changed. Each question asked faculty to use a five-point Likert-type scale (1 = “low” to 5 = “high”) to assess themselves in each of the four targeted skills, both before and after the module. Retrospective pre/post surveys, which have been used to evaluate skills for training programs,20 can overcome the design flaw in traditional pre-/postintervention surveys, where learners, before the intervention, may be unaware of what they do not know.
Kirkpatrick Level 3: Behavior
Finally, to assess changes in behavior, the feedback survey asked faculty whether, as a result of completing the module, they had made any changes to their own evaluation strategies or clinical care of patients. They were encouraged to provide written feedback to these two questions. One author (S.G.R.) used summative content analysis to identify themes,19 importing comments into two spreadsheets and, using formulas that counted recurring key words (e.g., “observation”), measuring the frequency with which comments were related to one or more of the module’s four skill domains. Next, S.G.R. read through a sample of comments where domain key words appeared and applied formulas to count the occurrence of key words that conveyed context or modifying characteristics (e.g., “more frequent”).
We did not measure Kirkpatrick Level 4, which is results. The CS-PIM was not designed to collect patient-identifiable information, and we did not ask faculty to disclose patient care data in the survey. Therefore, we could not assess the direct impact on patient care outcomes. Nor did we ask the trainees whether the feedback they received was helpful or effective.
We used descriptive statistics and analyzed retrospective pre/post self-report scores using the Wilcoxon signed rank test with statistical significance evaluated at the P < .05 alpha level. We analyzed data using SPSS 12 (International Business Machines Corp., Armonk, New York) and used Microsoft Excel and Access 2003 (Microsoft Corp., Redmond, Washington) for data importing, preparation, and content analysis. The CS-PIM does not require information that identifies trainees; if faculty inadvertently entered such data, we removed them prior to analysis. The Essex institutional review board (Lebanon, New Jersey) approved this secondary data analysis research study.
Using the time stamp information, we determined that, from March 2009 through October 2010, 644 faculty physicians completed 647 CS-PIMs and 647 feedback surveys. The discrepancy comes from three physicians who each completed two separate modules; we included data from these repeated modules because it was not possible to link specific modules and feedback surveys, and because the trainee data were likely unique between the two instances of the completed modules.
Table 1 provides demographic and other characteristics of the faculty who completed the CS-PIM. They reported spending an average of 63% of their patient care time supervising trainees (median: 70%; range: 1%–100%); those who did not exclusively supervise trainees spent the remaining time in direct or consultative patient care.
A large majority (79%) completed only the required minimum of 10 observation–audit cycles; the remaining 21% completed more (between 11 and 20) before requesting their summary report. The mean number of days over which faculty completed the module was 67 days (SD ± 77; median: 44). Table 2 provides additional observation characteristics.
Kirkpatrick framework measures
Because the questions within the feedback surveys were optional, the denominators for the items discussed below vary. They are indicated in the tables.
Kirkpatrick Level 1: Reaction. The majority of faculty (91%) rated the overall value of the CS-PIM in facilitating their observation and evaluation skills as excellent, very good, or good; the remaining faculty rated the module as fair or poor. As seen in Table 3, when rating the value of individual module components with respect to time and effort spent, between 85% and 95% of respondents gave favorable ratings (excellent, very good, or good).
We also measured the value of the module for faculty fulfilling MOC requirements. Most (93%) respondents would consider using the module again for MOC purposes; 93% agreed that 10 required observation–audit cycles was the appropriate amount for the MOC credit awarded. The vast majority of respondents (93%) would recommend the module to a colleague, and nearly two-thirds (64%) would even consider using the module beyond MOC to help them in their ongoing supervision and teaching.
Criticisms included the module’s inflexibility for describing the clinical setting where the observation takes place, its lack of applicability for subspecialty program settings, and the burden of the time commitment required. Some respondents were skeptical that the audit of the medical record—especially the focus on patient safety—was relevant to their supervisory role.
Kirkpatrick Level 2: Learning. Using data from the retrospective pre/post self-assessment, we ascertained that faculty’s self-reported skills in all four domains (direct observation, providing feedback, identifying errors, and auditing medical records) improved significantly (P < .001) (Table 4). Providing quality feedback to trainees was the skill mentioned most (by 81% of respondents) as having improved; it also had the greatest mean increase between pre and post scores (1.0, SD = 0.7).
Kirkpatrick Level 3: Behavior. Overall, 89% of faculty reported making a change in their direct observation of trainees, and 84% reported changing the way they provide feedback to their trainees as a result of completing the CS-PIM. Of the 533 faculty who reported making either change and provided a narrative response, we found that 46% (247) of such responses specifically involved the key word “feedback” and were most frequently related to improving feedback timeliness (key words: “immediate,” “soon,” “timely,” or “prompt”; 105 [43% of 247]). A single key word (“specific”) accounted for 17% (41) of comments regarding feedback. We found other modifiers (key words: “closely,” “thorough,” “detail,” “focus,” or “careful”; 28 [11% of 247]) that also described feedback content. Comments involving “observation” comprised 23% (123) of comments and were most commonly modified with faculty intending to observe with greater frequency (34 [28% of 123]).
Forty percent of the faculty reported making changes in their own clinical practice as a result of completing the module, but of the 237 narrative examples offered to back up these claims, 31% actually described the evaluation of trainees. Of those comments that did clearly describe faculty members’ own practices, the changes most frequently mentioned included “documentation” (18%) and “medication” (12%).
Discussion and Conclusion
Although the apprenticeship model of medical education—supervision by a mentor—has been around since the Flexner Report,21 one could argue that future physicians still lack optimal guidance in their clinical training.22,23 With the launch of the Outcomes Project in 200124 and the transition from a graduate training model that relies on knowledge acquisition to one that emphasizes the competent application of knowledge, skills, and attitudes, guidance becomes even more vital if learners are to master the abilities needed for practice.3 The ABIM developed the CS-PIM to help faculty evaluate and improve their mentoring skills—specifically, direct observation, providing feedback, identifying errors, and auditing medical records.
Using the Kirkpatrick evaluation framework, we found that the module was positively received, that it taught faculty something about their own observation and feedback skills, and that it helped them make changes, not only to the way they supervise their trainees but also to the way they themselves practice medicine.
Despite the faculty’s overall satisfaction with the CS-PIM and their stated willingness to use it again, only 21% completed more than the minimum required 10 observation–audit cycles, and only three completed two modules. This may be explained by faculty being anxious to see their summary reports (which became available after the 10th cycle) or faculty only paying attention to minimum requirements and not considering the possible benefit of additional cycles. Perhaps faculty desired to complete the module as quickly as possible. Although the module mainly aligns with “usual” direct observation practices, it does introduce additional steps (e.g., finding a computer, logging in, documenting the observation and medical record audit) that might be considered burdens. This last rationale may also explain why so few faculty completed another module for additional MOC credit or for their own supervisory needs.
We acknowledge certain limitations in our study. First, data from the module and feedback survey were self-reported, and we did not audit the module data for accuracy. Second, selection bias is possible because physicians choose what PIM they wish to complete for MOC credit, and we did not follow up with faculty who chose not to complete the module. Third, we did not query trainees to determine whether the feedback they received was helpful or effective, or measure health care results of the patients involved—both of which would have been appropriate indicators of “results” in the Kirkpatrick framework. Fourth, because the postmodule feedback survey was anonymous and optional, we cannot ensure that all faculty completed their survey immediately after module submission, thus introducing potential respondent recall bias. We used the time stamp of module submission to approximate the surveys completed by those physicians selected for analysis. Finally, lacking a “baseline” measure of the faculty physicians’ observation and feedback sessions, we cannot conclusively state whether there was a change in the quantity or quality.
Future research on the CS-PIM using this framework should focus on results. These might include trainees’ perceptions on the value of the supervisory feedback they receive, an analysis of patient outcomes directly related to the measures that are audited in the module, or a comparison of patients’ satisfaction with care provided by trainees who have received feedback from faculty completing versus not completing the CS-PIM. Researchers might also explore the adaptation of this module beyond medical faculty and trainees, for use in nursing, pharmacy, and even complex cross-disciplinary training models that exist in various clinical settings.
The medical education system depends on supervising physicians who correct and guide trainees toward clinical competence, ensure that they deliver safe, effective care to patients, and build their confidence before sending them out into the world to practice medicine. Our study suggests that the ABIM’s CS-PIM helps faculty do just that by developing their skills in direct observation, feedback, and other supervisory responsibilities.
Acknowledgments: The authors thank Lisa Conforti, MPH, and Sarah Hood, MS, for their assistance in reviewing and providing valuable feedback during the development of this manuscript.
Funding/Support: This study was supported by funding from the American Board of Internal Medicine. The study sponsor had no role in the design and conduct of the study; the collection, management, analysis, and interpretation of the data; or the preparation, review, or approval of the manuscript.
Other disclosures: None.
Ethical approval: The Essex institutional review board (Lebanon, New Jersey) approved this secondary data analysis research study.
1. ten Cate O. Entrustability of professional activities and competency-based training. Med Educ. 2005;39:1176–1177
3. Duffy FD, Gordon GH, Whelan G, et al.Participants in the American Academy on Physician and Patient’s Conference on Education and Evaluation of Competence in Communication and Interpersonal Skills. Assessing competence in communication and interpersonal skills: The Kalamazoo II report. Acad Med. 2004;79:495–507
5. Kogan JR, Holmboe ES, Hauer KE. Tools for direct observation and assessment of clinical skills of medical trainees: A systematic review. JAMA. 2009;302:1316–1326
6. Kassebaum DG, Eaglen RH. Shortcomings in the evaluation of students’ clinical skills and behaviors in medical school. Acad Med. 1999;74:842–849
7. Holmboe ES. Faculty and the observation of trainees’ clinical skills: Problems and opportunities. Acad Med. 2004;79:16–22
8. Pulito AR, Donnelly MB, Plymale M, Mentzer RM Jr. What do faculty observe of medical students’ clinical performance? Teach Learn Med. 2006;18:99–104
9. Donato AA, Pangaro L, Smith C, et al. Evaluation of a novel assessment form for observing medical residents: A randomised, controlled trial. Med Educ. 2008;42:1234–1242
10. . 2011 data from FasTrack (ABIM’s annual trainee evaluation system; data not public)
11. Meuleman JR, Caranasos GJ. Evaluating the interview performance of internal medicine interns. Acad Med. 1989;64:277–279
12. Pfeiffer C, Madray H, Ardolino A, Willms J. The rise and fall of students’ skill in obtaining a medical history. Med Educ. 1998;32:283–288
13. Mangione S, Burdick WP, Peitzman SJ. Physical diagnosis skills of physicians in training: A focused assessment. Acad Emerg Med. 1995;2:622–629
14. Mangione S, Nieman LZ. Cardiac auscultatory skills of internal medicine and family practice trainees. A comparison of diagnostic proficiency. JAMA. 1997;278:717–722
15. Ramsey PG, Curtis JR, Paauw DS, Carline JD, Wenrich MD. History-taking and preventive medicine skills among primary care physicians: An assessment using standardized patients. Am J Med. 1998;104:152–158
16. . American Board of Internal Medicine. Maintenance and recertification guide. 2012 http://www.abim.org/moc/
. Accessed July 6
17. Kolb DA Experiential Learning: Experience as the Source of Learning and Development.. 1984 Engelwood Cliffs, NJ Prentice Hall
18. Kirkpartick DL Evaluating Training Programs: The Four Levels.. 19982nd ed San Francisco, Calif Berrett-Koehler Publishers, Inc.
19. Hsieh HF, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. 2005;15:1277–1288
20. Levinson W, Gordon G, Skeff K. Retrospective versus actual pre-course self-assessments. Eval Health Prof. 1990;13:445–452
21. Cooke M, Irby DM, Sullivan W, Ludmerer KM. American medical education 100 years after the Flexner report. N Engl J Med. 2006;355:1339–1344
22. Kilminster SM, Jolly BC. Effective supervision in clinical practice settings: A literature review. Med Educ. 2000;34:827–840
23. Baldwin DWC Jr, Daugherty SR, Ryan PM. How residents view their clinical supervision: A reanalysis of classic national survey data. J Grad Med Educ. 2010;1:37–45
24. Carraccio C, Burke AE. Beyond competencies and milestones: Adding meaning through context. J Grad Med Educ. 2010;2:419–422