With increasing fiscal pressures on academic medical centers, many institutions are moving towards mission-based financing, the notion that the clinical, research, and teaching missions must no longer depend upon cross-subsidization but must financially support themselves.1 With this increased mission-specific accountability, there will be greater emphasis on measurable outcomes to justify the costs associated with the mission. In the realm of clinical teaching, the literature is replete with studies of qualities of excellent teachers,2 studies of how to measure teaching,3 and studies demonstrating that faculty development in teaching can influence clinical teachers' self-reported behaviors,4 actual behaviors,5 and teaching ratings.6 However, for the most part, the fundamental outcome of teaching has been left unstudied: that is, does the quality of teaching actually influence student learning? Although this may seem a truism too obvious for investigation, despite the cherished belief of clinical teachers there is very little quantitative evidence that better teaching is associated with enhanced student learning.
We recently reported the first documentation of the association of students' learning with the relative teaching abilities of attending physicians7,8 and residents.9 In these studies of students and their clinical teachers over the academic years 1993–1995, we found that medical students who worked on their internal medicine or surgical clinical clerkships with our best clinical teachers scored significantly higher on post-clerkship examinations and even on the U.S. Medical Licensing Examination (USMLE) Step 2. Our findings have been replicated at the University of Michigan.10 The only other study noting an association of teaching with learning, published in 1983, involved high school students in a remedial math class.11 To our knowledge, this is the extent of the quantitative evidence in all the educational literature that better teaching is associated with better learning.
Our previous reports, however, had several limitations. For one, our measure of teaching “quality” was based only on students' ratings. One can argue (as we did in those articles) that the learners are the best judges of the learning climate. Even though we controlled in our analysis for prior student academic achievement (USMLE Step 1 scores), it was possible that students especially excited about internal medicine scored better on internal medicine examinations and, in their enthusiasm, rated their instructors higher, with a spurious association of examination performance and teaching rating. Second, though statistically significant, our effect sizes were modest, amounting to one-sixth to one-seventh of a standard deviation on a test, or, for example, three points on the USMLE Step 2. Third, these studies encompassed only two academic years, and a limited number of teachers and students. Because this sample was small, we included in the analysis all teachers regardless of the numbers of students they worked with, even those with few teaching ratings. Though we were gratified to demonstrate an association between teaching and learning, our results may have been attenuated by the small sample and the inclusion of in the analysis of all teachers, regardless of numbers of teaching evaluations (teachers with imprecise measures of their teaching ability).
Therefore, the purpose of this project was to refine the method of our previous studies by using a larger sample of students and attending physicians, more precise measures of teaching ability, and a way of disentangling the potential confounders of raters and teachers. Our formal hypothesis was that students who are exposed to our highest-rated attending physicians during their internal medicine clerkship will score better on end-of-clerkship examinations and on the USMLE Step 2.
This work represents an extension of the data set from our previous reports,8,9 extending the sample size from two academic years to six. The study design, a prospective cohort study, involves data on students and their attending physicians, and notes the association of the students' examination performances with the “quality” of the attending physicians to whom they were exposed. The participants were all third-year medical students at the University of Kentucky College of Medicine, over the academic years 1993–1999 and their attending physicians on the inpatient general medicine services.
To give the reader a sense of the structure of our clerkship, students in the third year spend eight weeks on general medicine inpatient services, four at our university hospital, and four at our affiliated Veteran's Affairs hospital. A team consists of an attending physician, a supervising junior or senior medicine resident, two first-year residents, and two students. Importantly, students, housestaff, and attendings are randomly and independently assigned to the services (we do not take requests by students for specific attendings). Attending physicians may be either general internists or specialists. Note that students are exposed to new and different attending physicians and housestaff in each of the two four-week components of the clerkship. Ambulatory medicine is part of a separate primary care clerkship and is not included in the study. Attending physicians usually participate in or observe daily management rounds, and have formal separate teaching rounds three times per week, ideally focused on one or two patients on the service, usually at the bedside.
Our model for how teaching might influence students' learning was not that students would be influenced by the average teaching ability of all the instructors they worked with, but rather that students' learning would be enhanced by individual outstanding instructors who, in the learning climate they engender and the inspiration they provide each day, stimulate students to be excited about clinical medicine, resulting in students' learning not only throughout the clerkship but throughout all their clinical rotations. Therefore, we explored the associations of students' learning with exposures to particularly outstanding (or poor) attending physicians, rather than with the average ability of their two attending physicians.
In our prior studies,8,9 we simply defined “best” and “worst” attending physicians as those with the highest and lowest teaching evaluations, as rated by students. However, as mentioned in our introduction, this could lead to a confounding of teaching rating with examination performance by a student who may perform better (and rate the physician attending higher) because of interest in internal medicine. Therefore, for this study, we elected to pursue an alternative method of identifying teaching quality. We surveyed a consensus panel of third- and fourth-year residents at our institution who had also been medical students here. These individuals would have had five to six years of exposure to the clinical teachers at out university, working with a great majority of them. We also chose former students who were residents because they would be most familiar with the special needs and expectations of our internal medicine clerkship. We gave these residents a list of all faculty in internal medicine who had supervised more than five medical students during the six-year period. The threshold of five students was chosen because this was the number of evaluations we calculated were needed to achieve conventional standards of reliability for our clerkship's teaching evaluation form (greater than 0.80), and it would help identify those attending physicians for whom we had precise measures of their teaching ability. We asked the residents to confidentially rate faculty “high” if they would expect them to be rated among our best teachers, “low” if they were among our worst teachers, and “medium” if they would be in between. A priori, we defined “best” attending physicians as those that were named high rated instructors by 80% of the residents (at least 12 of the 15 residents) and were not mentioned as a low-rated instructor by any resident. Conversely, realizing the tendency for learners to rate even the worst instructors at least mediocre, we defined the “worst” attending physicians as those who were rated in the low category by at least five of the 15 residents, and who were not rated high by any of the residents.
For this study, students' evaluations of attending physicians' teaching quality were also collected over the six years, as further evidence of the validity for our consensus panel opinion (one would expect the instructors who were highly rated by residents' consensus to also have high teaching ratings if the consensus process is valid). Our measure of attending physicians' teaching quality was from confidential, end-of-month student evaluations, which were completed prior to the students' receiving their grades. The form consists of 16 items on a five-point Likert-type scale (1 = strongly disagree, 5 = strongly agree). Items included ratings of teaching skills and ability, rapport with learners and patients, overall rating, and ratings of their role modeling. The coefficient alpha for the evaluation form is.96. This means that there is a high degree of internal consistency among items for rating teaching, and that the instrument is a reliable measure of teaching. However, this also means that inter-item correlations are very high, for our form.75 to.95, which is not unusual for measures of clinical teaching.12 Because of the high inter-item correlations, we used the mean rating across all items as one measure of teacher “ability.” The overall rating an instructor was assigned in our data set was the mean of all the ratings from the students he or she worked with in the six academic years.
Our analysis used multiple regression approaches from the general linear model.13 Our dependent variables were scores on the National Board of Medical Examiner's (NBME) subject examination in medicine, taken at the end of the clerkship, and USMLE Step 2 scores. Independent variables included dummy coded variables for different categories of attending exposure (i.e., high-rated versus low-rated versus neither high- nor low-rated attending physician exposure). We also included USMLE Step 1 scores in the model as a control variable for prior student academic achievement.
Data were collected from 502 third-year medical students (100% of students) over the six academic years. We excluded 18 students who had worked with both a high-rated and a low-rated instructor (as our model was less clear about how this interaction might influence student learning), for a final sample of 484. A total of 46 attending physicians had more than five student evaluations over the six-year period and were included in the list that the consensus panel rated.
Overall, ten faculty met the criteria to be rated “high.” Eight of the ten were rated high by all residents, and the other two by 13 and 14 residents, respectively. Four of the ten were general internists. Eight were men and two were women, which reflects our faculty demographics. Five faculty met our consensus criteria for a “low” rating, including one general internist and one woman.
Teaching evaluations were received from 96% of the students. The overall mean teaching rating of the teachers rated high was 4.68 on the five-point scale (SD = 0.22, range 4.23–4.94). For the “medium” group, the mean teaching rating was 4.34 (SD = 0.32, range 3.4–4.92). For the “worst”-rated attending physicians, the mean rating was 3.56 (SD = 0.48, range 3.06–4.21). Mean differences between groups were highly statistically significant (p < 0.001). Forty-five students had had exposures to at least one low-rated and no high-rated attending physician; 219 had had exposures to at least one high-rated and no low-rated attending physician; and 220 had had exposures to neither a high- nor a low-rated attending physician. Our high-rated attending physicians were more often attending physicians on the general medicine inpatient services than were the low- or medium-rated faculty, hence the disparity in numbers of students per faculty.
Table 1 presents the least-square mean scores on the post-clerkship NBME subject examination in medicine and on USMLE Step 2, depending on exposure to high-, low-, or medium-rated instructors (least-square means are predicted means adjusted for USMLE Step 1 scores). As can be seen, students who worked with at least one of our consensus panel's highly rated instructors scored significantly higher on the post-clerkship NBME examination in medicine and USMLE Step 2.
Our findings once again confirm the association of better clinical teaching with better student examination performance, demonstrating in a quantitative fashion the outcomes of teaching. The effect sizes in this current project are much more substantial than those in our prior reports, amounting to one-fourth to one-third of a standard deviation, or, for example, up to seven or eight points on USMLE Step 2, versus the one-sixth to one-seventh standard deviation effect sizes of our prior reports. We attribute our stronger conclusions to the more refined method of this current project. First, our previous reports included all instructors, regardless of the numbers of students they had taught, and therefore all faculty were eligible to be included in the high- or low-rated category even if they had few student evaluations. For example, we may have included in our high category those faculty with only two or three ratings that were all high, when over time their ratings might have regressed towards a more stable mean that did not qualify them as such. In essence, we were categorizing some instructors as better or worse by using imprecise measures of their teaching ability. This imprecision would tend to add background “noise” to the analysis, attenuating our findings and effect sizes. Second, we disentangled learner outcomes from ratings by learners with our residents' consensus panel. As shown, attending physicians who were rated highly by the residents' consensus panel had significantly higher teaching ratings than did the medium- and low-rated instructors. Our previous method, relying on categorization solely by teaching rating, may have led to the exclusion of some otherwise excellent clinical teachers simply because they did not quite meet the “top 20% of student evaluations” we had required in our previous report to be considered a highly rated instructor.
Our findings seem to indicate that clinical teaching has an influence on outcomes, such as performance on USMLE Step 2. One might wonder how a short four-week exposure in a single discipline could influence USMLE Step 2 scores to such a degree, given that USMLE Step 2 comprises a wide variety of disciplines. Our answer is suggested by our model. From our experience as learners, the influence of a single outstanding instructor on one's approach to learning should not be underestimated. We suspect that the best teachers do not necessarily impart more factual information (facts which may be obsolete in a few years), but rather they engender a learning climate that makes learning fun, enjoyable, and exciting. They may do this by their example, by modeling the process of lifelong learning, by the joy they bring to their teaching, or by combinations of qualities such as these. Regardless, the learner's approach to learning is in some fundamental way changed, carrying over to the other clerkships and, we hope, to residency and beyond. Further studies should investigate the influence of outstanding teachers on life-long learning.
Several limitations to our study should be kept in mind as one interprets our results. This is a single-institution, single-discipline study, and certainly national studies are needed to assert the generalizability of our findings, as well as studies in other disciplines. In addition, our study focused on but one outcome measure, students' performances on NBME-type examinations, which measure but one aspect of clinical ability (knowledge). Future research should investigate the influence of teaching on other student outcomes, such as clinical skills, attitudes towards patients and the profession, and doctor—patient communication and relationships. Finally, this project's method did not lend itself as well to measuring the influence of residents' teaching on students' outcomes, so further studies are needed.
Nevertheless, despite these limitations, we conclude that attending physicians' teaching quality can have a measurable impact on students' examination performances. We therefore believe it is possible to begin considering learners' outcomes as an important measure of faculty's teaching ability, perhaps (with more study) an important addition to teaching portfolios and promotion dossiers. But even more, we believe our findings add to the growing literature on the critical importance of the educational mission that indicates students' learning would be jeopardized if the educational mission were to be compromised for fiscal reasons.
1. Watson RT, Romrell LJ. Mission-based budgeting: removing a graveyard. Acad Med. 1999;74:627–40.
2. Irby DM. What clinical teachers in medicine need to know. Acad Med. 1994;69:333–42.
3. Speer AJ, Elnicki DM. Assessing the quality of teaching. Am J Med. 1999;106:381–4.
4. Skeff KM, Stratos GA, Bergen MR, Sampson K, Deutsch SL. Regional teaching improvement programs for community-based teachers. Am J Med. 1999;106:76–80.
5. Wilkinson L, Sarkan RT. Arrows in the quiver: evaluation of a workshop in ambulatory teaching. Acad Med. 1998;73(10 suppl):S67–S69.
6. Skeff KM, Stratos GA, Berman J, Bergen MR. Improved clinical teaching: evaluation of a national dissemination program. Arch Intern Med. 1992;152:1156–61.
7. Blue AV, Griffith CH, Wilson JF, Sloan DA, Schwartz RW. Surgical teaching quality makes a difference. Am J Surg. 1999;177:86–9.
8. Griffith CH, Wilson JF, Haist SA, Ramsbottom-Lucier M. Relationships of how well attending physicians teach to their students performance and residency choice. Acad Med. 1997;72(10 suppl):S118–S126.
9. Griffith CH, Wilson JF, Haist SA, Ramsbottom M. Do students who work with better housestaff in their medicine clerkship learn more? Acad Med. 1998;73(10 suppl):S57–S59.
10. Stern DT, Gill A, Gruppen LD, Grum CM, Woolliscroft JO. Do the students of good teachers learn more? Abstract presented at Research in Medical Education Annual Meeting, Washington, DC, November, 1997.
11. Friedman M, Stamper C. The effectiveness of a faculty development program: a process-product experimental study. Rev Higher Educ. 1983;7:49–65.
12. Marriott DJ, Litzelman DK. Student's global assessments of clinical teachers: a reliable and valid measure of teaching effectiveness. Acad Med. 1998;73:572–4.
13. Cohen J, Cohen P. Applied multiple regression correlation analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates, 1983.
Research in Medical Education: Proceedings of the Thirty-ninth Annual Conference. October 30 - November 1, 2000. Chair: Beth Dawson. Editor: M. Brownell Anderson. Foreword by Beth Dawson, PhD.