Medical schools have a responsibility to provide high-quality learning experiences for their students. Consequently, most schools ask students to rate the quality of their learning experiences, and schools then use this feedback in efforts to improve future ratings.1–3 Yet, surprisingly little attention has been given to studying which components of a curriculum influence students' overall ratings.
For the purpose of quality improvement, at the University of Calgary we ask our medical students at the completion of each undergraduate course to rate various individual parts of the curriculum. In this study, our goal was to use these data to study the association between components of the curriculum and students' overall ratings of a course. But, before we could address this issue, we first had to examine the structure and stability of our rating tool, because previous research has suggested that students frequently misinterpret survey items and that rating scales used for evaluation are prone to systematic biases.3 Given the significant differences in instructional methods between preclerkship courses and clerkship, we decided to limit our analyses to our preclerkship curricula. Thus, our first objective was to identify the principal components of our preclerkship curricula, and our second objective was to study the association between these components and students' overall ratings of each course.
The conjoint health research ethics board at the University of Calgary approved this cross-sectional observational study of first- and second-year medical students. At the University of Calgary, we have a three-year clinical presentation curriculum.4 The first two preclerkship years include seven combined systems courses (Blood & Gastrointestinal; Musculoskeletal & Skin; Cardiology & Respirology; Renal, Endocrine & Obesity; Neurosciences & Aging; Children & Women's Health; and Mind & Family). Instructional methods for all of our preclerkship courses include lectures, small-group learning, and bedside teaching, during which we use diagnostic schemes to guide problem solving and to provide structure for basic science and clinical knowledge.5
During the 2007–2008 academic year, we asked all students to provide feedback on seven courses taught during the first two years of our program. After the summative assessment for each course, we administered an online course evaluation, on which students rated the quality of various aspects of the course. Participation was voluntary and anonymous, and we closed the survey before students received their final grades for the course. The survey and administration process used during the study period had been in place since the 2004–2005 academic year.
The survey tool asked students to rate the content and delivery of the course material (20 items) and the evaluation process (5 items). Each item was presented in a Likert-type format with a response set ranging from 1 (strongly disagree) to 5 (strongly agree). For the overall course rating we used an additional item with a similar scale, but we changed the descriptors (1 = poor to 5 = excellent). The course surveys contained 26 common items.
In keeping with previous studies, we performed an exploratory principal component factor analysis with varimax rotation to examine the dimensionality of our instrument.6,7 We did not include the overall rating of the course in our factor analysis; instead, we reserved this item as an outcome variable for subsequent analyses. We used inductive coding of factors, rather than a priori codes, and based this on both the grouping of individual items and the magnitude of loading.
We estimated internal consistency of the factors using Cronbach's alpha, and we assessed the relationship between individual factors and overall course rating using multiple linear regression (Stata 8.0, College Station, Texas, 2003). In our regression model we also included the interaction between each factor and year of study (first versus second year).
Our students completed 625 of a possible 1,022 surveys, representing an overall return rate of 61.2%. The response rate according to class ranged from 64.5% (391/606) for first-year to 56.3% (234/416) for second-year students. The results of the factor analysis, which revealed eigenvalues greater than 1 and produced a scree plot, allowed us to identify factors contained within our instrument. An initial review of both eigenvalues and the scree plot suggested a five-factor solution; however, the fifth factor lacked clarity and failed to include any items that loaded at our required level of 0.65. Consequently, a four-factor solution was generated. Although the five-factor solution accounted for slightly more variance than did the four-factor solution, the latter was more interpretable and meaningful, and therefore we selected it for use in this study. Consequently, we identified four factors (assessment of students, small-group learning, basic science teaching, and teaching diagnostic approaches) that accounted for 30.0%, 8.7%, 6.2%, and 5.3% of the total variance, respectively. Similar factors were identified when the analysis was performed on each class separately.
Of the 25 survey items, 5 loaded on assessment of students, 6 loaded on the other three factors (2 on each factor), and 14 items did not reach the loading threshold of 0.65, meaning that they did not substantially contribute to any of the factors (Table 1). The unweighted mean scores (out of 5) for each factor ranged from 3.41 for teaching diagnostic approaches to 4.11 for small-group learning, and the alpha coefficients ranged from 0.71 for basic science teaching (2 items) to 0.88 for assessment of students (5 items).
The mean overall rating of the seven courses was 3.41. In our regression model, we found significant interactions between year of study and each of the factors. For first-year students, each factor was significantly associated with the overall rating for the course, whereas for second-year students, only one factor—assessment of students—was associated with overall rating. The results of our regression analyses, stratified by year of study, are shown in Table 2.
We identified four factors that contributed to students' overall rating of an undergraduate course—three related to instruction and one to assessment. But, despite this three-to-one ratio, students' rating of their assessment had the strongest association with their overall rating for a course, and, for second-year students, this was the only variable that was associated with the overall rating. So why are students' ratings of a course dominated by their perceptions of their assessment?
Perhaps the simplest explanation for the influence of assessment is the “recency effect,” which describes the cognitive bias of giving disproportionate salience to recent stimuli.8,9 Here, we asked students to rate the course immediately after their summative assessment, so assessment was the most recent stimulus at that point. But recency effect refers to recall of information and how recall performance is influenced by the time interval after processing of this information by working memory. A student's rating of a course is surely more than a simple recall task.
Preparing for assessment and being assessed both augment learning—a phenomenon referred to as the “testing effect.”10–12 Thus, the association between assessment and overall rating might simply be an acknowledgment by students of how the assessment process contributed to their learning. But this also seems an inadequate explanation for our findings, because this would imply that the contribution of assessment to learning is greater than all other aspects of the curriculum combined.
Learning, and being assessed, can be pleasant or unpleasant, activating or deactivating.13 In other words, these are emotional experiences. Consequently, rating of these experiences is likely to be part emotional and part rational. According to Kahneman et al,14 we judge our emotional experiences according to the “peak-end rule,” that is, how they were at their peak—pleasant or unpleasant—and how they ended. Impressions of both the peak and end of an experience override the weighted average of continuous assessments made throughout the duration of the experience.15 Each of our preclerkship courses ended with formal assessment, which could explain why this attenuated the effect of preceding experiences. But assessment didn't simply attenuate the other experiences; it overshadowed them. Given that the summative assessment was likely the most threatening feature of the curriculum, this primacy of evaluation is consistent with the theory of negativity dominance in emotional experiences, which states that the emotional impact of positive and negative experiences is not equally weighted; instead, negative experiences are much more potent.16 The emotional impact of negative stimuli is also related to emotional health, and the increasing negativity dominance seen in second-year students may reflect the deterioration in emotional health during training.17
Our study has some important limitations. With a response rate of approximately 60%, our sample may not be representative of the class as a whole. This was a single-institution study of preclerkship students that used an end-of-course survey with closed questions. Consequently, our results may not generalize to other institutions. And, even within our own institution, we don't know how our results would change if we repeated this study during clerkship, changed the time of administration of the survey, or adjusted our questionnaire items. Although we tried to sample students' perceptions of quality—rather than difficulty—of the assessment by asking specific questions related to quality, and closing our survey before releasing students' grades, these measures don't guarantee objectivity in rating. Hence, students' attitudes toward assessment quality may still reflect their sense of how they performed—to some, the only fair test is an easy one. Additionally, factors that emerge from a factor analysis are dependent on the number of items included in the survey and the relationship among them. Assessment of students proved to be a major factor in that it made up 5 of the 25 items that were factor analyzed. Consequently, this finding may not necessarily reflect the true significance of assessment in the range of things students consider when evaluating a course. Items included on the survey reflected important elements of the course from our perspective. Had we anticipated that other aspects of a course were important to our students, we would have included items reflecting those additional features as well. Given these limitations, further work to confirm and explain our findings is needed.
As medical educators, it is somewhat disheartening to discover that three months of carefully planned instruction, not including preparation time, are dwarfed by the final three hours reserved for the assessment of students. This, however, does not imply poor-quality instruction, because the mean rating for our preclerkship courses was above 3.0, representing a rating of “good.” Instead, this reflects the inescapable fact that learning and assessment are emotional experiences, perceptions of which are governed by the peak-end rule, and that disproportionate weight is given to negative emotional experiences, such as summative assessments. Rather than rail against this, we would suggest two important messages. First, when the rating of student assessment is included in the end-of-course evaluation, we may lose valuable information on the other course components. Second, assessment of students should not be an afterthought, and effort directed to creating a transparent and valid assessment process is likely to be rewarded in end-of-course evaluations.18
In this study, we identified four principal components of our undergraduate courses: three related to instruction and one to assessment. Students' rating of their assessment had the largest influence on the overall rating of a course, likely because this was not only the final experience of the course but also the most “negative” experience. This finding should serve as a caution not to underestimate the significance of the assessment process. Clearly, our students don't.
The conjoint health research ethics board at the University of Calgary approved this study.
McLaughlin K, Woloschuk W, Coderre S, Wright B. Students' perceptions of their evaluation dominate overall course ratings. Abstract presented at the annual meeting of the Association for Medical Education in Europe, Glasgow, Scotland, 2010.
1 Kogan JR, Shea JA. Course evaluation in medical education. Teach Teach Educ. 2007;23:251–264.
2 Abrahams MB, Friedman CP. Preclinical course-evaluation methods at U.S. and Canadian medical schools. Acad Med. 1996;71:371–374. http://journals.lww.com/academicmedicine/Abstract/1996/04000/Preclinical_course_evaluation_methods_at_U_S_and.15.aspx
. Accessed March 8, 2011.
3 Billings-Gagliardi S, Barrett S, Mazor M. Interpreting course evaluation results: Insights from thinkaloud interviews with medical students. Med Educ. 2004;38:1061–1070.
4 Mandin H, Harasym P, Eagle C, Watanabe M. Developing a “clinical presentation” curriculum at the University of Calgary. Acad Med. 1995;70:186–192. http://journals.lww.com/academicmedicine/Abstract/1995/03000/Developing_a__clinical_presentation__curriculum_at.8.aspx
. Accessed March 8, 2011.
5 Mandin H, Jones A, Woloschuk W, Harasym P. Helping students learn to think like experts when solving clinical problems. Acad Med. 1997;72:173–179. http://journals.lww.com/academicmedicine/Abstract/1997/03000/Helping_students_learn_to_think_like_experts_when.9.aspx
. Accessed March 8, 2011.
6 Boor K, Scheele F, van der Vleuten CP, Scherpbier AJ, Teunissen PW, Sijtsma K. Psychometric properties of an instrument to measure the clinical learning environment. Med Educ. 2007;41:92–99.
7 Schonrock-Adema J, Heijne-Penninga M, Van Hell EA, Cohen-Schotanus J. Necessary steps in factor analysis: Enhancing validation studies of educational instruments. The PHEEM applied to clerks as an example. Med Teach. 2009;31:e226–e232.
8 Highhouse S, Gallo A. Order effects in personnel decision making. Hum Perform. 1997;10:31–46.
9 Constable KA, Klein SB. Finishing strong: Recency effects in juror judgments. Basic Appl Soc Psych. 2005;27:47–58.
10 Carrier M, Pashler H. The influence of retrieval on retention. Mem Cogn. 1992;20:633–642.
11 Roediger HL, Karpicke JD. Test-enhanced learning: Taking memory tests improves long-term retention. Psychol Sci. 2006;17:249–255.
12 Gates AI. Recitation as a factor in memorizing. Arch Psychol. 1917;6:40.
13 Feldman Barrett L, Russell JA. Independence and bipolarity in the structure of current affect. J Pers Soc Psychol. 1998;74:967–984.
14 Kahneman D, Fredrickson BL, Schreiber CA, Redelmeier DA. When more pain is preferred to less: Adding a better end. Psychol Sci. 1993;4:401–405.
15 Miron-Shatz T. Evaluating multiepisode events: Boundary conditions for the peak-end rule. Emotion. 2009;9:206–213.
16 Rozin P, Royzman EB. Negativity bias, negativity dominance, and contagion. Pers Soc Psychol Rev. 2001;5:296–320.
17 Schwenk TL, Davis L, Wimsatt LA. Depression, stigma, and suicidal ideation in medical students. JAMA. 2010;304:1181–1190.
18 Coderre S, Woloschuk W, McLaughlin K. Twelve tips for blueprinting. Med Teach. 2009;31:322–324.