In this issue, three related articles appear that focus on the important topic of student assessment. Although these articles differ substantially in their approach to data collection, and although they address different areas of assessment, the reports by Bloodgood and colleagues,1 Hauer and colleagues,2 and White and colleagues3 remind us that assessment, whether formative or summative, has the potential to serve many important educational roles. Although some have asserted that testing itself can enhance learning,4 assessment should not be seen as an end in itself. When we assess students, we should do so considering carefully the specific purposes that our assessments serve, and we should be certain to rely on approaches and methods of assessment that fulfill their full potential to reinforce our educational goals. Moreover, the benefits of assessment should justify its costs.
In the first of these reports, Bloodgood and his colleagues1 present a comprehensive prospective evaluation of the impact of the conversion at the University of Virginia School of Medicine from a graded system to a pass–fail system in the preclinical years. The authors are to be congratulated for having had the foresight to design a prospective study to assess a range of potential outcomes of this switch, both good and bad. Recognizing that eliminating grades might cause students to work less hard, attend class less often, or perform less well, these authors found instead—much to the relief of those of us who have a nongraded preclinical curriculum—that the change had no demonstrable deleterious effect on several different measures of learning and effort. Anticipating that the change to pass–fail grading might also increase student well-being and reduce perceived stress, Bloodgood and colleagues measured indicators of these outcomes as well and found significant improvements in these domains.
In considering these findings, we would raise two additional issues, the first involving an outcome that was missing from this paper, the second involving a potential downside of pass–fail grading that is often overlooked. Concerning missing outcome measurements, although the authors noted that one of the main purposes of the switch from grades to pass–fail was to eliminate competition and to encourage cooperative and collaborative learning among students, surprisingly, the authors did not report the extent to which competition decreased or collaboration increased, nor did they report any measures, whether direct in the form of observation or indirect in the form of students’ self-report, of these critical outcomes.
At our school, whenever we attempt to modify or even reexamine our long-standing pass–fail preclinical grading system, our students remind us in no uncertain terms about the need to preserve the value of pass–fail grading in minimizing student competitiveness, which they perceive as a nefarious influence to be avoided at all costs. We wish the authors had indeed measured whether students began acting more cooperatively, formed more study groups, and/or became more supportive of one another’s learning efforts. We are convinced that the Virginia students felt better and, based on objective tests scores, learned as much, but did the change in grading systems begin to transform the academic culture of the medical school, as we would hope that it did? We are also curious about whether the conversion from grades to pass–fail assessment was accompanied by any change in teaching emphases, for instance, from the conventional lecture approach to more interactive and small-group active learning formats, and whether such changes had a potentially confounding influence on the outcomes measured.
A second issue relates to the ability of faculty to monitor students’ performance and progress within a pass–fail system. Although every student who exceeds the passing threshold for each course is considered to have passed, the adoption of a nongraded system has the potential to limit the faculty’s awareness of students’ progress and their level of performance over time. If a student is struggling academically but maintaining exam scores consistently in the marginal range just above passing, such a pattern would not necessarily come to the attention of the faculty at large and advisors who oversee the student’s progress. As a result, faculty would be deprived of the warning they need to intervene preemptively and initiate appropriate remediation.
Recently, at Harvard Medical School, we began monitoring to identify students who, although they are passing, are performing consistently in the bottom quartile of the class on their course final exams. When such consistent marginal performance is recognized, the student’s society master (our equivalent of an academic advisor) is notified and meets with the student to determine whether he or she is in need of academic assistance, a formal remediation plan, or personal support. We make every effort to establish the fact that such feedback is not meant to be punitive but, instead, to be helpful and to fulfill our responsibility to our students.
Monitoring of student progress during the preclinical years and providing early intervention for students experiencing difficulties should not fall victim to the switch from letter grades to pass–fail grading. In addition, such monitoring, appropriately designed, need not undermine cooperative learning as long as students understand that low-passing final exam grades are being monitored exclusively to identify consistently weak academic performance as a signal for early intervention—whether academic or personal. The objective is to help students perform at their highest level and progress toward achieving their academic goals long before such marginal preclinical performance translates into inadequate preparation for, and performance during, the clinical years.
The second paper, by Hauer and her colleagues2 at the University of California, San Francisco, which focuses on assessment in the later years of students’ undergraduate medical careers, deals with assessment of clinical skills through the use of standardized patient (SP) exams, typically referred to as observed structured clinical exams (OSCEs). Reporting the results of a national survey of US medical schools, Hauer and colleagues note that most (88%) of the 82 schools that responded conduct such an exam in years three or four and that, without exception, each school anticipates the continuation of the exam for at least another three years—that is, for the immediate future. Although no data were provided on the financial, administrative, or faculty-resource costs of these exams, even schools with the least elaborate exercises (reported range of 4–14 stations of 11–20 minutes each) must be expending considerable resources to conduct these assessments.
To return to our original theme, we should ask what purpose these 82 schools believe is being served in return for expending so much time, attention, administrative resources, and money on these assessments in the clinical years. Because SP exams have been demonstrated to be valid indicators of future clinical performance,5,6 when students fail these exams, why do approximately one quarter of these schools require no form of remediation and fewer than half require passing a retest before graduation? One possibility, entertained by the study’s authors, is that some schools consider these exams as purely formative exercises, designed as practice for the United States Medical Licensing Exam Step 2 Clinical Skills exam. We wonder, however, why any school would go to the considerable effort to mount an OSCE if it did not trust sufficiently in its results or if the faculty had no plan to pursue and act on exam findings.
If we tested emergency medical technicians to determine whether they knew how to act in a standardized defibrillator simulation exercise and discovered that some of them did not quite know how or where to place the paddles, we could not imagine that we would allow such trainees to arrive at the scene of an actual emergency before we could guarantee that they had received remediation and achieved a passing grade. Regardless of whether the SP exercise is seen locally as formative or summative, the absence of a plan for follow-up, remediation, and reassessment after going to the trouble of conducting and scoring the exercise would seem to make the exercise virtually meaningless.
Asking why assess without consequences or follow-up action would remain largely in the realm of the hypothetical if not for the third paper, by White and her colleagues3 at the University of Michigan Medical School. At Michigan, where passing each of their 13 OSCE stations is a requirement for students to graduate, remediation is required and given on a station-by-station basis. Remediation there takes a multifaceted approach in which students are required to do additional reading relevant to the content of the case and then view a video of their own performance side-by-side with that of an “ideal” performance. Then, students who failed an OSCE station are expected to provide a written reflection contrasting the difference in performance between their video and that in the ideal video. Although the data presented indicate that their remediation program was highly successful in helping students succeed when retaking the failed stations, the addition of written faculty feedback did not improve performance further, nor can we determine from the reported findings whether students gained meaningful insight into their ability. We would hope and assume that these issues will be studied further.
Taken together, these papers underscore three points about the role of assessment in medical education:
1. Because assessment has many functions, assessment also has many stakeholders. Assessment provides students feedback on where they stand and motivates them to master the material, at a minimum for the base purpose of “passing the test.” For the faculty, assessment most often drives curriculum, even though the reverse would apply in a perfect world. In addition, assessment allows faculty to gauge where students stand across a range of learning domains and obligates faculty to provide students with opportunities to demonstrate growth and development in knowledge, skills, and attitudes. Finally, although not much studied and with relatively little evidence to demonstrate the relationship,7 we consider self-evident the notion that society benefits from assessment that motivates and measures competence, because patients suffer each time we graduate an ill-prepared physician.
2. Assessments ought to be designed and targeted to yield truly valuable information. At the same time, we need to recognize that assessment can bring with it competitiveness and stress, and, therefore, assessment systems need to be devised to maximize beneficial, useful outcome data and minimize harmful influences on learning comfort and collaboration.
3. Collection of assessment data is a hollow exercise unless the data are used and consequences follow for subpar performance. To be sure, however, the consequences of identifying poor performance should not be punitive, but, instead, assessment ought to be followed by assistance and guidance, feedback, instruction, and remediation. The goals of assessment should be to help students demonstrate improved performance, to stimulate them to be more self-reflective and insightful about their strengths and weaknesses, to arm them with the appropriate tools for self-improvement, and to make them aware of resources for seeking help when necessary.
Assessment is never “neutral” but should always be designed to enhance student learning and to facilitate a comfortable learning environment. When necessary, assessment also allows faculty the opportunity—or, more accurately, allows them to fulfill their duty to students—to identify marginal performance and to provide support and remediation.