Medical schools increasingly have turned to comprehensive clinical skills assessments as a means of assessing students’ clinical competence,1 often because faculty are reluctant to report substandard performance in clerkships using traditional evaluation methods such as written evaluations by supervisors.2,3 With information from these examinations, medical educators face the challenge of determining a follow-up plan for learners with poor performances who thus require remediation. Remediation can be a complex, multistep process involving diagnosis of learner problems, remediation activities, and retesting.4 Remediation expenses are incurred on top of the already significant costs for the comprehensive assessment, including testing facilities and standardized patients (SPs) as well as faculty time to develop cases, monitor student performance, and provide feedback.
Medical schools’ investments in remediation may vary because of the resources required to build the skills of a small percentage of students who have not achieved proficiency by their senior year of medical school and because of the absence of guidelines about how best to accomplish remediation. With these challenges, it is not surprising that remediation requirements differ across students and institutions, but the factors associated with more structured remediation are not known.4 On the basis of previous research about how infrastructure relates to resident performance,5 we hypothesized that there could be an association between institutional commitment to the comprehensive assessment and consequences of poor performance for students, as shown in Figure 1. Institutional commitment to the comprehensive assessment includes factors such as years of experience with the examination, a comprehensive examination committee to engage faculty in case writing, examination administration, and remediation, and clerkship director involvement, all of which are crucial to assessment and remediation.6
Institutional commitment to enforcing consequences of poor performance on the comprehensive assessment may also reflect confidence in the scores and educators’ ability to take action based on those scores. The challenge of identification and remediation of poor performance is daunting at all levels of practice. For practicing physicians, the current system includes the ad hoc nature of problem identification and the lack of readily available, objective, and valid evaluation measures or a system to respond to problems that are identified.7 Supervisors of practicing physicians and trainees often lack training or experience in addressing major performance problems, and thus the default is either to do nothing or to remediate without confidence in the process.2,7 These factors are all limitations of the current system. Figure 1 shows how satisfaction with and confidence in the remediation process could relate to the willingness to impose consequences and influence how a school operationalizes its remediation program. On the basis of our prior work, we believe that the specific consequences of failure that may result from poor performance on a comprehensive assessment are required remediation, required retesting, and external reporting of failing performance in the medical school performance evaluation (dean’s letter), although this prior work did not explore what characteristics of the school or the comprehensive assessment were associated with implementing these consequences.4
Our hypothesis was that institutional commitment, length of experience with the exam, clerkship director involvement, and satisfaction with and confidence in the remediation process are associated with a strong commitment to remediation and required consequences. The purposes of this study were
1. To characterize the consequences for students of poor performance on a comprehensive assessment using SPs, and
2. To determine whether medical school investment in the comprehensive assessment and satisfaction with and confidence in the remediation process relate to required consequences for poor performance.
In this cross-sectional descriptive study, we surveyed medical school curriculum deans of the 122 U.S. medical schools approved by the Liaison Committee on Medical Education at the end of 2006. Demographic information collected on institutional characteristics included public versus private (UnivSource), region, research funding, and number of enrolled students as reported by the Association of American Medical Colleges (AAMC). An online cover information sheet informed participants that we were “conducting a study to evaluate the current status of clinical skills assessment at medical schools nationally” and served as a waiver of informed consent. The University of California, San Francisco, institutional review board approved the study.
We obtained curriculum deans’ names and e-mail addresses from the online directory maintained by the AAMC Group on Educational Affairs and verified them by reviewing medical school Web sites and through investigators’ personal knowledge. All five study authors developed the survey questions based on our prior work.1,4 Two educators experienced with SP assessment pilot-tested the survey questions for clarity; neither were study authors. We administered the survey through an online survey program after an e-mail invitation to participate; nonrespondents received up to five follow-up e-mail requests.
In the survey, a “comprehensive assessment” was defined as a multistation, cross-disciplinary examination outside of a single clerkship involving SPs. The survey employed skip logic; there were between 13 and 39 items depending on skip patterns so that respondents who answered “no” to certain questions did not see follow-up questions on the same topic. Questions addressed purposes of the examination, examination characteristics, examination leadership (committee members and functions), consequences of failing the examination, impact of the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills (CS) exam, and future plans to continue the comprehensive clinical skills exam or not during the next three years. Participants viewed a list of eight potential consequences of failing based on prior interviews4 and were asked on a 1–5 scale (1 = never, 5 = always) how commonly they employed those consequences. The types of consequences fell into three categories: required remediation, required retesting, and external reporting of failing performance, as shown in Table 1.
On the basis of a model describing how infrastructure contributes to resident performance,5 we conceptualized institutional commitment to the comprehensive assessment by creating variables related to three components of commitment:
1. Years of experience administering the exam (ranging from 1 to 21).
2. Functions of a comprehensive assessment committee in developing and administering the exam (eight possible functions [case writing, checklist development, selecting exam scoring method, standard setting, validating exam scores, determining who failed exam, determining who needs remediation, developing individual remediation plans] scored dichotomously as yes/no, yielding a score of zero to eight; reliability [Cronbach alpha] = 0.95).
3. Clerkship director involvement in case or checklist development, exam scoring, or remediation (four items scored on a 0–4 scale ranging from 0 = never to 4 = always, for total score of 16; reliability [Cronbach alpha] = 0.83).
Factor analysis supported that these three components shared little variance in common; experience was unrelated to either committee functions (r = 0.1) or clerkship director involvement (r = 0.006), and these two were modestly correlated with each other (r = 0.48). We therefore maintained these three measures as separate components of institutional commitment.
Two questions addressed satisfaction with (I am satisfied with our school’s remediation process) and confidence in (I am confident that our school’s remediation process is effective) the school’s remediation process, on a Likert scale (1 = strongly disagree, 5 = strongly agree). These two items were intercorrelated (r = 0.84), and principal components factor analysis indicated that they contributed identical weight to a single factor. Therefore, we averaged these two variables for an overall measure of satisfaction with remediation after the comprehensive examination.
After obtaining descriptive statistics for each variable, we verified that the scores we created were normally distributed. Multiple regression addressed the relationship between the dependent variables (consequences of performance on the examination) and independent variables related to commitment to and satisfaction with remediation. Three regressions were conducted for the three types of consequences: required remediation, required retesting, and external reporting of failing performance. Demographic variables were nonsignificant, explained a small amount of variance, and were not included in the regressions. We used SPSS© version 14.0 (SPSS Inc., Chicago, Illinois) for all statistical analyses.
Ninety-three of the 122 curriculum deans (76%) responded to the survey. There were no significant differences between responding and nonresponding schools in terms of geographic region, public/private status, enrollment size, or size of research funding.
Eight-two of 93 respondents (88%) conducted a comprehensive assessment, on average for 8.1 years (standard deviation [SD] 5.1). Twenty-two (26%) had initiated their examination within the last three years. Examinations consisted of an average of 8.6 encounters (SD 2.5, range 4–14), with each encounter lasting 11 to 20 minutes in 71 schools (86%). All 82 respondents with a comprehensive assessment anticipated continuing the exam for at least the next three years. The 11 schools without comprehensive assessments most commonly cited inadequate funds (6, or 55%), insufficient faculty time (5, or 45%), satisfaction with current training and assessment methods using SPs (4, or 36%) or direct observations (1, or 9%) during clerkships, and use of other assessment methods (4, or 36%) as reasons for not having a comprehensive assessment.
Consequences of failing the comprehensive assessment
The comprehensive assessment was required at nearly all (80, or 98%) schools with an assessment. However, required remediation before graduation was the only consequence reported by more than half of schools (61, or 74%), as shown in Table 1. Six of the eight potential consequences were each used at fewer than 15% (between 1 and 10) of schools.
Institutional commitment to the comprehensive assessment
Forty-four of the 82 schools (54%) with comprehensive assessments had exam committees, most commonly including the exam director (37, or 84%), SP trainer or other clinical skills center leadership or staff (39, or 89%), clerkship directors (27, or 61%), or other educational leaders (27, or = 89%). Exam committee members at most schools participated in exam development (case writing, checklist development), standard setting and grading, and developing remediation plans for failing students. Among the 82 schools with comprehensive assessments, clerkship directors “always or almost always” participated in the following comprehensive assessment activities: case writing (37, or 45%), checklist development (36, or 44%), exam scoring (24, or 29%), and remediation (23, or 28%).
Satisfaction with remediation
Participants were somewhat satisfied with their schools’ remediation process (mean 3.45, SD 1.08) and confident that the remediation process was effective (mean 3.37, SD 1.17).
Relationship of satisfaction with remediation and institutional commitment to consequencesConsequences for Students of Failing the Comprehensive Assessment at 82 Medical Schools Requiring the Examination, Based on a 2006 Survey of Curriculum Deans
Table 2 provides the descriptive statistics for the variables we used in the multivariate analysis of relationships to the level of consequences that an institution imposed for poor performance on the comprehensive assessment.
We conducted three regressions to examine relationships to the consequences of failing the comprehensive assessment. The first regression examining the consequence of required remediation included the variables that were significant on univariate analysis: exam duration, exam committee function score, clerkship director involvement score, and satisfaction with remediation. Only satisfaction with remediation significantly predicted required remediation (P = .003), explaining 17.6% of the variance. In the next regression examining the consequence of required retesting, only satisfaction with remediation was significantly associated with required retesting (P < .001), explaining 23.5% of the variance (P < .001). In the third regression, none of the variables significantly related to external reporting of failing performance on the comprehensive assessment.
We repeated the regression for the total consequences score (summing the three types of consequences). Overall, 25.7% of the variance in consequences of performance on the comprehensive clinical examination was explained by measures of institutional commitment (years of conducting an examination, exam committee function, clerkship director involvement) and satisfaction with remediation. However, only years of using the exam (P = .015) and satisfaction with remediation (P < .001) contributed significantly to the variance.
Our results characterize the role of the comprehensive assessment in ensuring medical students’ clinical skills competence. Most medical schools that participated in our study (88%, or 82/93) now conduct a comprehensive assessment using SPs after the core clerkships, up slightly from 84% (76/91) in our 2004 survey,1 and the deans who responded to our survey unanimously endorsed plans to continue their comprehensive assessment programs. In this study, we learned about the infrastructure that institutions have created for the examination, including committees and programs for remediation and retesting, and their investments in remediation, such as faculty and staff time and use of SPs. The findings from this study also shed light on the ways that medical schools respond to performance results by providing additional instruction or aid to students, particularly the consequences they impose for poor performance.
Despite the substantial resource investment in creating a robust SP examination, only a minority of respondents restricted academic progress on the basis of exam results. Of the eight potential consequences developed from previous research,4 only one, “required remediation,” was used by more than half of our respondents. The consequences that might help a student develop essential skills for the SP portion of the USMLE and for subsequent clinical training, such as adding relevant clinical rotations or delaying the licensure examination, were each required in fewer than one in seven schools. Studies documenting correlation of performance on an SP licensing exam with subsequent practice performance should compel medical schools to remediate deficiencies that may impact the quality of care graduates will provide in their future practice.8,9 Similarly, it would be helpful to track students who perform poorly on the comprehensive assessment, including those who do and do not receive remediation, through their USMLE Step 2 CS exams and into residency training.
Our findings identify the reasons why some institutions attach more consequences to inadequate student performance than others, and suggest which factors are not associated with imposing consequences of failing. Although satisfaction with remediation is related to attaching consequences to poor performance, so is duration of experience with the examination. Continuous monitoring of a program is a crucial mechanism to achieve quality control in medical education and to ensure that educational activities achieve their intended outcomes.10 Our finding that medical school leaders plan to continue the examination offers the potential for further improvements associated with increased experience with the examination that would benefit remediation as well. In our study, the finding that comprehensive assessment programs with a longer history had developed more consequences of exam performance suggests progressively greater institutional investment in the process over time. Development of any complex new program commonly matures in stages, from basic core elements to sophisticated components.11,12 With more institutional experience using a comprehensive assessment, more faculty members understand the program and the need for intervention for poor performance.
Of the potential reasons for the paucity of consequences of a high-stakes examination in our model, satisfaction with remediation provided a potential explanation. Respondents who reported greater satisfaction with and confidence in their exam attached more consequences to the score, although overall satisfaction with and confidence in current remediation processes were slightly above the neutral level. This lukewarm endorsement of their own remediation efforts may explain the minimal consequences attached to failing the exam. Whereas satisfaction and confidence likely increase with greater experience with the examination format, satisfaction with remediation and experience with the exam each independently related to consequences in our study, and more satisfied schools were not necessarily those with longer experience. However, with increasing resources available to aid schools in developing high-quality SP programs,13 our findings may reassure schools with relatively new comprehensive assessments that extensive experience is helpful but not essential to generate confidence in a remediation process sufficient to attach consequences to student performance. The implementation of these consequences may itself further enhance trust in the exam and remediation processes through confirmation of at least some students’ skills improvement.
It is somewhat surprising that measures of clerkship director involvement did not relate significantly to consequences of the exam. Clerkship directors are key educators during the time that students acquire the substantial clinical experiences that prepare them for the comprehensive assessment. In addition, clerkship directors typically compile assessment information that determines a student’s final clerkship evaluation. Our findings suggest dissociation between clerkship directors’ involvement in the exam and consequences. It is possible that clerkship directors’ roles are confined to clerkship-level evaluation. Alternatively, at some institutions, clerkship directors’ participation in the comprehensive assessment may occur more centrally in a related role through the dean’s office, and indeed we found that at many schools, a clerkship director participates on a comprehensive assessment committee. Nonetheless, our findings point to the potential for greater clerkship director involvement to augment the impact of the exam. This would relate to the alignment of values between key educational leaders, which has been shown to enhance institutional performance.14
Schools required remediation more frequently than they retested students who did not pass the comprehensive assessment. We suspect that retesting was not related to satisfaction with or confidence in remediation because schools are relying on the USMLE Step 2 CS exam as a retest. The infrequent use of retests may also reflect institutions’ reluctance to commit additional resources and assume responsibility for proving the efficacy of remediation or restricting a student’s progress to graduate medical education.
Overall, our findings suggest that most schools do not hold themselves accountable for bringing students up to a performance standard after the comprehensive assessment because, although the majority require remediation, most do not consistently require retesting or report the results externally. At the national level, leaders have advocated through mechanisms such as the No Child Left Behind Act that schools should be held more accountable for the performance of all of their students as they progress through their education and into the workforce.15 It is possible that medical school leaders trust the students or residency program directors to address clinical skills deficiencies during subsequent training. However, this assumption is challenged by evidence that clinical skills continue to deteriorate during graduate medical training.16,17 Alternatively, institutions may choose to allocate resources to programs that will benefit larger numbers of students, or they may be uncertain as to effective remediation strategies. Some medical school leaders may even view the comprehensive assessment as a formative exercise, designed to prepare students to pass the USMLE Step 2 CS exam, and they may expect students to respond independently to the feedback provided by the experience. Nonetheless, medical schools invest significant time and resources into comprehensive assessments, and they have an obligation to students, faculty, and society to conduct program evaluation to determine whether their programs are “successful,” remediate students with identified deficiencies, and track outcomes during and beyond medical school.
This study has several limitations. Our definitions of institutional commitment and clerkship director involvement may not have captured all aspects of exam infrastructure and faculty activities and may not have weighted those contributions appropriately. We did not verify clerkship director involvement with the clerkship directors themselves. We grouped similar consequences, but we do not know which consequences may be perceived as more effective or important by individual institutions or students. It might be more discriminative to weight the consequences based on their severity or impact on the learner. It is possible that nonresponders have a different experience, although our overall response rate was high.
Comprehensive clinical skills assessment programs are increasingly prevalent in undergraduate medical education, and institutions are gaining experience with this form of competency assessment. As programs mature, institutions gain confidence in the score data and attach consequences to failing performances. However, our results can empower all schools with comprehensive assessments, regardless of the duration of their experience, to realize that attaching consequences to poor performance enhances the impact of the program for educators and, presumably, for future practitioners. Satisfaction and confidence also relate to willingness to act decisively to improve students’ performance. These remediation efforts have the potential to enhance the quality of all graduates.
The authors thank the Josiah Macy, Jr. Foundation for funding; Josephine Tan for assistance with literature searching; Varun Saxena for help with the figure; and the participating schools.