FRYE, ANN W.; CARLO, MANDIRA DAS; LITWINS, STEPHANIE D.; KARNATH, BERNARD; STROUP-BENHAM, CHRISTINE; LIEBERMAN, STEVEN A.
As medical schools revise preclinical curricula to emphasize active learning, clinical relevance of the basic sciences, and early clinical experiences, critical evaluation of the results of the changes is important. Such changes in preclinical curricula are expected to help students develop better skills in communication, interpersonal relationships, critical thinking, and other areas essential to the practice of medicine, resulting in better preparation to begin clinical clerkships. How does changing foundational aspects of preclinical curricula affect students' preparedness for clinical work? How can that be assessed?
Performance on the USMLE Step 1 is certainly the most visible outcome of preclinical education. Although the Step 1 is commonly taken just before clinical clerkships are undertaken, its scores are not likely to reflect effects of all curricular changes. Changes such as adopting small-group, problem-based learning (PBL) or early clinical experiences might be expected to impact noncognitive aspects of students' performances beyond the cognitive outcomes measured by Step 1 scores. Scores on knowledge-based examinations are not likely to be useful measures of students' preparedness for noncognitive elements of clinical clerkships, such as cross-disciplinary teamwork or patient communication, in which procedural knowledge must be applied in clinical tasks.
Might students' preclinical course performances predict their readiness for clinical clerkships? Studies of preclinical course performances as predictors of clerkship performance, such as those by Baciewicz et al.1 and Roop and Pangaro,2 tend to demonstrate a relationship between those measures and students' clinical course examination scores or grades. We felt, however, that preclinical course grades had not been shown to be a sensitive measure of readiness for the noncognitive demands of clinical training.
While students are frequently asked to evaluate course objectives, instructional delivery, and other curriculum features, they are not often asked how well their curriculum has prepared them to undertake the next training level. Fincher, Lewis, and Kuske3 used interns' self-assessments to examine their preparedness in competencies required to begin the intern year, including history and physical examination, patient diagnosis and management, and interpersonal skills. We adopted a similar approach to study important noncognitive outcomes of preclinical curriculum change.
Over the past seven years, the University of Texas Medical Branch (UTMB) implemented stepwise preclinical curricular reform. In 1995, a problem-based learning (PBL) track featuring self-directed learning in small groups and early clinical experiences opened to 24 students chosen by lottery from approximately twice that number of volunteer students per class, running parallel to the traditional didactic curriculum (TC). The PBL track's student assessment procedures relied heavily on essay tests, standardized-patient (SP) examinations, and evaluation of small-group work; the TC assessments relied predominantly on multiple-choice questions (MCQs), with less use of SP examinations. In 1998, the TC was replaced with the Integrated Medical Curriculum (IMC), a hybrid curriculum combining the problem-based, small-group, self-directed aspects of the PBL track with some didactic teaching.4 The hybrid IMC retained the TC's heavy reliance on MCQs for cognitive assessment with some SP-based examinations but added the PBL track's small-group assessment. The PBL track, meanwhile, remained essentially unchanged. All three tracks featured early clinical experiences, but the PBL track's emphasis was heavier than that of the TC or IMC. The curriculum labels used in this study (“PBL,” “traditional,” “hybrid IMC”) may unintentionally call attention to each curriculum's instructional characteristics more than the curriculum features more relevant to this study. Our use of these labels references all features of each curriculum, including amount of early clinical experience and array of assessment methods.
UTMB's curriculum evolution process provided an uncommon opportunity to examine the effects of three distinct preclinical curricula within a single institution on students' perceptions of their preparedness for clinical training. To that end, we developed a clinical-preparedness survey and administered it to students as they finished their preclinical curriculum. We hypothesized that if differences were found between students' self-assessments of preparedness for clinical training those differences would correspond to the differing emphases in the three preclinical curricula.
A group of UTMB faculty developed a 22-item clinical-preparedness survey to measure how prepared students felt to undertake tasks expected of beginning third-year students: communicating with patients and faculty, working collaboratively with other health care students and professionals, gathering patient data, integrating basic science knowledge with clinical work, assessing and managing patient problems. The 1-5 response scale (converted to 0–4 for analysis) was defined with endpoints 1 = “not at all [prepared]” and 5 = “extremely [prepared].” Seventeen additional items not analyzed in this study asked students about their readiness to use different information sources to learn about patient problems.
We analyzed responses from UTMB medical students expected to graduate in 2001 (class of 2001) and in 2003 (class of 2003). The class of 2001 was the last class that could choose between the PBL curriculum and the TC. The class of 2003 was the second class with the PBL and hybrid IMC choice. We elected to omit survey data from the class of 2002, the “pioneer” IMC class. The first IMC class was subjected to all the uncertainties associated with a new curriculum, which could have been reflected in their self-assessments.
Completed clinical preparedness surveys were received from 193 of the 198 class of 2001 students (28 PBL, 165 TC) on the first day of the first third-year clinical clerkship in 1999. Similarly, 177 of 193 students in the class of 2003 (22 PBL, 155 hybrid IMC) turned in completed surveys during the last month of their second-year studies in 2001. Mean MCAT scores were computed as a tentative overall measure of curriculum-group characteristics prior to students' experiencing their respective curricula. The groups' cumulative second-year GPAs were not used to describe the groups, as they were not comparable measures. Each curriculum had its own assessment system and grading scale, and one curriculum (the IMC) did not assign numerical values to its grades.
We conducted exploratory factor analysis of class of 2003 survey responses, applying standards of simple structure, interpretable factors, and consistency with the survey's intended structure to select the best factor structure. The factors were assigned labels consistent with their items' content.
We constructed subscale scores by averaging item responses across items associated with each subscale and computed an internal consistency reliability estimate (Cronbach's alpha) for each subscale. Group means and standard deviations for subscale scores were computed for both classes and the subscale scores' intercor-relational structure investigated.
Finally, differences between mean subscale scores by curriculum were investigated separately for both classes with multivariate and univariate analyses of variances. The MANOVA procedures capitalized on the intercorrelated nature of the subscale scores. We computed effect size estimates (eta2 or R2) for each difference.
Subscale Construction and Reliability
Factor analysis of responses from both classes, using principal-axis factoring and oblique rotation, revealed that survey questions loaded onto the four expected factors, with most items loading on only one factor each. Two survey items (“incorporate knowledge of basic sciences” and “find learning resources”) did not load cleanly on any factor and were subsequently omitted from subscale analyses. For subscale construction, the remaining 20 items were assigned to the factors on which they loaded most heavily and that their content matched best (Table 1). Four to six items each composed the teamwork (TW), history and physical (H&P), clinical reasoning (CR), and doctor—patient relationship (D/P) subscales, with internal consistency reliability (Cronbach's alpha) coefficients ranging from .76 to .93. The four subscales were substantially intercorrelated, with correlation coefficients from .22 to .41.
Curriculum-related Differences in Subscale Scores
The mean MCAT score for class of 2001 PBL students was 28.1 (SD = 3.5); it was 26.2 for TC students (SD = 4.6). The mean MCAT score for class of 2003 PBL students was 27.3 (SD = 5.4); it was 26.3 for IMC students (SD = 4.9).
As shown in Table 2, the class of 2001 PBL students' mean scores exceeded the TC students' mean scores for all four subscales, with differences from .36 to .58 scale points. That is, PBL students on average indicated that they felt more prepared in each subscale area than did TC students. Multivariate analysis of variance (MANOVA) of the class of 2001 subscale scores yielded an overall F statistic of 4.54 with 4, 188 df (p = .0016), with an effect size (eta2) of .088. Subsequent univariate analysis of mean differences on the four subscale scores also yielded F statistics of 7.34 to 14.48 (1, 191 df), p < .01. The greatest difference between curricula for these students, .58, was observed on the CR subscale (R2 = 0.070) and the smallest, .36, on the D/P subscale (R2 = .037). Since students were not randomly assigned to curricula, interpretation of F statistics using statistical inference techniques is not particularly helpful. Examination of the absolute sizes of the mean score differences, however, reveals that differences between groups ranged from just over ⅓ to more than ½ of a scale point. These differences on a 0-4 response scale are large enough to indicate meaningful differences between curriculum groups.
In the class of 2003, PBL students' subscale scores were higher than hybrid IMC students' scores on three of the four subscales, with observed mean differences ranging from .18 to .73 scale points and associated effect sizes (R2) from .008 to .089. Mean scores on the D/P subscale were the same for both groups. MANOVA yielded an overall F statistic of 5.53 with 4, 172 df (p = .0003), effect size (eta2) = .114, supporting investigation of the sizes of differences among groups on one or more subscales. Subsequent univariate analyses of differences on the three subscale scores with non-zero differences produced F statistics of 1.47 (H&P), 5.18 (TW), and 17.11 (CR), each with 1, 175 df. Curriculum-group differences for the TW and CR subscales were .39 and .73 scale points and r2 values of .029 and .089, respectively, each large enough to indicate a meaningful difference in scores.
Factor analysis of the clinical-preparedness survey responses supported construction of four intercorrelated, reliable subscale scores for each student. Subscales' emphases, TW, H&P, CR, and D/P, were consistent with the survey's intention of addressing noncognitive areas associated with readiness for clinical clerkships.
The small differences observed in the curricular groups' mean MCAT scores in both classes are of limited significance. There is no agreement among researchers in this area on expected associations between MCAT scores (or other primarily knowledge-based measures) and the areas addressed in our survey.
The differences observed in students' perceptions of readiness for noncognitive aspects of clinical training indicated that changes in foundational aspects of preclinical curricula (dominant instructional modality, amount of early clinical experience, and assessment methods) were associated with differences in self-assessment of clinical preparedness. Recall that the TC featured didactic instruction, a moderate level of clinical experience, and heavy MCQ/light SP exams for assessment. The PBL track featured problem-based learning, heavy clinical experience, and essay exams/heavy use of SP exams/small-group evaluations for assessment. The hybrid IMC featured problem-based learning plus didactic instruction, a moderate level of clinical experience, and heavy MCQ/light SP exams/small-group evaluations for assessment. Correspondingly, we observed that PBL students' mean teamwork scores were greater than those of TC students, with a “medium” effect size r2 estimate of .051, consistent with greater opportunity for student—faculty and student—student interactions in the PBL small groups. We observed a smaller difference between PBL and hybrid IMC teamwork scores and a “low” r2 of .029, consistent with the incorporation of some problem-based learning in the IMC.
The pattern of differences in the H&P subscale scores, with PBL track scores greater than TC scores (“medium” effect size estimate of .059) but about the same as IMC scores, was not completely consistent with predictions based on students' early clinical experience. It may be that consistent exposure to PBL case information also affects students' sense of preparedness in this area, thus increasing the IMC mean score. That hypothesis clearly warrants investigation. As the IMC's clinical experiences evolve, we plan to track changes in this subscale carefully, expecting IMC students to report increased perceptions of preparedness for clinical clerkship H&P.
The PBL students' clinical reasoning mean scores were higher than those of either the TC or the IMC group, both differences showing medium effect sizes. This may indicate that the use of problem-based learning, common to the PBL and IMC curricular groups, may not be enough alone to impact students' sense of preparedness in clinical reasoning. Further exploration of the effect of the PBL track's use of essay tests with their greater emphasis on higher-order thinking on students' perceptions of readiness for CR-related tasks is warranted.
On the other hand, the pattern of differences on the D/P subscale in which PBL scores were somewhat higher than TC scores, with an effect size characterized as “low,” but equal to IMC scores, suggests that the use of problem-based learning alone may increase students' confidence in patient education, delivering bad news, and the like. The exact mechanism of this effect also remains to be explored.
The effect of students' self-selection into the curriculum track on their perceptions of readiness for clerkships cannot be assessed with the design employed in this study. Investigation of the effect of self-selection by introduction of appropriate covariates, once identified, would add to the interpretation of these findings.
We believe that these findings are best interpreted with both the instructional and the assessment modalities of the three curricula in mind. It is reasonable to speculate that either or both together might explain this study's findings.
We acknowledge the inherent limitations of self-assessment as the sole mechanism for investigating effects of curricular change. Students are unlikely to employ the same definitions of constructs or the same standards as faculty would. Studies are presently under way to compare students' self-assessments with clinical preceptors' evaluations of student readiness. Nevertheless, this study demonstrated that students' self-assessments of readiness for clinical clerkships exhibited differences consistent with characteristics of their curriculum tracks. The differences we observed would not have been detected had we examined USMLE Step 1 scores or preclinical grade-point averages alone. Effective assessment of effects of curricular change must be directed toward all important outcomes, both cognitive and noncognitive. Student self-assessments provide useful information to that end.