Secondary Logo

Journal Logo


Effectiveness of Problem-based Learning Curricula: Research and Theory

Colliver, Jerry A. PhD

Author Information


Problem-based learning (PBL) has had a major impact on thinking and practice in medical education for the past 30 to 40 years. The PBL approach is based on active learning in small groups, with clinical problems used as the stimulus for learning. It is said that the PBL process incorporates fundamental educational principles such as those derived from adult learning theory. The implication is that this gives the PBL approach a greater effectiveness for the acquisition of basic knowledge and clinical skills.

Medical educators for the most part have been receptive to the PBL approach. It certainly seems like a more challenging, motivating, and enjoyable way to learn, and students appear to agree. However, the educational superiority of PBL relative to the standard approach has been less clear. This is a key concern given the somewhat extensive resources required for the operation of a PBL curriculum. Recently the debate over the superiority of PBL has intensified as the rationale, the research, and the claims surrounding PBL have become central issues in discussions of the Curriculum 2000 initiatives being considered at medical schools across the nation.

To address these issues, this article provides a critical overview of PBL, its effectiveness, and the underlying educational theory. The article gives a general picture of the status of PBL research and its findings, beginning with a quick look at three reviews published in 1993,1,2,3 followed by brief summaries of research published since those reviews, and concluding with a critical look at the educational theory that is said to underlie PBL. I then discuss this overview in the broader context of my recent review of 29 research papers on professions education. Finally, I briefly mention self-directed learning as an area in need of more research. Throughout this paper, I focus on (1) the credibility of claims about the ties between the PBL intervention and educational outcomes, in particular achievement (including basic knowledge, problem solving/diagnostic reasoning, and clinical skills), and (2) the size of the effects of the intervention on those outcomes.


After reading the three 1993 reviews, I searched the medical education literature published from 1992 through 1998 to identify those studies that compared students in a PBL curriculum track with students in a standard curriculum track. The starting point of the search predated the three reviews by one year to allow for publication lag. The search was limited to those articles that involved a comparison of curriculum tracks or schools because advocates of PBL argue—and I agree—that for PBL to be effective the entire curriculum must be problem centered and PBL should form the core of the curriculum.

Five of the journals searched were in medical education: Academic Medicine, Advances in Health Sciences Education, Evaluation and the Health Professions, Medical Education, and Teaching and Learning in Medicine. Four journals were in medicine: Annals of Internal Medicine, Archives of Internal Medicine, Canadian Medical Association Journal, and Journal of General Internal Medicine. The search was manual, meaning that articles were located from annual or periodic indices of the journals and the tables of contents of individual issues. Based on past experience in reading this literature, I thought that a careful search of these nine journals would locate most, if not ail, of the major relevant articles; none of the readers of various drafts of this paper—all familiar with PBL—reported other relevant curricular comparisons. Electronic searches yielded many citations, so this approach seemed reasonable and efficient for locating what I knew in advance to be a limited number of curriculum-wide studies, and articles based on such important studies would most likely be published in these nine journals.

A summary of each study was written that included a description of the study design, outcome measures, effect sizes, and any other information relevant to the research conclusion. If effect sizes were not reported in the paper, I computed them from the results provided. I did not use a research synthesis approach, whereby effect sizes are averaged across studies, because most studies were not randomized. Meta-analyses work best with randomized trials; with nonrandomized studies, the research should be considered on an individual study basis to properly interpret the research results. Also, the goal was not to increase statistical power by aggregating studies, which could easily produce statistical significance for a trivial effect. Rather, the focus of the overview was on the effect sizes themselves, to see whether their magnitudes were consistent with what might be hoped for a major curriculum intervention such as PBL.

Effect size measures. Effect size in these studies is typically expressed as a d value, a standardized mean difference, which is computed by dividing the difference between the means of the PBL- and standard-curriculum groups by their pooled standard deviation. The convention for thinking about effect-size measures is that d = .20 indicates a small effect (see Figure 1). For example, the PBL group and the standard group would be separated very little and overlap a lot. An effect size of d = .50 indicates a moderate effect, with a little more separation but still considerable overlap. And a large effect is said to be indicated by d = .80, which has even more separation and but still quite a bit of overlap.

Figure 1
Figure 1:
Graphs illustrate the separation and overlap of two groups for small, medium, and large effects.

So how big should the effect of PBL on achievement be? Benjamin Bloom in 1984 proposed an interesting rationale for thinking about the size of the effect to be expected with an educational/instructional method, such as PBL.4 Bloom argued that the optimal teaching method would seem to be one-to-one tutoring, that it should result in the maximum possible effect, and that all other teaching approaches—such as PBL—are simply attempts to approximate the ideal, one-to-one tutoring. Bloom's graduate students conducted several studies at the University of Chicago lab school that compared one-to-one tutoring and standard classroom teaching, using different subject matters and at different grade levels. They found effect sizes around d = 2.0; that is, the tutoring mean was two standard deviations above the standard mean (see Figure 2). Bloom concluded that other educational/instructional methods—such as PBL—should be judged relative to an optimum effect size of d = 2.0.

Figure 2
Figure 2:
Graphs illustrate the separation and overlap of two groups for the optimal effect (of one-on-one tutorial) and a desired effect (for a major curricular intervention).

So back to PBL. It seems reasonable that a major educational intervention such as the introduction of a PBL curriculum would be expected to show a large effect that clearly separated the PBL from the standard groups with little or no overlap, a large effect of d = .80 or even d = 1.00. With d = 1.00, PBL would be half as effective as what Bloom sees as the optimal instructional method, which would seem to be a reasonable level of effectiveness to be expected for PBL. Be that as it may, my intention is to provide a rough framework for starting to think about the size of the effect we might hope for with PBL. Thus, I focused on the effect sizes themselves, to see whether their magnitudes were consistent with what might be hoped for a major curriculum intervention such as PBL. Even if the PBL effect is statistically significant, the critical issue is whether the effect size is consistent with the strength of the claims and the costs of such a major curriculum overhaul.


The three reviews. First, let me begin with a brief overview of the three reviews, which differed greatly with respect to approach. Albanese and Mitchell1 categorized and listed the quantitative results of the studies, Vernon and Blake2 synthesized the results with a meta-analysis, and Berkson3 used a traditional, narrative approach. Nevertheless, a common picture emerges, showing little or no effect on student achievement. Albanese and Mitchell1 recommended “that caution be exercised in making comprehensive, curriculum-wide conversions to PBL.” Berkson3 concluded that “the graduate of PBL is not distinguishable from his or her traditional counterpart. The experience of PBL can be stressful for student and faculty. And implementation of PBL may be unrealistically costly.” Surprisingly, Vernon and Blake2 concluded that “the results generally support the superiority of the PBL approach over more traditional methods.” I want to clarify Vernon and Blake's conclusion because it suggests a discrepancy in the results of the three reviews that doesn't seem to be the case. Rather, Vernon and Blake's conclusion seems to be driven primarily by the findings for student satisfaction. With respect to achievement, Vernon and Blake in their meta-analysis reported a weighted mean effect size of d = −.18 for NBME I performance, showing that PBL students were about two tenths of a standard deviation below standard students with respect to performance on NBME I, and a weighted mean effect size of d = −.09 for other measures of factual knowledge. With respect to clinical knowledge, the weighted mean effect size for NBME II performance was positive but quite small, d = +.08, indicating that PBL students were about one tenth of a standard deviation above standard students.2 The PBL students were reported to be significantly superior with respect to clinical performance, but the weighted mean effect size was small, d = +.28, about one fourth of a standard deviation.2 Roughly, this means that about 10% more of the PBL students were above the standard students' mean compared with the standard students (i.e., about 60% above mean versus 50%). Even the effect for student satisfaction was only moderate—a weighted mean effect size of only d = +.55—which is much smaller than I expected given the enthusiastic claims about students' preference for PBL.

A major problem with most of these (and subsequent) studies is that they were not randomized: the PBL students were self selected, and evidence shows that students who select PBL are generally better students as indicated by MCAT scores, undergraduate GPAs, and other indicators.5 The problem is that this superiority of PBL students at entry to medical school would seem to be sufficient to account for the small differences in reported outcomes. So it seems fair to say that the three reviews show no convincing evidence for the effectiveness of PBL in fostering the acquisition of basic knowledge and clinical skills; the effects are small at best and easily accounted for by pre-existing differences.

Research since the reviews—randomized. I located a total of eight new studies—three randomized and five non-randomized—that involved comparisons of curriculum tracks. So let me start with the three randomized comparisons. The University of New Mexico School of Medicine6 offered a PBL curriculum track to 167 students and a conventional curriculum track to 508 students in the graduating classes of 1983 to 1992. Within each track, there was a subgroup of students who were randomly assigned to their respective track. The students in these subgroups had requested the PBL track, but had been randomly assigned: 85 students to the PBL track and 34 to the conventional track. The PBL students scored significantly and considerably lower than did the conventional students on NBME I (means: 455 versus 521; d = −.85; p < .01). They scored lower but not significantly so on NBME II (means: 472 versus 485 with n = 67 and 27, respectively; d = −.16) and on NBME III (means: 521 versus 551 with n = 38 and 19, respectively; d = −.33). However, the conclusion in that study's abstract was that “in the long run, the more student-centered problem-based curriculum better prepared the students for the NBME III” (means: 521 versus 491; d = +.33; p < .001), but that referred to the non-randomized comparison of the two entire tracks, the results of which would seem to be accounted for by the various selection methods employed, primarily self selection. The results reinforce my point about self selection, by showing a negative effect, −.33, for a randomized comparison and a positive effect of the same size, +.33, for a non-randomized comparison.

The New Pathway curriculum at Harvard Medical School7 was requested by 125 students in the classes of 1989 and 1990, of which 62 were randomly assigned to the PBL curriculum and 63 to the traditional curriculum. With respect to NBME I, there was clearly no difference between the two tracks' total scores (d = −.01; p = .96). For the seven subtests, the effects ranged from d = −.26 to d = +.46, but only the latter, moderate-sized effect for the behavioral science subtest was statistically significant (p = .01), which is accounted for by the inclusion of behavioral science in the New Pathway curriculum. Comparisons of the New Pathway and traditional tracks on a “group of diagnostic reasoning and clinical problem-solving tasks administered prior to each student's medicine clerkship,” including a measure developed at Southern Illinois University and one developed by the NBME, showed “no difference between the groups' performances regardless of the methodology used to measure clinical reasoning.” Interpersonal skills were assessed with standardized-patient interviews, which were rated by observers who viewed videotapes of the encounters and by the standardized patients who interacted with the students. Although the comparisons tended to favor the New Pathway students, the results of these analyses were no longer based on the original randomized groups, due to a sizable and differential lack of participation in the standardized-patient exercise—a loss of about a third of the PBL group and two thirds of the traditional group. Consequently, the analyses of interpersonal skills involved confounded comparisons of better PBL students with the lowest-performing standard students. Nevertheless, the often-cited conclusion in the abstract was that the New Pathway group had “better relational skills.” But this is very questionable given the serious confounding.

A quasi-randomized study conducted at three Dutch medical schools8 provides a framework for evaluating the magnitude of the effect of curriculum type on diagnostic accuracy, by reference to the growth of ability to perform that task during medical school. Diagnostic-accuracy scores were compared for students at three medical schools with different curricula (PBL, conventional, and an integration of basic and clinical sciences) across five years of training (preclinical years 2, 3, and 4 and clinical years 5 and 6). The study conducted in 1994 was cross-sectional, with about 40 student volunteers in each of 15 treatment-combination subgroups (five years by three schools/curricula) for a total of 612 students. Students were admitted to the three schools on the basis of a national lottery system, so the study was somewhat randomized despite the voluntary nature of participation within subgroups. The students were tested on 30 written case vignettes representative of diseases seen by Dutch family physicians. Not surprisingly, given the large sample, the results showed significant effects of years of training (p = .0001), school/curriculum type (p = .0001), and their interaction (p = .001). So I computed effect sizes (omega-squared values) for these data to see just how big these significant effects were. The omega-squared values showed that years of training accounted for 74% of the variance in the students' diagnostic-ability scores, whereas school/curriculum type accounted for only 1% of the variance, and their interaction, for only another 1%. In other words, school/curriculum type accounted for just 1% or 2% of the variance. I then translated these curriculum effects into the “years of training” metric and found that the effect of curriculum was roughly equivalent to an effect of only an additional three or four weeks of medical-school training.9

It should be mentioned that the effect size for the comparison of the two schools/curricula that differed the most at year 6 was moderate (d = +.50), which suggests that in general, effect-size measures comparing two groups at a given point in time may give an exaggerated picture of the importance of the effect of the intervention, relative to the broader context of growth throughout medical school.

Research since the reviews—non-randomized. Now let me turn to the non-randomized studies. Four of the five non-randomized studies, like the randomized studies, showed little or no effect. However, one study showed large effects. The Bowman Gray School of Medicine of Wake Forest University10 offered two curriculum tracks to students in the classes of 1991 to 1995. Students with an interest in PBL were selected into the PBL curriculum on a first-come, first-served basis. For these classes, 91 students opted for the PBL track and 401 were in the traditional lecture-based learning track. In this study of performance in the medicine clerkship, 88 and 364 students, respectively, had complete data for the analyses reported. There was clearly no difference between the two tracks with respect to the NBME medicine subject (“shelf”) test taken at the end of the medicine clerkship (d = +.07; p = .80). The PBL students received significantly higher ratings from housestaff and faculty on four clinical rating scales: amount of factual knowledge (means: 3.11 versus 2.90; d = +.50; p = .0001); take history and perform physical (3.22 versus 3.05; d = +.42; p = .002); derive differential diagnosis (3.04 versus 2.86; d = +.46; p = .005); and organize and express information (3.49 versus 3.33; d = +.39; p = .004). But the differences between the two tracks were quite small—about two tenths of a point on a five-point scale—with low moderate effect sizes; and the differences are easily accounted for by self selection.5 It should also be mentioned that the PBL track “emphasized frequent contact with real or simulated patients for the dual purpose of practicing interpersonal, physical diagnosis and clinical skills,” whereas the traditional track “limited patient contact to supervised encounters with a small number of hospitalized patients as part of bedside tutoring groups in the first and second years.” So one might expect even larger differences in the ratings of clinical skills. Be that as it may, are observed effects with such comparisons due to “what” the students were doing in the sense of their activities and experiences so that traditional students would quickly “catch up” as they got more clinical experience, or were they due to “how” the students were learning in the sense of educational principles and underlying mechanisms that would make their learning better and deeper and stick with them longer? Similar questions are raised about the following study.

At Rush Medical College,11 19 paid volunteers in the PBL track and 16 in the traditional track were given a series of pathophysiologic explanation tasks consisting of clinical problems/cases for which they were to describe underlying causal mechanisms that accounted for the patients' problems. Outcomes employed in studies of medical expertise were administered three times in the first seven months of medical school: at baseline before the start of school, after three months, and after seven months. The PBL students showed more improvement over the seven-month period than did the traditional students. At seven months, the PBL students gave more accurate diagnoses than did the traditional students (d = +.80), generated longer reasoning chains with more nodes and more connections (d = +2.36), accounted for more findings (d = +1.45), and were more likely to use science concepts in their explanations (d = +1.99). This is what the PBL students had been doing during those first few months of medical school, and the outcome measures tapped directly into those activities. And the exception seems to prove the rule. The PBL students also were found to use more hypothesis-driven reasoning than data-driven reasoning; the latter is more characteristic of experts, but hypothesis-driven reasoning is what is taught in PBL. The PBL students, then, were simply doing what they had been doing with similar problems, which for the most part the traditional students had yet to encounter.12 So does this give PBL students a lasting advantage? Are traditional students permanently handicapped, or do they make a rapid adjustment as they start to encounter clinical material in the clerkships? In other words, are these PBL/traditional differences just a matter of timing? For example, students who take epidemiology/biostatistics in the first year of medical school should do better on an epidemiology/biostatistics examination given in the first year than students who later take the course in the second year, although if tested in the third or fourth year, the two groups would probably perform similarly. The question is whether such differences would require explanation in terms of educational principles and underlying theoretical mechanisms. In addition, it should be noted that results from a related article13 suggest that the large effects may have been due to the complex scoring system employed, which involved summing or averaging scores for explanations given after five separate parts of each case problem: presenting information, history, physical examination, laboratory data, and hospital course. For example, although the effect size for diagnostic accuracy reported above11 was d = +.80, analyses including these same data showed that 89% of the PBL students and 89% of the standard students made the correct diagnosis by the end of the case problem.13

Southern Illinois University (SIU) School of Medicine14 has offered a closed-loop, or reiterative, PBL curriculum and a standard curriculum track starting with the class of 1994. Students who expressed interest in PBL had to be selected by the PBL committee for admission to the new track, resulting in a moderate advantage for PBL students upon entry, i.e., the PBL students were about a year older and scored moderately higher on the MCAT (d = +.46; p = .0344). A comparison of the 47 PBL students and the 154 standard curriculum students in the classes of 1994, 1995, and 1996 showed small to moderate advantages for PBL on USMLE Step 1 (d = +.18, p = .2707), USMLE Step 2 (d = +.39; p = .0197), clerkship ratings (d = +.50; p = .0028), and performance on the post-clerkship standardized-patient examination (d = +.30; p = .0703). Clearly, the PBL students were not disadvantaged by their participation in the new curriculum, but any apparent advantage is easily explained by selection differences. The importance of the SIU findings is that they counter the objection that PBL as employed at many schools fails to incorporate all of the features needed to make PBL effective. Presumably the SIU PBL curriculum meets the criteria, yet the effects are small at best.

Dalhousie University Faculty of Medicine15 introduced a PBL curriculum upon entry of the class of 1996, and subsequently compared the performance of the class of 1995 (conventional curriculum; n = 81) with those of the classes of 1996 (PBL; n = 84) and 1997 (PBL; n = 78) on the Medical Council of Canada (MCC) Qualifying Examination Part I at graduation. The 1996 and 1997 PBL classes were not significantly different from the 1995 conventional class in terms of total score (d = +.15; d = +.29; p = .36), multiple-choice-questions score (d = +.12; d = +.19; p = .76), or problem solving/clinical reasoning (d = +.22; d = +.38; p = .09). Although not significant, the means showed small increases from the conventional curriculum to PBL. The increase in overall MCQ performance was due to significant improvement in the psychiatry specialty subtest and the preventive medicine/community health subtest, presumably due to the incorporation of relevant social science issues into the teaching cases and scheduled activities in a community health and epidemiology unit related to the cases.16 Scores on the remaining MCQ subtests (i.e., medicine, obstetrics-gynecology, pediatrics, and surgery) showed no significant change, with equal numbers of declines and increases. The nonsignificant increase in problem solving/clinical reasoning score is consistent with the case-oriented problem-stimulated focus of the PBL curriculum. The passing rates were nearly identical for the three classes (93.8%, 94.0%, 94.9%; p = .993).

Finally, a very impressive, large-scale study based on data for 54,890 students from 118 schools conducted by the National Board of Medical Examiners17 looked at the effect of type of basic-science curriculum on USMLE Step 1 performance. The four curriculum types were (1) discipline-based, (2) organ system-based, (3) discipline in first year/organ system in second, and (4) other, which included PBL, multi-track, etc. (a catchall). The means were quite similar and not significantly different (209.7, 214.5, 210.7, and 208.7, respectively); and were even more similar and still not significantly different when adjusted for MCAT scores (210.1, 210.8, 209.1, and 210.0, respectively). The results are very convincing that the type of curriculum (at least, discipline-based or organ system-based or combination) just doesn't seem to matter on Step 1.

Summary. The randomized studies show no effect of PBL, maybe even a negative effect, on performances on the NBME licensure examinations. Some writers tend to dismiss these findings and argue that multiple-choice measures of knowledge such as the licensure examinations are not appropriate for testing the effectiveness of PBL. However, one of the theoretical claims of PBL is that it imparts better and deeper learning such that knowledge is better organized and structured and more readily accessible to recall. Be that as it may, the randomized studies also showed no effect on diagnostic reasoning and clinical problem solving; and even the highly confounded results for interpersonal skills assessed with standardized patients in the New Pathway study showed only weak to moderate effects even with the confounding. Presumably, these measures are appropriate for testing the effectiveness of PBL. The non-randomized studies reported some effects, but the differences would seem to be attributable to selection differences and to the use of outcomes that directly reflect the activities and experiences of the curriculum tracks. With respect to the latter, one non-randomized study reported large differences between PBL and traditional students in the first seven months of medical school,11 but the outcomes tapped directly into the activities the PBL track focused on during that period, activities that traditional students would encounter later in their training.12


The results are disappointing, providing no convincing evidence for the effectiveness of PBL, at least not the magnitude of effectiveness that would be hoped for with a major curriculum intervention. The results are also surprising, because the rationale for PBL is said to be based on educational theory and that theory is said to be supported by basic research. That is, PBL in contrast to the traditional lecture-based approach is thought to incorporate basic educational principles and to involve theoretical learning mechanisms, which presumably should have a positive and sizable effect on the acquisition of basic knowledge and clinical skills. Norman and colleagues,18,19,20,21 in an outstanding series of papers, have reviewed the evidence for these underlying principles and mechanisms. For the most part, the evidence from this basic research is positive. And yet the applied research on PBL curricula reviewed here shows little evidence for the practical effectiveness of PBL in fostering the acquisition of basic knowledge and clinical skills.

So what's the problem? If PBL is based on educational principles and learning mechanisms that are supported by basic research, why isn't the PBL curriculum more effective with respect to knowledge and clinical skills? The problem, as I see it, is that the theory is weak; its theoretical concepts are imprecise, lacking explicit descriptions of their interrelationships and of their relationships with observables, such as interventions and outcomes. In addition, the basic research is contrived and ad hoc, using manipulations that seem to ensure the expected results, regardless of the theory—which is too indefinite to place any real constraints on the observables anyway. In brief, the ties between theory and research (both basic and applied) are loose at best. For example, consider two key factors that are commonly said to give an advantage to PBL: context and activation. (Similar comments could be made about other theoretical factors, but these two should serve to illustrate the point.)

First, consider the role of context in PBL which is typically “explained” by reference to the classic Godden and Baddeley study.22 In that study, members of a university diving club recalled more words underwater when they had learned the list of words underwater rather than on dry land, and vice versa. By rough analogy, the implication is drawn that PBL students should be advantaged because they learn in the context of clinical cases. But questions can be raised about whether the learning context of a PBL curriculum really differs all that much from that of a standard curriculum and whether the differences between these learning contexts in turn and the context of practice are really all that great. That is, in the usual experimental study of context, like the Godden and Baddeley study, students learn in contexts A or B and then are tested in contexts A or B, and typically A and B are pretty different. In the applied medical education situation, students learn in context A or context B but the interest is in performance in context C; and typically A and B are not as dramatically and unequivocally different as underwater and dry land, and C may differ more from A and B (practice context versus classroom-type paper-and-pencil contexts) than A and B do from each other. So to what extent can it be claimed that context A (PBL classroom) versus context B (standard classroom) gives an advantage in context C (clinical practice)?

The Godden and Baddeley study, then, for the most part tells us what we would seem to already know (that learning and memory are context dependent), but it fails to tell us what we don't know (whether two given contexts are different or not and to what degree and what the implication is for outcomes). The theory is too imprecise to guide educational practice because it fails to make predictions that can be tested with confidence by basic or applied research. The explanation provided by the theory is an analogy, a metaphor, nothing approximating something like the “laws of context” that might guide us in an unfamiliar situation and permit prediction and control. (References to this study typically fail to mention that despite the dramatic difference between the underwater and dry-land contexts, the divers recalled on average only three more words in a list of 36 words under the same versus different learning and recall conditions. So what can be expected with PBL?)

Similar questions can be raised about activation.21 The activation of knowledge networks consisting of nodes and connections is said to facilitate the learning of new information, and PBL is thought to stimulate the activation process with its tutorial-group discussions. The implication is that activation of these theoretical networks is greater in a PBL curriculum. But it could also be argued that although the sources of stimulation may differ in PBL and standard curricula, if students in the two curricula spend approximately the same amounts of time in educational activities, the total amounts of activation should be about the same. So what would activation theory say: does PBL have an advantage or not? The theory is not that clear. To begin with, it really isn't clear what knowledge networks are and it isn't clear what it means to say they are activated, and it certainly isn't clear what activates them and how much and whether different stimuli activate the networks in different amounts, etc. My point is simply that educational theory and its basic research, on close examination, seem to be nothing more than metaphor and demonstration, nothing that taps into the underlying substrate of learning that would allow prediction and control.

The findings of this review are consistent with those of my recent review of 29 papers on professions education (primarily, medical education) that were presented at the 1998 annual meeting of the American Educational Research Association (AERA).23 That review was in response to concerns expressed by the national leaders of the organization that educational research results appear to have had little impact on educational policy and practice; the AERA leaders have been exploring ways to improve the dissemination and application of the results. My concern was that we don't have much to disseminate and apply, which is what I hoped to illustrate with my review of the 29 papers. What I was looking for was evidence of practical interventions based on specialized knowledge about teaching and learning that have sizable effects on relevant outcomes. But the results were disappointing. For example, only five papers showed sizable effects, and the substance of those studies provided little support for strong claims about the importance of educational research in guiding policy and practice. The findings were important, showing that teaching is effective, students do learn, courses can be improved, and so on. But the research was not indicative of specialized knowledge that suggests practical interventions that have sizable effects—something that might be called education science. Similarly, this review of PBL research shows no sizable effect of PBL on any relevant measure of knowledge base or clinical performance, and the discussion of educational theory questions whether the underlying PBL rationale reflects specialized knowledge about teaching and learning.

One important, but neglected, area of research that avoids this abstract cognitive speculation and gets directly at a valuable skill that is central to the PBL approach is self-directed learning. PBL is said to teach the practice of clinical medicine by requiring students to teach themselves, in order to firmly establish life-long habits of self-directed learning. Rapid advances in medical science make it imperative for practitioners to keep up to date. Thus, the development of strong habits of life-long, self-directed learning has become a critical challenge for medical education, and the PBL approach seems to directly addresses this challenge (without the need for claims about hypothetical structures and mechanisms at an abstract underlying cognitive level). But only one study has examined the effect of PBL on self-directed learning in practice. Shin, Haynes, and Johnston24 looked at adherence to current clinical practice guidelines for management of hypertension as a function of time since graduation and found no change over time for graduates of a PBL school but a decreasing trend for graduates of a standard curriculum school. The results suggest that the PBL graduates were keeping more up to date. However, the decreasing slope for the standard school was not statistically significant, and the difference between the two schools' slopes was nonsignificant. Also, it has been noted that the PBL school has distinguished itself in the field of cardiovascular research and that graduates of the PBL school are more likely to be involved in teaching, both of which could account for the more up-to-date performance of the PBL graduates.25 More research on self-directed learning is sorely needed.


Despite the claims that the PBL process is based on fundamental educational principles and underlying hypothetical mechanisms in a way that should improve learning, this review of the research on the effectiveness of PBL curricula provides no convincing evidence that PBL improves knowledge base and clinical performance, at least not of the magnitude that would be expected given the extensive resources required for the operation of a PBL curriculum. Moreover, a close examination of this theory and its basic research raises questions about both and suggests that the ties between educational theory and research (both basic and applied) are loose at best. My recommendation is that we reconsider the value of thinking in terms of this imprecise theory about underlying hypothetical cognitive mechanisms and of pursuing basic research that attempts to test its indefinite predictions. Also, we should rethink the promise of PBL for the acquisition of basic knowledge and clinical skills. PBL may provide a more challenging, motivating, and enjoyable approach to medical education, but its educational effectiveness compared with conventional methods remains to be seen.


1. Albanese MA, Mitchell S. Problem-based learning: a review of literature on its outcomes and implementation issues. Acad Med. 1993;68:52–81.
2. Vernon DTA, Blake RL. Does problem-based learning work? A meta-analysis of evaluative research. Acad Med. 1993;68:550–63.
3. Berkson L. Problem-based learning: have the expectations been met? Acad Med. 1993; 68(suppl 10):S79–S88.
4. Bloom BS. The 2 sigma problem: the search for methods of group instruction as effective as one-to-one tutoring. Educ Res. 1984;4:4–16.
5. Cariaga-Lo LD, Richards BF, Hollingsworth MA, Camp DL. Non-cognitive characteristics of medical students: entry to problem-based and lecture-based curricula. Med Educ. 1996;30:179–86.
6. Mennin SP, Friedman M, Skipper B, Kalishman S, Snyder J. Performances on the NBME I, II, and III by medical students in the problem-based learning and conventional tracks at the University of New Mexico. Acad Med. 1993;68:616–24.
7. Moore GT, Black SD, Style CB, Mitchell R. The influence of the New Pathway Curriculum on Harvard medical students. Acad Med. 1994;69:983–9.
8. Schmidt HG, Machiels-Bongaerts M, Hermans H, ten Cate TJ, Venekamp R, Boshuizen HPA. The development of diagnostic competence: comparison of a problem-based, an integrated, and a conventional medical curriculum. Acad Med. 1996;71:658–64.
9. Colliver JA, Robbs RA. Evaluating the effectiveness of major educational interventions. Acad Med. 1999;74:859–60.
10. Richards BF, Ober P, Cariaga-Lo L, et al. Ratings of students' performances in a third-year internal medicine clerkship: a comparison between problem-based and lecture-based curricula. Acad Med. 1996;71:187–9.
11. Hmelo CE. Cognitive consequences of problem-based learning for the early development of medical expertise. Teach Learn Med. 1998; 10:92–100.
12. Colliver JA. Research strategy for problem-based learning: cognitive science or outcomes research. Teach Learn Med. 1999;11:64–5.
13. Hmelo CE. Problem-based learning: effects on the early acquisition of cognitive skill in medicine. J Learn Sci. 1998;7:173–208.
14. Distlehorst LH, Robbs RS. A comparison of problem-based learning and standard curriculum students: three years of retrospective data. Teach Learn Med. 1998;10:131–7.
15. Kaufman DM, Mann KV. Comparing achievement on the Medical Council of Canada Qualifying Examination Part I of students in conventional and problem-based learning curricula. Acad Med. 1998;73:1211–3.
16. Kaufman DM, Mann KV. Basic sciences in problem-based learning and conventional curricula: students' attitudes. Med Educ. 1997;31:177–80.
17. Ripkey DR, Swanson DB, Case SM. School-to-school differences in Step 1 performance as a function of curriculum type and use of Step 1 in promotion/graduation requirements. Acad Med. 1998;73(suppl 10):S16–S18.
18. Norman GR, Schmidt HG. The psychological basis of problem-based learning: a review of evidence. Acad Med. 1992;67:557–65.
19. Schmidt HG, Norman GR, Boshuizen HPA. A cognitive perspective on medical expertise: theory and implications. Acad Med. 1990;65:611–21.
20. Regehr G, Norman GR. Issues in cognitive psychology: implications for professional education. Acad Med. 1996;71:988–1001.
21. Custers EJFM, Regehr G, Norman GR. Mental representations of medical diagnostic knowledge: a review. Acad Med. 1996; 71(suppl 10):S55–S61.
22. Godden DR, Baddeley AD. Context-dependent memory in two natural environments: on land and underwater. Br J Psychol. 1975;66:325–31.
23. Colliver JA. Pragmatic Consequences of Research in Professions Education. Professions Education Researcher Quarterly. January 2000.
24. Shin JH, Haynes RB, Johnston ME. Effect of problem-based, self-directed undergraduate education on life-long learning. Can Med Assoc J. 1993;148:969–76.
25. Woodward CA. Problem-based learning in medical education: developing a research agenda. Adv Health Sci Educ. 1996;1:83–94.
© 2000 Association of American Medical Colleges