Secondary Logo

Journal Logo

Special Theme Article

The Need for Evidence in Medical Education: The Development of Best Evidence Medical Education as an Opportunity to Inform, Guide, and Sustain Medical Education Research

Dauphinee, W Dale MD; Wood-Dauphinee, Sharon PhD, PT

Author Information
  • Free


To be absolutely certain about something, one must know everything or nothing about it.

—Olin Miller, 20th century American humorist

The Origins of Best Evidence Medical Education

Integrating evidence into the pursuit of quality in social policy and in health care decision making has been underway for several years. Perhaps the best-known example for physicians and other health care profes-sionals is the Cochrane Collaboration.1 It is built on the idea of analyzing information and data, and disseminating the results of these deliberate and deliberative processes to all who seek good evidence to guide them in clinical decision making. Its database of systematic reviews now houses many hundreds of completed reviews and many protocols for reviews-in-progress.2 Emerging from the Cochrane Collaboration was the application of evidence-based practice in social and behavioral sciences, which, as Wolf et al.3 have pointed out, ironically is the field wherein much of the methodology of systematic reviews was developed! As an example, they note that the concept of “meta-analysis” originated with Glass4 at an educational meeting in 1976. In fact, the naming of the social science collaboration after Donald Campbell will ring true with anyone who has taken a measurement course in the last 30 or more years. Campbell's work and books have served as the guide for much of our learning in this field.5 In consequence, the Campbell Collaboration was born to assist people in making well-informed decisions in social policy and education by promoting the use and distribution of systematic reviews on the effects of these policies and interventions in the real world.6

The application of the “best-evidence” concept to educational policy-making in medicine was the next logical step, and that occurred in the late 1990s.7 As described by Wolf et al.3 in 2001, it began with the idea of an educational review collaboration to organize a structure with analytical processes that would allow topic-focused groups of reviewers to systematically look at the effectiveness of medical school interventions. Under the leadership of Ian Hart and Ronald Harden, a plan was drawn up and matters began to move in late 1999.8 The implementation of the Best Evidence Medical Education (BEME) Collaboration has been underway for almost five years. Results are now appearing at the BEME Steering Committee, and some patterns are emerging.

The goals of this commentary are twofold: to describe the current status of BEME, and to present a viewpoint that will situate BEME in the broader context of medical education—specifically in its need to be accountable, to conduct research to understand educational processes and results, and also to identify a key role that medical educational research must play within a quality-improvement framework for medicine and health care. To address the latter goal, a few comments about context and other social pressures and norms are warranted, as they not only drive the need for evidence but also limit the scope of research or data gathering that can be utilized due to practical and social constraints. It will permit the reader to understand that, while BEME will help sustain and offer direction to educational research in medicine, there are complementary approaches that can address some of the other needs for situation-specific data and, hence, evidence on behalf of medical education. We address this latter goal first.

The Demand for Increasing Accountability and the Need to Measure

Led by the Institute of Medicine's key reports, two major issues have emerged in the area of public policy in the health field: safety and quality, and accountability.9,10 Couple these with the recent focus on outcomes, and one has automatically widened the notion of the types of evidence needed in medical education, from an interest in the structures and processes of education to the results or products. This is not new. In 1966, Donabedian11 established the usefulness of considering the quality of care under three headings: structure, process, and outcomes. More and more, for reasons of cost and accountability, the emphasis has moved beyond assessment of structures and processes to defining and measuring outcomes. The reality is that health care is confronting a relative scarcity of resources in the face of increasing costs12 as well as the consequences of systems’ errors and the need to improve quality. Given the pressure for resources, the efficient use of those resources is key, both in clinical practice and in medical education as well as in the type of physician emerging at the end. The quality agenda has reminded us that documenting, defining, and benchmarking outcomes are key.10 Then, there is the issue of the impact of technology, a new enabler, that makes monitoring and trending studies feasible to guide educational decisions.13 Recent technical advances mean that real-time data are potentially available and their use is a realistic possibility if planned well. At the same time, one is confronted with the issue of the validity of real-time assessment data, the representiveness of the variables selected, and the soundness of methods used in the actual working situation. Given these pressures, it can be argued that evidence in the context of fiscal and social accountability has never been more important to educators. A brief overview of the current evidence-based methods is needed to further establish the context, value, and appropriate role of BEME.

The approach and validity of evidence-based medicine continues to be challenged, even after more than a decade. Miles et al.14 outlined these issues in an editorial introduction and commentary citing several authors’ perspectives regarding the limitations of the evidence-based approach, all contained within a single issue of an evaluation journal. As Davidoff15 pointed out some years ago, evidence is only one component of clinical practice in medicine and the other health professions. In addition to evidence, one must consider the knowledge and skills of the practitioner or team; the values, preferences, and expectations of the patient (and perhaps the family); as well the constraints within the system that may relate to institutional policies, ethics, or safety. Naylor16 raised the issue of so-called grey zones of clinical practice where evidence about risk–benefit ratios is incomplete or contradictory and, therefore, such gaps must be acknowledged as part of any guideline or analysis to differentiate “fact from fervor.” Feinstein and Horowitz17 identified problems with the “content” of evidence and expressed concern over treatments designed to answer questions for “averaged” randomized patients. They also cautioned against answering issues regarding specific subgroups at different risk or with different prognoses, or failing to recognize the implications of postrandomization events that may lead to altered therapies. Tonelli18 has pointed out the limitations of evidence drawn from studies where reporting “means” does not translate to the individual physician, or in our context, to the individual student or resident. Miettinen19 expressed many concerns about the nature of specific versus general evidence as well, but most importantly, he raised the issue of “expertness” within collaborative study groups when they make those ultimate subjective judgments about evidence.

Similar situations clearly exist when designing and considering the effectiveness of educational practices within medical schools and those of other health professions. Thus, although useful to educational decision-makers, the medical education community certainly will consider BEME results as a key “driver” in the planning for the education of groups of trainees and in research on its effectiveness. But, in the context of wider accountability and integration of outcome results as an assessment of programs, other types of measures will be needed above and beyond those of BEME. We will return to this point after an overview of the current status of BEME.

BEME: Progress to Date

After a meeting in late 1999, the BEME Collaboration quickly identified the basic organization that has now been functioning for over three years. The Steering Group oversees the Editorial Review Board through which the various topic-review groups report.20 At present, there are five reviews at the operational level. Several others are at earlier stages. At least one group should be reporting publicly this year. There is a ten-step process by which a group can conduct a systematic review. The steps are now well established and include criteria by which BEME assess a topic-review group's plan and review process, requirements for progress reports, procedures to complete a final report and, ultimately, procedures to update that report.20 Of the several projects that are well underway, two have presented preliminary reports as well as several progress reports at the earlier stages. These include completion of initial literature searches, preparing sheets for coding, and establishing procedures for extracting evidence and data. Obviously, matters are still in the early stages and key developmental issues such as long-term financing remain to be solved beyond the current sponsorship of the Association for Medical Education in Europe. That aside, what have the initial BEME reviews suggested to the Steering Committee?

Although still early in the game, it has already become clear that the levels of evidence in the existing literature are often not sufficiently high to meet BEME criteria, or the evidence cannot lead to clear or valid differences between approaches. These facts do not belittle educational research, but rather they point out that if these trends continue we will face two realities. Medical educational research, while often plentiful, leaves a lot to be desired in terms methodological rigor to meet levels of evidence. In large part this may be a reflection of lack of independent peer-review processes as commonly found in graduate education or for funding agencies in contrast to the more traditional areas like biomedical research or clinical issues and therapies. Secondly, a lot of research may have been be carried out at one school, and as has been said so many times, if you have seen one school, you have seen one school! Nonetheless, the issues of instrument validation and sampling as well as the ability to randomize students within a single school to two or more curricula without crossover or contamination present interesting challenges. This points out the need to focus on basic principles of design and sampling if one is to overcome the paucity of data that could meet preexisting criteria as evidence for the other aspects of this model of collaboration.

These comments about BEME follow on previous insights. At its inception in 2000, Bligh and Anderson21 strongly supported the BEME initiative, but they pointed out that few such systematic reviews had been carried out in medical education, and even more importantly, that the clinical gold standard of the double-blind randomized clinical trial (RCT) was rare. They also emphasized the need to consider both qualitative and quantitative methods and designs. And, of course, they reiterated that critics point out the paucity of evidence that may underlie the “ambitions of the reformers.” They cited the necessity of three criteria for a systematic review: to be based on rigor and merit, to lead to appropriate dissemination of results, and critically important, to nurture a culture within academic and clinical communities of placing medical education on an equal footing as other sectors of academe. Of course, the last criterion needs to be earned, not declared. Norman22 pointed out that the Campbell Collaboration, while not specifically promoting the RCT as a standard as with the Cochrane Collaboration, was committed to experimental curricular interventions. He also reminded us that the average effects found in trials can seldom be applied to the average patient, who is seldom average! Most clinical trials show a measurable advantage of one intervention over another, but in medical education, this is seldom the case. Norman suggested that “no significant difference” is the rule. But even more critical, he and Ten Cate came to the same view—that true blindness in randomization in education is not possible,22,23 thus restating Ten Cate's view that an educational intervention in a trial format will tell more about students’ inventiveness to construct their own learning program than about the educational intervention. This, in turn, will often lead to “no significant difference.” So, Norman really presented the notion that maybe we worry too much about the curriculum when in reality it is only one of many key variables. However, this does not mean that other educational issues do not lend themselves to study using experimental and RCT designs. They do and often with counterintuitive results.24,25

Subsequently, Murray26 focused on the need for educational research, particularly that related to evidence-based medical education. She places her money on getting beyond the descriptive study and the frequent use of self-perceptions, such as end points reflecting students’ satisfaction. Also, she identified the challenges unique to educational research. In brief, as Davidoff15 noted, these are complex interventions, randomization is difficult, and the challenge of defining the outcomes is not easy. She also cited the issue of funding. But it better serves our purpose to improve our lot by addressing her concerns rather than appearing to blame it all on lack of funding. If one takes the view that it is all about insufficient funds, this can lead to a reactive stance. We should address the other issues and propose proactive steps to improve our work and, thus, encourage and obtain new sources of funding.

Why BEME and Other Forms of Evidence Should Be a Priority in the Medical Educational Research Community

In Wolf et al.'s original article in 2001,3 the authors wrote about the need for setting a research agenda for the systematic review of evidence of the effects of medical education. In fact, they outlined the results of a session that the Society of Directors of Research in Medical Education held in 2000, where an agenda of critical topics for systematic reviews was created. Interestingly, and notwithstanding the difficulties cited, the most-cited topic was curricular design.3 Learning and instructional design, testing and assessment, and outcomes were the other three most-cited topics when issues were gathered into groups under broader headings. This suggests that the key topics are common issues and also implies that no one has a clear idea of the quality of evidence in the literature, again supporting the need for BEME.

So does the educational community continue to support and participate in BEME? Absolutely yes! But, it is likely we will find many more gaps in the evidence or fewer clear answers than we want. While fully supporting BEME at experimental, applied, and policy levels, elevating and promoting the intensity of educational research in three ways will ensure better evidence in the future (a long-term goal).

  1. First, we need to understand that educational research is difficult, and the relationships and potential variables are complex. It has often been said that any research dealing with people in real-life settings is difficult despite the fact that the questions are obvious. In fact, the creativity is in the design. In contrast, for example, in classical biomedical research, controlling the environment and genetic influences is easier, so it is the designs that are often straightforward while the hypotheses are very creative. In many ways, the reverse is true in medical education. For that reason, we must be creative and clever in the methods and designs used to answer our obvious questions. We do not need more students’ satisfaction surveys or other inadequately validated or improvised surveys of one or two programs or schools. Good educational research design is hard and requires that we understand the limitations of current work.24–26 To quote Berliner,27 “educational research is the most difficult of all.”
  2. Next, we need to understand that, in this era of accountability and management by measurement, we must be prepared to be accountable to our faculties, the students, the public, and the many other stakeholders. This cannot be done without deciding what are the valid outcomes against which our programs and graduates must be judged, and without planning which data are needed to answer those questions. The recent changes in the Accreditation Committee on Graduate Medical Education's (ACGME's) standards and the new competences of the American Board of Medical Specialties address exactly these points.28,29 Our colleagues at the ACGME offer an excellent presentation of the rationale for accountability and results,30 and with today's technology, better monitoring of trainees is possible if the data sources are identified and valid.
  3. Third, although we need to encourage each other to use better experimental designs with appropriate and established quantitative and qualitative methods to answer key questions, we must remember that our mission is not a competition between students or schools of medicine. As noted, it is about competences and how best to optimize opportunities in educational terms. In the meantime, new designs and recent statistical approaches that model slopes rather than data points are needed to assess the complex relationships described here. Many are addressing this need and many more must do the same. For example, new frameworks to evaluate complex interactions have been developed by the United Kingdom's Medical Research Council (UK MRC).31 Further, we may need different strategies at different levels in the chain of proof for the effectiveness of complex social interactions, such as for individual performance, for teams, for provider organizations, or for larger systems like national strategies.32,33 In addition to these long-term goals, it is important not to forget that for existing programs bench-marking is doable and will guide quality improvement at present. In this era of increased accountability and continuous quality improvement, the importance of bench-marking as a mechanism for improving our activities is assuming importance that was previously reserved for new knowledge. That need is equally appropriate for both our educational programs and our research, leading ultimately to improved performances of our programs and graduates in real life.

Medical Education Research: Future Needs and How to Integrate Them with BEME

Having pointed out the current status of an emerging collaboration, and its possible limitations as cited by others, it is incumbent on all of us not to expect the impossible. In the tradition of the Cochrane Collaboration, we would expect the results of BEME could guide many of our educational “practice” decisions. But even if that circumstance turns out to be less frequent than we expected at the outset, there are other benefits that must be considered. Results may focus our attention on areas of medical education research (i.e., experimental designs) or on quality-improvement opportunities for local programs where we should be more proactive.

In fact, two recent calls can serve as models for how we need to revise a lot of our thinking around evidence in medical care and education practices. These frameworks suggest the “big-study” approach in such complex environments as delivery of health care and educational practices may not be feasible. Often, the complexity is too great and the series of influencers too numerous to get it right with an RCT (in which we hope randomization takes care of those other variables). That means we must use multilevel approaches. Ferlie and Shortell34 have made this point clearly in an article focusing on health care and quality of care improvement on multiple levels (e.g., individuals, groups and teams, organizations, and larger systems), as did the UK MRC report.31 Similarly, they identify many core properties that underlie quality improvement: leadership, institutional culture, team and microsystem development, as well as information technology. They go on to suggest that these areas of change will complement the development of evidence-based movements like Cochrane.34 Davies,35 in a paper focusing on educational policy and practice, outlined that even within systematic reviews and research synthesis, there are many formats. He described their relevance and problems. In so doing, he reminded us that tools must be used carefully and thoughtfully. As in any measurement and decision sequence, the day comes when a decision must be made—and it can never be completely objective. It can only be as fair, as informed, and as transparent as possible, using experts of sufficient knowledge and experience to anticipate the issues and recognize the patterns, so that an optimally informed decision is made. After all, nothing is linear in our complex world of care and education in medicine.36 In the end, Grol33 pointed out that beliefs may win over evidence unless we have criteria and standards for evidence-based implementation.

In summary, we should not be disappointed if BEME does not give answers to perplexing questions in the short run. We should do a better job prospectively at asking and designing studies to offer possible answers. But BEME analysis can serve us in another manner. It provides information to improve what we do educationally by guiding researchers and evaluators in two ways: by pointing out gaps (and, thus, encouraging research in areas to better understand what we do), and by improving theory around which we construct our programs. In this sense, medical educational researchers, and their associations, funding bodies, and medical education journals, will become important definers in setting a better and more focused educational agenda. This focus is really part of the social contract and professional obligation of our institutions for the privilege given us by the public and by the students and trainees in those programs.

On Being Educational Professionals: The Need to Explicitly Justify Our Approaches

Ultimately, medical education must demonstrate that all it professes can make a difference or is a value added. This is not to argue that curricula must be all the same. Rather, there are clear competences that our postgraduate educational system requires for recent graduates to move on in their residency education. Demonstration of those competences is required if we are to be accountable to the students who pay, and to society who pays, both of whom expect optimal use of resources. Above and beyond basic competences and knowledge, additional opportunities can be promoted in any educational program, as long as students who come to that institution value and understand them—and that other stakeholders also understand. Beyond that, we all have a duty to pursue benchmarks for our educational programs and seek valid outcome data. This is not because there is a competition between students and faculties, but because this is how one learns whether one's approach and students’ competences manifest the value added and, eventually, are reflected in clinical encounters in practice.

The recent move to recertification is an example of such attempts to define competencies and promote quality. But even with recertification, is there evidence it makes a difference in improving the quality of care? That is also part of the opportunity the BEME approach should encourage: Does it work in the real world of implementation? In the end, does best evidence medical education or evidence-based practice work? That is a fair question. As researcher and philosopher–educationalist, Geoffrey Norman would point out, results can often be counterintuitive.24 What appears to be objective may not be, and only by study and careful examination can we know. We hope, anticipating that all is not intuitive, to collect information and design better approaches to questions that will increase the likelihood of getting it right or, at least, making it better. That is really continuous quality improvement. And, that should be our mantra, whatever else we do, while BEME and educational research continue to grow and mature.


1.Evidence-Based Medicine Working Group. Evidence-based medicine: a new approach to the teaching the practice of medicine. JAMA. 1992;268:2420–5.
2.Cochrane Collaboration Web site 〈〉. Accessed 13 July 2004. Oxford, UK: The Cochrane Collaboration Secretariat, 2004.
3.Wolf FM, Shea JA, Albanese MA. Toward setting a research agenda for systematic reviews of evidence of the effects of medical education. Teach Learn Med. 2001;13:54–60.
4.Glass GV. Primary, secondary, and meta-analysis of research. Educ Res. 1976;5:3–8.
5.Campbell DT, Stanley JC. Experimental and Quasi-experimental Designs for Research. Chicago: Rand McNally, 1963.
6.Davies P. Approaches to evidence-based teaching. Med Teach. 2000;22:14–21.
7.BEME Group. Best Evidence Medical Education (BEME): report of meeting—3–5 December 1999, London, UK. Med Teach. 2000;22:242–5.
8.Hart IR, Harden RM. Best Evidence Medical Education (BEME): a plan for action. Med Teach. 2001;22:131–5.
9.Institute of Medicine, Committee on Patient Safety. To Err Is Human: Building a Safer Health System. Washington, DC: National Academic Press, 1999.
10.Institute of Medicine, Committee on the Quality of Health Care in America. Crossing the Quality Chasm: A New Health Care System for the 21st Century. Washington, DC: National Academy Press,2001.
11.Donabedian A. Evaluating the quality of care. Milbank Mem Fund Q. 1966;44(suppl):166–206.
12.Reinhardt UE, Hussey PS, Anderson GF. U.S health care spending in an international context. Health Aff (Millwood). 2004;23(3):10–23.
13.Steinberg EP. Improving the quality of care –can we practice what we preach? N Engl J Med. 2003;348:2681–3.
14.Miles A, Grey JE, Polychronis A, Price N, Melchiorri C. Current thinking in the evidence-based health care debate. J Eval Clin Pract. 2003;9(2):95–109.
15.Davidoff F. In the teeth of evidence: the curious case of evidence-based medicine. Mt Sinai J Med. 1999;66:75–83.
16.Naylor CD. Grey zones of clinical practice: some limits to evidence-based medicine. Lancet. 1995;345:840–2.
17.Feinstein AR, Horowitz RI. Problems in the “evidence” of “evidence-based medicine. ” Am J Med. 1997;103:529–35.
18.Tonelli MR. The philosophical limits of evidence-based medicine. Acad Med. 1998;33:1234–40.
19.Miettinen OS. Evidence in medicine: invited commentary. CMAJ. 1998;158:215–21.
20.BEME Collaboration Web site 〈〉. Accessed 24 May 2004.
21.Bligh J, Anderson MB. Medical teachers and evidence. Med Educ. 2000;34:162–3.
22.Norman G. Best evidence medical education and the perversity of human as subjects. Adv Health Sci Educ Theory Pract. 2001;6:1–3.
23.Ten Cate O. What happens to the student? The neglected variable in educational outcome research. Adv Health Sci Educ Theory Pract. 2001;6:81–8.
24.Norman G. Reflections on BEME. Med Teach. 2000;22:141–4.
25.van der Vleuten CPM, Dolmans DHJM, Scherpbier AJJA. The need for evidence in education. Med Teach. 2000;22:246–50.
26.Murray E. Challenges in educational research. Med Educ. 2002;36:110–2.
27.Berliner DC. Educational research: the hardest science of all. Educ Res. 2002;31:18–20.
28.Leach DC. Commentaries: a model for GME: shifting from process to outcomes. A program report from the Accreditation Council on Graduate Medical Education. Med Educ. 2004;38:12–4.
29.Horowitz SD, Miller SH, Miles PV. Commentaries: board certification and physician quality. Med Educ. 2004;38:10–1.
30.Batalden P, Leach D, Swing S, Dreyfus H, Dreyfus S. General competences in accreditation in graduate medical education. Health Aff (Millwood). 2002;21(5):103–11.
31.United Kingdom Medical Research Council. A Framework for development and evaluation of RCTs for complex interventions to improve health [unpublished]. Medical Research Council, 2000.
32.Burkhardt H, Schoenfeld AH. Improving educational research: towards a more useful, more influential, and better-funded enterprise. Educ Res. 2003;32(9):3–14.
33.Grol R. Beliefs and evidence in changing clinical practice. BMJ. 1997;315:418–21.
34.Ferlie EB, Shortell SM. Improving the quality of health care in the united kingdom and the united states: a framework for change. Milbank Q. 2001;79:291–315.
35.Davies P. The relevance of systematic reviews to educational policy and practice. Oxford Rev Educ. 2000;26:365–78.
36.Petros P. Non-linearity in clinical practice. J Eval Clin Pract. 2003;9:171–8.
© 2004 Association of American Medical Colleges