Secondary Logo

Journal Logo

Research Report

Evaluating Evidence-Based Medicine Skills during a Performance-Based Examination

Davidson, Richard A. MD, MPH; Duerson, Margaret RN, PhD; Romrell, Lynn PhD; Pauly, Rebecca MD; Watson, Robert T. MD

Author Information


Courses that teach the critical appraisal of the medical literature are present in every medical school curriculum,1 often under the names evidence-based medicine (EBM), clinical epidemiology, preventive medicine, or biostatistics. They have in common the goals of providing students with lifelong learning skills and assisting them in becoming competent clinical decision-makers. A particular challenge for instructors is determining whether these educational goals are met in ways other than by using knowledge-based tests associated with the course (i.e., does EBM instruction affect physicians’ knowledge and behavior, and even patient outcomes, over a period of time after the coursework is presented?).

Several authors have addressed instruction in critical appraisal and EBM, some in comparative trials, in studies involving medical students, residents, and practicing physicians. Norman and Shannon2 performed a systematic research synthesis that described seven methodologically acceptable studies and concluded that positive effects of teaching critical appraisal skills were more obvious in medical students’ education than in residents’ education. In Taylor et al.'s3 systematic review, most of the studies evaluated demonstrated an improvement in knowledge-based outcomes; however, they also noted the lack of methodologic rigor in most of the studies. Bennet et al.4 performed a controlled trial of teaching medical students and demonstrated an improvement in knowledge-based outcomes. A Cochrane review5 found only one study meeting their methodologic criteria. These studies, and others,6,7 have relied almost exclusively on knowledge-based outcomes or self-assessed skills, most frequently preintervention and postintervention tests, as outcome measures. Three studies have used objective structured clinical examinations (OSCEs) to evaluate EBM skills. Bradley and Humphris8 integrated some evidence-based assessment into one station of an OSCE, using published abstracts and assessed the communication skills and overall performance of students in transmitting information. This study, however, did not assess the development of a precise clinical question (CQ), database-searching skills, or appraisal of an entire article (as opposed to an abstract). Burrows and Tylman9 also used an OSCE, but this study evaluated only database searching and retrieval skills, without addressing critical appraisal or communication skills. Fliegel et al.10 developed a computer-based OSCE station that assessed question development, search terms, and abstract selection, but this study also did not evaluate critical appraisal skills beyond abstract review.

At the University of Florida College of Medicine, EBM instruction involves a 30-hour course in the second semester of the second year that emphasizes database-searching skills, methodology, and critical appraisal of the literature. The course consists of 15 hours of lecture, a three-hour Medline database searching workshop, and six two-hour small-group sessions directed at appraising articles from the current literature. The last small group utilizes an “educational prescription” in which the student develops a CQ based on a patient interaction, searches for appropriate literature, reaches a conclusion, and presents the results to the small group. During the third-year, internal medicine, and obstetrics and gynecology clerkship students are asked to critically appraise literature relating to a topic or a patient.

We developed a method of assessing these students’ knowledge and performance in EBM and integrated it into a clinical skills assessment utilizing a performance-based examination (PBE). Our goal was to develop an exercise that would test four major aspects of EBM: (1) the development of a CQ, (2) expertise at obtaining the appropriate resources (search technique), (3) critical appraisal of an entire article, and (4) translation of the effort into a clinical decision that can be communicated appropriately to the patient. This report describes the structure of the assessment and initial results.


Performance-Based Examinations

At the University of Florida College of Medicine, students complete two PBEs in their third year, one at the halfway point of the year and one at the end. Clerkship directors or their appointees were the station authors (SAs) and they developed the cases to be appropriate for rotations the students had completed. Cases are presented to a PBE Oversight Committee and are discussed, critiqued, and edited by the group.

In addition to the assessment of clinical and communication skills within the standardized patient (SP) interactions, students are asked a number of disease-specific questions during an interstation that follows each station. Each student completes eight eight-minute stations (drawn from a pool of 16). The stations address clinical cases in internal medicine, geriatrics, pediatrics, neurology, family medicine, obstetrics and gynecology, psychiatry, and general surgery. Over the course of the study, two cases were dropped and two new ones added.

Integration of Evidence-Based Medicine in the Performance-Based Examination

In 2002 and 2003, students’ EBM skills were assessed in the PBE. The SAs developed EBM clinical queries to be posed by the SPs in the last station. For example, at the conclusion of a station with a patient complaining of symptoms of acute bronchitis, the SP asks the student for a prescription for antibiotics. Regardless of the student's response, the SP asks the student for evidence supporting his or her decision. The student then must develop a concise patient-appropriate CQ, perform a Medline search, and based on the search select one or two articles that seem most appropriate to answer the question. During first year of the study, students then visited the Health Science Center Library, obtained and read the literature they identified, critiqued them, and developed a response to the question. During the second year of the study, the students followed the same protocol, but did not use the library. Instead, they were given a standard article to critique once they had selected their own. The standard articles had been selected by the SA and reviewed and critiqued by the course director (CD) for the EBM course; the critique was shared with the SA. In both years of the study, the students returned to the Assessment Center after an hour, where they were videotaped answering the SP's original query. The SP used a checklist to evaluate the student's communication skills, including whether they utilized medical jargon, were organized in their presentation, allowed the SP time to ask questions, and demonstrated an interest in the patients’ concerns. The results were collated, summarized, and scored in percentage of items successfully completed.

Immediately following that presentation, the student used a computer to address the following four items:

  1. Describe the precise CQ you developed, relating the question to your specific patient.
  2. Please list the specific search terms you used to obtain your evidence. What databases and searching methods did you utilize?
  3. What were the most persuasive articles you found? List between 1 and 3, giving the first author's name, the title of the article, the journal, and the volume number and year it was published.
  4. Why did you answer the question as you did? Please provide specific details relating to the resources that you used (e.g., the strength of study design, the susceptibility of the evidence to bias). You are limited to 150 words.


The results of the students’ responses were given to the SA for evaluation, who evaluated each item with a five-point Likert scale, with 1 being inadequate performance and 5 being superior performance. In the second year of the study, the CD evaluated each student's answer to Item 4 independently of the SA evaluation; the CD did not know the student's identity. Agreement between the SA and CD was measured by the weighted kappa statistic. During the second year of the study, Item 2 (search technique) was evaluated by a clinical librarian who rated the students’ searching skills on a three-step ordinal scale.

The scores for the EBM portion of the exam are integrated with those from the remainder of the exam; however, students who performed poorly on the EBM section received additional attention during subsequent clerkship experiences that include an EBM session. To provide a summary value for each student, a mean of the responses to Items 1, 3, and 4 was calculated for the group as a whole and for each student. This value was used to determine whether EBM performance was correlated with other measures of academic performance, including class rank, cumulative grade point average, and U.S. Medical Licensing Examination (USMLE) Step 1 score.


A total of 224 students participated in the EBM exercise (110 in 2002 and 114 in 2003), with each student having been assigned 1 of 16 EBM queries. Performance on the items was generally consistent over the two years. For Item 1, which asks the student to develop an appropriate searchable CQ, the mean (± standard deviation) scores were 3.96 (± 1.0) and 3.93 (± .8) for 2002 and 2003, respectively. On Item 3, the faculty evaluation of the “best” articles selected by the student, the scores were 3.92 (± 1.2) and 3.85 (± .8) 2002 and 2003, respectively. There was more variability in Item 4, the rationale for the conclusion reached by the student after reading the selected article. In 2002, the mean was 3.73 (± 1.2); in 2003 it was 4.09 (± .8). This difference likely represents the change in the program in year two, when a faculty member selected the article and all students who completed the station critiqued the same article.

Item 2, the quality of the search, was evaluated differently during the study. The mean score evaluated by faculty in 2002 was 4.10 ± .9. During 2003, with the searches were evaluated by a clinical librarian using an ordinal scale, 25% of students received a “✓−”; 35% a “✓” and 40% a “✓+.” Because of changes in the stations, subsequent analyses address data from year two.

There was moderate variability in the mean scores among the SAs. Summing the numerical values for Items 1, 3, and 4 for all students revealed a mean score of 3.95 ± 0.4. The low station average was 3.37 ± .1 and the high 4.61 ± .1. Agreement between the CD and SA scores on Item 4 was good (kappa = 0.6; p < .0001). There was no correlation between scores on this exercise and cumulative grade point average, class rank, or USMLE Step 1 scores.

In addition to the content competencies, the communication skills of the students were assessed by the SPs using a checklist. Eighty-eight percent of students avoided the use of unexplained medical jargon. Ninety-three percent were felt to be organized in their explanation; the student allowed questions 97% of the time; and 95% of the students demonstrated interest in the SPs’ concerns.


We developed a measure of EBM skills that was integrated into a preexisting PBE. Although the logistics of the evaluation are challenging and require an active SP program, our effort was well received by students and faculty. One limitation of the utility of our study is the variation in grading practices by the faculty. Even though there was significant agreement between the CD and SA, having several faculty from the appropriate department independently reviewing students’ responses would protect against this variability and strengthen this exercise. Our design does not allow an extensive investigation of case specificity because students were tested on only one case. However, the integration of EBM grading into a global estimate of clinical skills makes this issue less critical.

At our institution, performance of our students on the USMLE Step 1 exams with regard to biostatistics and epidemiology has been consistently higher than the national average; however, the number of questions addressing the content is relatively small, which might explain the lack of correlation with USMLE Step 1 scores. Although the PBE took place midway through the third year, it was only around six months after the EBM course; we would like to reevaluate these skills before graduation to address the durability of the educational process. Based on the data we have collected so far, we believe the results demonstrate the effectiveness of our teaching program in EBM.

Our study was not designed as a trial, but rather as an attempt to develop outcome measures that mimic actual evidence-based practice skills and behaviors as opposed to relying on knowledge-based tests. This technique could easily be applied to residents’ training, or for the evaluation of practicing physicians. We have found this effort to be practical and acceptable to faculty; in our test, each faculty member had to grade eight or 16 students’ responses, each of which was relatively short. We believe the integration of an evidence-based exercise into a PBE holds great promise for the assessment and application of evidence-based skills.


1.CurrMIT: Curriculum Management and Information 〈〉. Accessed 21 March 2002. Association of American Medical Colleges, Washington, DC, 2002.
2.Norman GR, Shannon SI. Effectiveness of instruction in critical appraisal (evidence-based medicine) skills: a critical appraisal. Can Med Assoc J. 1998;158:177–81.
3.Taylor R, Reeves B, Ewings P, Binns S, Keast J, Mears R. A systematic review of the effectiveness of critical appraisal skills training for clinicians. Med Educ. 2000;34:120–5.
4.Bennett KJ, Sackett DL, Haynes RB, Neufeld VR, Tugwell P, Roberts R. A controlled trial of teaching critical appraisal of the clinical literature to medical students. JAMA. 1987;257:2451–4.
5.Parkes J, Hyde C, Milne R. Teaching critical appraisal skills in health care settings (Cochrane Review). Cochrane Database Syst Rev. 2001;3:CD001270.
6.Smith CA, Ganschow PS, Reilly BM, et al. Teaching residents evidence-based medicine skills: a controlled trial of effectiveness and assessment of durability. J Gen Intern Med. 2000;15:710–5.
7.Ghali WA, Saitz R, Eskew AH, Gupta M, Quan H, Hershman W. Successful teaching in evidence-based medicine. Med Educ. 2000;34:18–22.
8.Bradley P, Humphris G. Assessing the ability of medical students to apply evidence in practice: the potential of the OSCE. Med Educ. 1999;33:815–7.
9.Burrows SC, Tylman V. Evaluating medical student searches of Medline for evidence-based information: process and application of results. Bull Med Libr Assoc. 1999;87:471–6.
10.Fliegel JE, Frohna JG, Mangrulkar RS. A computer-based OSCE station to measure competence in evidence-based medicine skills in medical students. Acad Med. 2002;77:1157–8.
© 2004 Association of American Medical Colleges