Secondary Logo

Journal Logo

Clinical Reasoning

Tracking Development of Clinical Reasoning Ability Across Five Medical Schools Using a Progress Test

Williams, Reed G. PhD; Klamen, Debra L. MD, MHPE; White, Christopher B. MD; Petrusa, Emil PhD; Fincher, Ruth-Marie E. MD; Whitfield, Carol F. PhD; Shatzer, John H. PhD; McCarty, Teresita MD; Miller, Bonnie M. MD

Author Information
doi: 10.1097/ACM.0b013e31822631b3


The acquisition of clinical reasoning skills—the ability to organize and use medical knowledge and reasoning ability to diagnose medical problems—is essential for medical students, yet is poorly understood. Even less is known about the development of clinical reasoning over the medical curriculum as a whole. A study by Neufeld et al1 concluded that the “clinical reasoning process remains relatively constant from medical school entry to practice.” That article noted a paucity of studies examining the evolution of the clinical reasoning process in medical students. In a more recent study, using a progress testing method,2–4 Williams et al5 reported on medical student progress toward achieving the clinical reasoning skills expected of learners by graduation at one institution. As might be expected, years of training had a large effect on clinical reasoning skill performance. Surprisingly, clinical reasoning performance did not accelerate during the third year, when students were intensively immersed in clinical practice. Because this study was conducted at one school, the results may have been unique to that curriculum.

The current study was designed to test whether the pattern of clinical skills acquisition illustrated by Williams et al would be similar at five different medical schools, each with a unique curriculum. The primary purposes of this study were to determine (1) the growth of clinical reasoning across years of training at the five participating schools, (2) whether the improvement was different in the five schools, and (3) whether the rate of improvement aggregated across the five schools was different during the different years of training (years one, two, and three).

We hypothesized that clinical reasoning skills would improve across years of training at each of the five schools but that, depending on the type of curriculum (e.g., traditional versus problem-based learning), the rates of improvement would differ from school to school. Several articles6,7 have promoted the idea of teaching clinical skills and basic science knowledge concurrently, noting that this integrated model may be more effective than the traditional approach of teaching basic sciences before proceeding to clinical problems. Thus, we further hypothesized that medical students at those schools with this kind of integrated basic science and clinical curriculum would show higher levels of clinical reasoning skills earlier in their training as compared with students undergoing more traditional forms of instruction. We believed that students in more traditional schools might have less demonstrated clinical reasoning skills in years one and two, but they might “catch up” to their counterparts in integrated curricula with a correspondingly higher rate of clinical reasoning skills growth in year three.


We developed two tests to measure medical students' clinical reasoning skills. The first included 70 diagnostic pattern recognition (DPR) items, and the second consisted of 69 clinical data interpretation (CDI) items. Tests of these types have been shown to be reasonable proxies for clinical reasoning.8–10 Examples of DPR items and CDI items are included in Appendices 1 and 2, respectively.

We designed the DPR test, based on the work of Case and colleagues,11 to assess students' ability to recognize common patterns of patient signs and symptoms. The CDI test used the script concordance item format of Charlin et al12 and was designed to test students' ability to interpret the impact of individual items of clinical data on the probability that a given diagnostic hypothesis was correct. Unlike the Charlin approach, we established a correct answer for each item. These methods provide useful educational assessments for medical students as well.13 D.L.K. and other physicians at the Southern Illinois University (SIU) School of Medicine authored the CDI test items and provided correct answers, which were reviewed for accuracy by other SIU physicians not involved in authoring the items. Details of item generation, scoring, and review, as well as examples of each of these types of questions, have been published elsewhere.5 We created the version of each test used in this research project by selecting questions with demonstrated desirable characteristics (e.g., item difficulty and discrimination) that had been administered in previous years at SIU alone and had been reviewed and revised by a panel of physicians familiar with working with medical students. The DPR examination covered 20 chief complaints. The CDI examination covered 31 chief complaints. Collectively, the DPR and CDI examinations covered 33 diagnoses.

Five medical schools administered the DPR and CDI examinations to their students in 2008. The five schools participating in this study were, in alphabetical order, Medical College of Georgia, Penn State College of Medicine, SIU School of Medicine, Vanderbilt School of Medicine, and the University of New Mexico School of Medicine. The institutional review board at each of these institutions approved this study. Three of the medical schools are public, and two are private. Enrollments range from small to large (72–190 per year). Among them, the schools have a variety of curricula and instructional delivery systems, including one with a discipline-based first year and a system-based second year, two that have integrated basic science/clinical preclerkship years, and two that have hybrid problem-based learning curricula.

All students at each school were invited to participate, and participants were grouped according to level of training: medical students just entering medical school (Group 0, n = 604), those who had just completed the first year (Group 1, n = 600), those who had just completed the second year (Group 2, n = 576), and those who had just completed the third year (Group 3, n = 614). In all, there were 2,394 participants.

We administered the two paper-and-pencil tests together, and students had 120 minutes to complete both tests. Testing for each group occurred in 2008 at the beginning of the new academic year. For example, students who had just completed their first year of medical school (Group 1) were tested at the orientation to year two of medical school. Students who had completed the fourth year at each school (beginning interns) were not included because they were not available for testing. We told students that their performance would not affect academic progress and that their participation was voluntary. Students were assigned a numeric code to protect their identity.

All answer sheets were mailed to one of the schools (SIU) for scoring, which was overseen by R.G.W. For each test, we computed a percent correct score for each student. Data from all schools and students at all levels were aggregated for further analysis. We computed mean percent correct scores and 95% confidence intervals for student groups from each medical school at each of the four levels of training.

Internal consistency reliability was computed using Cronbach alpha. We performed two-factor (years of training by school) analyses of variance separately for DPR and for CDI. We also used one-way analyses of variance to determine the effects of years of training separately for each school and to determine the effects of school separately at each level of training. A P value of <.05 was used to determine statistical significance. We used the Tukey honestly significant difference (HSD) method of multiple comparisons to determine which differences in means were significant when overall analysis of variance results were significant. To determine the growth of student performance from one year to the next within each school, we computed effect sizes (d). We computed effect sizes by subtracting the latest year score from the previous year score and dividing by the standard deviation for the previous year. We performed all statistical analyses using SPSS statistical software (SPSS version 16.0, SPSS Inc., Chicago, Illinois).


Table 1 provides the internal consistency test reliabilities broken down by school and years of training. The internal consistency reliabilities for the DPR and CDI examinations across all schools and all levels of training were 0.94 and 0.82, respectively. Average reliabilities were highest after two years of training (Group 2), suggesting that student knowledge applicable to these examinations was best integrated in the minds of students at that stage of their training. DPR reliabilities tended to be higher than CDI reliabilities during the first two years. The correlation between DPR and CDI performance was 0.58 (P < .05).

Table 1
Table 1:
Internal Consistency Reliabilities for Each Test, Each School, and Each Level of Training Represented in a 2008 Study of 2,394 Medical Students' Performance on Two Clinical Reasoning Skills Tests at Five Schools

Table 2 provides 2008 DPR and CDI performance results for medical students at the five participating medical schools at four levels of training. This table includes the mean percentage of correct answers, the standard deviation, the percentage of students who took the test, and the effect sizes for years of training. Tukey HSD multiple comparison results are included in the last row of the table to designate statistically significant performance differences between schools where applicable. Figures 1 and 2 depict these results as line graphs.

Table 2
Table 2:
Diagnostic Pattern Recognition (DPR) and Clinical Data Interpretation (CDI) Scores for 2,394 Medical Students from Five Schools at Four Levels of Training, 2008
Table 2
Table 2:
Figure 1
Figure 1:
Mean performance (with 95% confidence intervals) on a 2008 test measuring diagnostic pattern recognition for 2,394 students with zero, one, two, or three years of training in five medical schools.
Figure 2
Figure 2:
Mean performance (with 95% confidence intervals) on a 2008 test measuring clinical data interpretation for students with zero, one, two, or three years of training in five medical schools.

Year of training explains 66% of the variation in DPR scores (P < .05) and 25% of variation in CDI scores (P < .05). The “medical school influence,” which includes all medical school elements (e.g., student selection plus curriculum plus instructional delivery system plus faculty) accounted for 4% of variation in DPR scores (P < .05) and 2% of variation in CDI scores (P < .05). The interaction between medical school and years of training, a statistical indicator that patterns of improvement are somewhat different for the different medical schools, accounted for 2.7% of variation in DPR scores (P < .05) and 1.4% of variation in CDI scores (P < .05).

Table 2 and Figures 1 and 2 show that, as a rule, student performance increased substantially as a result of each year of training. These results also illustrate that the annual performance gains attributable to training at each of the participating medical schools were more similar than different. CDI performance and performance gains were lower than DPR performance and gains. The figures also depict the smaller DPR and CDI performance gains attributable to year three of medical school training across the five schools. To address the possibility of a ceiling effect on the results, we calculated the amount of gain possible based on performance in the prior year and then determined the percentage of that gain actually achieved in the subsequent year. When viewed this way, students attained 33% of possible DPR gain during year one, 43% during year two, and 35% during year three. Likewise they gained 15% of the amount of CDI gain possible during year one, 12% during year two, and 7% during year three.

Tukey HSD multiple comparison results (Table 2) indicate that school-to-school group differences are most often not significant and, when they are significant, relative position of schools varies from training year to training year.

Discussion and Conclusions

On average, among all students, DPR and CDI performance ability improved with each additional year of medical training. The CDI was a more difficult test than the DPR. We believe that the CDI was more difficult because a pattern recognition format (such as is found in the DPR test items) is more in line with how students encounter such data in their studies. That is, students more frequently read about a disease and its constellation of findings. They are less likely to learn about how a single piece of data may be helpful in ruling in or out a list of diseases on a differential diagnosis.

The expected rapid acceleration in third-year performance gains due to the intensive clinical training during that year did not occur. This finding was consistent with results reported previously at a single institution.5 The performance of students in the participating schools is more similar than different, especially after three years of the curriculum.

We observed that students in Group 0, who were just starting medical school, answered nearly half the items correctly, but this is not so surprising when one considers that by chance alone students should score 20% of the CDI items (five options for each question) and 12.5% to 20% of the DPR items (five to eight options for each question) correctly. In addition, students are not blank slates on entry to medical school. Many have already had extensive mentoring, shadowing, and volunteering experiences with physicians and patients. At entry, school-to-school differences amounted to an average of four items on the DPR and the CDI test. The largest school-to-school differences after years one and two amounted to a difference of only five correct answers. After year three, school-to-school differences were two correct answers on the DPR exam and no appreciable school-to-school differences in CDI performance. The DPR standard deviation decreased in year three, indicating that students' performance became more homogenous. This was not true for CDI. In summary, DPR and CDI performance differences among school cohorts were relatively small at the time of admission, as would be expected, and were also small by the end of year three.

The lack of accelerated progress in the third year is puzzling. One explanation for this finding is a ceiling effect that makes further improvement difficult after a certain level of achievement. Our calculations regarding percentage of possible gain indicate that ceiling effect may be a partial explanation for the findings—especially the DPR findings. Although it is also possible that portions of the test may have been too advanced for the level of learner, especially in the third year, all questions were created specifically for medical students by the test creators, and they were reviewed by other physicians also familiar with teaching medical students. Another plausible explanation for the lack of more rapid acceleration in performance gain during the third year is that acquisition of diagnostic reasoning ability is not the primary goal of clinical training during the third year. Students may be concentrating on other patient-care-related activities (e.g., learning to function in clinical environments, patient management, choosing a specialty). Another plausible though unexplored explanation is that student conceptions of disease presentations (based on classic presentation of diseases in years one and two) may be disrupted by the complexity and ambiguity of real patient presentations. Students, who have spent the third year being confronted by patients who do not look like they look in a textbook, may have been more hesitant and unsure when answering clinical reasoning questions on the progress test.

We hypothesized that schools with early, integrated clinical and basic science teaching would produce sustained superior gains in clinical reasoning, but this did not occur. These findings are consistent with those reported by others14,15 regarding the impact of medical school selection processes, curricula, and curriculum delivery systems on United States Medical Licensing Examination test performance. The variations in North American medical school selection policies and procedures, curricula, and curriculum delivery systems seem to result in relatively small differences in measured student performance capabilities at the end of undergraduate medical education training. Perceived inadequacies in medical education result in frequent calls for curriculum change, and schools seem to go through regular cycles of reinvention. Although we do not wish to discourage consideration and implementation of curricular changes, we believe that our results and those of Hecker and Violato14 should be considered before investing time and effort in curriculum revision to improve knowledge or diagnostic reasoning skills. Although changes in curricula or delivery systems might influence other features of student performance not measured by these tests, changes in student knowledge and diagnostic ability are likely to be modest.

National data regarding how practicing primary care physicians and residents perform on these tests would add context for interpreting the results presented here. Williams et al5 provided some data about DPR and CDI performance of practicing physicians and residents, but the data do not adequately represent the population of North American primary care physicians and residents. Such data would allow comparisons of medical student performance to practicing physician performance.

This study has limitations worth noting. The cross-sectional design of the study (as opposed to a longitudinal follow-up of the same cohort of students over time) makes it difficult to conclude that the gains seen were the result of acquisition of clinical reasoning skills rather than cohort differences attributable to differences in student selection during the years involved. However, it is unlikely that all five participating schools would have experienced similar change in student selection, a circumstance which would have been necessary to attribute these observed differences to selection. We plan to report longitudinal results for the participating schools in the future.

In summary, our results demonstrate that medical students achieve substantial gains in clinical reasoning ability with each year of medical school training and experience. However, these students do not seem to experience larger gains in clinical reasoning ability after the clinically intensive third year of medical school. Further, the gains observed are similar across the spectrum of medical schools that participated in this research project.


The authors would like to acknowledge the ongoing work of the North American Progress Testing in Medical Education Group: Medical College of Georgia, Pennsylvania State University, Southern Illinois University, Texas Tech at El Paso, University of New Mexico, Vanderbilt University, and the University of Minnesota. They would also like to thank Randall Robbs at Southern Illinois University School of Medicine for scoring the tests, and Robert Gowin, also at Southern Illinois University, for preparing the test score reports.



Other disclosures:


Ethical approval:

Ethical approval has been granted by the IRB committees at each of the participating institutions: Medical College of Georgia, Penn State College of Medicine, Southern Illinois University School of Medicine, Vanderbilt School of Medicine, and the University of New Mexico School of Medicine.


1 Neufeld VR, Norman GR, Feightner JW, Barrows HS. Clinical problem solving by medical students: A cross-sectional and longitudinal analysis. Med Educ. 1981;15:315–322.
2 Van der Vleuten C, Verwijnen GM, Wijnen W. Fifteen years of experience with progress testing in a problem-based curriculum. Med Teach. 1996;18:103–109.
3 Blake JM, Norman GR, Keane DR, Mueller CB, Cunnington J, Didyk N. Introducing progress testing in McMaster University's problem-based medical curriculum: Psychometric properties and effect on learning. Acad Med. 1996;71:1002–1007. Accessed May 25, 2011.
4 Arnold L, Willoughby TL. The quarterly profile examination. Acad Med. 1990;65:515–516. Accessed May 25, 2011.
5 Williams RG, Klamen DL, Hoffman RM. Medical student acquisition of clinical working knowledge. Teach Learn Med. 2008;20:5–10.
6 Eva KW. What every teacher needs to know about clinical reasoning. Med Educ. 2004;39:98–106.
7 Charlin B, Tardif J, Boshuizen HPA. Scripts and medical diagnostic knowledge: Theory and applications for clinical reasoning instruction and research. Acad Med. 2000;75:182–190. Accessed May 25, 2011.
8 Case SM, Swanson DB. Extended-matching items: A practical alternative to free-response questions. Teach Learn Med. 1993;5:107–115.
9 Carriere B, Gagnon R, Charlin B, Downing S, Bordage G. Assessing clinical reasoning in pediatric emergency medicine: Validity evidence for a script concordance test. Ann Emerg Med. 2009;53:647–652.
10 Boutros M, Nouh T, Reid S, et al. The script concordance test as a measure of clinical reasoning: A cross-Canada validation study. Med Educ. 2010;44(suppl 2):1.
11 Case SM, Swanson DB, Stillman PL. Evaluating diagnostic pattern recognition: The psychometric characteristics of a new item format. Proc Annu Conf Res Med Educ. 1988;27:3–8.
12 Charlin B, Roy L, Brailovsky C, Goulet F, van der Vleuten C. The script concordance test: A tool to assess the reflective clinician. Teach Learn Med. 2000;12:189–195.
13 Bland AC, Kreiter CD, Gordon JA. The psychometric properties of five scoring methods applied to the script concordance test. Acad Med. 2005;80:395–399. Accessed May 25, 2011.
14 Hecker K, Violato C. How much do differences in medical schools influence student performance? A longitudinal study employing hierarchical linear modeling. Teach Learn Med. 2008;20:104–113.
15 Ripkey DR, Swanson DB, Case SM. School-to-school differences in Step 1 performance as a function of curriculum type and use of Step 1 in promotion/graduation requirements. Acad Med. 1998;73(10 suppl):S16–S18. Accessed May 25, 2011.
Appendix 1
Appendix 1:
Examples of Questions and Answers on a Diagnostic Pattern Recognition Test Administered to 2,394 Medical Students With Zero, One, Two, or Three Years of Training at Five Schools, 2008
Appendix 2
Appendix 2:
Examples of Questions and Answers on a Clinical Data Interpretation Test Administered to 2,394 Medical Students With Zero, One, Two, or Three Years of Training at Five Schools, 2008
© 2011 Association of American Medical Colleges