The great majority of medical diagnoses, up to 90% in the case of chest pain, for example, are made on the basis of the history alone. 1–3 Although this is well established, the history-taking behaviors of medical students and residents have received little attention as a measure of diagnostic reasoning. Existing studies focus mainly on data gathering and communication skills. 4–6 The development of standardized patients (SPs) as an assessment tool has compensated greatly for the lack of direct faculty observation and evaluation of student histories and physicals. However, a major shortcoming of SP-based assessment is its propensity to reward data gathering, a skill that is not necessarily associated with good data interpretation. 7–11 More recently, studies of diagnostic processes and accuracy 12,13 have shown that the better diagnosticians have more semantically driven discourses, that is, they transform the patient findings into more abstract representations (e.g., “knee” becomes “large joint”) and create early problem representations (e.g., “an acute, recurrent large-joint problem” as opposed to “a chronic small-joint problem”) that help frame the diagnostic process.
Paying attention to history-taking and physical examination skills is important. Unfortunately, many students do not improve their interviewing techniques through medical school 14 and senior students have the same deficiencies as less advanced students. 15 More recently, similar deficiencies have been found in physical examination techniques related to respiratory and cardiac events; housestaff improved little during their training. 16,17 However, it was also found that once physical examination errors were corrected, changes in differential diagnosis and therapy frequently occurred. 18 Over 12% of internal medicine residents had never been observed directly taking a history and performing a physical examination and 55% had been observed only once or twice. 19 An important question remains. What should be observed? What history-taking behaviors are associated with good and poor diagnoses? The purpose of this study was to begin to identify which specific history-taking behaviors (predictor variables) are associated with making high-quality medical diagnoses as measured by global ratings of the students' diagnostic competence and semantic competence (criterion variables). For the purpose of this study, only cognitive aspects were investigated.
Because of the labor involved in obtaining criterion variables, namely global ratings and semantic classifications, a secondary analysis study was conducted with data from two previous studies. 12,20 This one-case convenience sample was chosen because it showed variability among students. The case was intended to produce not generalizable findings but rather the initial identification of potentially fruitful areas for future inquiry. Seventeen end-of-third-year medical students (clerks) saw a standardized patient complaining of a painful swollen right knee of two days' duration; the final diagnosis was gout in a patient with chronic osteoarthritis. All the students had completed their core clerkships. The interviews were videotaped and transcribed. On average, the encounters lasted 15.35 minutes (SD = 5.77, range: 8–26). IRB approval was obtained for the secondary analysis as well as the two primary studies.
Two reliable global ratings of the students' diagnostic performance were obtained from attending physicians in the Connell study 20: clinical reasoning and knowledge. Higher ratings corresponded to better performance. The mean global rating of clinical reasoning was 5.50 (out of 10, SD = 2.23, range: 1.85–9.05); and the mean global rating of knowledge was 5.23 (out of 10, SD = 2.11, range: 2.35–8.65). The semantic competence of the same students was classified in the Bordage study 12 and showed that 13 thinking-aloud discourses were rated as symptom-driven (eight reduced and five dispersed) and four as semantically driven. The semantically driven discourses were associated with more comprehensive and accurate diagnoses than the symptom-driven ones.
Except for numerous checklists of specific clinical findings, there was no instrument readily available to assess history-taking behaviors per se. A list of behaviors was drawn from a review of the literature, 4,21–23 from personal experience, and from viewing eight of the Connell videotapes in search of distinct behaviors. Twelve behaviors were identified, some positive and some negative: (1) clarifying or verifying patient information; (2) asking questions in close proximity, within a line of inquiry; (3) failing to respond to a key piece of information; (4) repeating questions unnecessarily; (5) summarizing information at hand; (6) changing topic before completing a line of inquiry; (7) telling the patient (prematurely) that ancillary tests are needed; (8) inquiring about the chief complaint—“frequency” (number of times six specific aspects of the chief complaint—onset, site, course, severity, context, aggravating—relieving factors—were explored); (9) inquiring about the chief complaint—“thoroughness” (percentage of the six aspects of the chief complaint above); (10) inquiring about present illness; (11) inquiring about systems; and (12) inquiring about past history (medical, surgical, medication, family, or social).
Because of the importance of distinguishing between what happens during the initial minutes of an encounter, compared with the encounter overall, the occurrences of the 12 behaviors were analyzed during the first three minutes as well as during the entire encounter.
One observer tallied the history-taking behaviors using one-minute intervals on the videotapes. The one-minute interval was determined after pilot testing showed that one-minute segments of the video recordings were optimal for observer accuracy in tallying behaviors. All the behaviors observed in a given interval were counted as often as they occurred. The videotapes were viewed in a random order and the observer was blinded to the students' global ratings and semantic classes. Inter-rater reliability was verified on a sample of videos using the phi inter-rater agreement method. 24 Inferential statistics were used not to test hypotheses per se but as a means of identifying potentially interesting behaviors within the context of the exploratory nature of this study. Pearson correlation coefficients were computed to examine relationships between history-taking behaviors and global ratings. T-tests were used to identify differences in the history-taking behaviors of students whose discourses were previously classified as symptom- or semantically driven. An uncorrected .05 level for statistical significance was used because of the exploratory nature of the study.
All but two of the 12 behaviors were observed in the 17 videotaped encounters. Failing to respond to a key piece of information and telling the patient that ancillary tests were needed were never observed in the present case. Six randomly chosen videotapes were used to establish inter-rater reliability between the main observer (MH) and one other observer (KC). There were very high levels of agreement for the 12 individual behaviors (phi coefficients between .92 and 1), exceeding the chance agreement coefficients, all p values < .05.
Four history-taking behaviors best differentiated students with symptom-driven discourses from those with semantically driven discourses; a narrative description of the results follows and numbers are presented in Table 1. Three behaviors were more frequent among students with semantically driven discourses: thoroughness of inquiry about the chief complaint during the first three minutes of the encounter; asking questions in close proximity, within a line of reasoning, both during the first three minutes and over the total duration of the encounter; and clarifying information provided by the patient over the course of the encounter. Repeating questions unnecessarily over the total encounter was characteristic of students with symptom-driven discourses.
Three behaviors were positively associated with the global ratings: asking the patient to provide clarifying information was associated with ratings of clinical reasoning during the first three minutes and over the total duration of the encounter; asking questions in close proximity within a line of reasoning was associated with ratings of both clinical reasoning and knowledge for the total encounter; and summarizing information at hand was associated with ratings of knowledge during the first three minutes. Four behaviors were negatively associated with measures of clinical competence: inquiring about systems and past history were both associated with ratings of clinical reasoning during the first three minutes; repeating questions unnecessarily was associated with ratings of knowledge during the first three minutes; and changing the topic before completing a line of inquiry was associated with ratings of clinical reasoning both during the first three minutes and overall. See Table 1.
Undoubtedly, making a diagnosis is highly dependent on the diagnostician's medical knowledge of the content of the case, thus case specificity. 7 However, the results from the present case study indicate that certain observable history-taking behaviors were more characteristic than others of good or poor diagnosticians. One of the striking results is that all the characteristic behaviors occurred very early in the encounter, in this case, during the first three minutes. Four of the eight behaviors were characteristic of highly rated diagnosticians and all were present during the first three minutes: thoroughness of inquiry about the chief complaint; asking questions in close proximity, within a line of reasoning; clarifying or verifying information provided by the patient; and summarizing information at hand.
Four other behaviors, also present during the first three minutes, were characteristic of poorly rated diagnosticians: repeating questions unnecessarily; changing topic before completing a line of inquiry; inquiring about systems; and inquiring about past history.
The latter two behaviors, although not negative in and of themselves, simply indicate that the diagnosticians lacked a clear focus on the chief complaint and the history of the present illness at the very beginning of the encounter. Instead, their inquiries were broad-based, likely indicating a lack of understanding of the case or a strategy of casting a wide net in the hope of finding a promising line of inquiry. A lack of focus on the chief complaint and present illness early on will likely lead to a poor outcome. This echoes both Hatala's 11 finding that gathering more data when in doubt does not improve diagnosis and Elstein et al.'s 7 finding that thoroughness of data gathering is not a good predictor of diagnostic accuracy. The fact that this can be observed from the very beginning of an encounter has interesting educational implications. Rather than gathering more data in a blind fashion that will likely lead to a poor outcome, the students would be better advised to interrupt their encounter momentarily and to look things up in terms of eliciting that particular chief complaint and selecting a limited set of plausible and prototypical diagnoses to compare and contrast.
As a whole, the four positive behaviors—thoroughness of exploration of the chief complaint, asking questions in close proximity, clarifying patient information, and summarizing—are all indicative of purposeful or hypothesis-driven inquiry. Early hypothesis generation is a well-established finding. 7,25,26 Gale and Marsden 27 found that physicians developed working hypotheses during the first 50 seconds of an encounter. Early hypothesis generation plays a crucial role in reaching a correct diagnosis. Gruppen 28 showed that students who consider the correct diagnosis early are four to nine times more likely to reach the final diagnosis than those who do not. Not having a clear diagnosis in mind, not being in the ballpark early, is a serious handicap. The students with the correct diagnosis (in the present study), based on Chang's 13 analysis of their case presentations, typically began their presentations with a semantically rich problem representation whereby they transformed two or three basic findings of the chief complaint to frame the problem overall (e.g., an acute, recurrent, large-joint problem). The findings from the present case study show that the students who had semantically driven discourses exhibited more positive history-taking behaviors than those with symptom-based discourses. It is not possible to draw conclusions about causation from the present study. Issues of trait (i.e., semantic competence), skill (i.e., history taking), and content specificity (i.e., specific knowledge) are yet to be sorted out.
A major function of early hypothesis generation is to guide subsequent information gathering and interpretation by providing a framework within which information about a patient can be evaluated. All three criterion variables were positively related to the behavior of “asking questions in close proximity, within a line of reasoning.” This history-taking behavior is reminiscent of Norman's “co-selection” process 29 whereby the identification of clinical findings is a highly interactive process in which clinical features are more evident when diagnoses are also available. Focused and purposeful inquiry is likely a reflection of this underlying psychological construct.
From the present results, purposeful inquiry—as manifested by thoroughness of exploration of the chief complaint, asking questions in close proximity, clarifying patient information, and summarizing—is to be distinguished from routine rote inquiry. Routine history, as in the review of systems, plays an important role in the diagnostic process as a fail-safe or verification mechanism. 30 However, using this strategy as an initial problem-solving strategy is unlikely to succeed. The highly rated students in the present study did not waste time following a systems-review checklist from the outset. Instead they explored the chief complaint thoroughly and asked questions in relation to a line of inquiry, suggesting a hypothesis-testing strategy. One of the reasons for students to use a checklist approach may be their eagerness to gain points on the SP-based assessment checklist whereby points are garnered based on acquiring specific discrete clinical findings, and not necessarily following a clear line of inquiry. Furthermore, rather than following a piecemeal approach to delivering information, SPs may be better advised to begin the encounter by telling their stories, the way patients normally do, from which the student can form early problem representations and can then ask questions in a purposeful manner and clarify missing or unclear symptoms.
The early occurrence of the eight characteristic behaviors echoes Shatzer's 31 finding that satisfactory generalizable estimates of examinee performance may be obtained from shorter SP stations (i.e., five or ten minutes compared with 20 minutes). The early occurrence could also be a reflection of biases that drive the attending physicians' global ratings early in an encounter. However, a number of the criterion variables were also present for the encounter overall. A second factor that mitigates the bias interpretation comes from the fact that the semantically driven criterion variable is derived from a comprehensive assessment of the case presentation overall.
The results from the present study begin to answer the question “what should be observed” both for the attending physician observing a student and for students going into an interview or reviewing one's history-taking performance with a patient. Beyond the ubiquitous case specificity, the results indicate that certain history-taking behaviors can be linked to good and poor diagnostic reasoning. The set of 12 predictor behaviors and three criterion variables constitutes an initial framework for further exploration across a broader range of cases. The small sample certainly contributed to some low effect sizes and observed power. The results from this exploratory study are not meant to be generalized but rather are to be used as a framework and potential focus of future inquiry. Broader sampling of cases would provide greater external validity. A more comprehensive assessment of history-taking behaviors would expand the list of behaviors and identify meaningful positive and negative behaviors and would increase our ability to understand the relationship between history-taking behaviors and measures of clinical performance.
The early manifestation of characteristic behaviors, within the first three minutes of the encounter, could have important educational implications. For example, attending physicians could focus their attention on specific positive and negative history-taking behaviors early during a direct observation session of a student interviewing a patient. The specific behaviors could then serve as the basis for feedback and discussion with the students. Conversely, students could be made aware of the characteristic behaviors and monitor their own history-taking strategies. Instead of blindly collecting data, they could prepare their interviews by brushing up on the chief complaint before seeing the patient or temporarily interrupt an interview to look up a chief complaint and relevant diagnoses in order to conduct more purposeful inquiries.
1. Sandler G. Costs of unnecessary tests. BMJ. 1979;7:21–4.
2. Schmitt BP, Kushner MS, Wiener SL. The diagnostic usefulness of the history of the patient with dyspnea. J Gen Intern Med. 1986;6:386–93.
3. Peterson MC, Holbrook JH, Hales DV, et al. Contributions of the history, physical examination, and laboratory investigation in making medical diagnoses. West J Med. 1992;156:163–5.
4. Duffy DL, Hamerman D, Cohen MA. Communication skills of house officers—a study in a medical clinic. Ann Intern Med. 1980;93:354–7.
5. Cantwell BM, Ramirez AJ. Doctor—patient communication: a study of junior house officers. Med Educ. 1997;31:17–21.
6. Pfeiffer C, Madray H, Ardolino A, Willms J. The rise and fall of students' skill in obtaining a medical history. Med Educ. 1998;32:283–8.
7. Elstein AS, Shulman LS, Sprafka SA. Medical Problem Solving: An Analysis of Clinical Reasoning. Cambridge, MA: Harvard University Press, 1978.
8. Stillman PL, Regan MB, Haley HL, et al. A comparison of free-response and cued-response diagnosis scores in an evaluation of clinical competence utilizing standardized patients. Acad Med. 1990;65(10 suppl):S27–S28.
9. Stillman PL, Regan MB, Haley HL, et al. The use of a patient note to evaluate clinical skills of first-year residents who are graduates of foreign medical schools. Acad Med. 1992;67(10 suppl):S57–S59.
10. Cohen R, Rothman AI, Ross J. Analysis of the psychometric properties of eight administrations of an objective structured clinical examination used to assess international medical graduates. Acad Med. 1996;71(10 suppl):S22–S24.
11. Hatala RA, Norman GR, Cunningham GP, Brooks LR, Department of Medicine, McMaster University, Hamilton, Ontario, Canada. The effect of ECG instructional materials on medical students' reasoning strategy (personal communication), November 1998.
12. Bordage G, Connell KJ, Chang R, Gecht M, Sinacore JM. Assessing the semantic content of clinical case presentations: studies of reliability and concurrent validity. Acad Med. 1997;72(10 suppl):S37–S39.
13. Chang RW, Bordage G, Connell KJ. The importance of early problem representation during case presentations. Acad Med. 1998;73(10 suppl):S109–S111.
14. Barbee RA, Feldman SE. A three year longitudinal study of the medical interview and its relationship to clinical performance in clinical medicine. J Med Educ. 1970;45:770–6.
15. Maguire GP, Rutter DR. History taking for medical students. I—Deficiencies in performance. Lancet. 1976 Sept 11;2(7985):556–8.
16. Mangione S, Nieman LZ. Cardiac auscultatory skills of internal medicine and family practice trainees: a comparison of diagnostic proficiency. JAMA. 1997;278:717–22.
17. Mangione S, Nieman LZ. Respiratory auscultatory skills of internal medicine and family practice trainees. Presented at the Eighth Ottawa International Conference, Philadelphia, PA, July 1998.
18. Wray NP, Friedland JA. Detection and correction of house staff error in physical diagnosis. JAMA. 1983;249:1035–7.
19. Stillman PL, Swanson D, Regan MB, et al. Assessment of clinical skills of residents utilizing standardized patients. Ann Intern Med. 1991;114:393–401.
20. Connell KJ, Sinacore JM, Schmid F, Chang R, Perlman S. Assessment of clinical competence of medical students by using standardized patients with musculoskeletal problems. Arthritis Rheum. 1993;36:394–400.
21. Beckman HB, Frankel RM. The effect of physician behavior on collection of data. Ann Intern Med. 1984;101:692–6.
22. Bordage G, Grant J, Marsden P. Quantitative assessment of diagnostic ability. Med Educ. 1990;24:413–25.
23. Beaumier A, Bordage G, Connell KJ, Saucier D, Turgeon J. Nature of the clinical difficulties of first-year family medicine residents under direct supervision. Can Med Assoc J. 1992;146:489–97.
24. Sinacore JM, Connell KJ, Olthoff AJ, Friedman MH, Gecht MR. A method for measuring interrater agreement on checklists. Eval Health Prof. 1999;22:221–34.
25. Dudley HA. Clinical method. Lancet. Jan 2;1(7688):35–7.
26. Barrows HS, Bennett K. The diagnostic (problem solving) skill of the neurologist: experimental studies and their implications for neurological training. Arch Neurol. 1972;26:273–7.
27. Gale J, Marsden P. The structure of memorized knowledge in students and clinicians: an explanation for diagnostic expertise. Med Educ. 1987;21:92–8.
28. Gruppen LD, Palchik NS, Wolf FM, et al. Medical student use of history and physical information in diagnostic reasoning. Arthritis Care Res. 1993;6:64–70.
29. Norman GR, Brooks LR, Cunnington JPW, Shali V, Marriott M, Regehr G. Expert—novice differences in the use of history and visual information from patients. Acad Med. 1996;71:(10 suppl):S62–S64.
30. Gale J, Marsden P. The role of the routine clinical history. Med Educ. 1984;18:96–100.
31. Shatzer JH, Darosa D, Colliver JA, Barkmeier L. Station-length requirements for reliable performance-based examination scores. Acad Med. 1993;68:224–9.
Research in Medical Education: Proceedings of the Fortieth Annual Conference. November 4–7, 2001.