Secondary Logo

Journal Logo

Education Strategies

Virtual Humans Versus Standardized Patients: Which Lead Residents to More Correct Diagnoses?

Wendling, Adam L. MD; Halan, Shivashankar MS; Tighe, Patrick MD; Le, Linda MD; Euliano, Tammy MD; Lok, Benjamin PhD

Author Information
doi: 10.1097/ACM.0b013e318208803f



In the report by Wendling and colleagues in the March issue, the Method section of the abstract (third sentence) and the Method section of the report (paragraph 6, second sentence) state an incorrect neck measurement of the virtual human discussed in the report. The correct measurement is a neck circumference of 40 cm.

Academic Medicine. 86(5):648, May 2011.

Medical educators often use standardized patient (SP) encounters to train and assess a novice's ability to gather history, to perform physical examinations, and to synthesize data to form diagnoses. These types of encounters have even been incorporated into various high-stakes exams, such as the Medical Council of Canada's Qualifying Examination Part II (since 1996), the Educational Commission for Foreign Medical Graduates' Clinical Skills Assessment (from 1998 to 2004), and the United States Medical Licensing Exam Step 2 Clinical Skills Assessment (since 2004).1–3 Yet, we believe possible limitations exist with this method of training and assessment. Although deficiencies in performance may be due to inexperience and lack of skill or knowledge on the part of the examinee, they may also be due to flaws inherent in human SPs.

Various studies examining the use of SPs in trainee assessments have found predominantly promising results, suggesting that performance during clinical skills assessments with SPs during high-stakes exams not only has high validity but also predicts performance on similar high-stakes exams. Further, these studies demonstrate that SP ethnicity, nationality, gender, and primary language have at most only a minor influence on the student's performance.4–14 However, we found no study that has reported what influence abnormal physical findings have on the decision-making abilities of examinees during clinical skills assessments.

SPs may not deliver a consistent history, especially over multiple repetitions, and they frequently cannot exhibit abnormal physical findings. They are typically healthy, ambulatory adults who have essentially no consistent physical findings, even though they are tasked with presenting many common diseases that are associated with significant abnormal physical findings. Whereas real patients who are able to present actual disease states can and should participate in medical training, their inconsistent availability limits the uniform educational experiences of students.

Some investigators have addressed these limitations with “augmented” SP encounters. Sun and colleagues15 used modified stethoscopes that could play abnormal auscultatory findings during physical exams of SPs. Sun and McKenzie16 further augmented SP encounters with virtual pathology echocardiograms that used tracking technology to display an abnormal exam when the echo probe was in an appropriate anatomic location over a patient's thorax. Although promising, these techniques can neither discern nor display disease conditions that affect the physical appearance of the patient.

The Virtual Experiences Research Group (VERG; Gainesville, Florida) is exploring human to virtual human interactions with unique applications of virtual reality. VERG developed a highly immersive virtual human (VH) for this purpose. The VH enables the presentation of a consistent history and abnormal physical findings for multiple learners. Some medical educators have already used the VH to assess medical students and teach them communication skills.17 Moreover, VERG merged the VH with a mannequin to create a mixed-reality simulation for breast exams.18 Although the SP supply is limited, the VH offers the exciting possibility of creating an almost limitless repository of diverse and challenging virtual clinical scenarios that are difficult to consistently duplicate with authentic SPs.

The purpose of this study was to determine whether junior anesthesiology residents would more frequently suspect obstructive sleep apnea (OSA) during a preoperative exam of a VH as compared with an SP.


After approval from the University of Florida social and behavioral institutional review board, we undertook this study with two sequential classes of first-year clinical anesthesiology (CA-1) residents at the University of Florida College of Medicine in Gainesville, Florida, early in the academic years of 2008 and 2009.

We created a script for both the VH and the SPs that included all of the characteristic features of OSA (snoring, daytime fatigue, observed apnea, hypertension, and obesity).19–21 We had two experienced anesthesiologists further train the SPs on consistently delivering the scripted history. Three SPs were necessary to process the first group of CA-1 residents (in 2008) through the assessment in a timely fashion. We recruited the SPs from the pool of available SPs at the University Professional Development and Assessment Center. The three SPs were middle-aged adults; one was a Caucasian man, one an African American man, and one a Caucasian woman.

We developed the VH script using Virtual People Factory (VPF; Gainesville, Florida). VPF is a Web application for modeling conversations between real humans and VHs. VPF conversations typically follow the pattern of humans providing conversational stimuli to trigger VH responses. VPF provides separate interfaces for the scenario authors and for the other content experts, allowing the authors to leverage feedback from other content experts to efficiently create and embellish a scenario that responds to a wide variety of stimuli.

We created the scenario by providing a few of the possible human conversational stimuli and VH responses. These represented best guesses as to the most frequently asked questions and statements that real clinicians would provide in an actual conversation. We then provided access to this scenario to a number of other clinicians, that is, senior anesthesia residents and faculty (this link will take interested individuals to the VPF test page of the VH used in this experiment:*) These other clinicians interacted with the VH in a conversational, instant-message-like user interface in their own language and without any prompts. The system attempted to respond by matching each trigger provided by the clinician to responses we had entered. VPF was able to provide us with tools for analyzing the transcripts to examine the frequency of certain topics, for reviewing and correcting errors, and for embellishing the script. We repeatedly invited other clinicians to test the scenario until we considered the level of error in the VH's responses to be extremely low. For example, we initially had the VH respond to the trigger question, “Can you climb a flight of stairs?” with a description of his exercise tolerance. But, when other clinicians phrased the question in a way we had not anticipated (e.g., “How active are you?”), the VH could not initially respond. On review, we recognized that the other clinicians were inquiring about exercise tolerance and added that question to the triggers for the exercise tolerance response. Figure 1 illustrates the process of developing the VH's dialogue.

Figure 1
Figure 1:
Illustration of the process of developing the script for the virtual human (VH), which first-year clinical anesthesiology residents interviewed in a study at University of Florida College of Medicine (2008–2009) to determine whether residents would more frequently suspect obstructive sleep apnea in the VH than they would in standardized patients. After the script authors first develop a script for the VH, content experts repeatedly interact with the script and the authors continually revise the script (using the content experts' feedback) until the final script is virtually error-free.

After a period of three months, this testing eventually resulted in a script of 849 user triggers and 259 VH responses for a question-and-answer dialogue with the VH. The end result is a VH that seems to converse about a given subject in a natural manner.

We developed the appearance of the VH using the latest animation techniques, which would allow us to present a life-size scale VH on an LCD television (Figure 2). The VERG engineers designed the VH to appear morbidly obese, with a neck circumference of 40 inches. Residents could interact with the character through natural conversation. A speech recognition system analyzed the residents' questions and converted them to text before they entered the VPF system. If the resident asked to perform an airway exam, an image suggestive of a small retropharyngeal space with excessive redundant soft tissue and prominent tongue and tonsillar hypertrophy appeared. Further, if the resident indicated that he or she was going to perform a cardiopulmonary exam, the soundtrack of a normal cardiopulmonary exam played.

Figure 2
Figure 2:
The physical appearance of the virtual human (VH) with obstructive sleep apnea (OSA) that first-year clinical anesthesiology residents interviewed at University of Florida College of Medicine (2008–2009) in a study to determine whether residents would more frequently suspect OSA in the VH than they would in standardized patients (SPs). This VH—unlike typically healthy adult SPs—can demonstrate the physical findings of OSA; he is morbidly obese and has a neck circumference of 40 inches.

This VH interaction system consists of commercial, “off-the-shelf” hardware and open-source, in-house-developed software. The total cost is approximately $3,000. The virtual system includes a networked personal computer, a large-screen LCD monitor, and a microphone (Figure 3).

Figure 3
Figure 3:
Illustration of the components of the virtual human system that the University of Florida College of Medicine used in a study (2008–2009) to determine whether first-year clinical anesthesiology residents would more frequently suspect obstructive sleep apnea in the virtual human than they would in standardized patients. The components include a networked personal computer, a large-screen LCD monitor, and a microphone.

The study population consisted of two sequential classes of CA-1 residents who were undertaking their basic training and attending a lecture series early in the academic years of, respectively, 2008 and 2009. The anesthesiology residency program is fully accredited by the Accreditation Council for Graduate Medical Education. All the participants had completed one year of postgraduate medical training, and they were beginning their first of three years of training in clinical anesthesiology. Each of the two classes received the same basic lecture and reading material on preoperative assessment before participating in this study. The first group of CA-1 residents interviewed one of three SPs. The interview was videotaped and reviewed by anesthesiology faculty. The second CA-1 group interviewed the VH. The VH records the clinician triggers and its own responses for subsequent review (again by anesthesiology faculty).

We considered any resident question or statement pertaining to OSA an indication that the resident suspected OSA, and we counted any such query or comment as a positive response. All residents also completed a standard preoperative history and physical assessment form for subsequent review by faculty who used the form to give the residents formative feedback.

We collected demographic data, including age, gender, type of internship previously completed, and prior pulmonology rotations, on all residents.

We completed statistical analysis using JMP 8.0.2 (SAS, Cary, North Carolina) and SAS 9.2 (SAS, Cary, North Carolina). To address the primary question of whether residents detected OSA more frequently in an SP than they did in a VH, we first employed a chi-square analysis. We performed this analysis as a univariate analysis and did not control for other factors within this study. To control for the impact of resident age, gender, type of internship, and prior pulmonary rotation on the role of the VH versus an SP in the diagnosis of OSA in our sample, we employed binominal logistic regression using main effects modeling. We set statistical significance at .05.


Five of 21 residents (23.8%) suspected OSA during the SP interviews, and 11 of 13 residents (84.6%) suspected OSA during the VH interview. Table 1 describes the distribution of resident factors (i.e., age, gender, internship, and pulmonology rotation status). No residents who interviewed the VH had completed an internal medicine internship, but other than that one exception, both groups of residents represented all types of internships well. A relative paucity of residents completed a pulmonology rotation during their internship. Interviews with the VH were more frequently associated with suspected OSA compared with SP interviews (odds ratio: 17.6 [2.9–107, 95% CI], P < .0006 based on univariate analysis).

Table 1
Table 1:
Demographics of Residents in 2008–2009 University of Florida College of Medicine Study to Determine Whether Residents Would More Frequently Diagnose Obstructive Sleep Apnea in Human Standardized Patients (SPs) or in a Virtual Human (VH) Patient

In the nominal logistic regression model, only the main effect of interviewing the VH compared with interviewing an SP was a statistically significant factor in the diagnosis of OSA (P < .005; Table 2); resident factors, such as age, gender, prior pulmonology rotation (only three residents had previously completed a pulmonology rotation), and type of internship were not statistically significant influences in this model. After analyzing the results of the residents who interviewed the three different SPs, we detected no differences in the frequency of suspected OSA among the three SPs (P = .08, P < .53).

Table 2
Table 2:
Logistic Regression Model for Diagnosis of Obstructive Sleep Apnea Among Residents Interviewing Either a Human Standardized Patient or a Virtual Human Patient, 2008–2009

We did not complete a power analysis because we detected a significant difference in the population studied.


According to some estimates, OSA is more prevalent in the surgical population than in the general population.19–23 Further, perioperative complications are more likely to occur in patients with OSA.24–26 Whenever possible, anesthesiologists must first detect OSA preoperatively in order to craft a perioperative care plan, with the hope of reducing the likelihood of complications in these patients.27 To that end, anesthesiologists-in-training need to learn to recognize OSA in patients.

Ideally, medical trainees should always have the opportunity to perform clinical skills examinations in patients who demonstrate the disease states that clinical faculty wish them to observe. Yet, access to SPs with the diseases and abnormal findings to which students must be exposed is, at best, extremely difficult. Even though OSA is a common disorder, at our institution we do not have access to SPs with diagnosed OSA. Thus, the VH works to our advantage. We can assign the VH any disease—and design associated physical findings—and then the VH can reliably portray the disease across multiple repetitions. In this case, we designed the VH to appear morbidly obese, with an increased neck circumference and an abnormal airway anatomy, because these signs are very strongly associated with OSA28–30 and cannot be consistently displayed by SPs. Certainly, strengths of the VH are not only its consistent depiction of physical abnormalities but also its consistent delivery of medical history. The SPs did not present the exact same history to all the residents (e.g., one of the three SPs told the resident participants, “I do not sleep well,” whereas another SP said, “I snore,” and the third SP stated, “I have insomnia”). Although our analysis of the data revealed no differences in the frequency of suspected OSA among the three different SPs (which suggests that the differences in the way the SPs delivered the history and their different appearances did not affect the residents' performances), the SPs available at our institution are healthy adult patients who cannot demonstrate the physical indications of OSA. Unlike the residents who conducted the SP interviews, all 13 residents who interviewed the VH interviewed the exact same “patient,” heard the same responses, and encountered the same physical OSA attributes. Correspondingly, these residents suspected OSA much more often. Our data imply that, without evident physical signs of OSA, junior anesthesiology residents are much less likely even to assess for (much less diagnose) OSA in an SP despite the SP's scripted history of OSA symptoms. We believe the residents did not suspect OSA when interviewing the SP because the SP did not consistently present the history and could not depict the characteristic abnormal physical findings associated with OSA.

We acknowledge that our study had limitations. Although the study population consisted of two consecutive classes of junior anesthesiology residents at a similar point in their training in the same residency program, the medical knowledge and experience of the two groups may not have been the same. Still, the logistic regression analysis suggests that none of the available measures of resident experience or knowledge, as assessed by participant age, gender, internship type, and prior pulmonology rotations, influenced their performance. The pool of available junior anesthesia residents was relatively small, which prevented us from conducting any interviews with negative controls (i.e., SPs or a VH without a history or without physical findings suggestive of OSA). The VH is a two-dimensional representation of a human and, as such, can display only visual and auditory physical findings at present; however, like in other SP encounters, the VH can be paired with mannequin simulators for discerning tactile abnormalities.18


Our results demonstrate that the VH, performing at least as well as SPs, provides a unique opportunity for training and assessment within the realm of SP encounters. Significantly more residents suspect OSA when the physical attributes of the syndrome are present and consistent history is delivered with the VH. This difference is clinically relevant. The availability of SPs who have particular diseases and can demonstrate or display the associated physical findings is extremely limited, yet the VH offers the ability to consistently portray, across countless repetitions, the physical abnormalities of an almost limitless repository of diseases.




This work was funded solely by departmental and institutional sources.

Other disclosures:


Ethical approval:

The University of Florida social and behavioral institutional review board approved this study.

Previous presentations:

Some of the findings in this report have been presented at the Southern Group on Educational Affairs meeting in Oklahoma City, Oklahoma, April 15, 2010.


1Medical Council of Canada. Qualifying Examination Part II. Accessed November 4, 2010.
2Educational Commission for Foreign Medical Graduates. ECFMG Clinical Skills Assessment (CSA)—General Information.∼dmd42/csa/index.html. Accessed November 4, 2010.
3United States Medical Licensing Exam. Step 2 Clinical Skills. Accessed November 4, 2010.
4Sutnick AI, Stillman PL, Norcini JJ, et al. ECFMG assessment of clinical competence of graduates of foreign medical schools. Educational Commission for Foreign Medical Graduates. JAMA. 1993;270:1041–1045.
5Ben-David MF, Klass DJ, Boulet J, et al. The performance of foreign medical graduates on the National Board of Medical Examiners (NBME) standardized patient examination prototype: A collaborative study of the NBME and the Educational Commission for Foreign Medical Graduates (ECFMG). Med Educ. 1999;33:439–446.
6Van Zanten M, Boulet JR, McKinley D. Using standardized patients to assess the interpersonal skills of physicians: Six years' experience with a high-stakes certification examination. Health Commun. 2007;22:195–205.
7Van Zanten M, Boulet JR, Mckinley DW. Correlates of performance of the ECFMG Clinical Skills Assessment: Influences of candidate characteristics on performance. Acad Med. 2003;78(10 suppl):S72–S74. Accessed November 4, 2010.
8Colliver JA, Swartz MH, Robbs RS. The effect of examinee and patient ethnicity in clinical-skills assessment with standardized patients. Adv Health Sci Educ Theory Pract. 2001;6:5–13.
9Van Zanten M, Boulet JR, McKinley DW. The influence of ethnicity on patient satisfaction in a standardized patient assessment. Acad Med. 2004;79(10 suppl):S15–S17. Accessed November 4, 2010.
10De Champlain AF, Schoeneberger J, Boulet JR. Assessing the impact of examinee and standardized patient ethnicity on test scores in a large-scale clinical skills examination: Gathering evidence for the consequential aspect of validity. Acad Med. 2004;79(10 suppl):S12–S14. Accessed November 4, 2010.
11Fernandez A, Wang F, Braveman M, Finkas LK, Hauer KE. Impact of student ethnicity and primary childhood language on communication skill assessment in a clinical performance examination. J Gen Intern Med. 2007;22:1155–1160.
12Colliver JA, Vu NV, Marcy ML, Travis TA, Robbs RS. Effects of examinee gender, standardized-patient gender, and their interaction on standardized patients' ratings of examinees' interpersonal and communication skills. Acad Med. 1993;68:153–157. Accessed November 4, 2010.
13Borkhoff CM, Hawker GA, Kreder HJ, Glazier RH, Mahomed NN, Wright JG. Patients' gender affected physicians' clinical decisions when presented with standardized patients but not for matching paper patients. J Clin Epidemiol. 2009;62:527–541.
14Lewis R, Lamdan RM, Wald D, Curtis M. Gender bias in the diagnosis of a geriatric standardized patient: A potential confounding variable. Acad Psychiatry. 2006;30:392–396.
15Sun B, McKenzie FD, Garcia HM, Hubbard TW, Ullian JA, Gliva GA. Medical student evaluation using augmented standardized patients: New developments and results. Stud Health Technol Inform. 2007;125:454–456.
16Sun B, McKenzie FD. Medical student evaluation using virtual patient echocardiogram (VPE) for augmented standardized patients. Stud Health Technol Inform. 2008;132:508–510.
17Stevens A, Hernandez J, Johnsen K, et al. The use of virtual patients to teach medical students history taking and communication skills. Am J Surg. 2006;191:806–811.
18Deladisma AM, Gupta M, Kotranza A, et al. A pilot study to integrate an immersive virtual patient with a breast complaint and breast examination simulator into a surgery clerkship. Am J Surg. 2009;197:102–106.
19Frey WC, Pilcher J. Obstructive sleep-related breathing disorders in patients evaluated for bariatric surgery. Obes Surg. 2003;13:676–683.
20O'Keeffe T, Patterson EJ. Evidence supporting routine polysomnography before bariatric surgery. Obes Surg. 2004;14:23–26.
21Fidan H, Fidan F, Unlu M, Ela Y, Ibis A, Tetik L. Prevalence of sleep apnoea in patients undergoing operation. Sleep Breath. 2006;10:161–165.
22Chung F, Ward B, Ho J, Yuan H, Kayumov L, Shapiro C. Preoperative identification of sleep apnea risk in elective surgical patients, using the Berlin questionnaire. J Clin Anesth. 2007;19:130–134.
23Chung F, Yegneswaran B, Liao P, et al. Validation of the Berlin questionnaire and American Society of Anesthesiologists checklist as screening tools for obstructive sleep apnea in surgical patients. Anesthesiology. 2008;108:822–830.
24Liao P, Yegneswaran B, Vairavanathan S, Ziiberman P, Chung F. Postoperative complications in patients with obstructive sleep apnea: A retrospective matched cohort study. Can J Anaesth. 2009;56:819–828.
25Gupta RM, Parvizi J, Hanssen AD, Gay PC. Postoperative complications in patients with obstructive sleep apnea syndrome undergoing hip or knee replacement: A case-control study. Mayo Clin Proc. 2001;76:897–905.
26Siyam MA, Benhamou D. Difficult endotracheal intubation in paitents with sleep apnea syndrome. Anesth Analg. 2002;95:1098–1102.
27Gross JB, Bachenberg KL, Benumof JL, et al. Practice guidelines for the perioperative management of patients with obstructive sleep apnea: A report by the American Society of Anesthesiologists Task Force on Perioperative Management of patients with obstructive sleep apnea. Anesthesiology. 2006;104:1081–1093.
28Sleep-related breathing disorders in adults: Recommendations for syndrome definition and measurement techniques in clinical research. The Report of an American Academy of Sleep Medicine Task Force. Sleep. 1999;22:667–689.
29Young T, Palta M, Dempsey J, Skatrud J, Weber S, Badr S. The occurrence of sleep-disordered breathing among middle-aged adults. N Engl J Med. 1993;328:1230–1235.
30Young T, Skatrud J, Peppard PE. Risk factors for obstructive sleep apnea in adults. JAMA. 2004;291:2013–2016.

* Click “agree” and “continue” on the first page and “continue” again on the second.
Cited Here

© 2011 Association of American Medical Colleges