Yudkowsky, Rachel MD, MHPE; Downing, Steven M. PhD; Sandlow, Leslie J. MD
The Accreditation Council for Graduate Medical Education (ACGME)1 mandates that residency programs begin to develop valid and reliable assessments of six core competencies: medical knowledge, patient care, interpersonal and communication skills, professionalism, systems-based practice, and practice-based learning and improvement. The ACGME Website provides a toolbox of assessment methods and examples of their application, but specialty-specific application of the methods is left to the creative ingenuity of individual programs.
In this article we describe the development and validation of an institution-wide, cross-specialty assessment of communication and interpersonal skills. Program directors often lack the time, educational expertise, and funds to develop new assessment programs.2 The challenge is particularly acute because the ACGME recommends nontraditional assessment methods, such as standardized patients (SPs), for several of the competencies. To assist programs in meeting the ACGME requirements, the University of Illinois at Chicago College of Medicine (UIC-COM) embarked on an initiative to develop institution-based competency assessments for residency programs. In this collaborative effort, the department of medical education provided the educational expertise, program faculty provided specialty-specific content expertise, the UIC-COM's Clinical Performance Center (CPC) provided simulation resources and a testing facility, and sponsoring hospitals provided funding.
As suggested by the ACGME, we measure communication and interpersonal skills (CIS) through clinical simulations in which trainees interacted with multiple SPs under realistic conditions. This type of assessment is referred to as an objective structured clinical examination (OSCE)3; hence, we called our assessment the CIS-OSCE. The use of SPs to assess residents' communication skills has been described previously4–6; our innovation is in leveraging the expertise of an institutional center to develop assessments for multiple residency programs, adapting cases across programs to conserve department and residency program resources.
The CIS-OSCE was first developed for internal medicine (IM) and family medicine (FM) residents, then modified for surgery, pediatrics, neurology, and obstetrics–gynecology. In this article we report on the first administration of the CIS-OSCE to residents in the IM and FM programs at UIC in 2003. We also report comparative data from the surgery, pediatrics, neurology, and obstetrics–gynecology assessments, and we discuss the lessons learned to date.
Developing the CIS-OSCE
There is an extensive evidence base for the effective assessment of communication and interpersonal skills using SPs—lay persons who are trained to simulate a patient in a consistent, reliable manner.7,8 Studies focus on a variety of communication behaviors; we selected patient-centered communication for our conceptual framework, because this approach is effective in promoting the doctor–patient alliance, improving compliance, and increasing patient satisfaction.9 Because communication skills are case specific,10 residents were assessed on their ability to maintain a patient-centered approach across several different communication tasks. We selected existing cases from our own SP case library and from public-domain casebooks.11,12 We chose tasks for their salience to clinical practice, and tasks were designed to allow residents to demonstrate their skills across a range of patient ages, genders, and problems. Residency program faculty reviewed and modified the cases to ensure appropriateness for their second- and third-year residents.
A variety of different instruments are used to assess communication and interpersonal skills. The American Board of Internal Medicine (ABIM) Patient Satisfaction Questionnaire13 (PSQ) measures patient satisfaction with patient-centered physician behaviors. We used an expanded 17-item version of the PSQ and added one global item asking SPs whether they would choose this resident as their personal physician. The result was an 18-item instrument to assess patient-centered communication, with all items scored on a 5-point Likert scale ranging from “strongly disagree” to “strongly agree” (Appendix).
We also generated a few essential content-specific items for each scenario. For example, we added “resident elicited the history of abuse” to the domestic violence case assessment, and “resident reviewed risks of the procedure” to the informed consent case assessment. These dichotomously scored items were not included in residents' CIS scores, because they did not involve generalizable patient-centered behaviors. At the completion of the exam, residents completed a survey indicating how much previous experience they had had with each task (never, 1–3, 4–6, 7–10, or more than 10 times). Group-level scores were reported to the program director for curriculum evaluation purposes.
To generate cases for additional specialty programs, faculty from each specialty modified the IM/FM presenting scenario and task content to ensure that these were relevant to their own residents' clinical experience. CIS items and task-specific items on the SP instruments stayed the same across specialties.
Financing the exam
The cost of developing and administering the CIS-OSCE was about $250 per resident. This cost covered CPC staff time to recruit and train SPs and to set up and administer the exam, SP time for training and for the assessment itself, and the cost of generating the report. Faculty time to develop the cases was donated in kind. Funding was provided by the affiliated teaching hospitals via the GME office. There was no direct charge to the residency programs.
Once the initial cases had been developed, the assessment was adapted for new programs at a cost saving of about $1,000 per program. The decreased cost was a result of significant reductions in faculty time needed to modify cases (rather than developing cases from scratch), in staff time to set up the exam, and in SP-training time.
Conducting the exam
Each of the six SP stations consisted of a 10-minute encounter with the SP followed by a five-minute postencounter interval, during which the SP completed the CIS rating scale and case-specific item checklist. After the postencounter interval, the resident returned to the SP for five minutes of verbal feedback. This feedback focused only on the communication and interpersonal elements of the encounter, identifying effective and ineffective resident behaviors and the SPs' subjective reactions to them; no feedback was given regarding case-specific items or the clinical content of the case. At the conclusion of the six encounters, residents completed the survey, including items regarding demographic information and previous task experience.
The assessment was conducted in the UIC-COM CPC, an established SP facility. Six professional actors were trained to portray the patients, complete the rating scale, and provide feedback to the residents according to standard CPC protocols. All SPs were experienced veterans of several CPC-assessment programs. All encounters were videotaped, and each resident's six encounters were recorded on a single tape. At the 2003 administration of the CIS-OSCE, SP data entry was done on sheets that could later be scanned, and the resident chart note was done on paper. Since 2004, however, SP data entry has been done by computer, using WebSP,14 a data-management system for performance assessments.
The first two sessions of the exam (each with six residents) served to pilot the assessment and the cases. We conducted separate focus groups with all residents and SPs after each session, and we used their feedback to improve the cases and conduct of the exam.
Resident and program director reports were generated by the UIC-COM testing center in 2003 and by WebSP software since 2004. Residents received reports that included their case-level scores for the CIS scale and global item and their overall CIS score across cases. Program directors received reports that included the individual resident reports, the group-level CIS scale, global and case-specific checklist scores for each case and across cases, and a group-level item analysis showing the response distribution for each case-specific item and survey item. We also provided the program director with the residents' written chart notes for optional scoring. Program directors could also opt to obtain the residents' videotapes for a number of uses, including review by the resident with or without a faculty preceptor for additional feedback, especially for residents who needed some remediation. They could also be used as part of the resident's portfolio documenting their competency.
We conducted informal interviews with each of the program directors after the completion of the CIS-OSCE to obtain their feedback and to determine how they used the results of the assessment.
The 2003 administration of the CIS-OSCE to IM and FM residents served as the data source for all psychometric analyses unless otherwise specified. We did all analyses using SPSS version 11.5 (SPSS Inc., Chicago, IL).
Following recommendations by Downing15 based on the Standards for Educational and Psychological Testing,16 we looked for validity evidence in the content, internal structure, and response process of the exam, as well as from the relationship of CIS-OSCE scores to other variables.
We obtained validity evidence for the content of the CIS-OSCE by matching the communication tasks we assessed to the ACGME competency descriptions.1 We explored the internal structure of the CIS scale by factor analysis. Factors were extracted using principal component analysis with Varimax rotation and Kaiser normalization based on mean item rating across cases. Internal consistency reliability of the CIS scale was estimated with coefficient alpha. We estimated reliability of the composite exam score using generalizability analysis. We did not formally assess the response process in this study; however, high interrater reliability of SPs after CPC protocol has been previously established.17 We explored the relationship of CIS scores to other variables by examining effects of gender and level of training on CIS scores.
After the first two groups of residents completed the pilot exam, we explored the feasibility and acceptability of the assessment in focus groups with the SPs and residents. We obtained additional triangulation on acceptability and impact of the assessment from interviews with the program directors.
The University of Illinois IRB approved the study.
Seventy-nine IM and FM residents participated in the initial administration of the CIS-OSCE in 2003. Of these, 41 (52%) were second-year residents, and 38 (48%) were third-year residents. Of the 70 records with identified gender, 44 residents (56%) were male.
List 1 shows the ACGME competencies assessed by the CIS-OSCE. Although we had set out to assess only communication and interpersonal skills, components of patient care and professionalism were also embedded in the challenges presented to the residents.
Table 1 shows the results of the rotated factor analysis of the 18 CIS scale items, specifying two factors, with factor loadings below 0.5 suppressed. Factor one, accounting for 34% of the variance, is interpretable as a “communication” factor, and factor two (30% of the variance) seems to represent an “interpersonal skills” factor. In fact, all four factors with eigenvalues above 1.0 were easily interpretable, including warmth (accounting for 28% of the variance), shared decision making (21%), encouraging questions (16%), and reciprocal communication (not using jargon, letting patient tell their story; 10%), for a total of 76% of variance explained.
Table 2 presents global and scale scores for the 79 IM and FM residents, as well as case-level and exam-level reliability indices. The standard error of measurement (SEM) was 0.1 for the treatment-refusal case and 0.2 for all other cases. Internal consistency reliability was high for all cases, with median coefficient alpha of 0.91 (SD 0.02). Generalizability was 0.66 based on case scores and 0.72 based on global scores. Global scores were highly correlated with case scores, with correlations ranging from 0.82 to 0.91 (median correlation 0.90, SD 0.04).
Resident gender and level of training
Female residents had higher scores than male residents. Mean global scores were 3.2 (SD 0.68) for men and 3.5 (0.65) for women (two-tailed t = −2.21, df = 68, P = 0.03, effect size d = 0.43). Mean CIS-scale scores were 3.9 (3.1) for men and 4.1 (0.30) for women, (two-tailed t = −2.67, df = 68, P = 0.01, effect size d = 0.63). However, 9 of 79 residents did not provide gender data on the survey. After a worst-case sensitivity analysis, the difference between male and female residents was no longer significant. The mean scores of second- and third-year residents were identical.
Sixty-six of 68 IM residents (97%) discussed risk factors in the informed consent (HIV) case, but only 55 (81%) thoroughly explained the pros and cons of testing. In the elder-abuse case, only 50 of 63 residents (79%) elicited the history of elder abuse; when asked by the patient, 9 residents (14%) said they would confront the patient's son about the problem despite her request not to do so. One resident agreed to the patient's request not to make note of the abuse on her clinic chart.
Acceptability and feasibility
In focus groups after the first two pilot administrations of the CIS-OSCE, all residents (n = 6 for each of the two groups) felt that the SP portrayals were realistic, but some felt that there had not been enough time to play some of the scenarios through to conclusion. Residents particularly valued encounters in which they had little prior experience, such as elder abuse. All residents appreciated the verbal feedback from the SPs and felt that the assessment experience was helpful overall. In a similar debriefing of the six SPs, all the SPs felt that the encounters were realistic. Although several of the scenarios could not be played out to a conclusion, SPs felt adequately able to assess the residents' interpersonal and communication skills based on the 10-minute encounter. In the survey after the CIS-OSCE, 36 of 45 IM residents (78%) agreed or strongly agreed that the cases had allowed them to demonstrate their interpersonal and communication skills.
Experience across programs
Table 3 shows the generalizability of the CIS-OSCE for the five residency programs assessed to date: IM/FM, general surgery, pediatrics, obstetrics–gynecology, and neurology. Generalizability ranged from 0.57 to 0.82, with a median of 0.72 (SD 0.12). Results of the surgery CIS-OSCE have been previously published.18 Detailed results of the CIS-OSCE assessments of other specialties are pending.
Interviews with the program director “consumers” of the CIS-OSCE indicated that the reports generated by the CIS-OSCE were acceptable and useful for both resident assessment and program evaluation. Reports were used in diverse ways (List 2). Most program directors used the individual CIS-OSCE report as a focus of discussion in their semiannual resident feedback/evaluation meetings. In some programs, residents and preceptors reviewed videotapes of the encounters to facilitate individual remediation, whereas group debriefing of residents afforded remedial instruction of frequently missed content-specific and CIS items. Program directors reported that group-level reports helped them identify curricular gaps, such as lack of experience with elder abuse, sometimes resulting in educational interventions, the outcomes of which could be assessed in the following year's exam. All programs stated that they had used the CIS-OSCE as only one component of their formative assessment process.
This article describes the development and validation of an SP-based assessment of resident communication and interpersonal skills. The use of SPs to assess CIS at the resident level is not novel; our innovation is in leveraging the expertise of an institutional clinical performance center to develop assessments for multiple residency programs while conserving department and residency program resources. Accordingly, we have presented information about the CIS-OSCE at one particular program, as well as some comparative data across programs. On the basis of these data, we conclude that the institutional approach is both effective and efficient.
As for any assessment, validity questions are paramount. As recommended,16 we have described validity evidence based on the CIS-OSCE's content, internal structure, response process, relationship to other variables, and consequences of the exam. Thus, the content of the CIS-OSCE included a variety of common and clinically important communication challenges and a spread of patient ages, gender, and cultural backgrounds. Cases included in this assessment reflected components of three of the six ACGME competencies. The factor analysis of our ABIM-derived scale confirmed that the expanded item set represented both communication and interpersonal skills of examinees. We ensured the response process, including data entry, test security, and rater reliability, by following standard CPC protocols for patient training and quality assurance. These sources of evidence relate to all versions of the CIS-OSCE across residency programs.
The internal structure of the exam is reflected by moderate to high internal consistency at the case level (coefficient alpha). The variation in coefficient alpha across cases reflects the variable salience of different items within different case scenarios. Generalizability (a measure of the reproducibility of exam scores under similar conditions) was sufficient for a local formative exam. Interestingly, the generalizability of the exam varied across residency programs. This variation may be attributable to changes in the cases across programs, resulting in nonequivalent forms. Alternatively, a wider range of performance among the residents in a specific program would result in a higher generalizability, even though the exam is roughly comparable across programs. Thus, the low generalizability of the pediatrics exam compared with the IM exam may be attributable to the relatively extensive modifications made in the original IM cases to adapt them for pediatric residents. After these changes, perhaps the pediatrics cases were no longer comparable with the IM cases. Alternatively, the low generalizability may simply have reflected a more consistent (less variable) level of performance among the pediatric residents. The high generalizability of the surgery and neurology resident exam, despite the low number of residents, is probably attributable to the wide range of performance observed among those residents. The implication is that the generalizability (reliability) of the assessment in a given residency program will not necessarily be predicted by the results of the generalizability analysis for a different program. Whenever possible, reliability or generalizability should be established for each context or group of examinees individually, as we have done here.
As for any performance assessment, the ideal way to increase the generalizability of the exam would be to increase the number of cases assessed. For this low-stakes formative assessment, we chose to dedicate time to feedback at the cost of limiting the number of encounters that could be accomplished in the time available. That decision could be reversed for a summative exam used for pass/fail decisions.
The relationship of the residents' performance to other variables was assessed by comparing scores across gender and level of training. Previous studies have consistently shown that women tend to achieve higher scores on SP-based assessments of communication and interpersonal skills,19 and that little or no change occurs across levels of training.20 Our findings of higher scores for female residents and no differences between second- and third-year residents are consistent with these studies, providing some reassurance that the CIS-OSCE is sensitive to real differences and is not spuriously sensitive to differences that do not exist. The relationship between prior experience and task comfort and performance on the CIS-OSCE has been reported elsewhere.21 Additional studies are in progress to relate CIS-OSCE scores to other measures of resident communication, such as global ratings of communication in the context of a mini-CEX.
Finally, we estimated the predictive validity of the exam score by correlating the case-level score with the score on the global item. For a resident physician soon to be in practice, the question of whether a patient would choose him or her as a physician is consequential indeed. The high correlation of the scale score with the global score, on both the case level and exam level, confirms findings that patient-centered communication is central to patient satisfaction with a physician. However, because the scale and global ratings were provided together, the global rating may have been contaminated by the scale ratings. Better predictive validity studies would correlate CIS-OSCE scores with the same global item asked of clinic patients the following year. Formal consequential validity studies await standard setting for the exam.
Van der Vleuten22 suggests that the acceptability, feasibility, and educational impact of an assessment are as important to evaluate as its reliability and validity. The weight of each of these characteristics depends on the specific context of the exam: “if any of these equals zero, the utility of the assessment is zero.” The CIS-OSCE was acceptable to residents, standardized patients, and program directors—three important stakeholder groups. Anecdotal evidence indicated that the assessment was also well regarded by the residency review committee as a component of a program's resident competency assessment program.
The CIS-OSCE has considerable potential for educational impact23 as an integral part of a residency program curriculum. Our residency programs are using the reports as a new focus for individual-, group-, and program-level discussion, diagnosis, and remediation. Residents' comments about the particular usefulness of scenarios with which they were not familiar (such as elder abuse) highlight the potential uses of the CIS-OSCE as an instructional intervention. We are planning additional follow-up studies to investigate the long-term impact on resident learning and curriculum.
The feasibility aspect of the CIS-OSCE is its particular strength. Many medical schools sponsoring multiple residency programs have an inhouse SP facility. For these institutions, the CIS-OSCE demonstrates the feasibility and efficiency of developing and implementing an SP-based core competency assessment program on an institutional level. The original cases we used were almost all existing medical student SP cases. Adapting these cases to the residency level required minimal effort, as did adapting the cases across specialty programs. Because of the similarity of cases across specialties, once the SPs were trained to the original IM version of the case, little additional training was required when switching to surgery, obstetrics–gynecology, or neurology versions. There was even considerable crossover in many of the pediatric cases, which had been more substantively modified. The rating scales, report formats, and logistical arrangements were identical across versions, providing even more savings of time and effort.
Next steps and lessons learned
Program directors tell us that case-specific items are particularly useful for program evaluation purposes. Accordingly, over the years we have added case-specific items based on “best practices” for specific scenarios. For example, the informed-consent scenario checklist now includes items on discussing risks, benefits, alternatives, and other information that must be disclosed in any informed-consent situation. We have also added a “structure of communication” scale to assess communication behaviors that are not specifically patient centered, such as “moved from open-ended to closed questions” and “uses segment summaries to check accuracy of understanding.” Like the CIS scale, both the case-specific and structural communication items are constant across specialties.
Because UIC provides an institutional online cross-program core-competency curriculum in areas such as communication skills, confidentiality, and informed consent, the CIS-OSCE also provides a way to evaluate the effectiveness of components of this curriculum. Indeed, one of the things we have learned in the course of developing and administering this assessment is that one cannot disentangle CIS from other core competencies. Resident CIS must be assessed in the context of an encounter with another person in a health care–related situation. Inevitably, this situation will also contain aspects of professionalism, knowledge, and patient care. Rather than attempting to assess each competency in isolation, efforts should focus on developing a system that provides opportunities to assess competencies in various combinations and in multiple settings. The CIS-OSCE can be a valuable component of such a system.
We are currently developing a second set of six cases for the CIS-OSCE, including new challenges, such as obtaining advance directives, notification of patient death, and discussing a medical mistake. These “Form B” cases will be administered on alternate years, providing the opportunity to evaluate remediation at the individual resident level as well as group-level curricular interventions.
The CIS-OSCE has been piloted in only one institution, with a limited number of residency programs and residents. Interrater reliability was not established. Several residents did not provide gender data on the survey, and a worst-case sensitivity analysis did not result in a significant difference between genders. Therefore, the gender-difference finding will need to be replicated in future studies with full gender data. The acceptable level of generalizability found when assessing our residents may not be replicated in programs with more homogenous levels of performance. Some specialties may require more substantive modification of the scenarios, with unpredictable effects on the psychometric properties of the assessment. The psychometric properties reported for the IM/FM exam are not necessarily transferable across residency programs. Reliability indices were acceptable for a local, formative exam, but would not be acceptable for a high-stakes summative exam. We have not set pass/fail standards for the exam, and we do not recommend that it be used summatively at this stage. However, as we accumulate more data about resident performance both within and across programs, standard-setting exercises with both faculty and SPs will provide guidelines to residency programs wishing to use the CIS-OSCE as a component of their resident-promotion decision process.
The institutional cost savings achieved by assessing CIS across specialties might not be realized in assessing skills more dependent on specialty-specific content knowledge (e.g., diagnostic reasoning or disease management). On the other hand, cross-specialty competencies such as professionalism, practice-based learning, and systems-based practice are good candidates for a centrally coordinated assessment program such as ours.
The ACGME lists SPs and OSCEs as the preferred methods for demonstrating resident competence in interpersonal and communication skills. Cases developed to elicit and assess communication and interpersonal skills in one program can be modified across specialties with minimal additional effort, conserving faculty time and development costs. Institutional-level competency assessments use resources efficiently to relieve individual programs of the need to “reinvent the wheel,” and provide program directors and residents with useful information for individual and programmatic review.