The Accreditation Council for Graduate Medical Education (ACGME) endorsed six general competencies for residents at its February 1999 meeting. These competencies were identified as part of the ACGME Outcomes Project,1 a step in the direction of evaluating resident education based upon outcomes of the residents’ training. New program requirements, which became effective July 1, 2001, were then issued by the ACGME’s Residency Review Committee in Obstetrics and Gynecology.2 All residency programs must demonstrate that residents are developing competencies in patient care, medical knowledge, practice-based learning and improvement, interpersonal and communication skills, professionalism, and systems-based practice. Residency programs must develop curricula and provide educational experiences to develop these competencies. Programs must have valid and reliable tools and evaluation instruments to assess the six competencies.
There are a number of methods and tools to assess each of these required competencies. One single evaluation tool or instrument may not be equally valid or reliable to the same degree across all of the competencies. The ACGME has developed the Toolbox of Assessment Methods,3 which suggests methods for evaluating the six required competencies: record review, chart stimulated recall, checklists, global rating, standardized patient (SP), Objective Structured Clinical Examination (OSCE), simulations and models, 360-degree global rating, portfolios, multiple-choice question examinations, oral examinations, procedure and case logs, and patient surveys.
The 360-degree global rating evaluation consists of measurement tools completed by multiple categories of people who have had the opportunity to interact with a resident and observe the resident performing a skill. For evaluating interpersonal and communication skills, the Toolbox suggests using the 360-degree evaluation in addition to surveying patients, and conducting OSCEs with SPs. To demonstrate interpersonal and communication skills, residents must be able to effectively exchange information and team with patients, patients’ families, and professional associates. Residents are expected to create and sustain a therapeutic and ethically sound relationship with patients; use effective listening skills and elicit and provide information; use effective nonverbal, questioning, and writing skills; and work effectively with others as a member or leader of a health care team or other professional group.
Developing good interpersonal and communication skills is essential to the effective day-to-day functioning of all human beings, but is of critical value for a physician. Only through good communication and interpersonal skills can a physician effectively demonstrate the acquisition and appropriate use of the other competencies, such as patient care, medical knowledge, systems-based practice, and professionalism.
To our knowledge, there have been no published reports of the use of the 360-degree evaluation instrument in graduate medical education in obstetrics and gynecology. In our study, we sought to test the reliability of the 360-degree evaluation instrument in assessing residents’ interpersonal and communication skills
The obstetrics and gynecology residency program at Monmouth Medical Center in Long Branch, New Jersey, has a complement of eight residents (two residents each in years 1–4). To obtain the 360-degree evaluation data, we distributed questionnaires to the residents and the evaluators in March and April of 2002. We selected this period because all eight residents were on active clinical duty. Additionally, the evaluators had observed the first-year residents for almost 1 year and the other six residents for the duration of their residency at our medical center. The nursing and faculty staff had been with us for several years; therefore each evaluator had the opportunity to interact with each resident over approximately 1 year (for first-year residents) and longer for the more senior residents. Each evaluator returned the completed questionnaire within two weeks.
The evaluators were 25 nurses (staff from labor floor, ambulatory care clinic, antenatal testing unit), 16 faculty members, 12 allied health professional staff (patient care assistants, registrars, receptionists), 12 medical students, ten patients, seven fellow residents, and a self-evaluation by each resident.
Each questionnaire had ten statements designed to evaluate the residents’ interpersonal and communication skills, scored on a scale of 1–5. The highest possible score was 50 (see Appendix).
On this scale, any particular behavior or action was graded on an ascending scale of frequency: 1 = “never” and 5 = “always.” The best score was 5. Questionnaires included “negative” statements, such as “Resident interrupts me during conversations.” In order to keep the scores on the same ascending interpretation scale, these negative statements were scored in the reverse ranking order. A score of five meant “never” and a score of one meant “always.” Thus, despite the negative statements, the best score obtainable remained 50.
We intentionally did not include the option of “not applicable” or “unable to evaluate” to avoid diluting the data because this option could have been selected frequently by evaluators unwilling to commit themselves.
It is difficult to design instruments for objective evaluation of interpersonal and communication skills and, to our knowledge, no single standard instrument exists. Our questionnaires were designed to evaluate what we thought were the components of interpersonal and communication skills for residents’ interactions with faculty, nurses, patients, fellow residents, ancillary staff, and medical students. We thought that the important components of such interaction were listening skills, ability to communicate through the spoken word, and demonstrate, through words and actions, basic human courtesy and respect for other members of the health team.
Validity of each questionnaire was confirmed by a clinical psychologist as well as the Patient Satisfaction Department of the medical center before we began collecting data (validity by “expert opinion”). We asked the psychologist and the Patient Satisfaction Department to evaluate whether each statement helped to measure different components of interpersonal and communication skills (content validity),4 such as listening skills (e.g., listen without interrupting the speaker), ability to communicate through the spoken word (e.g., relay information in a clear and comprehensive manner to faculty, nurses, fellow residents, ancillary staff and medical students), and ability to demonstrate humanistic qualities (e.g., treating coworkers such as nurses and other residents with human courtesy and respect). The scoring scale aimed to quantify these behaviors (face validity).4 To ensure all data were maintained strictly confidential, each resident was assigned a code that was used in all result tables and data analysis.
Completed questionnaires were collected and collated by category. Coded data were entered into an Excel spreadsheet. The total score given by each evaluator was calculated for each resident. The mean score given by each category of evaluator was calculated for each resident. Data were analyzed with SPSS statistical software (SPSS, Inc., Chicago, IL). Rank order of the residents was calculated from the mean of total scores given by each category of evaluator(s).
As in previous literature comparing residents’ evaluations by various professionals,5 Shrout-Fleiss (model 2)6 intraclass correlation coefficients were calculated as a measure of reliability of ratings within each group of evaluators. The evaluators (faculty, colleagues, etc.) were considered to be a random sample of a larger population of all possible raters (i.e., as “random effects”).7 This measure calculates the consistency of scores among different evaluators in an evaluator category. The higher the consistency, the higher would be the reliability of the scores. Reliability/reproducibility among scores given by different categories of evaluators was calculated by deriving the Pearson correlation coefficient. A p value of < .05 (two tailed) was considered significant for the correlation coefficient.
The eight residents completed a self-evaluation questionnaire and a questionnaire for their seven colleagues; 25 nurses, 16 faculty members, 12 ancillary staff, 12 medical students, and ten patients completed questionnaires for each resident. The analyses here exclude data from the residents’ self-evaluations.
Table 1 shows the mean scores for each resident for each category of evaluator, with the intraclass coefficient of correlation, which indicates the reliability of the score within each class of evaluators. Correlation coefficients ranged from .85–.54.
Table 2 shows the residents’ rank order derived from collating all the categories of evaluators but excluding the self-evaluation. The highest ranked resident overall (resident 1), ranked high among five categories of evaluators and the lowest ranked resident (resident 8), was low with most evaluators. The mean rank for each resident reflects the rank awarded among all the categories of evaluators, suggesting that even across different categories of evaluator(s) each resident was graded similarly, high or low. This finding also suggests good correlation among the various categories of evaluators. Of note is that the rank order among colleagues as evaluators is markedly different from that of other categories of evaluators suggesting that residents graded their colleagues in a different manner from all other categories of evaluators.
Table 3 shows the significant correlation between the faculty and ancillary staff (p = .002). There was an interesting negative correlation between medical students and colleagues (p = .016). The scores from colleagues correlated negatively with all the other categories of evaluators, indicating “different” evaluation criteria by colleagues.
The ACGME requires residency programs to train residents in six competencies and to develop evaluation methods for assessing these competencies. Interpersonal and communication skills, one of the required competencies, are essential to demonstrating the development of the other competencies such as professionalism, patient care, and medical knowledge. Systems-based practice, another competency, also requires excellent communication and interpersonal skills to use the “system.”
The 360-degree evaluation has been recognized in the ACGME’s Toolbox of Assessment Methods as second only to surveying patients (SPs) during an OSCE. The “best” method as recommended by the Toolbox includes only one class of evaluator (i.e., patients). It is possible that residents interact differently with patients than they do with colleagues, nurses, faculty, ancillary staff, and medical students. In this respect, the 360-degree evaluation tool would be a better assessment method because it would include evaluators representative of the full spectrum of people with whom clinicians must interact.
Nurses from labor and delivery as well as clinic nurses at our medical center were enthusiastic when informed of the purpose of our study. They were even more enthusiastic when informed that the data were completely confidential.
Medical students, ancillary staff, faculty, and patients were also eager participants, especially with the knowledge that the data were to be maintained in strict confidence.
The residents themselves showed mixed reactions to the proposal for the study. Some were concerned about evaluating their fellow residents and some voiced concern about evaluating themselves. However, on being assured that the data, as well as the analysis would be completely confidential and secure, residents promptly completed their questionnaires. All categories of evaluators returned completed questionnaires promptly, indicating their support for the study.
Analysis of data showed excellent correlation among the different evaluators in each category (high intraclass correlation coefficients, ranging from .85–.54) among nurses, faculty, ancillary staff, patients, and medical students.
There was general agreement among different categories of evaluators for each resident; i.e., a resident who ranked high with nurses also ranked high with other categories and a resident who ranked low with the nurses ranked low with other categories.
The general agreement among categories did not hold true for residents evaluating themselves. The junior residents graded themselves higher, while the more senior residents rated themselves average or low. It is possible that as residents grow in experience they become more impartial or unbiased in their self-evaluation. It is also possible that experienced residents expect more of themselves and, therefore, grade themselves lower. We did not include the residents’ self-evaluation data in our final analysis owing to possible skewing of results from inclusion of these data. Additionally, Gordon,8 Kruger and Dunning,9 and Tousignant and DesMarchais,10 have shown that self-evaluations are not valid.
Residents’ evaluation of colleagues did not show the same trend as did the other categories of evaluators. The rank order by colleagues was quite different from that of other evaluators. This may indicate different “dynamics” regarding interpersonal and communication skills among residents as a group. The skills required to obtain a high grade from nurses, faculty, medical students, ancillary staff, and even patients may differ from those required to interact with their fellow residents. Our findings in this regard are in contrast to reports by DiMatteo and DiNicola11 and Risucci et al.12 These investigators reported positive correlation among attending and peer ratings. These studies were not conducted with obstetrics and gynecology residency programs.
Our study had some limitations. First, we conducted it at a single institution with a small number of residents. However, all eight residents in our program participated thus potential “selection” bias did not exist.
Second, this was a “one-time” data collection set. We do not know if all the categories of evaluators would rate the residents the same way if the evaluators were asked to respond to statements in the questionnaire at another time, or if statements were repeated more than twice. However, since we collected the data during March and April, the evaluators had interacted with each resident for at least ten months (first-year residents), and longer with the more senior residents (two to four years), thus reducing bias. The faculty and staff at our program have been stable over a number of years, thus potentially reducing bias.
There was significant time and effort involved in distributing, collecting and ensuring confidentiality of the data. It was possible in our small institution to maintain complete confidentiality because data-sheets were handed out and collected by the principal investigator. In larger residency programs, maintaining confidentiality may become a significant concern. Larger residency programs may find the logistics of distribution, collection, analysis, and ensuring confidentiality of data become a real barrier to conducting such a study. It may be possible to use electronic data collection. However, not all categories of evaluators may have access to computers, especially patients, thus limiting the feasibility.
Our study had some unique features. Ours would be the first reported study using the 360-degree tool to assess interpersonal and communication skills in obstetrics and gynecology, using almost all categories of individuals who interact with residents.
Our study showed that in a stable institution with a relatively small number of residents, it is feasible to conduct the 360-degree evaluation of interpersonal and communication skills. Although it does require significant time and effort, a dedicated investigator can conduct the study. Our study shows some interesting results, the underlying causes of which are open to interpretation, and serves as the starting points for further studies in this direction.
It would be interesting to see the results if our study were continued through the transition of a group of residents from their entry into the program through graduation (four years), using the exact same statements on the questionnaire and categories of evaluators each year. We plan also to analyze the pattern of scores for each statement in an effort to determine which were most reliable across evaluator categories. This analysis would help to design a shorter questionnaire, which could be more feasible to administer. Each year, once the results of the study were collated, they would be used to provide formative feedback in a confidential manner to each resident and suggestions for improvement made. The effect of such feedback and suggestions may then reflect on the scores obtained the next year. In this way, a progressive improvement in interpersonal and communication skills could be encouraged and measured.