Clinical supervision is the mainstay of residency training. High-quality supervision assures high-quality clinical care, patient safety, and education. In addition, resident–faculty educational interactions provide role models for future clinicians and educators.1
Resident supervision has been demonstrated to be a 1-dimensional construct consisting of 9 facets or attributes, rated as the frequency with which residents perceive them during clinical duties. Accordingly, faculty supervision is considered high quality if faculty members frequently or always:
- allow residents to participate in perianesthesia planning,
- are available for help/consultation,
- are present in the operating room during the critical phases of the anesthetic,
- provide timely and constructive feedback,
- stimulate practice and patient-based learning,
- demonstrate professional behaviors,
- teach and demand compliance with safety measures,
- demonstrate high interpersonal skills, and
- provide residents training opportunities and foster resident autonomy.2
Each of these 9 attributes can be assessed on a 4-point ordinal scale (never, rarely, frequently, and always).
The quality of supervision can be reliably measured as the average score obtained on the above 9-point faculty supervision evaluation instrument. High reliability of this instrument has been demonstrated by psychometric analyses under both classical test and generalizability theories.2 However, measures were contaminated by significant halo effect, defined as a strong tendency of raters to “think the person in general as rather good or rather inferior and color the judgment of separate qualities by this general feeling.”3 The halo effect is a ubiquitous and pervasive bias that affects virtually every human assessment of others. High correlations across attribute scores and high percentage of variance attributed to the rater–ratee dyadic interaction term of the object of measurement score at generalizability studies are the main indicators of halo error. Despite this finding, the overall estimated reliability of measures produced by the instrument has been consistently high.2,4
However, to date, psychometric analyses have incorporated individuals (faculty members) as the object of measurement.2,4 In this issue of Anesthesia & Analgesia, de Oliveira, Jr et al.5 describe the psychometric performance of the faculty supervision evaluation instrument applied to measure supervision quality at the departmental level. The authors used a national sample of anesthesia residents training at several programs (departments), taking a variety of rotations within each program. By applying classical test theory methods, the authors found a Cronbach alpha coefficient around 0.90, indicating that measures obtained via the instrument are expected to be highly reproducible and dependable. In addition, the high discriminant and convergent validities of the instrument found in the study indicate that the resulting measures discriminate between measures that should not be related to the construct and converge to similar constructs related to the quality of clinical supervision. These findings significantly widen the scope of application of the above faculty assessment instrument, allowing for valid and reliable measures at both individual and departmental levels.
This issue of Anesthesia & Analgesia also includes a study by Hindman et al.6 that investigated whether average score of faculty evaluations could be a surrogate of the supervision score attributed to the overall department. To answer this question, the authors obtained the mean of all individual faculty evaluations from each participating resident during a 36-week period. The authors compared that mean with the resident’s corresponding global evaluation of all faculty anesthesiologists with whom they worked during the same period. They compared both measures and summarized results as global-to-individual supervision score ratios. The authors found that departmental scores were significantly lower than the average faculty scores. More specifically, the median global-to-individual score ratio was 86.2%. Global and the average faculty scores were significantly, but not highly, correlated.
What do these findings mean? Perhaps, on providing a global evaluation of the departmental supervisory capabilities, residents take into consideration both good and bad experiences with faculty supervision and intuitively average over their perceptions when scoring each attribute. An alternative interpretation is that the results stem from subtle changes in phrasing. In the original instrument,2 each item was directed to the “instructor” (singular) being evaluated, whereas in the present studies, each item was directed to “instructors” (plural and generalized) within the department. It has been demonstrated that even subtle item rephrasing may produce different attitudes toward item response patterns.7 Perhaps, by using the plural form, the investigators substantially changed the focus of items from personal to institutional or departmental levels, making responses more accurate regarding differences in faculty members’ supervisory abilities and reducing the halo effect induced by resident–faculty personal interactions.
It is possible that halo effect does not exist, or is undetectable, for departmental evaluations. Supporting this hypothesis is the study by Kihlberg et al.8 that examined the occurrence of halo error in departmental evaluations of clinical teaching provided by medical students. In this study, no evidence of halo effect was found, suggesting that by depersonalizing evaluations (directing ratings to the whole department), this type of rater bias might be virtually eliminated.8
Hindman et al. may have introduced another metric for assessment of departmental supervision: the global-to-individual supervision score ratio. This ratio addresses the homogeneity of supervisory abilities within departments. A ratio close to 1 implies a consistent level of supervision ability among the faculty. The narrow confidence interval of the supervision score ratio suggests this may be a promising metric, but future research is required for prospective validation.
Faculty supervision of residents is a critical aspect of high-quality training. The 2 articles in this issue of Anesthesia & Analgesia offer useful guidance in quantitative assessment of faculty supervision. These metrics can help individual departments improve the quality of faculty supervision. If the metrics are published, they can help prospective applicants avoid programs with poor supervision, and presumably poor teaching, and direct their attention to training programs with the best supervision.
Name: Getúlio R. de Oliveira Filho, MD, PhD.
Contribution: The author produced, revised, and wrote the manuscript.
Attestation: Getúlio R. de Oliveira Filho approved the final manuscript.
This manuscript was handled by: Steven L. Shafer, MD.
1. . AAMC policy guidance on graduate medical education: assuring quality patient care and quality education. Acad Med. 2003;78:112–6
2. de Oliveira Filho GR, Dal Mago AJ, Garcia JH, Goldschmidt R. An instrument designed for faculty supervision evaluation by anesthesia residents and its psychometric properties. Anesth Analg. 2008;107:1316–22
3. Thorndike EL. A constant error in psychological ratings. J Appl Psychol. 1920;4:25–9
4. Hindman BJ, Dexter F, Kreiter CD, Wachtel RE. Determinants, associations, and psychometric properties of resident assessments of anesthesiologist operating room supervision. Anesth Analg. 2013;116:1342–51
5. de Oliveira GS Jr, Dexter F, Bialek JM, McCarthy RJ. Reliability and validity of assessing subspecialty level of faculty anesthesiologists’ supervision of anesthesiology residents. Anesth Analg. 2015;120:209–13
6. Hindman BJ, Dexter F, Smith TC. Anesthesia resident’s global (departmental) evaluation of the faculty anesthesiologists’ supervision can be less than their average evaluations of individual anesthesiologists. Anesth Analg. 2015;120:204–8
7. Chan D, Schmitt N, DeShon RP, Clause CS, Delbridge K. Reactions to cognitive ability tests: the relationships between race, test performance, face validity perceptions, and test-taking motivation. J Appl Psychol. 1997;82:300–10
8. Kihlberg P, Perzon M, Gedeborg R, Blomqvist P, Johansson J. Uniform Evaluation of Clinical Teaching—an Instrument for Specific Feedback and Cross Comparison Between Departments. Högre utbildning. 2011;1:139–50