Secondary Logo

Journal Logo

Economics, Education, and Policy: Research Report

Influence of Provider Type (Nurse Anesthetist or Resident Physician), Staff Assignments, and Other Covariates on Daily Evaluations of Anesthesiologists’ Quality of Supervision

Dexter, Franklin MD, PhD*; Ledolter, Johannes PhD; Smith, Thomas C. BS; Griffiths, David BS; Hindman, Bradley J. MD

Author Information
doi: 10.1213/ANE.0000000000000345

Currently, at many U.S. healthcare facilities, supervision of anesthesiology residents and/or Certified Registered Nurse Anesthetists (CRNAs) is a major daily responsibility of anesthesiologists.1,2 The word “supervision” is used here not as a U.S. billing term. Rather, we use the term supervision to include all clinical oversight functions directed toward assuring the quality of clinical care whenever the anesthesiologist is not the sole anesthesia care provider. Investigators have been learning how the quality of anesthesiologists’ supervision in operating rooms (ORs) can be evaluated, both individually3–5 and departmentally6–9 (Table 1). Supervision of residents is a required element of both postgraduate medical educationa and billing compliance.b

Table 1
Table 1:
Previous Findings Regarding Supervision of Anesthesiology Residents and Certified Registered Nurse Anesthetists (CRNAs) by Anesthesiologists

Based on previous results (Table 1), our department implemented a process by which the supervision provided by each anesthesiologist working in OR(s) was evaluated each day by the anesthesiology resident(s) and CRNA(s) with whom they worked the previous day.4,5,8,9 The evaluations utilize the 9 questions developed by de Oliveira Filho et al.3 to assess anesthesiologist supervision (Table 2). Evaluations are used for faculty development, teaching evaluation, residency program review, and mandated Ongoing Professional Performance Evaluation. Using the same process, we also implemented daily evaluation of each anesthesiologist’s supervision by the CRNAs with whom they worked.5

Table 2
Table 2:
de Oliveira Filho et al.’s Instrument3,5 for Measuring Faculty Anesthesiologists’ Supervision of Anesthesiology Residents During Clinical Operating Room Care

A concern raised by some anesthesiologists was that their supervision score(s) would be affected greatly by their daily clinical supervisory assignments. At our hospital, anesthesiologists’ daily OR assignments are made by a scheduling office, with practically no input by each individual anesthesiologist for his/her next workday. Whether an anesthesiologist is assigned frequently or infrequently to work with any specific resident and/or CRNA is a decision made by the scheduling office and the daily anesthesiologist in charge, not the individual anesthesiologist. On a given day, some anesthesiologists are assigned to supervise 1 resident physician (e.g., cardiac surgery cases), some are assigned to supervise 2 residents, and some to supervise 3 CRNAs (e.g., adjacent ORs), etc. These distributions of assignments differ markedly among anesthesiologists (e.g., χ2 tests for randomness of assignments to 1 resident P < 0.00001 and to residents versus CRNAs P < 0.00001), although >85% have some days 1:1 and some 1:3. Some anesthesiologists were concerned that resident and/or CRNA evaluations of their supervision would be affected adversely: (1) on days they were assigned to staff multiple rooms or (2) when the acuity of patient in care in other rooms was high. In addition, residents and CRNAs have different expectations for supervision (see Table 1.6 and 1.11).5 Consequently, some anesthesiologists were concerned that their supervision scores might be biased if they were assigned relatively often to supervise CRNAs rather than residents, or often with different CRNAs. The aim of this study was to determine the relationships between daily anesthesiologist supervision scores and (1) the type of anesthesia provider providing the score (resident or CRNA); (2) various measures of daily anesthesiologist clinical responsibility (e.g., workload of the rater [resident or CRNA] and anesthesiologist); and (3) the number of occasions (individual days) the rater (resident or CRNA) and the anesthesiologist worked together. Knowing which of these covariates matter is important to guide valid monitoring of anesthesiologists.10


The University of Iowa IRB determined that the work was not human subjects research. Analyses were performed with de-identified data.

If resident evaluations of faculty supervision are used to make administrative decisions regarding individual anesthesiologists (e.g., to require supplemental supervision/educational training, for promotion, or for ongoing professional practice evaluation), the numeric value of the staff supervision score should be dependable and insensitive to covariates, the topic of this study. A psychometrically dependable score has a narrow confidence interval around the mean. Prior to this study, we knew that for an anesthesiologist’s mean supervision score to be dependable, scores were needed from at least 9 different resident raters (see Table 1.9).4 Accordingly, we wanted a data collection period sufficient for approximately 85% of the anesthesiologists to have received ratings from at least 9 different residents.4,5 At our hospital, the departmental median daily anesthesiologist supervision assignment was 2 (i.e., the median number of residents and/or CRNAs being supervised simultaneously was 2 ORs). Therefore, for each anesthesiologist, we expected approximately 1 evaluation of their supervision per type of rater (resident or CRNA) for each day they worked in the ORs. At our hospital, the lower 20th and 10th percentiles of anesthesiologists’ regularly scheduled OR clinical days were 1 per week and 1 per month, respectively. Some anesthesiologists have few (1 or 2) days per month in ORs, because their clinical responsibilities are primarily non-OR (e.g., critical care or pain medicine) and/or they have major administrative or research responsibilities. Thus, we had strong confidence that, with 6 months of data, we would obtain 9 different resident ratings for approximately 85% of the anesthesiologists.

During the 6-month period of this study, we obtained ratings from ≥9 different residents for 84% of the anesthesiologists (N = 56/67). The first date (starting the 6 months) for which daily evaluations of anesthesiologists were requested was July 1, 2013, which was a Monday. We picked July 1 to correspond to the start of a residency class (i.e., year of training). Evaluations were analyzed for all anesthesiologist-resident and all anesthesiologist-CRNA OR interactions occurring through Sunday, December 29, 2013.

Before starting the process of daily evaluations, all residents and CRNAs received an e-mail (June 2013) that included text in the body of the e-mail and PDF attachments with an explanation for why daily evaluations were being requested and instructions for completion of the evaluations. Anesthesiologists also were informed about the new daily evaluation process. Information emphasized that because of the immediacy of requests (the next calendar day), evaluations of supervision would be made while impressions were “fresh.”

Details of the departmental process for evaluation are in the Appendix.

Minute-by-minute overlaps of cases were combined daily among all cases to calculate and identify all residents and/or CRNAs who cared for patients with each anesthesiologist for at least 60 minutes. The day after working with an anesthesiologist in an OR, an e-mail with a secure hyperlink was sent automatically to each such rater (resident or CRNA). The 9-question evaluation instrument matched that of de Oliveira Filho et al.3 (Table 2).5 The evaluation could not be submitted unless all 9 questions were answered.

Analyses were limited to data from the 32 OR main (tertiary) surgical suite, 8 OR ambulatory surgery center, 2 OR Urology surgical suite, operative obstetrics (see Appendix), pediatric cardiac catheterization laboratory, and electroconvulsive therapy. This resulted in analysis of data for both 97% of minutes and 97% of anesthetics. Of these 15,664 cases, there were 13,140 cases that contributed to an anesthesiologist’s time with a resident or CRNA.c

The supervision data were aligned with the department’s anesthesia billing data to obtain American Society of Anesthesiologists’ Relative Value Guide (ASA RVG) base units, modifier units (e.g., emergency condition), and non-time-based units for procedures (e.g., 3 units for arterial line placement in accordance with the ASA 2012 Relative Value Guide).11 When one anesthesiologist relieved another on a case, the base and non-time-based units were attributed proportionally based on the actual minutes supervised (Table 3). For 798 cases for which the anesthesia information management system’s case identifier was missing from the billing data, all base and non-time-based units were attributed to the anesthesiologist and resident or CRNA listed in the billing data.

Table 3
Table 3:
Pooled Correlation Analysis Between the Ratings of Anesthesiologists’ Supervision as Evaluated by Residents and Certified Registered Nurse Anesthetists (CRNAs) and Daily Workload Together

We used 2 ways of estimating the mean supervision score of an anesthesiologist. The 2 ways differ depending on the number of occasions (days) for each anesthesiologist and rater combination. Take for example, a consecutive period in which an anesthesiologist was evaluated by 9 different residents (see Table 1.9). To obtain evaluations from 9 different residents, the anesthesiologist had 15 days with resident assignments and received 14 evaluations: 1 resident (A) provided 3 daily evaluations; 3 residents (B, C, D) each provided 2 daily evaluations; 5 residents (E, F, G, H, I) each provided 1 daily evaluation; and resident B worked with the anesthesiologist a 3rd day but did not complete an evaluation. For a pooled analysis (meanpooled), all evaluations for each anesthesiologist by the same type of rater (i.e., all residents or all CRNAs) are pooled. In the above example, the sample meanpooled is the average of all 14 evaluations. In contrast, for an equally weighted sample meanequal, each of the raters’ mean supervision scores for the anesthesiologist are weighted equally (i.e., treated as a single observation). The sample meanequal for the anesthesiologist evaluated by 9 different residents would be the average of the 9 anesthesiologist × resident sample means (i.e., the mean of the 9 different rater’s means). We henceforth drop the adjective “sample” before each meanpooled and meanequal. Although meanpooled has a larger sample size, it is unclear12,13,d how to estimate the variance of meanpooled as would be needed to compare statistically meanpooled to a threshold such as 3.0. In contrast, estimating the variance of meanequal is simple,14–17,e and thus what we studied previously psychometrically4 (see Discussion Limitations). Although we use meanpooled in the article, meanequal and its variance are the values to use for monitoring anesthesiologists’ supervision.


Evaluation Completion Rates Can Exceed 85%

We obtained daily evaluation completion ratesf of 92.9% among residents (N = 2968/3196 anesthesiologist × resident days) and 90.5% among CRNAs (N = 3490/3858) (see Table 1.13). The mean monthly evaluation completion rate exceeded 85% (Student’s 1-group 2-sided t tests: residents P = 0.0001, CRNAs P = 0.0005, N = 6 months). Weekly evaluation completion rates did not change over time (Cochran-Armitage Trend tests residents P = 0.84, CRNAs P = 0.38). Among monthly periods, the CRNAs accounted for 51.1% ± 0.9% (SE) of billed minutes and 48.9% ± 0.7% of ASA RVG units.

Mean Scores Need Not Differ Substantively Depending on How Calculated

For the example at the end of the Methods, there was little difference between meanpooled and meanequal because there were 1 or 2 rating occasions (days) for 89% of raters, where 89% = (3 + 5)/9. Overall, for our complete 6-month data, meanequal differed negligibly from meanpooled. The bias equaled −0.02 ± 0.01 (SE) for residents and CRNAs (both N = 67 anesthesiologists). The absolute percentage differences were 0.39% ± 0.05% and 0.94% ± 0.09%, respectively. In routine practice, the differences will be even less (Table 4).18,19 Consequently, although meanequal would be monitored, stakeholders can think of the “average” as being meanpooled.

Table 4
Table 4:
Numbers of Occasions Expected During Routine Use at University of Iowa

Supervision Scores Differed Between Residents and CRNAs

If anesthesiologist supervision scores differ between those provided by anesthesiology residents and those provided by CRNAs, the type of rater (resident or CRNA) should be included as a covariate when interpreting anesthesiologist supervision scores. The meanpooled ± SD of all anesthesiologist supervision scores was 3.76 ± 0.39 among residents (N = 2821 evaluations) and was 3.13 ± 0.71 among CRNAs (N = 3189 evaluations) (see Table 1.6 and 1.11; see Fig. 1). In comparison, the maximum possible supervision score equals 4.00 and the minimum possible score equals 1.00. Anesthesiologist supervision scores provided by residents were significantly greater than anesthesiologist supervision scores provided by CRNAs (Wilcoxon-Mann-Whitney P < 0.0001, Student’s t test P < 0.0001). This unpaired analysis (all resident scores versus all CRNA scores) provides greater weight to the anesthesiologists and raters who had greater OR days.20

Figure 1
Figure 1:
Relationship between meanequal scores and percentages of scores <3.00 (“frequent”). The meanequal scores were used to create batches by rounding to the nearest 0.1 units. This figure shows that despite the non-normal distribution of mean scores among anesthesiologists (e.g., from the upper bound of 4.0), at the (important) lower end there is a monotonic relationship between mean scores and percentages of scores <3.00 (“frequent”). The value of 3.00 was chosen because, in a previous nationwide survey, anesthesiology residents who reported mean scores for their entire department <3.00 also reported making more “mistakes that have negative consequences for the patient” and “medication errors (dose or incorrect drug) in the last year” (see Table 1.8).8 This figure shows also that when the meanequal (i.e., equally weighted among raters) is 3.5 or less among residents or 3.0 or less among Certified Registered Nurse Anesthetists (CRNAs), the anesthesiologist will (P < 0.05) not be meeting expectations of supervision of some residents and CRNAs. The “P < 0.05” is shown by including 1-sided lower 95% Clopper-Pearson confidence intervals.18 , 19 Among anesthesiology residents, only 6% (N = 3/47) “perceived that supervision that met their expectations was less than ‘frequent (3.00)’” (see Table 1.6).5 Therefore, a limit line is shown at 6%. Among CRNAs, the percentage that “perceived that anesthesiologist supervision that met their expectations was less than ‘frequent’” was 33% (N = 50/153) (see Table 1.11).5 Therefore, a limit line would be shown at 33%, but that overlaps with the data points and so, without affecting conclusions, the line is drawn at 37%. For the anesthesiologists who had meanequal scores (rounded to the nearest 0.1 units), of 3.5, these are presented for residents at 3.53 and CRNAs at 3.47 so as not to overlap. For the anesthesiologists who had CRNA rounded meanequal scores of 2.6 to 2.8, these were pooled because of limited total numbers of evaluations.

Next, a paired analysis was performed. The meanequal supervision score was calculated for each combination of anesthesiologist and rater (resident or CRNA). For N = 65 anesthesiologists, there were meanequal scores from both residents and CRNAs. The mean ± SD among anesthesiologists of the meanequal were 3.76 ± 0.16 from residents and 3.09 ± 0.25 from CRNAs. The mean ± SD of the differences (residents minus CRNAs) were 0.67 ± 0.23, significantly greater than zero (Wilcoxon signed-rank test P < 0.0001, Student’s paired t test P < 0.0001).

Not only did the meanpooled (and also the meanequal) supervision scores differ between the resident and CRNA raters, there was heterogeneity among anesthesiologists in the differences between residents and CRNAs. The meanpooled score was calculated for each combination of anesthesiologist and type of rater (resident or CRNA). A 2-way analysis of variance was performed using 2 factors, anesthesiologist (N = 65) and type of rater (resident or CRNA). The meanpooled was treated as the dependent variable. The Tukey 1-degree of freedom test for additivity was calculated. There was significant interaction (P < 0.0001).21,g Thus, the meanpooled evaluation score for each anesthesiologist cannot be obtained from 1 type of rater (e.g., by residents) and then a constant added or subtracted to estimate accurately what the corresponding mean would be for the other type of rater (e.g., CRNAs).

To interpret the interaction result, the meanequal was calculated for each combination of anesthesiologist and rater. The SD was calculated among the anesthesiologists’ meanequal: 0.16 ± 0.07 (SE) for residents and 0.25 ± 0.11 for CRNAs. Next, the difference was taken between each anesthesiologist’s meanequal from residents and CRNAs. The SD, among anesthesiologists, of the pairwise differences of the meanequal equaled 0.23 ± 0.10 (SE). This shows that, in addition to the variability among anesthesiologists (0.16 for residents and 0.25 for CRNAs), there is considerable variability in the differences between the 2 rater types (0.23). Because of this interaction, we treat supervision scores provided by residents and CRNAs as separate values, and we do so when summarizing evaluations in our department.

If the heterogeneity of the differences between resident and CRNA supervision scores reflected absence of correlation between residents and CRNAs, that would suggest an absence of concurrent validity of evaluation of anesthesiologist supervision using the 9-question instrument.3 However, that was not so. For each anesthesiologist, 2 values were compared, the meanequal score from residents and the meanequal score from CRNAs. The means were correlated (Kendall τb = +0.357 ± 0.081, P< 0.0001, N = 65 anesthesiologists). The correlation was still present when values were limited to the anesthesiologists who were evaluated both by at least 9 different residents and 9 different CRNAs (τb = +0.433 ± 0.072, P < 0.0001, N = 52) and anesthesiologists who were evaluated by at least 15 different residents and 15 different CRNAs (τb = +0.507 ± 0.071, P < 0.0001, N = 45) (see Table 1.5). These correlations suggest that although CRNAs and residents differ in their expectations regarding level of supervision, the behaviors and attributes that are used to assess the quality of an anesthesiologist’s supervision have significant commonality.

Supervision Scores Were Insensitive to Anesthesiologists’ Assignments

Daily clinical assignments determined ASA RVG units of work that the anesthesiologist and the rater (resident or CRNA) did together on each day (“units together”). Daily clinical assignments also determined daily units that the anesthesiologist did with other anesthesia providers on that same day (“units not together”) (Table 3). The ASA RVG units included time, base and modifier units for intensity of effort, and additional units for procedures such as arterial line placement. Anesthesiologist supervision scores provided by residents were: (1) greater when a resident had more units of work that day with the rated anesthesiologist (“units together”, P < 0.0001) and (2) less when the rated staff anesthesiologist had more units of work that same day with other providers (“units not together,” P < 0.0001) (Table 3). However, the relationships were unimportantly small, τb = +0.083 ± 0.014 and τb = −0.057 ± 0.014, respectively. Specifically, the overall meanpooled supervision scores were calculated for the lower and upper quartiles of the predictors. The differences of these means correspond only to 0.14 ± 0.02 and −0.07 ± 0.02 points respectively, on the 1.00 to 4.00 scale (see Fig. 1 for comparison). The correlations were even less among the CRNAs, τb = −0.029 ± 0.013 and τb = −0.004 ± 0.012, respectively. The correlations were comparable when analyzed based on time (minutes) together and time (minutes) not together (Table 3). These absolute differences in τb ≤0.028 included any potential attribution of base and non–time-based units to different anesthesiologists during a case. Thus, although daily clinical assignments had a detectable effect on anesthesiologist supervision scores provided by residents, the magnitude of the effect was so small that it can be ignored. Thus, we do not consider daily clinical assignments as covariates when evaluating an anesthesiologist’s quality of supervision.

Daily clinical assignments influenced the number of occasions (days) that a rater (resident or CRNA) worked with each anesthesiologist (Table 5). For each anesthesiologist × resident pair, the meanequal supervision score was calculated. There was unimportantly small association between the meanequal and the number of days the anesthesiologists and residents worked together (τb = −0.069 ± 0.023) (see Table 1.10). Likewise, there was tiny association between the meanequal anesthesiologist supervision score and the number of days anesthesiologists and CRNAs worked together (τb = +0.038 ± 0.020).

Table 5
Table 5:
Association Between Occasions and Meanequal Supervision Scores of Faculty Anesthesiologists by Resident Physicians and Certified Registered Nurse Anesthetists (CRNAs)

Regression trees also were used to evaluate variables influencing anesthesiologists’ supervision scores. Whether using least squares, least absolute values, or trimmed mean criterion, the staff assignment variables (i.e., ASA RVG units, time, and occasions, Tables 3 and 5) were not associated with anesthesiologist supervision scores for either residents or CRNAs. When type of rater (resident or CRNA) was considered an independent variable in a single model, the type of rater was a significant covariate.


In Table 1, we summarized prior knowledge about evaluating anesthesiologists’ supervision of resident physicians and CRNAs. In this study, we made the following additional observations in each of the 3 preceding sections:

  • Obtaining daily evaluations of anesthesiologists was feasible at the University of Iowa with automated e-mail delivery, secure Web server, etc. (see Appendix). Sustained evaluation rates exceeding 85% were achieved for both residents and CRNAs. If meanequal scores were calculated on an ongoing basis10 (e.g., when there are 9 [new] different residents’ evaluations and/or 15 different CRNAs’ evaluations), then at an average sized academic department such as ours,13 rarely would the estimate differ substantively from the meanpooled.
  • Although there was significant commonality in the attributes that residents and CRNA perceived as constituting “supervision,” not only did anesthesiologists’ supervision scores differ between residents and CRNAs, the differences were heterogeneous among anesthesiologists. Thus, supervision scores should be analyzed separately for residents and CRNAs.
  • Although mean supervision scores differed markedly among anesthesiologists, supervision scores were not affected meaningfully by staff assignments (e.g., how busy the anesthesiologist is with other ORs or how many times the anesthesiologist has been assigned to work with the rater).

These results are important because together they are sufficient for monitoring each anesthesiologist’s mean score (specifically, the meanequal with equal weights for each rater) independent of the other anesthesiologists (see footnotes d and e in Methods). When evaluating anesthesiologists’ supervision scores, the only covariate of importance that we have identified is the type of rater (resident or CRNA).

Previously, Davis et al.22 showed that “a 33% decrease of time spent in teaching occurred when the … anesthesiologist concurrently directed anesthetic care in 2 ORs … The … anesthesiologist was present in ORs 76% ± 19% (SD) of the time (from patient on the OR table until skin incision) when covering” 1 resident in 1 OR, “and 62% ± 24%” for each resident “when covering” 2 residents in 2 ORs (P < 0.001). The fact that both resident and CRNA evaluations of anesthesiologist supervision are affected minimally by staff assignments (i.e., time, number of ORs, or patient acuity) suggests that actual physical presence of the anesthesiologist is not the key determinant of either resident or CRNA evaluation of supervision. It appears that, with effective supervision, “presence” of the anesthesiologist can often be perceived even when they cannot be present physically. This suggests that an important component of anesthesiologist supervision must occur in the preoperative phase, developing an intraoperative plan and in preparing for contingencies.

We studied not only residents’ evaluations of anesthesiologists’ supervision, but also CRNAs’ evaluations. Most CRNAs consider supervision by anesthesiologists that meets their expectations to be at least “frequent” (a score of at least 3.0) (see Table 1.11 and Fig. 1).5 As summarized above, we showed that, for evaluation of individual anesthesiologists, residents’ scores and CRNAs’ scores should be considered separately.


Daily evaluation of each anesthesiologist working with a resident and/or CRNA is both a feasible and effective way to gather the required number of independent evaluations of anesthesiologist supervision that are necessary for highly dependable scores (see Tables 1.6 and 1.11). Rather than observing “evaluation fatigue,” the sustained high rates of evaluation completion by both residents and CRNAs suggest striking engagement (>85%). When supervision scores are provided to anesthesiologists, scores are provided only in aggregate form, based on the results from multiple raters collected over many weeks of time.10 This is done to ensure an adequate number of different raters and to preserve the confidentiality of the raters. We have essentially no knowledge of which features of our processes contributed to our successful data collection (see Appendix).

We measured only billed activity. Although the process of supervision starts the day before (e.g., by review and consideration of the procedures and patients), we did not have data on time invested in preoperative planning. In addition, anesthesiologist supervision includes case-related activities outside of ORs. Half of the daily messages to anesthesiologists are for activities outside of ORs (e.g., from holding areas and postanesthesia care units),23 very few of which are billable. These activities, at least in part, also constitute supervision. The near complete absence of any substantive relationship between billed activity with other providers (“time not together” and “units not together”) and supervision scores (Table 3) suggests our conclusions are insensitive to the error introduced by not having any quantification of these preoperative and postoperative activities.

Our assessment of the relationship between numbers of occasions (days working together) and supervision scores found statistically significant but quantitatively irrelevant correlations (Table 5). Such correlations may apply to anesthesiologists routinely working with a few different raters. However, even these correlations are overestimates of those achievable for most anesthesiologists because 6 months of data were used whereas new meanequal evaluation scores could be obtained every several weeks. The maximum numbers of occasions will be far less (Table 4) than used for our analyses (Table 5).

Our study design and analysis were limited to the assessment of dependability, not reliability. In our previous study, we obtained variance components to calculate dependability (φ) and generalizability (G) indexes.4 The dependability (φ) index estimated the dependability of the scores for absolute decisions (e.g., to determine if an anesthesiologist’s meanequal from residents <3.00) (see Tables 1.6 and 1.9). The G-coefficients estimated the reliability of the relative ranking of anesthesiologists, with the rankings based on the mean score averaged over equal numbers of independent and unique resident ratings (i.e., identical whether use meanpooled or meanequal).4 We currently are unaware of a motivation or value to moving (if possible) the supervision scores of anesthesiologists in the 15th to 25th percentiles (with meanequal =~ 3.65) up to those of the 75th to 85th percentiles (with meanequal =~ 3.85). Importantly, even if we had, alternatively, been interested in ranking the anesthesiologists, since the rank concordance between resident and CRNA scores (τb = +0.36 ± 0.08) is low, they would still be analyzed separately.

Finally, we evaluated only the quality of anesthesiologists’ clinical supervision for OR cases. Anesthesiologists also provide managerial supervision, especially on nights and weekends. We previously developed processes for monitoring the quality of the managerial decisions (e.g., reducing how long patients wait for surgery).24–28


In this study, we learned that obtaining daily evaluations of anesthesiologists is feasible. Although the attributes that residents and CRNA perceive as constituting “supervision” significantly share commonalities, supervision scores should be analyzed separately for residents and CRNAs. Although mean supervision scores differ markedly among anesthesiologists, supervision scores are influenced negligibly by staff assignments (e.g., how busy the anesthesiologist is with other ORs).


This Appendix describes the details of the departmental process for evaluation. The computer software used is described in the Online Only Supplement (Supplemental Digital Content,

Labor epidurals and operative procedures were treated as separate cases, with labor epidurals excluded (e.g., labor epidurals alone were excluded and, if labor epidural followed by cesarean section, the cesarean section was included).

Each rater (resident or CRNA) had a personalized request-for-evaluations Web page. The day after working with an anesthesiologist in an OR for at least 60 minutes, an e-mail was sent automatically to each rater’s preferred e-mail address. Clicking the hyperlink took the rater directly to his or her request-for-evaluations page using an encrypted connection.

On the request-for-evaluations Web page, evaluations that were to be completed were highlighted in yellow, and evaluations that had been completed were flagged with a green checkmark. The rater’s individual rate of completing evaluation requests during the past 21 days, and the department-wide completion rate during that period were reported at the top of the page. Requested evaluations that were not completed by 11:59 PM of 21 days after the date of request were removed from the request list of the rater (resident or CRNA).

Most evaluations (76.8%) were completed during sessions when raters (resident or CRNA) performed more than 1 evaluation (2.08 ± 1.52 [SD] evaluations per session). Residents completed evaluations in 2.85 ± 2.73 (SD) days and CRNAs in 3.48 ± 3.19 days. There was no significant association between evaluation scores and days from pairing to evaluation (residents τb = +0.028 ± 0.016, P = 0.078; CRNAs τb = −0.024 ± 0.013, P = 0.069).

No editing was possible once an evaluation was completed.

Administration tools allowed the supervisor (BJH) to track each rater’s 21-day completion percentage and to e-mail the rater reminder(s) that the department expected at least a 90% completion rate and that evaluation of the anesthesiologists was their professional responsibility.h The administrative transaction logs include activities other than reminders and thus counts are approximate. Nevertheless, there were approximately 750 reminders for 7054 anesthesiologist × rater days, where 7054 = 3196 + 3858. These were sent out approximately weekly, when convenient for the supervisor. There were no other consequences of noncompletion.

Prior to data collection, we knew that there would be situations for which the rater (resident or CRNA) would be asked to evaluate anesthesiologist supervision (i.e., the electronic medical record indicated coincident patient care of at least 60 minutes), but the actual time being supervised would be interpreted as less. For example, during resuscitation of a patient, the rater may have been the person charting while multiple providers cared for the patient. Therefore, the evaluation page also contained an opt-out button: “No evaluation because Dr. <Last Name> and I were not assigned to care for patients together ≥ 60 min.” Although no evaluation was made (i.e., no score provided), selection of this option by the rater was considered to be a completed evaluation. However, raters were instructed: “If you and the faculty were assigned to care for patients together for ≥ 60 min, but faculty participation was low, please complete this evaluation to document this.” There were no differences in the distribution of days from working together until evaluation between use or nonuse of the opt-out button (Mann-Whitney residents P = 0.15, CRNAs P = 0.49).


Franklin Dexter is the Statistical Editor and Section Editor for Economics, Education, and Policy for Anesthesia & Analgesia. This manuscript was handled by Dr. Steven L. Shafer, Editor-in-Chief, and Dr. Dexter was not involved in any way with the editorial process or decision.


Name: Franklin Dexter, MD, PhD.

Contribution: This author helped design the study, conduct the study, analyze the data, and write the manuscript.

Attestation: Franklin Dexter has approved the final manuscript.

Name: Johannes Ledolter, PhD.

Contribution: This author helped analyze the data and prepare the manuscript.

Attestation: Johannes Ledolter approved the final manuscript.

Name: Thomas C. Smith, BS.

Contribution: This author helped conduct the study.

Attestation: Thomas C. Smith has approved the final manuscript.

Name: David Griffiths, BS.

Contribution: This author helped conduct the study.

Attestation: David Griffiths has approved the final manuscript.

Name: Bradley J. Hindman, MD.

Contribution: This author helped design the study, conduct the study, and write the manuscript. This author is the archival author.

Attestation: Bradley J. Hindman has approved the final manuscript.


a ACGME Program Requirements for Graduate Medical Education in Anesthesiology. See Section II.B.2.a. Available at: Accessed November 27, 2013.
Cited Here

b Department of Health and Human Services, Centers for Medicare and Medicaid Services. CMS Manual System. Pub 100–04 Medicare Claims Processing, Transmittal 1859, November 20, 2009. Subject: MIPPA Section 139 Teaching Anesthesiologists. Available at: Accessed November 28, 2013.
Cited Here

c The difference between the total number of cases (15,664) and those contributing to anesthesiologist and resident/CRNA pairs (13,140) were cases for which the anesthesiologists either staffed student nurse anesthetists or performed personally the case without an anesthesia provider.
Cited Here

d Suppose we were to treat the variance among occasions to be the same among raters. Then, we would make interval estimation of meanpooled using a 2-stage nested model with 2 unknown variances (among raters and within raters).12 The raters would be treated as a random effect; the coefficients analyzed using analysis of variance; and the Satterthwaite approximation applied to adjust the effective sample size to be in between the number of raters and the total number of observations (occasions).12 From Ref. 12, Table V row 17 column 3, Monte-Carlo simulation shows that coverage of 90.0% confidence intervals is good, 89.6% to 89.9% depending on the proportion of total variance attributed to that among raters. However, homoscedasticity is not a reasonable assumption. With 6 months of data, among the 134 combinations of anesthesiologist and type of rater, there were 17 combinations with ≥5 different raters each having ≥5 occasions (i.e., for which homoscedasticity could be assessed). Applying Levene test based on median, 2 were P < 0.001, 3 were P < 0.01, and 5 were P < 0.05.
Cited Here

e The variance of meanequal can be estimated simply by taking the sample variance among each rater’s mean. This analysis is commonplace not only in psychometrics, but in management applications (“batch means”),14,15 and has been applied to multiple operating room and anesthesia group management problems.15–17 Among the combinations of anesthesiologist and type of rater from footnote d, 14 of 17 had meanequal <3.75 (i.e., sufficiently less than the bound of 4.0 that there would be a reason to estimate the corresponding variance). The Lilliefors tests for normal distribution were 13 of 14 P > 0.14 and the 14th was P = 0.06.
Cited Here

f These counts include the 147 of 3196 (4.5%) and 301 of 3858 (7.8%) evaluations, respectively, for which the resident or CRNA rater reported that the time spent together had been less than 60 minutes.
Cited Here

g We repeated this analysis multiple ways and each showed interaction. We deleted the 2 outliers, 1 positive and 1 negative, leaving N = 63 anesthesiologists, and still P = 0.0003. We trimmed the 11 of 65 anesthesiologists with resident meanequal ≥3.90, because the scale is bounded at 4.0, and P < 0.0001. We trimmed the 11 of 65 anesthesiologists with the resident meanequal <3.60, and still P < 0.0001. We limited the analysis to the N = 52 anesthesiologists with both at least 9 resident raters and 9 CRNA raters, and calculated the Kendall τb correlation between the differences of the resident and CRNA meanequal and averages of the resident and CRNA meanequal. The τb = −0.288 ± 0.087 (SE) has Monte-Carlo P = 0.0022 (StatXact-9, Cytel, Inc., Cambridge, MA). The upper quartile of anesthesiologists by resident scores (meanequal >3.85) was trimmed, leaving N = 39. The τb = −0.336 ± 0.084, with P = 0.0028. Finally, the lower quartile of anesthesiologists by resident scores (meanequal <3.61) was trimmed. The τb = −0.412 ± 0.095, with P < 0.0001. Thus, the interaction is robust to trimming either of the tails.
Cited Here

h “Dear Dr. <Last Name>: I see that you are a little behind in your faculty evaluations. Over the last 21 days, you completed 8 evaluations and have another 2 left to go. This puts you at a completion rate of 80.0 percent. Our goal is a 90 percent completion rate. Thank you for your prompt attention to this matter. Click this link to go to the evaluations page on our Intranet. Dr. Brad Hindman”
Cited Here


1. Shumway SH, Del Risco J. A comparison of nurse anesthesia practice types. AANA J. 2000;68:452–62
2. Taylor CL. Attitudes toward physician-nurse collaboration in anesthesia. AANA J. 2009;77:343–8
3. de Oliveira Filho GR, Dal Mago AJ, Garcia JH, Goldschmidt R. An instrument designed for faculty supervision evaluation by anesthesia residents and its psychometric properties. Anesth Analg. 2008;107:1316–22
4. Hindman BJ, Dexter F, Kreiter CD, Wachtel RE. Determinants, associations, and psychometric properties of resident evaluations of faculty operating room supervision in a US anesthesia residency program. Anesth Analg. 2013;116:1342–51
5. Dexter F, Logvinov II, Brull SJ. Anesthesiology residents’ and nurse anesthetists’ perceptions of effective clinical faculty supervision by anesthesiologists. Anesth Analg. 2013;116:1352–5
6. Paoletti X, Marty J. Consequences of running more operating theatres than anaesthetists to staff them: a stochastic simulation study. Br J Anaesth. 2007;98:462–9
7. Epstein RH, Dexter F. Influence of supervision ratios by anesthesiologists on first-case starts and critical portions of anesthetics. Anesthesiology. 2012;116:683–91
8. De Oliveira GS Jr, Rahmani R, Fitzgerald PC, Chang R, McCarthy RJ. The association between frequency of self-reported medical errors and anesthesia trainee supervision: a survey of United States anesthesiology residents-in-training. Anesth Analg. 2013;116:892–7
9. de Oliveira Filho GR, Dexter F. Interpretation of the association between frequency of self-reported medical errors and faculty supervision of anesthesiology residents. Anesth Analg. 2013;116:752–3
10. Dexter F, Ledolter J, Hindman BJ. Bernoulli cumulative sum (CUSUM) control charts for monitoring of anesthesiologists’ performance in supervising anesthesia residents and nurse anesthetists. Anesth Analg. 2014;119:679–85
11. Dexter F, Lubarsky DA, Gilbert BC, Thompson C. A method to compare costs of drugs and supplies among anesthesia providers: a simple statistical method to reduce variations in cost due to variations in casemix. Anesthesiology. 1998;88:1350–6
12. El-Bassiouni MY, Abdelhafez MEM. Interval estimation of the mean in a two-stage nested model. J Statist Comput Simul. 2000;67:333–350
13. Kheterpal S, Tremper KK, Shanks A, Morris M. Workforce and finances of the United States anesthesiology training programs: 2009-2010. Anesth Analg. 2011;112:1480–6
14. Law AM, Kelton WD Simulation Modeling and Analysis. 19912nd ed New York, NY: McGraw-Hill, Inc.:551–3
15. Ledolter J, Dexter F, Epstein RH. Analysis of variance of communication latencies in anesthesia: comparing means of multiple log-normal distributions. Anesth Analg. 2011;113:888–96
16. Dexter F, Marcon E, Epstein RH, Ledolter J. Validation of statistical methods to compare cancellation rates on the day of surgery. Anesth Analg. 2005;101:465–73
17. Dexter F, Epstein RH, Marcon E, Ledolter J. Estimating the incidence of prolonged turnover times and delays by time of day. Anesthesiology. 2005;102:1242–8
18. Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934;26:404–13
19. Hahn GJ, Meeker WQ Statistical Intervals. A Guide for Practitioners. 1991 New York, NY Wiley:82–4, 100–5
20. Proschan MA. On the distribution of the unpaired t-statistic with paired data. Stat Med. 1996;15:1059–63
21. Alin A, Kurt S. Testing non-additivity (interaction) in two-way ANOVA tables with no replication. Stat Methods Med Res. 2006;15:63–85
22. Davis EA, Escobar A, Ehrenwerth J, Watrous GA, Fisch GS, Kain ZN, Barash PG. Resident teaching versus the operating room schedule: an independent observer-based study of 1558 cases. Anesth Analg. 2006;103:932–7
23. Smallman B, Dexter F, Masursky D, Li F, Gorji R, George D, Epstein RH. Role of communication systems in coordinating supervising anesthesiologists’ activities outside of operating rooms. Anesth Analg. 2013;116:898–903
24. Dexter F, Willemsen-Dunlap A, Lee JD. Operating room managerial decision-making on the day of surgery with and without computer recommendations and status displays. Anesth Analg. 2007;105:419–29
25. Stepaniak PS, Mannaerts GH, de Quelerij M, de Vries G. The effect of the Operating Room Coordinator’s risk appreciation on operating room efficiency. Anesth Analg. 2009;108:1249–56
26. Ledolter J, Dexter F, Wachtel RE. Control chart monitoring of the numbers of cases waiting when anesthesiologists do not bring in members of call team. Anesth Analg. 2010;111:196–203
27. Stepaniak PS, Dexter F. Monitoring anesthesiologists’ and anesthesiology departments’ managerial performance. Anesth Analg. 2013;116:1198–200
28. Wang J, Dexter F, Yang K. A behavioral study of daily mean turnover times and first case of the day start tardiness. Anesth Analg. 2013;116:1333–41

Supplemental Digital Content

© 2014 International Anesthesia Research Society