# Quantifying the Diversity and Similarity of Surgical Procedures Among Hospitals and Anesthesia Providers

In this Statistical Grand Rounds, we review methods for the analysis of the diversity of procedures among hospitals, the activities among anesthesia providers, etc. We apply multiple methods and consider their relative reliability and usefulness for perioperative applications, including calculations of SEs. We also review methods for comparing the similarity of procedures among hospitals, activities among anesthesia providers, etc. We again apply multiple methods and consider their relative reliability and usefulness for perioperative applications. The applications include strategic analyses (e.g., hospital marketing) and human resource analytics (e.g., comparisons among providers). Measures of diversity of procedures and activities (e.g., Herfindahl and Gini-Simpson index) are used for quantification of each facility (hospital) or anesthesia provider, one at a time. Diversity can be thought of as a summary measure. Thus, if the diversity of procedures for 48 hospitals is studied, the diversity (and its SE) is being calculated for each hospital. Likewise, the effective numbers of common procedures at each hospital can be calculated (e.g., by using the exponential of the Shannon index). Measures of similarity are pairwise assessments. Thus, if quantifying the similarity of procedures among cases with a break or handoff versus cases without a break or handoff, a similarity index represents a correlation coefficient. There are several different measures of similarity, and we compare their features and applicability for perioperative data. We rely extensively on sensitivity analyses to interpret observed values of the similarity index.

From the ^{*}Division of Management Consulting, Department of Anesthesia; ^{†}Department of Management Sciences; and ^{‡}Department of Anesthesia, University of Iowa, Iowa City, Iowa.

Accepted for publication August 12, 2015.

Funding: Departmental.

The authors declare no conflicts of interest.

Reprints will not be available from the authors.

Address correspondence to Franklin Dexter, MD, PhD, Division of Management Consulting, Department of Anesthesia, University of Iowa, 200 Hawkins Dr., 6JCP, Iowa City, IA 52242. Address e-mail to Franklin-Dexter@UIowa.edu or www.FranklinDexter.net.

The first objective of this Statistical Grand Rounds is to review analytical methods for the analysis of the diversity of surgical procedures among hospitals, activities among anesthesia providers, etc. We apply multiple methods and consider their relative reliability and usefulness for perioperative applications, including calculations of SEs. The second objective is to review methods for comparing the similarity of procedures among hospitals, activities among anesthesia providers, etc. We again apply multiple methods and consider their relative reliability and usefulness for perioperative applications. The applications include strategic analyses (e.g., hospital marketing) and human resource analytics (e.g., comparisons among providers).

Measures of diversity (e.g., Herfindahl and Gini-Simpson index) are used for quantification of each hospital or anesthesia provider, one at a time. Diversity can be thought of as a summary measure. Thus, if the diversity of 48 hospitals is studied as described later and shown in Figure 1, the diversity and its SE is being calculated for each hospital.

Quantifying diversity of procedures is important, because it influences appropriate operations management. For example, standardization (modularity) of processes should not be expected at perioperative organizations with large diversity. Software should be expected to need to be appropriate for many surgeon preference cards, for predicting durations and equipment for rare procedures, and for customization of most patient care instructions. Marketing messages can focus on the majority of patients undergoing many different rare procedures not the minority of patients undergoing common procedures.

In contrast, measures of similarity are pairwise assessments. Thus, if quantifying the similarity of procedures among cases with a break or handoff versus cases without a break or handoff, a similarity index represents a correlation coefficient. There are several different measures of similarity, and we compare their features and applicability for perioperative data.

Quantifying similarity is important for knowing whether 2 hospitals or anesthesia groups in the same region compete (i.e., the degree to which their procedures overlap). Similarity of procedures among patients leaving a region to have surgery versus having surgery locally indicates opportunities for local service expansion. In this article, we show too the novel use of similarity indices for determining when groups to be compared (e.g., cases with/without anesthesia provider breaks) have balanced (i.e., matching) distributions of procedures.

## MEASURES OF DIVERSITY

### Herfindahl, Gini-Simpson Index, and Their SEs

Let

represent the proportion of cases performed at the

facility

that are of the

procedure,

. The notation *S* is used, because the population of many unique procedures (or combination of procedures) is analogous to populations of different species in ecology.^{1},^{2} For example, the number

can be all Current Procedural Terminology (CPT) codes for “invasive therapeutic surgical procedures” from the U.S. Agency for Healthcare Research and Quality.^{a} For example, previously we observed at a U.S. academic hospital (*j* = 1) that, when classifying each scheduled surgical case by its primary procedure code, there were

scheduled procedures in their local dictionary.^{3} Among 8108 patients who were inpatients preoperatively and their cases were cancelled, 1.38% of the cases were scheduled for percutaneous nephroscopy (i.e.,

).^{4}

Consider an urn representing the

facility. Each ball in the urn represents a surgical case. Each ball is labeled with the procedure that was performed. If a procedure was performed 5 times, then 5 balls in the urn are labeled with the same procedure. Shake the urn. Draw 1 ball from the urn and record the procedure labeled on the ball. Return the ball to the urn. Shake the urn. Draw out another ball from the urn. Compare the procedure labeled on that second ball with the procedure of the first ball. The Herfindahl index

for the

facility equals the probability that the 2 balls drawn are labeled with the same procedure:

In ecology, this is called Simpson’s index. The smaller the value of the Herfindahl index (i.e., the Simpson index), the greater is the diversity of procedures. Let

represent the number of procedures that are performed at the

facility,

. The maximum value of

, which is obtained when only 1 procedure is performed at the hospital, is

and

. The minimum value

is obtained when all procedures performed are equally likely (i.e.,

).

The Gini-Simpson index^{5} equals

. The Gini-Simpson index is intuitive because greater values show greater diversity. An estimate for the Herfindahl of the

facility^{6},^{7}:

the second term differing in being the summation to

rather than

. The corresponding estimate for the Gini-Simpson index equals

. These are maximum likelihood estimates.^{6},^{7}

Figure 1 is adapted from a figure we published in 2003, quantifying the diversity of operative procedures performed on infants and toddlers at each of

different hospitals in the State of Iowa.^{6},^{b} Instead of plotting the Herfindahl index as in the original figure, in Figure 1, we use the Gini-Simpson index. The (pediatric) hospital performing the greatest number of different procedures is shown using a red circle (*j* = 2):

and Gini-Simpson index = 0.931 ± 0.009. The SEs were calculated using Equations 2 and 11 in the Appendix. The 0.069 means that there was a 6.9% ± 0.9% chance that any 2 randomly selected cases at the

hospital were of the same procedure. In comparison, at the hospital in Iowa performing the greatest total number of cases among young children, the chance for both procedures being the same was 65.5% ± 2.2% (i.e.,

and Gini-Simpson index = 0.345 ± 0.022). This *j* = 3rd hospital is shown in Figure 1 using a blue circle.

Comparing among the hospitals in Figure 1, the large pediatric hospital (red circle) has significantly greater diversity of procedures (Gini-Simpson index, 0.931) than any other hospital. Because the SE can be calculated for the index, inferential analysis can be performed. The pairwise differences between the large pediatric hospital and each of the other 47 hospitals in Figure 1 are all significant with Bonferroni-corrected *P* < 0.0001. To interpret the numbers in the Figure 1, consider that the large pediatric hospital (red circle) had

observed procedures. In contrast, the *j =* 3rd hospital performing the greatest total number of cases (blue circle) had only

observed procedures, principally myringotomy tube placement and adenoidectomy.^{b}

Because the SEs of Herfindahl indices are needed for use in comparing hospitals, we focus in this Grand Rounds on different methods for calculation of the SEs. In contrast to the simple Equation 11, an alternative Equation 12 derived by Taplin uses additional terms.^{8} The SEs calculated using these 2 methods differed; however, by only a very small amount (<0.0001). To illustrate the negligible differences, we used a data set, from a perioperative application, with a much smaller sample size. Over 56 weeks at an ambulatory facility (*j* = 4), 12 anesthesia providers started 1947 cases during which there was at least 1 break or handoff.^{9}–^{13},^{c} Rather than calculating the Herfindahl index based on the distribution of cases among procedures, we assessed the diversity of cases with breaks among the 12 anesthesia providers:

. Thus, there was a 14.42% chance that any 2 cases, in which a break was given or handoff occurred, were started by the same anesthesia provider. The SE calculated using Equation 12 was 0.215%. The SE calculated using Equation 11 differed by just 0.001%.

### Effective Number of Common Procedures

Esophagogastroduodenoscopy is an example of a common procedure.^{14} Anoplasty and anorectal myomectomy are examples of rare procedures.^{2},^{15},^{16} Because of such rare procedures, the observed (sample) number of different procedures

is not a reliable estimate for the actual number of different procedures performed at a facility, because of rare procedures.

Figure 2 shows the probability distribution of different procedures performed during outpatient surgery in the United States with an anesthesia provider.^{1},^{17},^{c}Figure 2 is adapted from Ref. 1.^{2} The few very common procedures are performed >100-fold more often than the many rare procedures (Fig. 2), because there are thousands of different procedure codes and combinations.^{1},^{2} Thus, most sample estimates for the number of different procedures miss at least some rare procedures that are performed infrequently at each facility.^{1}

Primary surgical procedures classified by the CPT were reviewed for the 16,413 cases performed by anesthesia providers at a hospital in Iowa (*j* = 5) over 7 successive 8-week periods.^{d} The number of different procedures during each of the 7 eight-week periods was 769, 794, 820, 855, 878, 887, and 930, respectively. In contrast, using all 56 weeks together, there were

different procedures. The reason for this vast difference was that most of the 2086 procedures were performed just once or twice. For example, during the 8-week period with 769 procedures, 72.2% were performed just once or twice. Among the other 6 periods, the percentages were 72.2%, 72.3%, 72.3%, 73.0%, 73.1%, and 74.1%. In fact, because these are primary surgical CPT codes billed for anesthesia, we know from the dictionary^{e} that

(i.e., even the

from 56 weeks was an underestimate). Figure 2 shows that this behavior is not an artifact and cannot be overcome by pooling case duration prediction data among facilities (see section Limitations to Quantifying Diversity: Example of Case Duration Prediction).

The inverse of the Herfindahl

has good interpretive value for the numbers of common procedures, numbers of providers performing cases, etc. For example, suppose that among

possible procedures, 6 procedures were performed 4 times at the *j* = 6th hospital, and 2 procedures were not performed. Then, among the 24 total cases,

, and

. The estimate of the Herfindahl index

. The inverse of the Herfindahl

equals 6 procedures. The inverse is also referred to as the Hill diversity measure of order 2, as explained in Equation 15 of the Appendix. The diversity measure is of order 2, because the Herfindahl uses the square of the proportions. For the *j* = 6th hospital,

, matching the number of procedures. Next, suppose that at the *j* = 7th hospital, there are also

possible procedures and 24 cases, but 3 procedures each accounted for 7 cases, 3 procedures each accounted for 1 case, and again 2 procedures were not performed. The estimate of the Her findahl index

. Its inverse

. The diversity measure (3.84) is >3 because there are >3 procedures, but not 4 or greater because 3 procedures account for most cases.

Figure 3 shows the same pediatric data as Figure 1^{6},^{18} but now plotted as the inverse of the Herfindahl.^{b} The greater diversity of procedures at the large pediatric hospital (red circle,

) versus the hospital performing the most cases (blue circle,

) is even more apparent in Figure 3 than in Figure 1:

vs

, respectively. These estimates can be compared with the corresponding number of different procedures performed:

vs

. Thus, the inverse Herfindahl values of 14.47 and 1.53 are not estimates for the total numbers of all possible procedures at a hospital, but instead approximate the number of procedures performed (i.e., observed) commonly.^{5}Figure 3 shows that the estimates

are sufficient for making comparisons of the diversity of procedures among hospitals in units of numbers of cases.

The relative (logarithmic) relationship of procedure incidences (Fig. 2) has an important consequence for the measure of the number of common procedures obtained by using the inverse of the Herfindahl

. Each increase in the total number of procedures

(i.e., species richness) results^{19} in an increase in the value of

. This relationship is reasonable intuitively,^{20} and we have used it when explaining the results of the analyses.^{6} We consider this topic more, below, in the Section “Diversity Assessed with Weighting for Differences Among Procedures and Providers.”

Other measures of diversity are used often, especially the diversity index of order 1:

This measure is the exponential of the Shannon entropy. See Equations 16 to 18 of the Appendix. The

effectively counts^{5} more of the procedures than does the inverse of the Herfindahl,

. In the aforementioned example with

, the

, matching the 6 performed procedures. Similarly,

procedures. In contrast, with

and

,

, which is less than

procedures.

Continuing with examples to show that

we use data from the

hospital, mentioned earlier, which performed the greatest number of different procedures. The inverse of the Herfindahl index

. The exponential of the Shannon entropy

. The

was less than the 53 procedures performed at least 3 times, which was less than

, which was itself less than the 86 procedures performed at least 2 times. For the other pediatric hospital in Iowa (*j* = 8), the inverse of the Herfindahl index

and the exponential of the Shannon entropy

. The

was less than the 13 procedures observed at least 3 times, which was less than

, which was less than the 25 procedures observed at least 2 times.

We have not routinely used the exponential of the Shannon entropy

because its estimates can be sensitive to the data (i.e., unreliable), whereas the Herfindahl (i.e., Simpson index) index works well for the perioperative applications (Figs. 1 and 3). For any number of observed surgical cases (i.e., sample size), the mean square error of the (standardized) Shannon entropy can be 2 orders of magnitude greater than that of the Gini-Simpson index.^{21} The sample (maximum likelihood) estimate of the Shannon entropy has large bias when the total number of different procedures in the population,

, is comparable to the sample size.^{22–24} We showed previously that for surgical procedures, the estimated total population size

, including (as necessary) meaningful combinations of procedures, is almost precisely equal to twice the sample size.^{1} The exponential of the Shannon entropy has its minimum SE when all procedures are equally frequent (i.e., uniform distribution),^{22} entirely different from the logarithmic distribution characteristic of counts of procedures (Fig. 2). For example, at the large pediatric hospital, the minimum variance unbiased estimator of the inverse of the Herfindahl (Equation 13)

, which is very similar to the maximum likelihood estimate

, that we used earlier. In contrast, the low bias estimator for the exponential of the Shannon entropy (Equation 18) was

, which is very different from

. For another example, at the small pediatric hospital, the minimum variance unbiased estimator of the inverse of the Herfindahl was

, very similar to

. In contrast, the low bias estimator for the exponential of the Shannon entropy was

, very different from

. It is not that

(or

) is inherently more valid than

(or

), because they are measuring different things.^{21} However, when analyzing data actuarially or automating reports by service, reliability is important.^{25},^{26}

### Diversity of Procedures and Providers at Single Hospitals

Over 60 consecutive weeks, 50 anesthesia providers performed 17,902 cases at the *j* = 5th hospital.^{f} Once again analyzing diversity among anesthesia providers rather than procedures, the inverse of the Herfindahl was

anesthesia providers. The value of 39.75 is less than the number (50) of anesthesia providers, because each provider did not perform an equal number of cases. Specifically, the 38 anesthesia providers performing the most cases each accounted for at least 1.05% of the 17,902 cases and the 39th accounted for 0.91%. Among the 41.9% of cases with at least 1 break or handoff, the inverse of the Herfindahl was 39.11 ± 0.38 anesthesia providers. For the remaining cases in which no break or handoff occurred, the inverse of the Herfindahl was 39.67 ± 0.32 anesthesia providers.^{g},^{27} Thus, because the inverses of the Herfindahl were essentially the same, there was no difference between the diversities of anesthesia providers among cases (1) with breaks and (2) without breaks. We consider more applications to human resources analytics later.

By using cases performed at the *j* = 5th hospital by anesthesia providers, we compared the procedures among cases with breaks versus cases without breaks. There were 2132 procedures observed. The Herfindahl indices were 0.0033 ± 0.0001 for cases with breaks and 0.0089 ± 0.0003 for those without breaks. Both Herfindahl indices were very small. However, the diversity of procedures was significantly greater among cases with breaks than without breaks, *P* < 10^{–6}. The inverses of the Herfindahl indices were 299.3 ± 10.1 procedures for cases with breaks versus 112.6 ± 4.0 procedures for cases without breaks. Thus, cases in which breaks took place were derived from a greater diversity of procedures than were cases in which no break occurred. When the 3 most common procedures were excluded, the Herfindahl indices were now essentially identical between cases with and without breaks: 0.0034 ± 0.0001 and 0.0033 ± 0.0001, respectively. The 3 most common procedures were performed mostly (95.4%) without breaks, likely because those cases were very brief: electroconvulsive therapy (CPT 90870), extracapsular cataract extraction (66984), and follicle puncture for oocyte retrieval (59870).^{h} We consider these data further, below, in the sections Comparisons of Similarity Indices, Based on Data from Within a Single Hospital and Sensitivity Analyses to Interpret an Observed Value of the Similarity Index

. The absolute differences in the Herfindahl indices (0.0033 vs 0.0089) were small because these 3 most common procedures collectively accounted for only 8.28% ± 0.21% of cases.

### Diversity Assessed with Weighting for Differences Among Procedures and Providers

As explained in the section “Effective Number of Common Procedures, when all procedures are equally prevalent, the inverse of the Herfindahl (i.e., Hill number of order 2) equals the number of procedures. Although this relationship seems intuitively reasonable,^{20} the count of individual procedures does not take into account the additional information from knowing the clinical characteristics of the procedures.^{28} For example, among the 7315 cases at the *j* = 5th hospital and with at least 1 break (or relief), the 3 most common procedures were total knee arthroplasty (2.4%), total hip arthroplasty (2.2%), and laparoscopic cholecystectomy (1.2%). For the preceding analyses, the 2 arthroplasty procedures and laparoscopic cholecystectomy would be counted as 3 distinct procedures even though the 2 arthroplasty procedures are similar.

Using that proportions sum to 1, the Gini-Simpson index (e.g., as shown in Fig. 3) can be rewritten:

Rao’s quadratic entropy equals^{5},^{29}:

where

is the conceptual (unitless) difference between the 2 procedures,

. Two procedures that are nearly identical have a difference

0, whereas those of different specialties and anatomic region would have a difference

1.^{29} There is a corresponding minimum variance unbiased estimator.^{5} Furthermore, there is a corresponding unbiased estimator for the effective number of common procedures that results from this measure (Equation 4).^{5} However, we have not used the weighted measure in our perioperative studies, for several reasons.

First, for strategic analyses, we have found presentation of results to nonscientific audiences to be important.^{6},^{25},^{26} In this context, sensitivity of calculated results to the method of measuring the differences between procedures has seemed a limitation, because it involves medical understanding often lacking among stakeholders. For example, we often use the Clinical Classifications Software to describe subspecialty,^{26},^{30–32} in part, because it is available for *International Classification of Diseases, Ninth Revision, Clinical Modification*, *International Classification of Diseases, Tenth Revision, Clinical Modification*, and CPT.^{i} Both donor pneumonectomy and sleeve pneumonectomy are procedures in the Clinical Classifications Software of “lobectomy or pneumonectomy.” However, there are major anesthetic and surgical differences between these procedures. Likewise, there are major anesthetic and surgical differences between partial nephrectomy versus radical nephrectomy with vena caval thrombectomy. To the extent that both donor pneumonectomy and partial nephrectomy have 7 American Society of Anesthesiologists Relative Value Guide Base Units^{j} (i.e., are not physiologically complex),^{6},^{25},^{30},^{33–35} these 2 procedures may actually be less different from one another than the other 2 examples of pairs of procedures from the same specialty. We have instead been addressing medical issues by performing sensitivity analyses, focusing on individual common procedures.^{6},^{25},^{26},^{30–32},^{35}

Second, conclusions, as exemplified by all our results mentioned earlier, seem unchanged. For example, consider the comparison of the large pediatric hospital

and the hospital performing the greatest total number of cases among young children

(Fig. 3). Because the latter hospital performed almost exclusively (99%) outpatient pediatric otolaryngology, whether the functional similarity among these procedures were based on specialty or American Society of Anesthesiologists base units, the diversity for this

hospital would be less when adjusted for the functional similarity of the procedures. Consequently, the greater diversity of the large pediatric hospital

would be emphasized further. This seems unnecessary from Figure 3. In other words, the analyses of the diversities of procedures are not seeking subtle differences among hospitals or groups but quantitative ways of summarizing substantial categorical data.

### Limitations to Quantifying Diversity: Example of Case Duration Prediction

Although comparisons of diversity among hospitals have strategic value (Figs. 1 and 3),^{6},^{25},^{26},^{35},^{36} a limitation to quantifying diversity of procedures within single hospitals has been that, once appreciated, other statistical methods then are used for administrative decision making. For example, the initial application of diversity measures for perioperative management was case duration prediction.^{1},^{2},^{15},^{37–46} Twenty percent (SE 1%) of outpatient surgery cases performed in the United States between 1994 and 1996 were of a procedure that was performed annually 1000 times or less nationwide.^{2} Nearly half of cases at a comprehensive hospital were procedures scheduled by the surgeon <9 times in 3 years.^{47} Rare procedures account for most of the uncertainty in case duration-associated decisions,^{16} in part, because many decisions depend on the time to complete a list of cases in an operating room on a day. Even though only 4% ± 1% of endoscopy cases were rare, they were distributed among 13% ± 3% of lists.^{14} Nearly half (49% ± 0.4%) of lists of cases at a comprehensive hospital had at least 1 case performed only once by the surgeon in 3 years.^{48} For decisions related to the longest amount of time that cases take,^{46},^{47},^{49–52} rare procedures are even more consequential,^{39},^{46} because more data are needed to estimate SDs than means. Pooling data among surgeons helps little, because rare procedures tend to be rare for all surgeons,^{1} and surgeons differ in how quickly they operate^{37},^{53–56} (e.g., because of different surgical approaches and methods).^{57} Thus, the diversity of procedures matters a lot for case duration prediction. However, knowing this about diversity, the problem then can be bypassed statistically by relying on the surgeon’s scheduled duration as an expert judgment”^{16},^{56} predicting the median (or mean) duration of the rare (or common) procedure.^{39},^{44},^{54},^{58} The process variability around that prediction (i.e., the coefficient of variation) is estimated using data from similar procedures.^{16},^{39},^{44},^{46} Such methods are remarkably accurate.^{39–42},^{45},46

## SIMILARITY INDEX

### Descriptions of Similarity Indices

The similarity measure that we have used in previous studies is that from Yue and Clayton^{7} because of its natural probabilistic interpretation for surgical procedures.^{6},^{26},^{30} Consider 2 hospitals. Suppose that the CPT manual contains

different procedures. Select 1 procedure at random from the CPT manual (e.g., 69421 myringotomy tube “requiring general anesthesia”). Let event A be that the procedure is 1 of the

procedures observed at the *j* = 1st hospital. Then

. Let event B be that the procedure is one of the

procedures observed at the *j* = 2nd hospital. Then

. Let

represent the probability that the randomly selected procedure is one of the

procedures present at both facilities, given that it is present at least at 1 of the 2 hospitals.^{59} By definition,

would differ among pairs of hospitals, but always

and

. Using set (Venn diagram) notation, a reasonable similarity measure (that we do not use routinely, however) would be:

where the

means intersection (i.e., present in both and thus shared), and the

means union (i.e., present in either). The numerator equals

when A and B are independent.

To apply Equation 5 to the *j* = 1st and *j* = 2nd hospitals^{5},^{7},^{59}:

We estimate

in Equation 6 by replacing the proportions

with their sample estimates. To interpret Equation 6, and by analogy Equation 5, suppose that all procedures performed at the first hospital were done equally often at the hospital and that the same is true for the second hospital too.^{7},^{59} In other words,

among the

procedures performed at the first hospital, and

. Substituting these expressions into Equation (6), and multiplying the numerator and denominator^{7} by

:

The

is used to represent Jaccard, as this is the Jaccard index. It equals the ratio of the number of shared procedures to the number of unshared procedures.^{5},^{59} Although the

and

are used often in ecology, and the Jaccard index makes great sense, below we show for our applications that these indices have limitations. (Note that nowhere in this Grand Rounds, do we use the phrase “in common” and instead “shared,” so that “common” can be used in terms of frequency [i.e., “common procedures”]).

Dexter et al.^{6} and Yue and Clayton used^{7}:

To interpret the numerator of Equation (8), we again use urns, just like for the interpretation of the Herfindahl. Envision 2 urns, 1 representing each hospital. In each urn are balls, representing all the various cases. Each ball is labeled with the case’s procedure. If a procedure was performed 5 times, then there are 5 balls in the urn that are labeled with that procedure. Shake the urns. Draw 1 ball from each of the 2 urns. The numerator is the probability that the procedure labeled on the first ball is the same as that labeled on the second ball. The denominator normalizes the range to be from 0.0 (when there is no overlap of the procedures) to 1.0 for

, for all

.

Suppose now that, as immediately above, all procedures that are performed at each facility are done equally often:

for the

procedures performed at the first facility, and, similarly,

. Substituting into Equation 8, and multiplying the numerator and denominator by

gives

, as desired (i.e., for this special case,

).

The nonparametric maximum likelihood estimator for

is obtained by replacing the true proportions

in Equation 8 with the observed proportions

that are defined in Equation 10:

### Characteristics of the Similarity Index θ Based on Previous Work with Statewide Data

The SE of the similarity index

can be calculated asymptotically by using Equation 20 in the Appendix. The estimated SEs are similar (within 0.001) to those calculated using bootstrapping.^{6},^{7} For example, we calculated the similarity of procedures performed between 2 pediatric hospitals in a certain state:

and

.^{6} The

,

,

,

, and

.^{b} The 2 SEs were 0.055 vs 0.052, differing by just 0.003. When we compared the similarity between the smaller pediatric hospital in the state (

) to the hospital performing the most pediatric cases (

),^{6} there were

,

,

,

, and

. The 2 SEs were 0.040 vs 0.041, differing by just 0.001.

For purposes of comparing procedures between groups, values of the similarity index

that are relatively large (≥0.8) are known, in part, from analyses of state discharge abstract data.^{6},^{26} For example, a hospital’s (*j* = 9) anesthesia department was concerned about patients leaving its small community to undergo surgery in a nearby large metropolitan area. The hospital’s similarity was compared with 134 other hospitals in the state.^{26}Figure 4 shows the 134 pairwise comparisons with the

9th hospital.^{26} The hospital to which it had the greatest similarity (and hence competition) was the one other hospital in its community:

. Thus, the principal competition was not the collective effect of many distant hospitals, as presumed, but was, instead, the unrecognized result of the surgeons having privileges at both of the local (similar) hospitals.^{26}

Another example provided insight into what values of the similarity index

are small (<0.3).^{26} At a large hospital (

), leadership perceived that it was competing for patients undergoing the same procedures with the small local community hospital (

).^{26} However, the 2 hospitals were quite dissimilar:

. The 2 hospitals had evolved different practice mixes with different procedures^{26} (i.e., were not competing for patients undergoing the same procedures).^{60}

Data from the

hospital was used to evaluate the sensitivity of the similarity index to rare procedures (i.e., those for which observed frequencies may underestimate true frequencies in the population).^{26} The hospital was compared pairwise with each of the 48 comparably sized hospitals in its state.^{26} For each pair of hospitals, omission of the most common procedure that both hospitals shared produced the greatest decrease in the similarity index for all hospitals.^{26} Omission of each of the 3 most common procedures produced the greatest changes in the similarity index for 3 quarters of the 48 comparisons.^{26} In contrast, omission of each of the 3 least common procedures produced no changes in the similarity index to 4 decimal places. Thus, procedures that are both common and shared between hospitals influence the similarity index

, whereas rare procedures have practically no influence.^{26} These findings will guide results of the next section with different types of data, all coming from within a single hospital.

### Comparisons of Similarity Indices, Based on Data from Within a Single Hospital

Comparison was made between anesthesia providers’ cases in which there was or was not at least 1 break or handoff (Fig. 5). With 60 weeks of data from hospital

,

procedures were classified based on the primary surgical CPT codes. We compared the procedures of the 7315 cases that had at least 1 break (or relief) versus the procedures of the 10,141 cases that did not have a break or relief (i.e., the entire case took place with 1 anesthesia provider). Logically, cases of brief durations were more likely to have 1 anesthesia provider (i.e., be among the 10,141 cases without a break). The most common procedure overall was electroconvulsive therapy,^{h} accounting for just 0.2% of cases with breaks versus 5.5% of those without breaks. The second most common procedure in Figure 5 was extracapsular cataract removal with insertion of intraocular lens prosthesis. Again, this is a brief procedure.^{h} Consequently, the procedure accounted for 0.6% of cases with breaks and 4.6% of those without breaks. The same relationship applied to the third most common procedure, follicle puncture for oocyte retrieval, as well as the fourth, fifth, and sixth procedures. The seventh (laparoscopic cholecystectomy) most common procedure and all others each accounted for <1.0% of all cases. Given that the 6 most common procedures had substantial heterogeneity between cases with and without at least 1 break (e.g., because of anesthetic durations), there was small^{26} (<0.3) similarity of procedures between cases with breaks versus without breaks:

(Fig. 5).

Our focus in this section of the article is the comparison of similarity indices.^{k},^{30},^{61} The small^{26} similarity value of

seems reasonable looking at Figure 5. In contrast,

, a much greater value that suggests large^{26},^{60} similarity. Yue and Clayton^{7} previously showed the cause for this disparity.

gives unusually large values when shared procedures range in frequencies from very common to rare, which is precisely what holds for surgical procedures.^{6},^{25},^{35} This is one reason why we have been using

rather than

to assess similarity.^{26},^{30}

Chao et al.^{62} developed a correction to the

for procedures performed 0 or 1 times in each or both of the groups compared (e.g., hospitals as for Fig. 4 or cases with versus without breaks as for Fig. 5). See Equation 22 in the Appendix. Chao et al.^{62} found that their “adjusted estimate [was] always higher than the corresponding unadjusted one because of the presence in sample pairs of observed, shared, rare species.” Because there were many rare surgical procedures at the

hospital,^{25},^{35} with the adjustment, the resulting estimate was greater: 0.99 vs

. As described earlier, this resulted in even less relevance for our specific problem, because it was not that there was a good quantitative match (see preceding paragraph, Figure 5, and the section, above, Diversity of Procedures and Providers at Single Hospitals).

Just as for

, the Jaccard index (Equation 7) can overestimate the desired similarity, because both common and very rare procedures are treated equally, even though, for our perioperative applications, their abundances (frequencies) may differ >100-fold (Fig. 2). For the preceding data,

. That is not quite as large as

but still substantially greater than the more appropriate small^{26} value of

.

The other problem with the Jaccard index is what to do when all procedures are present in both groups, as then

and

. For example, among the cases performed by anesthesia providers, and with at least 1 break (or relief), the first (or only) break occurred before or within 12 minutes of the start of the surgical procedure (“early”) for 39.9% of the cases versus “late” for the remaining 59.1% of cases. The rationale for evaluating 12 minutes or less was from our previous study investigating when documentation of anesthetic events (e.g., of intubation and of drug administration) was complete.^{61},^{l} For this example, each individual anesthesia provider was used in lieu of the specific procedure (Figs. 4 and 5). There were 50 anesthesia providers working at the hospital during the studied period. As shown in Figure 6, the similarity among anesthesia providers was large^{26},^{30}:

. In other words, among cases with a break, there were only small differences among anesthesia providers in the timing of the break, early versus late. However, as shown by Figure 6, the relationship was not exact, unlike as expressed by

. Thus, the Jaccard index is not useful for this application because whether every anesthesia provider has at least 1 early break is essentially irrelevant.

Yue and Clayton developed an unbiased estimator of

, which is an alternative to the nonparametric maximum likelihood estimator of Equation 9. See Equation 23 in the Appendix. We use the maximum likelihood estimator, because there is no corresponding analytical value for the SE of the unbiased estimator, and the differences are minor. Comparing the procedures of cases with and without a break, the

. The corresponding unbiased estimate was 0.26. Comparing anesthesia providers with breaks before or within 12 minutes of the start of the surgical procedure versus afterward, the

. The corresponding unbiased estimate was 0.95. See the Appendix for explanation why the unbiased estimators are greater.

All the measures of similarity that we considered as alternatives to

resulted in greater estimates than those of

for the above applications (Equation 9). Although we do not routinely use these alternative measures for the reasons described earlier, we remain cognizant that

may underestimate similarity. Furthermore, we emphasize that the 2 other indices have advantages for ecologic settings. Our groups shown were hospitals (Fig. 4), presence of breaks (handoffs) (Fig. 5), and timing of breaks (handoffs; Fig. 6). In contrast, when a group is typically a geographic location where species of animals are counted, what species will be present is often unknown ahead of time. Yet, for our problems, we know every anesthesia provider who works at the hospital. In addition, if 1 or 2 of the anesthesia providers were handed a CPT manual, they likely could identify with only minimal error what procedures are performed at the hospital, and the few procedures that are the most common. However, if asked for the specific percentages of the cases with breaks that are accounted for by each of the 3 most common procedures, the estimates probably would be inaccurate (see Diversity Assessed with Weighting for Differences Among Procedures and Providers). Our interests in similarity are the quantitative relationships because operational activities, management monitoring, and hospital competition depend on the numbers.

### Sensitivity Analyses to Interpret an Observed Value of the Similarity Index θ

What matters for managerial decision making is the magnitude of similarity and investigation of outliers.^{26} As explained earlier, for comparisons of hospitals, omission of each of the 3 most common procedures produced the largest changes in the similarity index for 3 quarters of the 48 comparisons, and changing the least common procedures had no measurable effect.^{26} Applying this approach to the study, above, of similarity in the distributions among anesthesia providers of cases with breaks before or within 12 minutes of the start of the surgical procedure versus afterward, exclusion of the 3 anesthesia providers who received the least number of breaks resulted in the

, a negligible change of 0.0001. They were added back. Next, we considered the 3 anesthesia providers who received the largest number of breaks. They are the 3 (large) outliers to the right in Figure 6, shown using the dotted line square. Excluding each of the 3 in sequence from largest to third largest total number of cases with break resulted in

changing to

,

, and

, respectively. Excluding all 3 outlier anesthesia providers increased the similarity to

. Repeating the analysis of all 50 anesthesia providers, but using sums of anesthesia minutes for the cases with breaks instead of counts of cases with breaks,

(i.e., no difference from the sensitivity analysis). For our other analysis using procedures (Fig. 5),

. Three procedures accounted for the most cases: electroconvulsive therapy (0.2%, 5.5%), extracapsular cataract removal with insertion of intraocular lens prosthesis (0.6%, 4.6%), and follicle puncture for oocyte retrieval (0.1%, 3.5%). When each was excluded, the

,

, and

, respectively. When all 3 were excluded,

. Repeating the analysis with all procedures, but using sums of anesthesia minutes rather than cases,

. This is shown in Figure 7, but again is no different from the sensitivity analysis (i.e., we do not consider the analysis by count of cases to be a major limitation because sensitivity analyses are performed).

## DISCUSSION

### Evaluating Care at Handoffs

Potentially, an anesthesia provider’s quality of setup, documentation, etc., could be evaluated by another anesthesia provider who provides a break or handoff. However, anesthesia setup, documentation, etc., depend on the procedure. Furthermore, the incidence of breaks differs among procedures, whether analyzed by case (

, Fig. 5) or by hour (

, Fig. 7). Thus, when assessing the provider’s quality of setup, etc., the procedure needs to be included in the statistical analysis. As shown by Figures 1 and 3, what makes a comprehensive hospital comprehensive is the substantial diversity of procedures performed, especially among physiologically complex procedures.^{6},^{18} Consequently, at such hospitals, a sample size sufficient to estimate the anesthesia provider effect (i.e., the provider’s quality of setup, etc.) cannot be achieved while controlling for the procedure. These observations may be useful, because 2 institutions found that each increase in the number of handoffs was associated with greater morbidity.^{63},^{64}

## APPENDIX

### Estimates for Herfindahl Index

Let

refer to the number of cases performed at the

facility during the observation period (i.e., the sample size). Let

specify the procedure of the

case at the

facility,

. Let

refer to the

procedure,

, where

is the total number of procedures or combinations of procedures (e.g., from the corresponding dictionary). Finally, let the indicator

equal 1 if the value in the expression is true, and 0 otherwise. Then, the observed proportion of cases at the

facility that are of the

procedure is as follows:

In practice, data come in the form of

and the array (vector) operations of Equation 10 need to be performed.^{26}

Using the first-order Taylor series expansion (i.e., Delta method),^{m} the SE for the maximum likelihood estimate for

(Equation 2) equals^{6},^{7},^{65}:

From Appendix B of the study by Taplin,^{8} a more accurate estimate from using higher order terms equals:

where:

and

However, in the Herfindahl, Gini-Simpson Index, and their SEs, we show that, for our applications, the difference between SEs calculated using Equations 11 and 12 are in the fourth or fifth decimal places, respectively (i.e., negligible). Using the first-order Taylor series expansion,^{m} the SE of the inverse of the Herfindahl is approximately equal to Equation 11 or 12 divided by

.

The minimum variance unbiased estimator for the Herfindahl equals^{5},^{65}:

The corresponding estimate of the diversity of order 2

The measures of diversity of order

(i.e., “Hill numbers”) are given by^{5},^{66}:

To obtain the measure of diversity of order

, start with the Shannon entropy of the procedures at the

hospital^{5}:

The summation is from

to

(number of procedures observed at the

hospital and not to

(total number of procedures such as from a dictionary) to avoid the

. The diversity of order 1 is

A low bias estimator for

is given by Gotelli and Chao’s Equation 25b as follows^{5}:

### Estimate for the Similarity Index and its SE

Repeating Equation 8, from above,

The minimum possible value of

is obtained when the numerator equals 0, when no procedures are present at both facilities (i.e.,

). The maximum of

is obtained when

for all

. To understand why no greater value of

can be obtained, start with

. Expanding terms implies that the difference between the denominator and the numerator of Equation 8 is ≥0, implying that the numerator is no greater than the denominator.

From Comparisons of Similarity Indices, Based on Data from Within a Single Hospital, there were comparisons of

vs

and

vs

. However, one should not draw the conclusion that the relationships among

,

, and

are necessarily

and

. Such does indeed hold for

and

. However, consider

and

. Then,

and

.

(Equation 9) is asymptotically normally distributed.^{7} Let,^{6},^{7}

and

These are the nonparametric maximum likelihood estimators for the Herfindahl index of the 1st and 2nd hospitals. In addition, let

Then, the nonparametric maximum likelihood estimator for

is

. The estimate of its SE is the square root of its asymptotic variance^{6},^{7}:

where^{6},^{7}:

### Chao Et al.’s Correction of _{n} for Rare Procedures (that is, Species)

We rewrite Equation 6 as follows:

By using the notation from Chao et al.,^{62} let

“be the observed number of shared species” (e.g., procedures) “that occur” precisely “once” in the first group. “These species must be present in the” second group, “but may have any frequency.”^{62} “Let

be the observed number of shared species that occur twice in” the first group.^{62} Similarly, “define

and

to be the observed number of shared species that occur, respectively, once” and twice in the second group.^{62} “When

or

,” replace “

and

by

and

, respectively.”^{62} Doing so was necessary for our example with 50 anesthesia providers, because the minimum number of cases among the 50 anesthesia providers was 8 for the first group and 13 for the second (i.e.,

). Then, substitute the following into Equation 22 as follows^{62}:

and

If the value of

or

is > 1, which can happen, set it equal to 1.^{62}

### Yue and Clayton’s Unbiased Estimator of Similarity (Equation 8)

Yue and Clayton’s^{7} nonparametric maximum likelihood estimator

in Equation 9 is consistent but biased. The (small) bias arises because the estimate in Equation 10,

is a biased estimate of

. The unbiased sample variance is obtained by multiplying by

.^{n} Yu and Clayton^{7} suggest the unbiased estimate:

The same substitution is made for

. The adjusted estimates are used in Equation 8. However, Yu and Clayton^{7} do not provide an analytical expression for the SE of the resulting adjusted estimate of

. For our data, the unbiased estimator of similarity is greater than the corresponding nonparametric maximum likelihood estimator of

. The

and

are much > 1. Thus, the denominators of

and

are effectively the same. However, the numerator is smaller by subtracting the 1. Because

and

are smaller and appear in the denominator of Equation 9 for

, the consequence is that the unbiased estimator is greater. For further consideration, see the corresponding section in the article: Comparisons of Similarity Indices, Based on Data from Within a Single Hospital.

## DISCLOSURES

**Name:** Franklin Dexter, MD, PhD.

**Contribution:** This author helped design the study, conduct the study, analyze the data, and write the manuscript.

**Attestation:** Franklin Dexter has approved the final manuscript.

**Name:** Johannes Ledolter, PhD.

**Contribution**: This author helped analyze the data and prepare the manuscript.

**Attestation**: Johannes Ledolter has approved the final manuscript.

**Name:** Bradley J. Hindman, MD.

**Contribution:** This author helped conduct the study and write the manuscript.

**Attestation:** Bradley J. Hindman has approved the final manuscript.

## RECUSE NOTE

Dr. Franklin Dexter is the Statistical Editor and Section Editor for Economics, Education, and Policy for *Anesthesia & Analgesia*. This manuscript was handled by Dr. Steven L. Shafer, Editor-in-Chief, and Dr. Dexter was not involved in any way with the editorial process or decision.

## ACKNOWLEDGMENTS

The authors thank Jennifer Espy, BFA, who assisted in editing, and David Griffiths, BS, who assisted in computer programming.

**FOOTNOTES**

a http://FDshort.com/AHRQSurgeryFlag. Accessed May 24, 2015.

Cited Here...

b The data are more complicated, because counts of procedures at each hospital are what were available, each procedure being of a specified type of procedure.^{6} Some surgical cases include more than 1 procedure (e.g., myringotomy tube insertion bilaterally is 2 procedures both of the same type). For simplicity of wording in the current article, we refer to “procedure” as the type of procedure and to the count of procedures as “cases.” The sole implication for the current article is that the listed sample sizes for Figure 1 exceed the number of cases. This has no influence on the current article other than briefer wording and similarity with the new applications presented.

Cited Here...

c Throughout this article, at facilities *j* = 4 and *j* = 5, the phrase “anesthesia providers” refers to Certified Registered Anesthesia providers. At the facilities, the anesthesia providers bill independently. However, for all cases, a faculty anesthesiologist provides clinical oversight directed toward assuring the quality of clinical care.^{9–13} In contrast, for Figure 2, an anesthesia provider could be any type including anesthesia resident or anesthesiologist.^{1},^{2}

Cited Here...

d The facilities in this article overlap. For example, the *j* = 5th hospital’s data includes that of the ambulatory surgery center, *j* = 4. We use the numbering for convenience so that we do not need to introduce the sample sizes and population repeatedly. The numbering is essentially that of sequential examples.

Cited Here...

e The data are from December 1, 2013, through January 24, 2015. The 7 eight-week periods are from December 29, 2013, through January 24, 2015. The American Society of Anesthesiologists’ 2014 Crosswalk has base units for 5426 different CPT.

Cited Here...

f Throughout the article, when we refer to anesthesia providers, our counts refer to cases started by an anesthesia provider, and, of course, usually finished by an anesthesia provider, but our count starts. The classification by anesthesia providers is by the anesthesia provider starting the case.

Cited Here...

g The estimated evenness factor equals 79.3%, where 0.793 equals the 39.67 divided by the population size of *S* = 50 anesthesia providers.^{19} The estimation of evenness is reliable for that calculation because there are no very rare “species” (i.e., no anesthesia provider performing only a few cases). The estimate of evenness is unreliable (i.e., highly sensitive to)^{27} for counts of procedures, because many procedures and combinations of procedures are rare, as shown in Figure 2.

Cited Here...

h The procedures’ anesthesia times (mean ± SE) were 19.9 ± 0.2 minutes for electroconvulsive therapy (*n* = 576), 37.6 ± 0.8 minutes for cataract extraction (*n* = 510), and 38.8 ± 0.5 minutes for oocyte retrieval (*n* = 360). However, it was not that the brevity accounted for the differences in the Herfindahl indices between groups, because the analysis used only the counts of cases. Rather, there were many such cases of the procedures because they were brief. Other procedures were brief but were performed less often: ventilating tube removal (69,424) 24.6 ± 2.1 minutes (*n* = 13), removal of nonbiodegradable drug delivery implant (11,983) 30.2 ± 2.1 minutes (*n* = 10), and manipulation of knee joint under general anesthesia (27,570) 37.5 ± 2.4 minutes (*n* = 17).

Cited Here...

i http://FDshort.com/CCS_ICD-9, http://FDshort.com/CCS_ICD-10, and http://FDshort.com/CCS_CPT. Accessed June 6, 2015.

Cited Here...

j http://FDshort.com/ASA2014Crosswalk. Accessed June 6, 2015.

Cited Here...

For readers intrigued by the issue of breaks, among cases with at least 1 break, the similarity of procedures was moderately large^{26},^{30} (

Cited Here...

Cited Here...

) between cases with the first (or only) break occurring before or within 12 minutes of the start of the surgical procedure.^{a} The anesthetic durations were 2.76 ± 0.04 hours among cases with the break early in the case versus 2.81 ± 0.03 hours among cases with the break >12 minutes^{61} after surgery began. The area under the receiver operating characteristic curve was only 0.51 (*P* = 0.19 by Wilcoxon Mann-Whitney *U* test). Thus, it was not that breaks were more commonly given early in the case among briefer procedures.

Cited Here...

l By using anesthesia information management systems data, we evaluated the times of entries of comments, drugs, fluids, and periodic assessments (e.g., electrocardiogram diagnosis and train-of-four).^{61} Essentially, we counted the timing of mouse clicks. More than half of ongoing cases had completed initial documentation by 13 minutes.

Cited Here...

m http://FDshort.com/WikipediaTaylor. Accessed May 13, 2015.

Cited Here...

n http://FDshort.com/SampleVariance. Accessed April 25, 2015.

Cited Here...