The operating room (OR) is a major production unit in every hospital. For hospitals, the main 2 operational risks of ORs consist of high idle times (i.e., underutilized OR time) and work outside regular hours (i.e., overutilized OR time). Frequent work beyond scheduled hours not only leads to overtime costs but also to intangible costs resulting from dissatisfaction and reduced motivation of staff. Overtime work is one of the primary reasons for nurses to terminate their employment,^{1} and scheduling conflicts are a major cause of nursing staff turnover.^{2} Therefore, efficient OR management should aim for maximal use of available OR time while preventing frequent overtime work.^{3} OR schedules depend crucially on estimated case durations, and statistical models may help to improve these estimates to support management in the cost-efficient use of expensive surgical resources.

Herein, we provide a brief review of some relevant results in the literature on case-duration distributions and case scheduling. Early results show that OR waiting times follow a 2-parameter lognormal distribution,^{4} and that OR operation times follow a distribution that is normal^{5} or lognormal.^{6} Knowledge of the probability distributions of case durations has advanced markedly in the past decade.^{7–9} The single most important source of variability in surgical procedure times is surgeon effect. Type of anesthesia, age, gender, and American Society of Anesthesiologists risk class were additional sources of variability.^{7} In another study, Strum et al.^{8} tested surgeries with 2-component procedures. The conclusion is that dual Current Procedural Terminology (CPT) surgeries were better modeled by the lognormal distribution than by the normal distribution. Surgical procedure times are frequently distributed with nonzero start times that require a lognormal model with a shifted parameter for best model estimates.^{9} Decision rules based on the skewness and coefficient of variation of the data can be used to identify the correct alternative 78% of the time, but do not do any better than a single rule based on the skewness.^{9} The way in which the lognormal location parameter is estimated affects the ability of goodness-of-fit tests to correctly recognize the model and the accuracy of percentile point values derived from the estimated model.^{10}

An empirical study^{11} has shown that surgical time and total procedure time are lognormal distributed. Surgical procedure time fits the lognormal distribution for 93% of all CPT codes, whereas surgical time fits normal distribution for about 80% of all CPT codes studied.

For some of the scheduled cases, there are few or no data available, making statistical modeling difficult. These cases can disproportionately affect decision making under uncertainty because no sufficient data-driven recommendations could be obtained. Several studies have tried to solve this problem of few or no cases.^{12,13} Dexter and Ledolter^{12} validated a practical way to calculate prediction bounds and compared the OR times of all cases, even those with few or no historic data for surgeon and the scheduled procedure(s). The conclusion of this study is that when historic data are available, they should be used in combination with the scheduled OR time. Historic data provide value in estimating the proportional variation in OR time. Finally, the scheduled OR time alone is nearly as good a predictor of the expected mean OR time of a new case as the Bayesian method.

In another study,^{14} elective case scheduling at hospitals and surgical centers at which surgeons and patients choose the day of surgery, cases are not turned away, and anesthesia and nursing staffing are adjusted to maximize the efficiency of use of OR time. In this study, 2 patient-scheduling rules are investigated: Earliest Start Time or Latest Start Time. In this study, the achievable incremental reduction in overtime by having perfect information on case duration versus using historical case durations was only a few minutes per OR. The differences between Earliest Start Time and Latest Start Time were also only a few minutes per OR. There are cases that have a high probability of taking longer than scheduled. Increasing the case’s scheduled duration could then reduce overutilized OR time.^{15} Dexter et al.^{15} studied surgeons’ and schedulers’ case scheduling behavior to evaluate whether such a strategy would be useful. The impact of inaccurate, scheduled case duration on staffing costs and unpredictable work hours can be reduced by allocating appropriate total hours of OR time (i.e., staffing) for the cases that will get done, regardless of the inaccuracy of the scheduled durations of those cases.

There are many other studies related to optimally scheduling cases.^{16–23} All these studies contribute to optimizing the use of scarce and costly ORs.

Based on the above-mentioned studies, we can conclude that gains in OR scheduling efficiency may be obtained by using accurate statistical models to predict surgical and procedure times. Therefore, the 3 main contributions of this article are the following: (i) the validation of the results of Strum et al. on the statistical distribution of case durations, including surgeon effects, using OR databases of 2 European hospitals, (ii) the use of expert prior expectations to predict durations of rarely observed cases, and (iii) the application of the proposed methods to predict case durations, with an analysis of the resulting OR efficiency.

## METHODS

In this section, we first present our database, then describe our methods.

### Data

We retrospectively reviewed all recorded surgical cases from 2 large teaching hospitals from 2005 to 2008 (total 85,312 cases). Because there were differences in case duration based on type of anesthetic used, we classified the CPT codes by type of anesthesia: general, local, and regional.^{8,24} Monitored anesthesia care is not a type of anesthesia used in the hospitals under study. We use the following definitions: Surgical Time = the time from incision to closure of the wound; Procedure Time = time when patient enters the operating suite until the patient leaves the OR. To detect the influence of sample size on the Shapiro-Wilk^{*} test, we divided the sample size into very small (*n* < 10), small (10 ≤ *n* < 30), medium (30 ≤ *n* < 200), and large (*n* ≥ 200). In Table 1, we present the dataset for Hospital A. For every case-frequency interval, the number of CPT codes, the number of cases, and the total hours spent for these cases in the period 2005–2008 is shown.

There was a total of 44,223 cases, of which 289 (0.7%) cases were omitted because of incomplete data. In 15 cases (0.03%), the operation was canceled, although the patient received anesthesia, and in 3 cases, a donor procedure was performed. In our analyses, we have 43,916 cases (1172 CPT-anesthesia combinations), with hours totaling 48,204.

There were 37,848 cases (39,296 h) with 1 CPT-anesthesia combination, 5177 cases (7312 h) with 2 CPT codes, and 891 cases (1596 h) with more than 2 CPT codes. The average number of cases per year with 2 CPTs was 1294 (median 1305, min 1165, and max 1401). For CPTs with more than 2 codes, the average was 222 cases (median 221, min 201, and max 247).

To eliminate a potential confounding factor† in our study, we considered only surgical procedures with a single CPT code. Therefore, we confined our analysis to 37,307 cases with a case frequency of ≥10 (737 CPT-anesthesia combinations).

We broke down the CPTs according to the various surgeons (Table 2). There were 30 surgeons and 6349 CPT-anesthesia-surgeon combinations (43,916 cases, 48,204 h, Table 3). If we differentiate combinations with at least 10 cases per surgeon and 1 CPT-anesthesia code, 1341 CPT-anesthesia-surgeon combinations remain (32,347 cases, 34,512 h). Regardless of the number of CPT codes, of the 1172 CPT codes, there are 318 CPT-anesthesia combinations (1004 cases, 1717 h), which were performed <10 times in a period of 4 yr. Of the 43,916 cases scheduled, for 46 cases (0.1%), the actual procedure code was different than the scheduled code. In 132 cases (0.3%), the actual surgeon was different than the scheduled surgeon.

In Table 1, the dataset for Hospital B is presented. There was a total of 41,916 cases, of which 520 (1.2%) cases were omitted because of incomplete data. The analysis was limited to 41,396 cases (942 CPT-anesthesia combinations, 43,895 h). There were 38,075 cases (38,308 h) with only 1 CPT code, 2707 cases (4531 h) with 2 CPT codes, and 614 cases (1056 h) with more than 2 CPT codes. The average number of cases per year with 2 CPTs was 676 (median 687, min 634, and max 699). For CPTs with more than 2 codes, the average was 153 cases (median 151, min 143, and max 166).

As in Hospital A, we considered only cases with 1 CPT code and each CPT-anesthesia combination with a case frequency of 10 or more. We confined our analysis to 37,313 cases (570 CPT-anesthesia codes).

There were 24 surgeons (Table 2) and 4473 CPT-anesthesia-surgeon combinations (41,396 cases, 43,895 h Table 3). If we differentiate combinations with at least 10 cases per surgeon, 1147 CPT-anesthesia-combinations remain (30,274 cases, 32,927 h). Of the 41,396 cases scheduled, in 28 cases (0.07%), the actual procedure code was different than the scheduled code. In 89 cases (0.2%), the actual surgeon was different than the scheduled surgeon.

Next, we describe in detail what we have studied and how the study was performed.

### Fitting the Normal and 2- and 3-Parameter Lognormal Models for 1-CPT-Anesthesia-Surgeon Combinations with Case Frequency ≥10

We repeated the work of Strum et al.^{9–11} for the normal and 2- and 3-parameter lognormal modeling of surgical procedure times. Repeating the work of Strum et al. is important scientifically because replication of research is a way to refine our understanding of modeling surgical cases. The 3-parameter lognormal model is of interest because surgical procedure times are frequently distributed with nonzero start times that require a lognormal model with a shift parameter for best model estimates.^{9,11} A nonzero start time means that minimum surgical procedure times, even for the simplest procedures, are strictly positive. As is assumed,^{11} the percentage of cases that fit the lognormal model can be even higher when segmented by the factor surgeon. Therefore, we validated whether performed procedure times and surgical times of CPT-anesthesia-surgeon combinations fit a normal, 2-parameter, or 3-parameter lognormal distribution.

The general formula for the lognormal model can be described as follows:

for *x* > *θ*

where *θ* = shift parameter for duration data *θ* > 0.

The case where *θ* = 0 is called the 2-parameter lognormal model.

For the 3-parameter lognormal model, we estimated the shift parameter by using a modified version of the approach of Spangler et al.^{10} The shift parameter describing the location or origin of the random variable is important for decision making, because it provides a lower bound on values of the random variable.^{10} First, for every CPT-anesthesia (surgeon) combination, we calculated the natural logarithm of surgical time and procedure time. We then used the bisection method to estimate the shift parameter, so that we could estimate 3 parameters for each combination of surgeon(s) and procedure(s). The bisection method we used is as follows:

Set LOWER = 0.

Set UPPER = smallest observed value.

Initial GUESS = (LOWER + UPPER)/2.

Subtract GUESS from all observed values, take the logarithm, and estimate the mean and sds, then recalculate the Shapiro-Wilk *P* value (= *P*_{new}). We repeated this iteratively using bisection to find the shift parameter that results in the largest value of the *P* value. We chose to stop the iteration if (*P*_{new} − *P*_{old})/*P*_{new} × 100% < 1% or if *P*_{new} < *P*_{old}. If the final *P* value was larger than 0.05, we did not reject the hypothesis of the normal model.

### Estimation with Specialist Prior Guess

If very few data are available (*n* < 10), it may help to use prior information to obtain more reliable estimates of the time distribution. Therefore, we present a method to estimate the mean procedure time from prior and actual data for procedures with <10 cases. It is well known that Bayes theorem provides a mechanism for combining a prior probability distribution for the states of nature with sample information to provide a revised (posterior) probability distribution about those states of nature. These posterior probabilities are then used to make better decisions. Our approach differs from that of Dexter et al.^{12,25} in the way that we used the surgeon’s prior statement on the distribution in terms of quantiles of the operation time.

To obtain the prior information required, we asked surgeons to make a prior statement on the distribution of the procedure time for cases with a frequency <10. For a given procedure, we asked surgeons in the period October–December 2008 before they started the scheduled case to make an estimation in terms of quantiles (25%, 50%, 75%, and 95%) of the time distribution of a procedure. With this information, we were able to update our uncertainty because of new evidence. In the analyses, we used the 2-parameter lognormal model in which the mean and variance were calculated from a weighted mean of the actual data and the prior data. Furthermore, we assumed that the specialists do not remember the previous operation times, so that all calculated times (past and current) can be treated as containing similar information. Next, we explained our model for using prior information in mathematical terms.

Let *T* denote the procedure time and let ln(*T*) be its natural logarithm. Assuming a 2-parameter lognormal model for the procedure time, it follows that ln(*T*) is normal with mean *μ* and variance *ς*^{2}. We then had to combine the prior and actual data information to estimate the mean *μ*. Let *m* be the prior mean and *s* the prior variance of ln(*T*). Let the 25%, 50%, 75%, and 95% quantiles of ln(*T*) be denoted by *T*25, *T*50, *T*75, and *T*95, then the normal distribution implies that:

T25 = m − 0.675 s

T75 = m + 0.675 s

T95 = m + 1.645 s

For example, with 2 prior estimates made by Specialists 1 and 2, we estimated the following model:

where [−0.675 0 0.675 1.645] is the vector with corresponding *z*-values. This vector provides the regressor needed to estimate location and scale of the lognormal prior distribution corresponding to the quantiles. The vector is repeated for each specialist.

When we have data for *j* specialists, hence 4*j* times, we get 4*j* equations with given values on the left-hand side and with unknown values of *m* and *s*. This can be seen as a regression model with 2 unknown parameters, *m* (the constant term) and *s* (the slope). By applying regression, the constant term *m* will be the sample mean of the ln(quantiles) values, and the slope *s* can also be computed quite easily. The prior mean of the operation time is exp (*m* + 0.5 *s*^{2}), and the prior variance is (exp[*s*^{2}] − 1) × exp(2*m* + *s*^{2}). The prior sd is √ (prior variance). We take the value of *m* as the prior mean *xs** and the value of 1/*s*^{2} as *τ*. The posterior mean is then given by the formula:

The resulting weight is *w* = *τ*/(*τ* + *n*) and the posterior mean is equal to:

Note that this is the posterior mean of the log-times. The mean of the actual times is given by exp(*μ** + 0.5*ς**^{2}), where *ς**^{2} is the posterior variance. The prior variance is *s*^{2} and the data variance is *ς*^{2}.

An intuitive method is to weight these 2 values in the same way as was done for the mean, so that

Combining these results, we get: the posterior for the operation times is lognormal with mean *μ** and variance *ς**^{2}. The mean of the operation times is then given by:

### Improving Coupling Between Estimates of Scheduled Time and the Actual Procedure Time

When reserving OR time for a procedure, the OR management needs to balance the costs of reserving too much time against the costs of reserving too little.^{25} If too much time is allocated to a case, expensive OR capacity is likely to be wasted, leading to a decrease in OR utilization.^{12–14,16,17,21–23,27} With too little allocated capacity to a surgical case, the OR schedule must be modified, resulting in idle OR times and increased demand for anesthesiologists, nurses, and support staff. Improving coupling between estimates of scheduled time and the actual time reduces the prediction error of a scheduled surgical case. By using a simulation, we compared the effect on the prediction error of scheduling cases when applying 3 different case-modeling methods. The first method of estimating scheduled case durations is based on taking the trimmed mean time of the last 10 case durations. The second method uses the bias-corrected scheduled OR time. This method is based on the following linear regression based on data from 2005 to 2007: actual OR time = intercept + slope × (scheduled OR time). This regression shows how much better it is for purposes of choosing how long to schedule a case (when compared with lower/upper prediction bounds or times remaining in cases) to use statistically based methods compared with simple adjustment of the scheduled OR time. The last method uses the mean of the 3-parameter lognormal model.

To make it possible to compare the outcome of the 3 methods, only procedures with a case frequency of 10 or more were used, with 1 CPT-anesthesia code and fitting the 3-parameter lognormal model. We used the data available (from the sample). Historical data from 2005 to 2007 were used and then the window was expanded to include predictions made on each day in 2008 using data from 2005 to 2007 and from 2008 until the day before making the prediction. The originally scheduled sequence of cases was not changed. For instance, when scheduling an inguinal hernia repair (Lichtenstein) on January 2, 2008, only historical data up to and including January 1, 2008 were used. The actual time on January 2, 2008 was used for scheduling this procedure for January 4, 2008.

The difference between the actual OR time of a procedure is compared with the scheduled procedure time as calculated by each of the 3 methods. If the actual procedure time is larger than the scheduled time, that procedure is underreserved. Otherwise it is overreserved. For each method, the number of under- or overestimated procedures is counted as the mean under- and overestimated time per case. Differences in the mean under- and overestimated time per case between the 3 methods were tested with an paired *t*-test.

### OR Inefficiency

OR inefficiency was defined as the sum of underutilized OR time and overutilized OR time, multiplied by the relative costs of overtime.^{16,23,26} Underutilized time was hours of staffed operating time at straight-time wages, but not used for surgery, setup, or cleanup of the OR. Overutilized time was hours after OR time, staffed at overtime. The relative cost of overtime in our study was 1.50. The cost per hour of overutilized OR time includes indirect costs, intangible costs, and retention and recruitment costs incurred on a long-term basis as a result of staff working late. Due to fixed OR capacity in our hospital (8 am–4 pm), the short-term objective in maximizing OR efficiency is to reduce overutilized OR time.^{15} In Hospital A for example, the mean end time of all ORs running after 4 pm is 4.19 (±17) min.^{3}

We analyzed the effect of the different methods of case-duration prediction on OR efficiency. In the first method, we used the trimmed mean of the last 10 case durations; in the second method, we used the bias-corrected scheduled OR time; and the mean of the 3-lognormal model in the last method. Case scheduling with original cases in 2008 was used. For each method, add-on elective cases with their concomitant turnover times were scheduled daily. Best Fit Descending was used, which is an off-line algorithm in which add-on elective cases are sorted based on longest to shortest with fuzzy constraints. Cases were considered in the order specified by the algorithm. If no OR had sufficient open time available for the case, and if sufficient open time was available in the OR with the most remaining time provided, the scheduled duration of the case was shortened by ≤15 min, then the case was assigned to the OR with the most remaining time.^{28}

For all cases (2 or more CPTs, and procedures with case frequency < 10) that are not meeting the criteria, we used the actual case duration as the scheduled duration (i.e., perfect retrospective knowledge).

After scheduling the cases and knowing the actual OR times of these same cases, the mean overutilized OR time was calculated considering each OR-day to be independent of all others. Differences in the mean overutilized OR time between the 3 methods were tested with a paired *t*-test.

### Statistics and Software

The null hypothesis of the Shapiro-Wilk test statistic (*W*) is that a sample is from a normally distributed population. Thus, *P* < 0.05 for *W* rejects this supposition of normality. Most authors agree that this is the most reliable test for nonnormality for small to medium-sized samples.^{29–37} To perform the Shapiro-Wilk test, we used StatsDirect statistical software and also SPSS15, Excel 2007, and COBOL. Normal probability plots were examined visually for those CPT-anesthesia-(surgeon) combinations that were not well fitted by either the normal or lognormal models. We analyzed Q-Q–P-P and box plots to confirm the results of the Shapiro-Wilk test. Examination of the calculated skewness and kurtosis, and of the histogram, box plot, and normal probability plot for the data may provide clues as to why the data failed the Shapiro-Wilk. In our database, start and end of anesthesia time, surgical time, and procedure time are recorded exactly (to the minute). D’Agostino^{29} indicated that the Shapiro-Wilk test can be affected by rounding.

## RESULTS

### Fitting the Normal and 2- and 3-Parameter Lognormal Models

In some of the procedures, we found outliers. In the database, there is a so-called “remark field” in which unexpected events during an OR are entered. The outliers we encountered were attributable to logistical problems (16 times) in the OR, surgeon arriving late (12 times), and OR team not ready (4 times). These outliers can be seen as incidental, so we removed these data. Table 4 shows the results of fitting CPT-anesthesia groups to the normal and the 2- and 3-parameter lognormal models for both hospitals separately.

If we look at Hospital A for the CPT-anesthesia combinations, then procedure times fit the normal model 37.9% and surgical time 52.5%. The fits for the 2-parameter lognormal model (*P* ≥ 0.05) are 57.7% and 69.6%, respectively. For the 3-parameter lognormal model, the fits for procedure time (*P* ≥ 0.05) are 80.6% and 84.1% for surgical time. If we differentiate CPT-anesthesia-surgeon combinations, then procedure times fit the 2-parameter lognormal model (*P* ≥ 0.05) in 70.4% of the combinations. The results for surgical times are 79.6% (Table 5). For the 3-parameter lognormal model, the fits for the procedure time are 87.6% and 90.7% for surgical time. The results for Hospital B are approximately in line with those for Hospital A.

We tried to understand why surgical time fits the normal and 2- and 3-parameter lognormal models better than procedure time. Procedure time consists of 3 main activities: administering anesthesia, preparing the patient for surgery, and performing the actual surgery. For the cases under study, the proportion of surgery time is on average 75% of the total procedure time. Preparation time and anesthesia time are 18% and 7%, respectively. While preparing the patient for surgery, relatively more OR staff members are involved in various activities and protocols compared with administering anesthesia and surgery. To better understand this, for every CPT-anesthesia code, we tested both the anesthesia time and preparation time for the 2-parameter lognormal model. With *P* ≥ 0.05, 92.5% of anesthesia time is lognormally distributed, whereas 17.6% of the preparation time shows a fit to the lognormal model. Hence, preparation time is poorly modeled compared with anesthesia time. This could explain why procedure time is less well modeled for the lognormal model than surgical time.

Table 6 is a paired comparison of the 2- and 3-parameter lognormal models and the normal model using the Friedman test. We compared the normal model with the 2-parameter lognormal model and the 3-parameter model. The 2-parameter lognormal model was superior to the normal model for modeling procedure time and surgical time. The 3-parameter lognormal was superior to the 2-parameter lognormal model and normal model. Surgical time was estimated better than procedure time when modeling with both the 2- and 3-parameter lognormal models and the normal model.

### Estimation with Specialist Prior Guess

In the Results section, we focused (arbitrarily) on the total thyroidectomy procedure (Table 7). The results of other procedures are found in Table 8. The 2 procedure times (261 and 198 min) are calculated after combining the prior statements of the specialists with the previously calculated times. Because we have data for 2 specialists, and therefore 8 time estimates, we get 8 equations with given values on the left-hand side, the values in the column “ln(quantiles) (Table 7),” and with unknown values of *m* and *s*.

In Table 7, we show the output of SPSS. The *R*^{2} of this regression is 0.85, indicating a good fit. The outcomes are *m* = 4.947 and *s* = 0.287. In other words, the prior statements of the specialists can be translated as a lognormal model with a mean of 4.947 and sd of 0.287. The prior mean of the operation time is 147 min, and the prior variance is 1.847. The prior sd is 43 min. We took the value of *m* = 4.947 as the prior mean *xs** and the value of 1/*s*^{2} = 1/0.287^{2} = 12.14 as *τ*. The resulting weight was *w* = 0.574, and the posterior mean (Eq. 3, Methods) was equal to 5.162.

The prior variance, *s*^{2}, is 0.0824, and the data variance is 0.1459. Weighing these 2 values as was done for the mean (Eq. 4, Methods) gives a value of 0.109 for *ς**^{2}. Combining these results, we get the posterior for the operation times, which is lognormal distributed with mean *μ** = 5.162 and variance *ς**^{2} = 0.109. The mean of the operation times is then 184 min. Note that the prior mean was 147 min, and the data average time was 249 min. The posterior mean of 184 lies closer to the prior mean than to the data mean. This is because the prior distribution has a relatively small sd (43 min) as compared with that of the data (90 min) and because the number of data points (9) is small.

If we wish to determine, for instance, a 95% upper bound for the operation time, then this is done by estimating the 95% bound for the log-times. In our example, the log-time has normal posterior with *μ** = 5.162 and variance *ς**^{2} = 0.109, so that *ς** = 0.330. The 95% upper bound for the log-time is then *μ** + 1.645*ς** = 5.705. The bound for the time itself is then exp (5.705) = 300 min.

Table 8 presents the results for the data mean (sd), prior [mean time, (sd)] and posterior [mean time, sd] for 30 procedures. From this table, we see that the posterior mean is a weight of the data mean and the prior mean. The variance of the posterior mean always lies between the data variance and prior variance.

### Improving Coupling Between Estimates of Scheduled Time and the Actual Procedure Time

In Hospital A (Table 9), under the standard method, the average overreserving per case is 22.9 min (21.4 min), whereas the average underreserving is 21.6 min (18.8 min).

The result of the regression is: actual OR time *=* 18.16 + 0.88 × (scheduled OR time) with standard error of the constant 0.30 and slope 0.04 (*P* < 0.0001), *R*^{2} 0.55.

Applying the biased regression, then the average overreserving per case is 16.3 min (9.4 min), whereas the average underreserving is 12.6 min (7.6 min). For the 3-lognormal model, the results are 12.9 (8.4) overreserving and 9.6 (5.4) underreserving. The average overreserving and underreserving among the 3 methods is significant (*P* < 0.001). The results for Hospital B are in line with Hospital A (Table 8).

### OR Inefficiency

In Hospital A 12,138 cases were scheduled. The mean overutilized OR time (min) per OR per day for the standard method is 23.4 (22.7–24.0), for the biased corrected mean time 16.6 (16.1–17.2) and the 3-lognormal 6.6 (6.2–6.9). For Hospital B 8,794 cases were scheduled. The mean overutilized OR time per OR per day for the standard method is 30.6 (29.6–31.5), for the bias-corrected mean time 22.2 (21.4–22.9) and for the 3-lognormal model 10.6 (10.1–11.2).

## DISCUSSION

Modeling the distribution of OR cases is one of the key steps in a planning process. In our study, the focus is more on decision making before the day of surgery. In other studies^{12,16,25} the focus is toward decisions on the day of surgery. These do not involve average OR times, but rather lower prediction bounds, upper prediction bounds, and especially times remaining in cases. Both focuses are helpful in the effective scheduling and efficient use of expensive surgical resources. We find that the percentage of cases fitting the normal and 2- and 3-parameter lognormal models is higher for surgical time than for total procedure time (the opposite was true for Strum et al.^{11}). The evidence supports the idea that type of surgery is the most important single source of variability among surgeries.^{7} Using the bisection method and applying the 3-parameter lognormal model fits procedure time and surgical time better than the 2-parameter lognormal model without shift parameter. This can be explained by the fact that the 2-parameter lognormal model is a limitation of the 3-parameter lognormal model. When segmenting to the factor surgeon, the fits are even higher for the 2- and 3-parameter lognormal models. One could ask why the fits are better with CPT-anesthesia-surgeon segmentations. Offering an *a priori* hypothesis, Strum et al.^{7} suggest that this may be attributable to surgeon work rates. If Strum et al. are correct, then segmentation into surgeon-specific groups should result in more homogeneous work rates and thus a better fit to the lognormal. Another reason is that, because of further segmentation, the number of available cases reduces and, because of this reduction of cases, the *P* values will increase. This could also explain why the lognormal model fits for the CPT-anesthesia-surgeon combinations are higher. We confirm as in other studies that small groups have a better fit than the medium and large groups. This lack of discrimination relates to the design of the statistical tests. D’Agostino, Shapiro and Wilk, and others^{29,31–37} discuss the fact that goodness-of-fit tests become more discriminating as the sample sizes increase. Conversely, it may be obvious that samples with *n* < 10, for example, may indiscriminately fit almost any model.

If few data are available, the use of prior information given by the surgeon may lead to a better estimation of the case duration. Because the posterior distribution contains all the information we need to make statistical decisions, we can use it for predicting case durations and case scheduling. The uncertainty of the posterior data is less than when using only the data without prior information. On the other hand, if the amount of historical data for a specific procedure increases, the usefulness of the prior information will decrease. This is because with an increasing number of observations, the sample mean will determine the outcome. Our approach differs in some respects from the classical one as discussed, for instance, by Dexter et al.^{12} This is caused by the fact that we have prior data that are quite informative and that can be translated in terms of a lognormal prior distribution. In the classical approach, the prior on the 2 parameters *μ* and *ς*^{2} consists of 3 parts:

- For given
*ς*, the (conditional) prior for *μ* is normal. The prior for *ς* is inverted *γ*.
- The (unconditional, marginal) prior for
*μ* is a *t*-distribution.
- The (unconditional, marginal) posterior for
*μ* is (another) *t*-distribution.

Our prior information is not directly related to mean and variance, but can be translated to mean and variance of the normal distribution (of the log-times). Therefore, we combined a normal prior with a normal distribution of the observed data. However, in applying the calculation rules to get the posterior, we used the classical framework, which is not fully consistent. However, the central formulas (equations 2 and 4) have a direct intuitively appealing interpretation that also applies in our framework: we take a weighted average of the prior and data information, and the weights are inversely proportional to the uncertainty involved in both types of information: proportional to *τ* = 1/*s*^{2} = 1/(prior variance) and to *n* = 1/(1/*n*) = 1/(data variance).

If we wish to more closely model the classical set-up, we need to estimate the parameters *α* and *β* of the prior (inverted *γ*) distribution of the sd. These 2 parameters can be estimated by considering all other types of operations and modeling the resulting set of (inverted) sample variances for all these types of operations as in the study by Dexter et al.^{12} The (marginal) posterior of the mean (of the log-times) then becomes *t* instead of normal.

Furthermore, our results for CPTs with few data may potentially be useful if the data from the 2 hospitals were compared with findings in another study.^{12} The latter article did not find the Bayes method to have important value for the mean. The overall effect for every case including those with multiple CPTs would be needed.

Finally, we find that compared with the standard way of case scheduling using the mean of the 3-parameter lognormal distribution for case scheduling reduces the mean overreserving OR time per case up to 53.1% and the underreserving OR time up to with 55.6%. Using the 3 parameter lognormal model for case scheduling causes a lower mean overutilized OR time up to 20.0 (19.7–20.3) min per OR per day as compared with the standard method and 11.6 (11.3–12.0) min per OR per day as compared with the bias-corrected scheduled OR time.

### Limitations

The prior information could be misleading when the prior variance is too small, because specialists may underestimate the variance. Surgeon case durations for specific procedures may change progressively, for example, as a result of subtle changes in the demographics of a patient population.^{15} We asked specialists in every specialty if they were aware of these changes. None recognized that these changes had occurred in the past 4 yr. We assumed that the specialists do not remember the previous operation times, so we treated all realized times (past and current) as containing similar information. In practice, surgeons may or may not actually remember historical case durations. Although the studied procedures have a relatively low occurrence and are performed by different surgeons, we believe that there may be an effect of the memory of an individual surgeon on the results but it will be very small.

In the simulation for case-duration prediction and efficiency gains, we omitted procedures not fitting the 3-parameter lognormal mode and procedures with a case frequency <10. Because of this, the real efficiency gains may be overestimated. Although in the hospitals under study, 86% of all cases consist of 1 CPT code, we cannot make general conclusions or statements regarding the impact of improving case-duration prediction on the efficiency of use of OR time, but only as related to the cases under study.

## CONCLUSION

OR case scheduling can be improved by using the 3-parameter lognormal model with surgeon effects and by using the surgeon’s prior guesses for rarely observed CPTs. Compared with standard case scheduling practices and the bias-corrected method using the 3-lognormal model for case scheduling, both significantly reduce the average underestimated and overestimated OR time per case as well as the OR inefficiency.