# Comparing Policies for Case Scheduling Within 1 Day of Surgery by Markov Chain Models

BACKGROUND: In previous studies, hospitals’ operating room (OR) schedules were influenced markedly by decisions made within a few days of surgery. At an academic hospital, 46% of ORs had their last case scheduled or changed within 1 working day of surgery, and a private hospital had 64%. Many of these changes were for patients who were admitted before surgery (i.e., inpatient cases). In this study, we investigate the impact on OR productivity of how cases are scheduled within 1 working day before the day of surgery.

METHODS: We consider the case-scheduling choice between 2 ORs. We compare 3 scheduling policies: Best Fit Descending, Worst Fit Descending, and Worst Fit Ascending. “Descending” strategies consider new cases from longest to shortest, whereas “Ascending” considers new cases from shortest to longest. Best Fit schedules each new case into the OR with sufficient but the least remaining underutilized OR time for the case. Worst Fit does the same but with the most remaining time. For our application, Best Fit chooses a later start time, whereas Worst Fit chooses an earlier start time. In our computational model, cases are of 2 possible durations, brief or long. Case cancellation is incorporated explicitly, and the number of new cases to schedule depends on the current number of scheduled cases in each OR, both new from previous studies. The number of cases in each OR is modeled as a Markov chain, evolving between 2 periods, corresponding to 1 day and 0 days before the day of surgery. For each scheduling policy, we evaluate the mean overutilized OR time and productivity. Our sensitivity analyses cover many cancellation rates, arrival settings, case durations, and initial conditions (i.e., how cases are scheduled into the 2 ORs preceding 1 workday before the day of surgery).

RESULTS: Best Fit Descending and Worst Fit Descending achieved almost the same overutilized time and productivity. Worst Fit Ascending caused greater overutilized time (as much as 6.6 minutes more per OR) and thus lesser productivity (as much as 1.6% less) compared with Best Fit Descending or Worst Fit Descending. When the staff were scheduled for less time than the optimal allocated OR time, there were nearly the same differences between the staff productivity resulting from the use of Worst Fit Ascending rather than Worst Fit Descending or Best Fit Descending.

CONCLUSIONS: Scheduling office decision making within 1 day before surgery should be based on statistical forecasts of expected total OR workload (i.e., forecasts that include the addition of non-elective cases and the subtraction of cases that cancel). As long as a case is not scheduled into overutilized time when less overutilized time could be achieved in another OR, and cases are considered in descending sequence of scheduled durations, the differences in overutilized time and productivity among the scheduling policies are small. Cognitive bias in staff scheduling causes a significant reduction in productivity, but the differences among scheduling policies are nearly the same as when there is no bias.

From the ^{*}Krannert School of Management, Purdue University, West Lafayette, Indiana; ^{†}Division of Management Consulting, Department of Anesthesia, University of Iowa, Iowa City, Iowa; and ^{‡}Department of Anesthesiology, Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, Pennsylvania.

Accepted for publication September 29, 2015.

Funding: Departmental.

The authors declare no conflicts of interest.

Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website.

Reprints will not be available from the authors.

Address correspondence to Franklin Dexter, MD, PhD, Division of Management Consulting, Department of Anesthesia, University of Iowa, 200 Hawkins Dr., 6JCP, Iowa City, IA 52242. Address e-mail to Franklin-Dexter@UIowa.edu or www.FranklinDexter.net.

During the past 3 years, we have performed several observational (informatics) studies of operating room (OR) scheduling data and found the final OR schedule to be influenced markedly by decisions made within a few days of surgery.^{1–4} For example, 2 hospitals had an average of 1 add-on case for every 4 ORs (24.7% ± 5.2% and 24.1% ± 0.3%, mean ± SE).^{1},^{5} At an academic hospital, nearly half the ORs had their last case scheduled or the date changed on the original day of surgery or 1 working day before surgery (46.4% ± 0.4%).^{1} The percentage was greater at a private hospital, 64.4%.^{1} Within the week before surgery, 43.1% ± 0.5% of cases changed (i.e., were newly scheduled, had the scheduled date of surgery modified, or were cancelled).^{1} This frequent shifting of the OR schedule means that, other than staff scheduling taking place months before the day of surgery,^{6},^{7} rarely at hospitals do operational decisions substantively affect the efficiency of use of OR time^{8–12} until the working day before surgery.^{1},^{2}

Many changes are made to the OR schedule within a few days of surgery because of cancellations and rescheduling of patients not admitted before the day of surgery (i.e., “outpatients”) and to the scheduling, cancellation, and rescheduling of patients admitted before surgery (i.e., inpatients).^{3} Motivated by these observations, in this article, we (effectively) study the scheduling of cases within 1 working day before the day of surgery and explicitly consider case cancellation.

More than a decade ago, 2 commonly used scheduling policies, Best Fit Descending and Worst Fit Descending (Fig. 1), were evaluated for the scheduling of cases on the day of surgery.^{5},^{13} Although Best Fit Descending always achieved greater adjusted utilization, the differences were small because most ORs had time only for 0 or 1 add-on case, and, when 0 or 1, the 2 policies always resulted in the same scheduling decision. In the current article, we add both consideration of the scheduling of cases arriving the day before surgery and the risk of case cancellation. We tested 3 hypotheses.

**Hypothesis 1:** When Best Fit Descending or Worst Fit Descending are applied to case scheduling within 0 to 1 workday before the day of surgery, they achieve nearly the same overutilized time and thus nearly the same productivity (i.e., an absolute difference between the 2 strategies <1%).

Here, OR productivity is defined as the total workload/labor cost, where the workload is the total duration of cases and turnover times among ORs, and the labor cost = scheduled staff time + relative cost × overtime (i.e., time exceeding the scheduled staff time). In the absence of cognitive bias (see Hypothesis 3 below), the scheduled staff time should be the same as the allocated OR time or slightly greater because of predictive error in case durations. The labor cost ≅ allocated OR time + relative cost × overutilized time.

The past decade, several articles have identified cognitive biases that influence OR case scheduling.^{14–17} One of the cognitive biases, risk-averse behavior, is prevalent among personnel running the OR control desk on the day of surgery.^{14} This bias can lead to greater overutilized times and, sometimes, case cancellations.^{14},^{a} Based on that article, in our study, we also examined the scheduling policy Worst Fit Ascending, since this policy resembles risk-averse behavior (Fig. 1).^{a} For decision making on the day of surgery, but neglecting the possibility of case cancellation as included in our current study, Worst Fit Ascending increased overutilized time and thus decreased productivity compared with Best Fit Descending or Worst Fit Descending.^{5} For single add-on cases in single ORs, the absolute difference in adjusted utilization was 3.3%.^{5}

**Hypothesis 2:** When Worst Fit Ascending is applied to case scheduling within 0 to 1 workday before the day of surgery and with case cancellation, under many typical scenarios, it causes a substantive (i.e., >1%) increase in overutilized time and a substantive reduction in productivity compared with Best Fit Descending and/or Worst Fit Descending.

Another bias occurs ubiquitously for decisions that involve the matching of staff scheduling to the workload: people and organizations tend to have a “pull to center” (i.e., mean) cognitive bias.^{15} Given the uncertainty in predicting the end of the workday for a service that is allocated a single OR, often OR staff are scheduled for too few hours (i.e., less than optimal given the variation among days in the hours of cases and turnover times, defined as the “workload”).^{10},^{11},^{18} Let “allocated time” refer to the hours into which cases are scheduled, calculated based on minimizing the inefficiency of use of OR time.^{1},^{6},^{8–12},^{13–18} The pull-to-center bias often results in anesthesiologists, nurse anesthetists, OR nurses, and other perioperative staff being scheduled for shifts that are the same or even less than the allocated hours.^{15},^{18}

Worst Fit Descending and Best Fit Descending tend to “squeeze” more cases into the allocated OR time (Fig. 1). When the scheduled staff time is less than the allocated OR time (e.g., from biases), such compression would cause more overtime and lead to lower productivity (see the formula for productivity above). The benefits of increasing productivity and reducing overutilized time gained by Worst Fit Descending and Best Fit Descending (i.e., as in Hypothesis 2) may then be partly or even fully attenuated.

**Hypothesis 3:** When, from cognitive bias, the scheduled staff time is based on the mean workload instead of the OR allocation that maximizes the efficiency of use of OR time, the difference in productivity of Worst Fit Descending and of Worst Fit Ascending (or Best Fit Descending and Worst Fit Ascending) is nearly the same as when the staff scheduling time is optimal (i.e., ≅ the allocated OR time).

## METHODS

This project was performed for quality improvement purposes. The Thomas Jefferson University IRB determined that this project did not meet the regulatory definition of human subjects research.

### Markov Chain Model

We consider a 2-OR setting. All situations of >1 OR are combinations of 2 ORs. Furthermore, rarely are there choices of >2 ORs for a case within 1 day of surgery when constraints such as suitability of the OR and surgeon are considered.

Cases are of 2 possible durations, long or brief (e.g., in the baseline scenario, 4.5 or 1.5 hours; see below “Baseline Parameter setting”). The reason we use the 2-duration setting is that it simplifies the Markov chain model sufficiently for model results to be interpretable but maintains the ability to differentiate among scheduling policies. Dexter et al (2003).^{19} found that for purposes of understanding *staff scheduling* policies, all cases could be of the same scheduled duration, since the crucial factors were the daily numbers of cases per OR per day and the variability in the numbers of those cases. Dexter and Traub (2002)^{13} found that when there are 2 or more new cases to be scheduled, the performance of *case* scheduling policies depends on whether the second new case is longer than the first new case. In the sensitivity analyses, multiple different pairs of durations are used.

We study the change of case schedules in 2 ORs between 2 periods: 1 and 0 working day before the day of surgery, respectively. We use *k* = 1 to denote the working day before surgery and *k* = 0 to denote the day of surgery. We consider 1 decision period per day, with the decision period being just before the final schedule is published at the end of the workday (e.g., 7 PM). This assumption is reasonable because very few cases are scheduled between then and the start of the day of surgery (e.g., 7 PM to 7 AM).^{1}

We track a 4-dimensional state at the beginning of each period

, where the

denote the number of long and brief cases scheduled in OR 1 at the beginning of period *k*, respectively. The

denote the same for OR 2. The

is the initial state (at the beginning of period *k = 1*). The state at period *k* = 0 is the state at the beginning of the day of surgery. We use (

) to denote the final state after all cases have been performed on the day of surgery.

The evolution of the 4-dimensional state

, from period *k* = 1 to period *k* = 0 and from period *k* = 0 to period *k* = final follows 3 steps.

#### Step 1: Cancellation of Current Cases

At the beginning of period *k*, we assume that each of the currently scheduled cases has an equal probability *q* of being cancelled during period *k*, and these cancellations are independent of each other. Thus, *C*_{k}, the total number of cancellations within period *k*, are the sum of

, representing the cancellations of long and brief cases in OR 1 and OR 2, respectively. Since we model independent cancellations,

is binomial with parameters

and *q*. We obtain the corresponding binomial distribution of the other 3 cancellation quantities (

) similarly.

Cancellations on the day of surgery (period *k* = 0) are sometimes correlated events.^{3},^{20},^{21} For example, a health system’s cancellation rate on the day of surgery (4.6% ± 0.3% by cases) among surgeons with at least 1 other cancelled case was greater than the overall cancellation rate (1.8% ± 0.1% by cases). To model this correlation among case cancellations, we would need to track which case belongs to which surgeon in the model, resulting a state space with much higher dimensions. This would make the Markov chain analysis numerically challenging to perform. Instead, in sensitivity analyses below, we test a large range of cancellation rates.

#### Step 2: Arrival of New Cases

After the *C*_{k} cancellations have occurred, *X*_{k} − *C*_{k} cases remain, where

. A total number *A*_{k} of new cases subsequently arrive during the period *k*. We consider *A*_{k} to follow a binomial distribution^{b} with parameters 3 and *p*_{add}. The first parameter of 3 shows that we consider there to be at most 3 new cases during the period *k*. For the modeling, some maximum is needed, and the value 3 was chosen as the smallest integer large enough to have no influence on the results.^{c} The second parameter *p*_{add} is not a constant. Days with fewer hours of elective cases have more scheduling of add-on cases and vice versa,^{22} a phenomenon that is likely caused by scheduler’s behavior when the allocated time is full.^{1},^{12} We use the number of cases remaining (*X*_{k} − *C*_{k}), and consider

, where

is the cumulative distribution function for a normal distribution with mean μ and SD *d* (see Table 1 with μ = 3, *d* = 1). Although the form of

is parsimonious and leads to model output that match well with empirical observation on the relation between case additions and scheduled case hours, the function is hypothetical.^{1} Therefore, we test various functions in the sensitivity analyses.

After we generate the total number of new cases *A*_{k} arriving within period *k*, we then determine how many of them are long cases and how many are brief cases. We assume the proportion of long cases = *p*_{long} (and the proportion of brief cases = 1 − *p*_{long}). Correspondingly, the number of long cases,

follows a binomial distribution with parameters *A*_{k} and *p*_{long}. The number of brief cases

.

#### Step 3: Schedule New Cases into the 2 ORs

The

long cases and

brief cases that arrived within period *k* are scheduled into the 2 ORs. The scheduling depends on which scheduling policy we use (see Introduction):

- When we use Best (or Worst) Fit Descending, the
- long cases are scheduled first. For Worst Fit Ascending, the
- brief cases are scheduled first.
- Given a new case to schedule, we choose between the 2 ORs according to the following rules. If we use Best Fit Descending, we schedule the new case to the room that has the
*least remaining time*as long as doing so would not result in scheduling the case into overutilized time. For example, in Figure 1A, the long case is first scheduled to OR 1, whereas in Figure 1B it is scheduled to OR 2. If we use Worst Fit Descending (Ascending), we schedule the case to the room in which the starting time would be the earliest, which is OR 1 for both Figure 1A and 1B. When there is a tie between OR 1 and OR 2 using any of the scheduling policies, we assume the case is scheduled into OR 1. This has no effect on conclusions because OR 1 and OR 2 are arbitrary. - When the allocated hours for the 2 ORs are the same, Best Fit Descending (i.e., Least Remaining Time) is equivalent to Latest Start Time,
^{13}and Worst Fit Descending (Ascending) schedules each case to the OR with the earliest start time. Earliest Start Time lets the surgeon and patient finish as early as possible, whereas Least Remaining Time enables a better “packing” if there were a subsequent case scheduled.^{13} - If, for either OR, the new case has to be scheduled completely into overutilized time, we choose the OR with the earliest starting time regardless of the scheduling policy. Thus, the 3 scheduling policies are the same when the scheduling of a new case (or a series of new cases) results in expected overutilized time in both ORs.

The scheduling policy determines

, the number of new long and brief cases scheduled into OR 1 and OR 2, respectively. Then, the number of long cases scheduled in OR 1 at the beginning of period *k* − 1 equals

. The same relationship applies to short duration cases and to OR 2. Consequently, the probability of transitioning from

to

equals

. Using the probability distributions specified in steps 1 to 3, we calculate these probabilities of transition from *i* to *i* − *c* + *a* for all possible values of *i*, *c*, *a*. Thus, we know the transition probability from each possible value of

to each possible value of

, as well as the transition probabilities for the other 3 quantities (

) in the 4-dimensional state. The overall result is the transition matrix for the Markov chain. With the transition matrix, we calculate the probability distribution of the final state

from the initial state. From this final distribution, we calculate the mean overutilized time and productivity per OR.

### Baseline Parameter Setting

The parameters for this Markov chain model were estimated based on the data from Thomas Jefferson University Hospital (Table 2; Supplemental Digital Content 1, Supplemental Table A, http://links.lww.com/AA/B280). We first created a baseline scenario with the parameters to be specified below. We tested the validity of the Markov chain model by comparing the output from this baseline scenario with empirical data. Then, we conducted sensitivity analysis by changing the parameter values in this baseline scenario.

In the baseline scenario, OR 1 was allocated 10 hours, OR 2 was allocated 8 hours, and we assumed that the staff scheduling matched the OR allocations (i.e., no cognitive bias; see Introduction). We considered the relative cost of overutilized to regularly scheduled OR time (see Introduction) to equal 1.75 for computing the OR allocations and productivity.^{9–11},^{23},24 When calculating productivity in the presence of decisions made based on cognitive bias, we used a relative cost of 4.00, as surveyed, because the medical staff are working late unexpectedly (i.e., after the end of their scheduled shifts).^{25} The relative cost is greater because the intangible/indirect costs of working late (e.g., staff dissatisfaction, increased staff turnover, recruiting costs) are greater than when working late is planned (e.g., the individual is scheduled to be on-call to work late if necessary). This applies also to salaried physicians, even without greater direct payments made for working late. If there were no related intangible/indirect costs, then, by such an argument, anesthesiologists performing the same total hours of cases would be equally satisfied whether: (1) doing cases anytime 24 hours per day 7 days per week with no predictability or (2) exclusively during regularly scheduled hours. This is not so.^{26–28} The value of 4.00 includes the cost (e.g., lost revenue) if an anesthesiologist shortage reduces the number of cases performed.

The 2 possible durations for each case were 4.5 and 1.5 hours. These 2 durations are within the typical ranges for case durations empirically observed at Thomas Jefferson University Hospital and another academic hospital studied (see Table 1 in He et al.^{29}), and they are consistent with the observation from a third large surgical suite where the mean total workload (hours of cases and turnovers) was 7.8 hours with a mean of 2.5 cases per workday per OR.^{30} The duration for the long case was chosen to make it possible to pack 2 long cases into an OR without expected overutilized time, given a 10-hour OR allocation. Meanwhile, the choice of these 2 durations made it sufficient for us, with a 0.5-hour turnover time,^{31} to differentiate among different scheduling policies in many scenarios. For example, with 8 and 8 hours of OR time allocated, suppose OR 1 initially had 1 long case (4.5 hours) and OR 2 initially had 2 short cases (1.5 hours each), with a 0.5-hour turnover time between the cases. Then, there would be 3 hours of underutilized time in OR 1, but 4 hours in OR 2, where 3 = 8 − (4.5 + 0.5) and 4 = 8 − (2 × 1.5 + 0.5 + 0.5). As a result, if a new short case were submitted, then Best Fit and Worst Fit would schedule the new case into different ORs. However, if we had set the short case duration to be 2 hours, then having 1 long case (4.5 hours) and 2 short cases (2 hours each) both would have resulted in 3 hours of underutilized time in each OR, making the choice between the 2 scheduling policies indifferent. Achieving an overall mean 3.0 hours duration of cases, we assume that *p*_{long} = 0.5 (i.e., each case had an equal chance [50% and 50%] to take either duration [4.5 or 1.5 hours]). In the sensitivity analyses, we test several other combinations of the possible case durations and different values of *p*_{long} (see details in Methods’ section “Sensitivity Analyses,” below).

In the baseline scenario, we use the arrival model described above with

. The 2 parameters for computing *p*_{add}, equal μ = 3 and *d* = 1, respectively. The cancellation parameter *q* = 0.025 corresponded to approximately 5% of cases cancelled within 1 workday of surgery, typical for many academic hospitals.^{3},^{21},^{32} In the sensitivity analyses, we test other alternative arrival models and different values for the cancellation parameter *q*.

### Computational Setting

The computation of the Markov chain model was implemented in MATLAB 2012b (The MathWorks, Inc., Natick, MA). We constructed a finite-dimensional transition matrix, with no greater than 12 cases scheduled into each OR. This had no effect on our results,^{d} because the minimum workload per OR (with all 12 brief cases) was still the entire day, 23.5 hours = 12 × 1.5 + 11 × 0.5, the 0.5 representing the half-hour turnover time. In addition, the arrival model we used (Table 1) ensured no greater than 12 cases would ever be present in an OR. These restrictions are described under “Computational settings” because they have no effect on our results,^{d} but decreased the time to complete calculations. It took 20 to 25 minutes to run each scenario (baseline or other sensitivity analysis scenarios) on a Mac laptop with 1.8-GHz Intel Core i5 processor and 8-GB memory.

### Validity of the Markov Chain Model

Under the aforementioned parameter setting in the baseline scenario, using any of the 3 scheduling policies, the mean total workload from the performed cases (including turnover times) in the 2 ORs combined equaled 16.28 hours, and the SD was 3.92 hours. The optimal OR allocation based on minimizing the inefficiency of use of OR time under this total workload was 18 hours, matching our allocations for OR 1 and OR 2 (10 hours and 8 hours, respectively). Moreover, with the “baseline” initial distribution,^{e} 87.7% cases that were finally performed were scheduled initially into the 2 ORs before the workday before surgery. Among the remaining 12.3% cases, 68.3% (8.4% of total) of them were scheduled 1 workday before the surgery and the remaining 31.7% (3.9% of total) of them were scheduled on the day of surgery. These statistics matched the empirical observation on the movement of cases among ORs, cancellations, and additional cases scheduled within 1 workday before the day of surgery.^{1} See detailed comparison in Table 2. The closeness between the model output and the empirical observations supported the validity of our Markov chain model.

### Sensitivity Analyses

We conducted multiple sensitivity analyses to evaluate whether our conclusions for the 3 hypotheses were sensitive to the parameter settings.

#### Initial Distribution

We summarized the initial distributions we tested for sensitivity analysis in Table 3. In the baseline scenario (“Sym 1”), by the start of the workday before surgery, each OR was scheduled with cases having expected total hours (including turnover time) equal to approximately 80% of the allocated OR time. Ambulatory surgery centers and hospitals with few patients who are inpatient preoperatively will have even greater percentages of ORs full 1 workday ahead of our baseline scenario. Thus, we test the initial distributions “Sym 2,” under which the 2 ORs were almost full initially.

In the baseline scenario, both ORs had cases scheduled (i.e., a symmetric situation^{f}). However, in some hospitals, an entire OR may be kept empty until the working day before surgery to facilitate the scheduling of patients who are inpatients. We test 2 initial distributions that were unsymmetrical, “UnSym 1” and “UnSym 2.”

#### Cancellation Rate

Recall that in the baseline scenario (see the aforementioned section “Baseline Parameter Setting”), the cancellation parameter *q* = 0.025 corresponded to approximately 5% of cases cancelled within 1 workday before the day of surgery. In certain hospitals where ORs were set aside for patients being inpatient preoperatively,^{3} >12% of such cases could be cancelled within 2 workdays before the surgery.^{4} Thus, we tested larger cancellation parameter *q* = 0.075, which corresponded to approximately 14% cases cancelled within 1 workday before the surgery. We also tested *q* = 0.125 as an extreme scenario to examine our hypotheses when as many as 23% of cases can be cancelled within 1 working day before the day of surgery. The latter value matches that reported by other academic hospitals.^{33},^{34}

#### Arrival Process

Recall that in the baseline scenario (see “Markov Chain Model”), we adopted the function

so that the number of new arrivals in each period depends on the current number of scheduled cases. In most existing research, the arrival process was assumed independent of the scheduled cases. Thus, as a sensitivity analysis, we tested the scenario in which *p*_{add} is a constant so that the number of new arrivals became independent of the scheduled cases. In addition, we also tested 2 alternative forms for the arrival probabilities in the sensitivity analysis. Instead of specifying a functional form for the parameter *p*_{add}, we directly specify the case arrival probabilities under different scenarios. The details for these 2 alternative forms are specified in Supplemental Tables B and C in the Supplemental Digital Content (http://links.lww.com/AA/B280).

#### Durations of Long and Short Cases

Besides the durations of 4.5 hours and 1.5 hours that we used in the baseline scenario, we also tested scenarios in which the difference between the long and the short duration cases was less or greater. We kept the 50% to 50% ratio among long and short cases and tested scenarios with (1) 4.0-hour and 2.0-hour case durations (lesser difference) or (2) 5.0-hour and 1.0-hour case durations (greater difference). Each represents a change in durations equal to 0.5 hours.

Finally, we tested scenarios wherein the proportions of long and short cases were not symmetric. Keeping the overall mean case durations approximately 3.0 hours, we tested 2 more combinations: (3) 5.5 hours for long cases (37%) and 1.5 hours for short cases (63%) and (4) 6 hours for long cases (40%) and 1 hour for short cases (60%).

## RESULTS

**Hypothesis 1:** When Best Fit Descending or Worst Fit Descending are applied to case scheduling within 0 to 1 workday before the day of surgery, they achieve nearly the same overutilized time and thus nearly the same productivity (i.e., an absolute difference between the 2 strategies <1%).

Among all the tested scenarios (i.e., different combinations of the initial distributions and cancellation rate) in Table 4, Best Fit Descending achieved the least overutilized OR time and greatest productivity, matching results for single cases (from Dexter et al.).^{5} In addition, the Best Fit Descending and Worst Fit Descending scheduling policies achieved similar (a) mean overutilized time per OR per workday and (b) (total) mean overutilized time per pair of ORs per workday. The maximum difference between the mean overutilized times was small (1.14 minutes per OR per workday), as was the maximum difference between the productivities (0.2%^{g}), supporting hypothesis 1. We do not report a *P* value for this and the other differences in the paper because all the results are calculated directly from the Markov chain model (i.e., there is no sampling error).

**Hypothesis 2:** When Worst Fit Ascending is applied to case scheduling within 0 to 1 workday before the day of surgery and with case cancellation, under many typical scenarios, it causes a substantive (i.e., >1%) increase in overutilized time and a substantive reduction in productivity compared with Best Fit Descending and/or Worst Fit Descending.

In comparison with Best Fit Descending and Worst Fit Descending, Worst Fit Ascending resulted both in greater: (a) mean overutilized time per OR per workday and (b) mean overutilized time per pair of ORs per workday. This ordered relationship again matched that of the scheduling of single add-on cases.^{5} In Table 4, the maximum difference between the mean overutilized times per OR per workday was greater, and the maximum difference between the productivities was 1.10%, which confirms the Hypothesis 2 (i.e., Worst Fit Ascending can cause substantively less productivity).^{h}

**Hypothesis 3:** When, from cognitive bias, the scheduled staff time is based on the mean workload instead of the OR allocation that maximizes the efficiency of use of OR time, the difference in productivity of Worst Fit Descending and of Worst Fit Ascending (or Best Fit Descending and Worst Fit Ascending) is nearly the same as when the staff scheduling time is optimal (i.e., ≅ the allocated OR time).

The mean overutilized time did not change when the scheduled staff time was briefer than the optimal value (Table 5). This is, essentially, by definition, because scheduling was performed based on the OR allocations, and overutilized time is, by definition, calculated with respect to allocated OR time. However, overtime is defined as the time exceeding the scheduled staff time. When the staff time is not optimal, the overtime differs from the overutilized time. Consequently, under each scheduling policy, the suboptimal staff scheduling significantly reduced the productivity.

Table 5 shows that the differences in overtime between Worst Fit Ascending and Worst Fit Descending (or between Worst Fit Ascending and Best Fit Descending) were less (or comparable) when the scheduled staff time was briefer than the optimal value. This observation matches our conjecture (see Introduction) that Best Fit Descending and Worst Fit Descending tend to squeeze cases into the allocated OR time, and when the scheduled staff time is less than the allocated time, such behavior can cause more overtime (i.e., more hours worked late). The differences in productivity among the scheduling policies were nearly the same (0.2% less to 0.6% greater).

### Sensitivity Analyses

All 3 hypotheses listed in the Introduction were supported both by the “Tests of the hypotheses” (above) using the baseline scenario and these Sensitivity Analyses.

#### Sensitivity to the Initial Distribution

Under the baseline initial distribution Sym 1, the 2 ORs initially had extra capacity (Table 3, top left panel). In contrast, under the initial distributions Sym 2, the 2 ORs initially were almost full (Table 3, top right panel). The consequence was that differences in the overutilized time among the scheduling policies were less than for the baseline scenario. The mean differences in overutilized time between Worst Fit Ascending and Worst Fit Descending or between Worst Fit Ascending and Best Fit Descending were <1 minute per OR per workday. Since the ORs were almost full initially, the new cases were more likely to be scheduled into overutilized time. When a case was scheduled into overutilized time, the 3 scheduling policies were effectively the same. The result under this Sym 2 initial distribution did not affect our conclusion for Hypothesis 1. It also did not affect our conclusion for Hypothesis 2, given that it is an example of an atypical scenario resulting in small differences among scheduling policies.

Under the initial distributions UnSym 1 and UnSym 2 (Table 3, bottom panels), OR 2 was scheduled with fewer hours of cases initially than for the baseline initial distribution Sym1. Using these 2 alternative initial distributions, the maximum difference between Worst Fit Ascending and the Descending policies in overutilized time was less than the gap in the baseline scenario (Table 4). The maximum difference in productivity was 0.6%, which was not substantive. Our conclusion for Hypothesis 2 remained unchanged, because the direction of relationship holds between Worst Fit Ascending and the Descending policies among scenarios; whether the difference was substantive changed.

#### Sensitivity to the Cancellation Rate *q*

When the cancellation rate *q* was greater but not sufficiently large to decrease the OR allocation, there was less overutilized time for all 3 scheduling policies (Table 4, such as *q* = 0.075 vs *q* = 0.125 under Sym 1 initial distribution). The productivity also was less, even though the overutilized time was reduced. This was because the decrease in the total duration of cases (i.e., the numerator in the formula for productivity) exceeded the decrease in the labor cost (i.e., denominator in the formula for productivity). These findings indicate validity of the methodology.

In the experiments we presented in Table 4, greater cancellation rates caused smaller total workloads. However, hospitals with substantial cancellation rates have, in practice, more cases scheduled (i.e., most cancelled cases are rescheduled to be performed on future dates).^{2} Thus, the total workload remains almost the same.^{2} From a modeling perspective, among hospitals, there is a correlation between the cancellation rate and the arrival rate. To add this to our baseline Markov chain model, in a new set of experiments, when we increased the cancellation parameter *q*, we also increased the arrival parameter (μ), so that the workload would remain similar, despite different cancellation rates.

Table 6 summarizes the numerical results for this new set of experiments. When more cases were cancelled within 1 workday before the day of surgery (but the total workload remained similar), the overutilized time was greater and the productivity was less under the same scheduling policy. Case cancellations bring more uncertainties into the scheduling process and thus undermine the performance of each scheduling policy. When we compared the different cancellation rates (*q* = 0.0, 0.025, and 0.075), the differences among the 3 scheduling policies became smaller when more cases were cancelled. This observation shows that our conclusions of Hypotheses 1 and 2 were not affected by the model for cancellation. In particular, note that in Table 6, even when the *q* = 0.0 (i.e., no cancellations), the 3 scheduling policies showed similar direction of relationship as in the baseline scenario (i.e., our conclusions were unchanged).

#### Sensitivity to the Arrival Process

The results in Tables 4 and 5 are based on the number of new cases depending on the current number of scheduled cases. Table 7 shows the results for scenarios where the number of new cases was unrelated to the number of scheduled cases (i.e., the parameter *p*_{add} was a constant). The differences among the 3 scheduling policies become smaller. The greatest difference in overutilized time was <1 minute per OR per day, and the productivity only differed by at most 0.2% among the 3 policies. These observations suggest that the dependence between arrival and scheduled number of cases (as observed empirically)^{1} is an important feature to consider when modeling case addition and cancellation within a workday before surgery.

Table 7 also demonstrates the results for scenarios with alternative arrival settings (see Supplemental Tables B and C in the Supplemental Digital Content for the alternative settings, http://links.lww.com/AA/B280). Under these alternative arrival settings, we observed that the differences in overutilized time between Best Fit Descending and Worst Fit Descending became greater than under the baseline arrival setting. However, the differences in productivity are still quantitatively unimportant (<0.5%). For Hypothesis 2, Worst Fit Ascending still resulted in greater overutilized time (the largest gap between Worst Fit Ascending and Worst Fit Descending was 4.92 minutes per OR per day). Meanwhile, the difference in productivity was 1.1% per OR per day. These observations show that the conclusions of our hypotheses are not sensitive to the form of *p*_{add}.

#### Sensitivity to the Difference Between the Durations of Long and Short Cases

The results in Tables 4 and 5 are based on cases of durations of 4.5 hours and 1.5 hours. When the durations of the long and short cases were changed to 4.0 hours and 2.0 hours (a lesser difference in case durations), the SD of the final workload was less, as were the differences in the mean overutilized times, and productivities among the 3 scheduling policies were less (Table 8). In contrast, when the durations of the long and short cases were changed to 5.0 hours and 1.0 hours, the SD of the final workload was greater, as were the differences in the mean overutilized times and productivities among scheduling policies. For example, when using the combination of 5.0 hours and 1.0 hour, the difference was 6.60 minutes in the mean overutilized time between Worst Fit Ascending and Best Fit Descending, and 1.6% in the productivity, both greater than the corresponding differences under the baseline setting. This was expected, since the case durations affect the bin-packing (packing cases into the ORs), which in turn affects the performance of different scheduling policies. All conclusions of the 3 hypotheses were unchanged under the tested combinations of case durations.

## DISCUSSION

In this article, we evaluated 3 hypotheses related to the scheduling of a series of cases within 1 workday before the day of surgery. The paper applies (i.e., is limited to) hospitals, because hospitals have patients scheduled the working day before surgery and the day of surgery.^{3},^{4} Compared with the previous study >15 years ago,^{5} by applying the results of empirical analyses possible from substantial advances in informatics,^{1},^{7},^{12–14},^{30} and by using Markov models to represent the decision options, we could explicitly consider the risk of case additions and cancellations.

### Applications

Results for Hypothesis 1 show why an important feature of an OR information system is a cue (e.g., to the scheduler) if a case will be scheduled into overutilized OR time. Effectively, this is as simple as a checklist item 1: “the case will be scheduled into overutilized time; confirm with the medical director.” As emphasized in Dexter and Epstein (2016),^{35} it is important that this be a medical director and that he or she understands the relevant science.^{1},^{4},^{35–39} Such an approach of 1 simple cue applies only when the hours into which cases are scheduled (i.e., the OR allocation) are calculated based on maximizing the efficiency of use of OR time.^{1},^{2},^{6},^{8–15} None of the 3 scheduling policies that we studied can be used if the optimal OR allocations are not applied, because all 3 policies depend on knowing when there is overutilized time in an OR. Prior studies addressed the organizational challenges and solutions to configure the right OR allocations (e.g., have an autocratic leader and education in the science of OR management).^{36},^{40–43}

Hypothesis 3 highlights further the importance of matching the staff scheduling to the hours that the staff work. When cognitive bias is present and staff scheduling is based on mean workload instead of OR allocations,^{7},^{12},^{15},^{18},^{41} all 3 scheduling policies result in significantly longer hours worked late (i.e., greater overtime), and productivity is reduced significantly (Table 5). Matching staff scheduling with the optimal OR allocation requires no additional staff because they are already doing the work; but, the benefit is reduction in their hours working late.

Finally, provided that staff scheduling is being done rationally, Hypothesis 2 suggests that there is a small but often substantive benefit in scheduling cases in descending sequence of duration (i.e., among new cases, first consider the longest duration case and then briefer cases). Since hospitals often have >1 new case on the working day before surgery, our findings suggest value to expanding the brief checklist, to compensate for the risk-averse decision maker, by including an item 2: “cases considered in descending sequence, when possible.” (By “when possible” we mean, for example, that it is sufficiently safe for patients to wait for surgery to start and that surgeon availability matches the proposed order of cases.) Nevertheless, from Hypothesis 1, the most important thing is not how these cases are scheduled into ORs (i.e., Best Fit or Worst Fit), but that they are not scheduled into overutilized OR time (checklist item 1), and when feasible considered in descending sequence of duration (checklist item 2).

## CONCLUSIONS

When explicitly considering case additions and cancellations within 1 workday before the surgery, we found that Best Fit and Worst Fit Descending achieved less mean overutilized time and greater productivity than Worst Fit Ascending. As long as the scheduler does not schedule a case into overutilized time when less overutilized time could be achieved in another OR, the differences in the overutilized time and productivity among the 3 scheduling policies are small, but still substantive. The implication is, when possible (e.g., safe patient waiting times are met and the surgeon is available), to consider scheduling additional cases in descending sequence of duration. Scheduling office decision making within 1 day before surgery should be based on statistical forecasts of expected total OR workload (i.e., forecasts that include the addition of nonelective cases and the subtraction of cases that cancel). Cognitive bias in staff scheduling causes a significant reduction in productivity, but the differences among scheduling policies are nearly the same.

## DISCLOSURES

**Name:** Pengyi Shi, PhD.

**Contribution:** This author helped design the study, conduct the study, analyze the results, and write the manuscript. This author is the archival author for the MATLAB codes.

**Attestation:** Pengyi Shi has approved the final manuscript.

**Name:** Franklin Dexter, MD, PhD.

**Contribution:** This author helped design the study, analyze the data, and write the manuscript.

**Attestation:** Franklin Dexter has approved the final manuscript.

**Name:** Richard H. Epstein, MD, CPHIMS.

**Contribution:** This author helped conduct the study and write the manuscript. This author is the archival author for the data set from the academic hospital.

**Attestation:** Richard H. Epstein has approved the final manuscript.

## RECUSE NOTE

Franklin Dexter is the Statistical Editor for *Anesthesia & Analgesia*. This manuscript was handled by Dr. Steven L. Shafer, Editor-in-Chief, and Dr. Dexter was not involved in any way with the editorial process or decision.

**FOOTNOTES**

a Personal communication, Pieter Stepaniak, Tuesday, October 8, 2013. Around 12 noon, with 1 OR having allocated time open through 5:00 PM, a nonrisk-averse nurse would start the 4.5-hour add-on case, accepting the risk that this case might result in some overutilized time. The other 1.5-hour add-on case would be scheduled for the next available OR. At the end of the day, all cases will be done resulting in some over/underutilized time and no cancelled patients. The risk-averse nurse will take no risk at all for overutilized time and start the 1.5-hour case first. The 4.5-hour case would likely be cancelled and scheduled (preferably) to be a first start the next morning.

Cited Here...

b The assumption of the binomial distribution for the arrival here is reasonable here because surgeons usually keep a pool of candidate patients in need of surgery. When schedulers try to add new cases to fill the OR on the day before the surgery, they choose cases from this pool (with certain probabilities), which essentially means the new arrivals follow a binomial distribution.

Cited Here...

c Because 3 new cases correspond to an average of 9 hours of cases, for most of our tested scenarios, the 2 ORs are initially scheduled with cases occupying around half of the allocated time. Thus, the possibility of having >3 new arrivals to schedule in 1 period is small; otherwise, the OR allocation would be longer, based on the newsvendor solution. This assumption is also reasonable based on our empirical observation on the case addition per day (Table 2; Supplemental Digital Content, Supplemental Table A, http://links.lww.com/AA/B280).

Cited Here...

d Indeed, to confirm this, we performed a sensitivity analysis by increasing the limit to 16 cases per OR. It resulted in no difference in the final performance (mean overutilized time and productivity, with 3 decimal places kept).

Cited Here...

e The baseline initial distribution is listed as baseline in Table 3, with OR 1 and OR 2 initially scheduled a mean 7.57 and 6.71 hours of cases (i.e., a total of 14.28 hours of cases or 87.7% cases are scheduled at the beginning of Period).

Cited Here...

f We named the distribution “Sym 2” because both ORs are symmetrically, and nearly fully, scheduled by the start of the workday before surgery.

Cited Here...

g Table 4 used the baseline setting for the case duration. When durations of the long and short cases were changed to 5.0 hours and 1.0 hour, respectively, the difference in the mean overutilized time and productivity between Best and Worst Fit Descending increased to 2.88 minutes and 0.7%, respectively. These differences were not substantively large, matching Hypothesis 1.

Cited Here...

h This difference between Best Fit Descending and Worst Fit Ascending was even larger when case durations of the long and short cases were changed to 5.0 hours and 1.0 hour, respectively, further confirming Hypothesis 2. See the Results’ section “sensitivity analyses.”

Cited Here...