Secondary Logo

Journal Logo


Hospital and Surgeon Variation in Patient-reported Functional Outcomes After Lumbar Spine Fusion

A Statewide Evaluation

Khor, Sara MSc∗,†; Lavallee, Danielle C. PharmD, PhD; Cizik, Amy M. PhD, MPH; Bellabarba, Carlo MD; Dagal, Armagan MD§; Hart, Robert A. MD; Howe, Christopher R. MD||; Martz, R. Dean MD∗∗; Shonnard, Neal MD††; Flum, David R. MD, MPH

Author Information
doi: 10.1097/BRS.0000000000003299

Conventionally, clinical outcomes, most often mortality, and complication rates, were used as quality indicators to compare performance across hospitals and surgeons.1–3 Although these clinical outcomes are relevant for high-risk procedures, they reveal little about the efficacy of treatment for most of the patients undergoing lower-risk elective procedures such as lumbar spine fusion surgeries. In these procedures, patient-reported outcomes (PROs), such as functional improvement, may be a more useful marker of surgical quality, as they are often the desired outcomes for surgery. Differences in PROs after accounting for patient factors across hospitals and surgeons could indicate variation in health care system performance and an opportunity for quality improvement.

Lumbar fusions are increasingly common procedures and have some of the highest aggregate hospital costs compared to other operating room procedures.4 Approximately 60% of patients experience a meaningful improvement in function 1 year after a lumbar fusion surgery, with the remaining individuals either experiencing no meaningful improvement or worse outcomes.5

In Washington State, a multistakeholder collaborative established by the legislature included PRO measurements in patients undergoing lumbar fusion as a community standard.6,7 PRO measures are included to document a minimum level of baseline disability and to demonstrate improved outcomes. Little is known about the value of PROs as hospital or surgeon quality metrics. One recent study examined the variation in functional outcomes after lumbar spine surgery across major academic centers using clinical trial data,8 but these findings may not reflect the variations seen in the community at large.

The purpose of this study is to examine variation in functional PROs after lumbar fusion surgery across surgeons and hospitals in Washington State. We also aimed to explore the potential impact of guiding patient selection using a PRO prediction tool.


Patient Population

The Foundation for Health Care Quality's Spine Care Outcomes Assessment Program (Spine COAP) is a prospective database that abstracts data from medical records regarding patients’ characteristics, perioperative and surgical details, and clinical outcomes for patients undergoing spine surgery in Washington State. Initiated in 2011, it represents approximately 75% of eligible elective spine fusion procedures in Washington.9 Spine COAP excludes patients who underwent procedures involving more than five spinal levels or whose indications for surgery were trauma, tumor, deformity, or infection. PROs from patients were collected preoperatively (0–60 days before surgery) at the clinic and postoperatively in a patient preference-driven multi-modal approach (postal mail, electronic-mail, telephone, or text message) through the Comparative Effectiveness Translational Network (CERTAIN, University of Washington).10 All surveys were administered in English.

Patients eligible for this analysis included adults who underwent a lumbar fusion procedure between March 1, 2012 and August 1, 2016, and completed both the baseline (pre-surgery) and 12-month follow-up questionnaire measure on function. The final date of follow-up data collection was November 1, 2017.

The research project used de-identified patient data and was determined by the human subject division of the institutional review board at the University of Washington as exempt.

Primary Outcomes

Function was measured using the Oswestry Disability Index (ODI), a reliable, valid, and responsive PRO measure that is the recommended standard for assessing functional status of patients with spinal disorders.11,12 The ODI is a composite index on a 0 to 100 point scale, with a score of 100 reflecting maximum disability.

The outcomes of interest in this study were functional improvement from baseline exceeding the minimal important difference (MID) and final ODI scores in the range of “minimal disability” at 1 year. MID improvement was defined as a reduction of ≥15 points from baseline in the ODI. 11,13 “Minimal disability” was defined as reaching ≤22 on the ODI at 12 months, a threshold below which patient themselves consider their health state to be acceptable.14 When assessing MID improvement and minimal disability, only patients with preoperative ODI >15 and ODI score >22 were included, respectively.

Statistical Analysis

Patient characteristics were summarized and compared between hospitals and surgeons using the analysis of variance (ANOVA) test. Kruskal-Wallis tests were used for continuous variables and χ2 tests were used for categorical variables.

We reported unadjusted and adjusted functional improvement and minimal disability rates for each hospital and surgeon and reported intraclass correlation coefficients (ICCs) to describe the proportion of total variability accounted for by between-hospital variance and between-surgeon variance. We adjusted for patient characteristics that were demonstrated to be important in a previous study, including age, sex, insurance status, race, American Society of Anesthesiologist score, smoking status, prior spine surgery, diagnosis (spondylolisthesis, disc herniation, post laminectomy/failed back syndrome, stenosis, pseudarthrosis, radiculopathy), opiate use, asthma, and baseline ODI scores.5

Outcomes were also adjusted for reliability, which accounts for the proportion of observed variation in outcomes due to chance alone, using the empirical Bayes techniques.15 Details on the calculations of the risk-adjusted and reliability-adjusted rates are in the supplementary material,

For all hospital- and surgeon-level analyses, hospitals and surgeons with <10 cases were excluded.

Stratified By Likelihood of Functional Improvement

We calculated the probability of functional improvement for each patient based on a previously published and validated risk calculator.5 We stratified patients into two groups based on their predicted likelihood of improvement: low chance (<50% chance of improvement) and high chance (≥50% chance of improvement). Observed-to-Expected (O/E) ratios of improvement were then calculated for each group to examine the variation across hospitals and surgeons, adjusting for patient characteristics. Subgroups with <5 patients were excluded from this subgroup evaluation.

All analyses were performed using Stata version 14 (StataCorp LP, College Station, TX, USA). All tests were two-sided and p values <0.05 were considered to be significant. All mean values are reported with standard deviations.


Among 1082 patients with a baseline ODI, 32% (n = 345) did not have 1 year PROs (62 [18%] withdrawn, six [2%] no valid contact information, 277 [80%] nonresponsive). The remaining 737 patients (40% male; mean age 63 ± 12; Table 1) underwent surgery at 17 hospitals with 58 surgeons. This evaluation included hospitals of various types (seven community hospitals, 10 academic hospitals), training capacities (three hospitals had a surgical residency program), volume (14 hospitals were high volume, defined by having >200 licensed beds), and organization types (11 were not for profit, 4 were district hospitals, one county hospital and one proprietary). Thirty-six of the 58 surgeons included were neurosurgeons, 21 were orthopedic surgeons, and one in other specialty. Thirteen hospitals and 16 surgeons had ≥10 cases The most common indications for surgery were stenosis (84%) and spondylolisthesis (70%), and 31% of the patients had previous spine surgery. Median baseline ODI score was 46 of 100 (interquartile range: 33–56). With the exception of age, sex, and smoking status, all other patient characteristics varied significantly across hospitals and surgeons. Smoking rates differed across hospitals, but not across surgeons.

Overall Patient Characteristics and Range of Values By Hospitals and Surgeons

Twenty-one patients (2.8%) with preoperative ODI ≤15 and 68 patients (9.2%) with ODI score ≤22 were excluded from the MID improvement and minimal disability assessments, respectively. Overall, 58.7% of patients experienced an MID improvement in function and 42.5% reached minimal disability at 12 months.

Hospital Analysis

The proportion of patients who experienced MID improvement in ODI at 12 months differed significantly across 13 hospitals, ranging from 44.2% to 78.7%, with a small percentage of the total variability attributable to hospital group (ICC = 4.2%) (eFigure 1a, After adjusting for patient and surgical factors, the proportion of the total variability explained by hospital-level variance was even smaller (rates ranged from 47.3–74.4%; ICC 1.2%) (Figure 1A). The proportion of patients reaching minimal disability varied greatly across hospitals (range 28.6%–52.9%; ICC 1.0%) (eFigure 1b, After adjusting for patient and surgical factors, the range was 28.6% to 49.0% (Figure 1B) and ICC was <0.01%. Further adjusting for sample size using reliability adjustment shrank all values to the mean, suggesting no difference in functional improvement or minimal disability rates across hospitals.

Figure 1
Figure 1:
Hospital variation. At 12 months following lumbar fusion surgery, risk-adjusted proportion of patients who (A) experienced a minimal important difference (MID) improvement in Oswestry Disability Index (intraclass correlation coefficient [ICC] = 0.012), and (B) reached minimal disability (ICC < 0.0001). Horizontal line indicates overall unadjusted rate. Hospitals with <10 cases are not shown.

Surgeon Analysis

MID improvement rates in ODI differed by 2.5-fold across 16 surgeons (range: 33.3%–83.9%; ICC 6.5%) (eFigure 2a, After risk adjustment, the rates ranged from 41.7% to 100% (Figure 2A), with ICC of 3.5%. Similarly, minimal disability rates at 12 months varied greatly across surgeons (range 8.3%–70.0%; ICC 4.4%) (eFigure 2b, After risk adjustment, the differences between the rates were insignificant (range: 23.1%–59.6%; ICC <0.1%) (Figure 2B). After further reliability adjustment, all values shrank to the mean, reflecting no detectable statistical differences in these outcomes between the surgeons.

Figure 2
Figure 2:
Surgeon variation. At 12 months following lumbar fusion surgery, risk- adjusted proportion of patients who (A) experienced a minimal important difference (MID) improvement in Oswestry Disability Index (intraclass correlation coefficient [ICC] = 0.035), and (B) reached minimal disability (ICC < 0.001). Horizontal line indicates overall unadjusted rate. Surgeons with <10 cases are not shown.

Stratified By Likelihood of Functional Improvement

Applying the algorithm from a risk calculator from a prior study,5 64% (n = 438) of the patients were predicted to have a higher chance of improvement (ie, likelihood ≥50%). Among these patients, O/E ratios ranged from 0.71 to 1.18 by hospitals and 0.67 to 1.21 by surgeons, and only one of 10 hospitals and two of 11 surgeons had O/E ratios significantly lower than one (Figure 3A and B). The variations in O/E ratios among the patients who had a low chance of success were much larger: 0.16 to 1.87 by hospitals and 0.43 to 1.90 by surgeons, with five of 10 hospitals and five of 11 surgeons had O/E ratios that significantly deviated from one.

Figure 3
Figure 3:
Observed to expected ratio of patients experiencing a minimal important difference improvement in Oswestry Disability Index (ODI) at 12 months (ODI decreased by ≥15) (A) across hospitals and (B) across surgeons, stratified by probability of improvement (low chance = probability of improvement<50%, high chance = probability of improvement≥50%). Subgroup with n ≤5 removed.


This is the first study that examined the variation in PROs in lumbar spine surgery across hospitals and surgeons using a statewide prospective database. Overall, 58.7% of patients achieved a MID improvement in function and 42.5% reached minimal disability status at 12 months. We identified a 35% and 51% difference in functional improvement rates across hospitals and surgeons, respectively. The proportion of patients reaching minimal disability differed by 24% across hospitals and 62% across surgeons. However, after accounting for patient and surgical factors, the variation between hospitals and surgeons greatly decreased, and this variation became statistically insignificant after further reliability adjustment, suggesting that patient characteristics are the main drivers of variability in functional response among lumbar fusion patients. These findings are particularly relevant given the increasing interest in profiling the quality of hospitals and surgeons based on PROs. Interestingly, we found that hospital-level and surgeon-level factors were important in explaining variation in PRO response among patients with low chance of improvement (ie, those with <50% probability of MID improvement in ODI). Among the patients with low probability of improvement, five of 10 hospitals and five of 11 surgeons had O/E ratios for functional improvement that were significantly greater or less than one, whereas for patients with high probability of improvement, there was no significant variation from expected.

There is increasing interest in improving functional outcomes in spine surgery. The extent to which variation in functional outcomes exists between surgeons and centers has been unclear, as is the extent to which this variation is driven by patient, hospital, and surgeon factors. Previous studies examined variation in surgical outcomes across centers, but most have either focused on mortality rates and complications and not PROs,16 or were not in spine.17–20 Recently, a report from the spine patient outcomes research trial (SPORT) found significant variations in risk-adjusted ODI change across 13 hospitals 1 year after lumbar surgery.8 Several factors could have contributed to the difference in their findings from ours. First, our study is a statewide evaluation of academic and community hospitals, whereas the SPORT trial included randomized and observational cohorts from major academic centers in 11 states. Our study focused on patients who received lumbar fusion, whereas the SPORT study included all lumbar procedures (proportion of patients receiving fusion ranged from 36% to 73% across centers). Second, their outcome was score change (a continuous variable) instead of MID improvement or reaching minimal disability (binary outcomes). Third, although both studies had similar overall sample sizes, the distribution of patients between facilities may have been different. Low patient counts can make the detection of difference among hospitals’ or providers’ difficult. The SPORT study also did not adjust for reliability to eliminate the effects of chance on variation due to small sample sizes. Recent evidence suggests that reliability adjustment is a critical consideration when examining variation in outcomes across hospitals and can result in more accurate estimates of risk-adjusted outcomes.21 Lastly, the SPORT model did not adjust for smoking status, insurance, and asthma, covariates that were previously found to be important.5

Our study showed that after accounting for patient factors, the variations in functional outcomes attributable to surgeon or hospital groups were very small (maximum ICC was 3.5%), suggesting that functional outcome, despite being a clinically important outcome, may not be as useful in selecting higher quality hospitals or surgeons. In fact, to demonstrate variability with adequate reliability (≥0.8) at an ICC level of 3.5% would require approximately 110 patients per physician or hospital group. This patient volume is usually only possible in very large spine institutions or over a very long time period. Our study demonstrated that the overall variability was mainly driven by patient characteristics, suggesting that quality improvement efforts to reduce variation and improve overall functional outcomes may be better if focused at the patient level. For example, health care systems interested in improving functional outcomes after lumbar fusion surgery may choose to focus on modifiable risk factors, such as cigarette smoking, before patients progress to surgery.

An alternative approach is to limit spine fusion procedures to patients most likely to have favorable functional outcomes. We developed a PRO prediction tool ( for lumbar fusion surgery candidates that may be used to select individuals most likely to gain from undergoing lumbar fusion, as well as manage surgeon and patient expectations from surgery.5 However, to effectively select these patients an acceptable threshold for the likelihood of success, a value below which surgery should not be recommended has to be set. Setting the threshold too low may expose patients to an expensive and invasive intervention with a low probability of success. Alternatively, setting the threshold too high will exclude a number of patients that could have experienced meaningful functional improvement or minimal disability. We explored two thresholds (50% or 70% chance of functional improvement or minimal disability) to demonstrate how selecting to only operate on those with high likelihood of success may have important implications on surgical volumes. In our study, only 37% of patients who underwent lumbar fusion had ≥50% chance of reaching minimal disability, and only 11% had ≥70% chance (Table 2). Although this varied widely across surgeons and hospitals, the implications of using a 50% to 70% threshold could reduce surgical volume by 63% to 89%. Similarly, only 47% and 27% of patients had a ≥50% and ≥70% chance of achieving MID improvement in their function, respectively (Table 3), and using these thresholds might reduce surgical volume by 37% to 73%.

Proportion of Patients Undergoing Operation at Different Probability Limits of Reaching Minimal Disability at 12 Months
Proportion of Patients Undergoing Operation at Different Probability Limits of Achieving MID Improvement in ODI at 12 Months

There remain other challenges to the successful incorporation of prediction tools into spine practice, such as determining the optimal physician and patient education and the infrastructure needed to incorporate these tools,22 as well as ethical and equity concerns. As experience with PROs expands, it will be critical to ensure meaningful integration of PRO data into practice to improve care and value to patients and clinicians.

Lastly, we found significant hospital- and surgeon-level variation in O/E ratios in functional improvement among patients who are less likely to benefit from surgery (defined by a probability of improvement <50%), suggesting that hospital or surgeon type, experience, and skills may play a bigger role in influencing outcomes in this group. Future larger studies should examine whether and how surgeon or hospital factors affect functional outcomes among patients with low probability of success.

There are several limitations to our study. Results using data from Washington State hospitals (90% white; 70% with private insurance) may not be generalizable to different state populations. Moreover, 32% of patients who underwent lumbar surgery did not have 1-year follow-up ODI. Upon further investigation, we found no difference in mean predicted likelihood of improvement based on patient characteristics at baseline between patients with 1-year data and without (57% vs 58% respectively, P = 0.5). One-year dropout rates varied by hospital but not by surgeon, and we found no difference in the mean predicted likelihood of improvement among patients with missing 1-year PROs across hospitals (P = 0.25) nor across surgeons (P = 0.28). These findings suggest that the extent to which patients with missing follow-up data may have affected our results is minimal. We also had no information regarding conservative treatments received before surgery, which could have an effect on postsurgical outcomes. In addition, although our overall statewide study population is large, the surgical volume at the hospital- or surgeon levels is small, limiting the power to detect small differences between groups. Lastly, although the prediction model fitted the data well, there is always greater uncertainty in the predicted outcomes for patients at the extremes of the likelihood scale and unmeasured covariates that may be important.


There is an increasing interest in incorporating PROs as part of the move toward value-based payment and to improve quality. There is limited information about the variation of PROs across hospitals and surgeons, a key aspect of using these metrics for quality profiling. Our study demonstrates that variations in PROs across hospitals and surgeons were mainly driven by differences in patient populations undergoing lumbar fusion, suggesting that PROs may not be helpful in profiling hospital or surgeon quality. Addressing modifiable patient risk factors or better patient selection may decrease differences in outcomes across hospitals and providers and improve overall success rates, but the latter approach would have significant implications on surgical volumes.

Key Points

  • There is no significant difference in patient-reported functional outcomes after lumbar fusion surgery across hospitals and surgeons in Washington State after accounting for patient characteristics.
  • Variation in functional outcomes is mainly driven by patient characteristics, suggesting that functional outcomes may not be helpful in profiling hospital or surgeon quality.
  • Careful patient selection may reduce variation in success rates and improve overall quality, but would significantly reduce surgical volumes.


1. Gutacker N, Siciliani L, Moscelli G, et al. Choice of hospital: which type of quality matters? J Health Econ 2016; 50:230–246.
2. Birkmeyer N, Birkmeyer JD. Strategies for improving surgical quality—should payers reward excellence or effort? N Engl J Med 2006; 354:864–870.
3. Chou S, Deily ME, Li S, et al. Competition and the impact of online hospital report cards. J Health Econ 2014; 34:42–58.
4. Weiss AJ, Elixhauser A, Andrews RM. Characteristics of operating room procedures in U.S. hospitals, 2011: HCUP Statistical brief #170. In: Healthcare cost and utilization project (HCUP) statistical briefs. Rockville, MD: Agency for Healthcare Research and Quality, 2014.
5. Khor S, Lavallee D, Cizik AM, et al. Development and validation of a prediction model for pain and functional outcomes after lumbar spine surgery. JAMA Surg 2018; 153:634–642.
6. Dr. Robert Bree Collaborative – Accountable Payment Models Workgroup. Lumbar fusion surgery bundle. The Bree Collaborative. 2014:1–12.
7. The Dr. Robert Bree collaborative. Available at: Updated 2016.
8. Desai A, Bekelis K, Ball PA, et al. Variation in outcomes across centers after surgery for lumbar stenosis and degenerative spondylolisthesis in the spine patient outcomes research trial. Spine 2013; 38:678–691.
9. Lee MJ, Shonnard N, Farrokhi F, et al. The spine surgical care and outcomes assessment program (spine SCOAP): a surgeon-led approach to quality and safety. Spine (Phila Pa 1976) 2015; 40:332–341.
10. Flum DR, Alfonso-Cristancho R, Devine EB, et al. Implementation of a “real-world” learning health care system: Washington state's comparative effectiveness research translation network (CERTAIN). Surgery 2014; 155:860–866.
11. Fairbank JC, Pynsent PB. The oswestry disability index. Spine (Phila Pa 1976) 2000; 25:52.
12. Ghogawala Z, Resnick DK, Waiters WC III, et al. Guideline update for the performance of fusion procedures for degenerative disease of the lumbar spine. part 2: assessment of functional outcome following lumbar fusion. J Neurosurg Spine 2014; 21:7–13.
13. Parker SL, Adogwa O, Paul AR, et al. Utility of minimum clinically important difference in assessing pain, disability, and health state after transforaminal lumbar interbody fusion for degenerative lumbar spondylolisthesis. J Neurosurg Spine 2011; 14:598–604.
14. van Hooff ML, Mannion AF, Staub LP, et al. Determination of the oswestry disability index score equivalent to a “satisfactory symptom state” in patients undergoing surgery for degenerative disorders of the lumbar spine-a spine tango registry-based study. Spine J 2016; 16:1221–1230.
15. MacKenzie TA, Grunkemeier GL, Grunwald GK, et al. A primer on using shrinkage to compare in-hospital mortality between centers. Ann Thorac Surg 2015; 99:757–761.
16. Lawson KA, Saarela O, Abouassaly R, et al. The impact of quality variations on patients undergoing surgery for renal cell carcinoma: a national cancer database study. Eur Urol 2017; 72:379–386.
17. Gutacker N, Bojke C, Daidone S, et al. Hospital variation in patient-reported outcomes at the level of EQ-5D dimensions: evidence from England. Med Decis Mak 2013; 33:804–818.
18. Neuburger J, Hutchings A, van der Meulen J, et al. Using patient-reported outcomes (PROs) to compare the providers of surgery does the choice of measure matter? Med Care 2013; 51:517–523.
19. Varagunam M, Hutchings A, Black N. Do patient-reported outcomes offer a more sensitive method for comparing the outcomes of consultants than mortality? A multilevel analysis of routine data. BMJ Qual Saf 2015; 24:195–202.
20. Waljee JF, Ghaferi A, Finks JF, et al. Variation in patient-reported outcomes across hospitals following surgery. Med Care 2015; 53:960–966.
21. Dimick JB, Ghaferi AA, Osborne NH, et al. Reliability adjustment for reporting hospital outcomes with surgery. Ann Surg 2012; 255:703–707.
22. Klifto K, Klifto C, Slover J. Current concepts of shared decision making in orthopedic surgery. Curr Rev Musculoskelet Med 2017; 10:253–257.
23. Cizik AM, Lee MJ, Martin BI, et al. Using the spine surgical invasiveness index to identify risk of surgical site infection: a multivariate analysis. J Bone Joint Surg Am 2012; 94:335–342.
24. Mirza SK, Deyo RA, Heagerty PJ, et al. Development of an index to characterize the “invasiveness” of spine surgery: validation by comparison to blood loss and operative time. Spine (Phila Pa 1976) 2008; 33:2651–2661.

hospital variation; lumbar fusion; Oswestry Disability Index; outcome variation; patient-reported outcomes; prediction tool; quality indicators; quality profiling

Supplemental Digital Content

Copyright © 2019 Wolters Kluwer Health, Inc. All rights reserved.