Secondary Logo

Journal Logo


Development and Validation of a Multidomain Surgical Complication Classification System for Adult Spinal Deformity

Klineberg, Eric O. MDa; Wick, Joseph B. MDa; Lafage, Renaud MScb; Lafage, Virginie PhDb; Pellise, Ferran MD, PhDc; Haddad, Sleiman MD, PhDc; Yilgor, Caglar MDd; Núñez-Pereira, Susana MD, PhDc; Gupta, Munish MDe; Smith, Justin S. MD, PhDf; Shaffrey, Christopher MDf; Schwab, Frank MDb; Ames, Christopher MDg; Bess, Shay MDh; Lewis, Stephen MDi; Lenke, Lawrence G. MDj; Berven, Sigurd MDk; International Spine Study Group

Author Information
doi: 10.1097/BRS.0000000000003766

Adult spinal deformity (ASD) is common, with reported prevalence surpassing 60% among elderly patients.1 Surgical approaches to ASD include a broad spectrum of invasiveness and variability in complication rates, and procedures are often performed in older patients with multiple comorbidities.2 As a result, complications occur frequently, with rates ranging from 37% to 71%.3–7 Complications are an important metric for surgical quality and safety, yet significant variability exists in the literature regarding expected complication rates, severity, and impact on outcomes. With increasing prevalence of ASD surgery and health care's value-based focus,8 spine surgeons must be able to accurately characterize complication incidence and impact on surgical outcomes.2,7 Understanding complications also has important implications for preoperative counseling, setting appropriate patient expectations, shared decision making, and preoperative planning.

To date, complications in ASD surgery have been broadly categorized as “major” or “minor,”9,10 or classified by increasing severity.11 However, this is inadequate for understanding complications’ impact on outcomes.10 Similarly, inconsistent definitions of complications have likely been responsible for wide variation in reported ASD surgical complication rates.2 In response to the need for a better ASD-specific complications classification system, the International Spine Study Group (ISSG), AO Spine Deformity Knowledge Forum, and European Spine Study Group (ESSG) held a consensus meeting utilizing a Delphi method to develop a new ASD complications categorization system. Herein, we describe the development and validation of the ISSG-AO Spine Complications Classification System.


Classification System Development

A Delphi approach was used to develop the classification system. The Delphi process utilized a consensus conference with a panel comprised of AO Knowledge Forum members, who are spine experts from around the globe. Shared knowledge and experience of the panel was used to determine the relevant domains under which complications should be classified. Open-ended questions were first proposed regarding what comprised a complication classification, and what elements were important to include within the system. Consensus was achieved through in-person voting. Several voting rounds were required to gain consensus. Figure 1 demonstrates the complication categories and subcategories included within the classification system. Intervention severity classification by invasiveness was based on the Clavien-Dindo complication system for general surgery12; similar classifications have been applied to spine surgery.11 A high level of granularity was desired to allow for consistent complication type definition, especially as previous systems that broadly categorize complications have been shown to be inadequate.10 Neurologic complications were specifically emphasized as they are an especially burdensome risk inherent to spinal surgery. Complications are broadly classified as medical or spine-related adverse events. Medical complications are subclassified as cancer, cardiopulmonary, central nervous system, gastrointestinal, nonspine infection, musculoskeletal not related to the spine procedure, and renal. Spine complications are subclassified as implant-related, radiographic, neurologic, intraoperative, and wound-related. The classification system also records complication timing (intraoperative, in-hospital, or post-discharge), whether the event was related to the spine procedure, intervention details with invasiveness subclassification as none, mild, moderate, or severe, and complication resolution. Care was taken to ensure that complications were distinguished from expected clinical outcomes. For example, preoperative neurologic deficits are recorded to ensure that unresolved preoperative deficits are not counted as complications of the surgery itself. See Supplemental Digital Content 1, for the ISSG-AO Spine Complications Classification System reporting form.

Figure 1
Figure 1:
Complication categories and subcategories within the ISSG-AO Spine Complications Classification System.

Validation Study With Example Cases

Ten example cases (Figure 2; See Supplemental Digital Content 2, for a complete list of example cases) based on actual clinical scenarios were developed and used to assess the classification system's accuracy and repeatability. Cases included one to three complications with an average of two per case. There were a total of 22 complications, of which 40.9% were medical and 59.1% were spine-related. Five spine-related complications included neurologic deficits. Three required no intervention, nine required mild, six moderate, and four required severe interventions. At final follow-up, 15 complications were resolved, three were partially resolved, three were unresolved, and one resulted in death.

Figure 2
Figure 2:
Example case #1 as given to event readers. Cases two to 10 were presented in a similar format.

Twenty-two spine care providers (“event readers”) were invited to apply the classification system to the ten cases. Seventeen event readers, including ten fellowship trained spine surgeons, five trainees (residents/fellows), and two research coordinators, completed data collection forms for all cases. To assess intrarater reliability, event readers repeated their evaluations of the same 10 cases 2 weeks after their initial evaluation. “Round one” and “round two” refer to event readers’ responses provided in the first and second evaluations of cases, respectively. Responses were compared to the answer key and considered to be “at least” correct if event readers chose correctly under event category and were considered “exactly” correct if the correct response under the event specifics category was chosen. For example, if the patient sustained a medical-related complication event, an event reader's response was considered at least correct if their answer matched the answer key response under the “medical event” category, and was exactly correct if the response matched the answer key response for “medical event specifics.” An intervention answer was at least correct if the reader chose correctly for intervention severity, and was exactly correct if the reader chose correctly for intervention details (ie, the patient required a nonroutine ICU admission under the moderate severity intervention category). In subanalysis by event type, “mechanical” complications were a combination of implant-related and radiographic complications, including proximal junctional kyphosis.

Statistical Analysis

Descriptive statistics were utilized for validation study outcomes. The classification system's ability to accurately capture complications data to an increasing degree of granularity was assessed by sequentially combining and comparing the percent of correct responses from each category; these are reported as the M1, M2, and M3 data points (Figure 3). The M1 data point combines percent of correct responses from timing, type, medical/spine event, and medical/spine event specifics categories. M2 combines M1 with percent of correct responses from the neurologic complications category. M3 combines M2 with the correct responses from the intervention category. M1, M2, and M3 were calculated for all 10 example cases, then subcategorized to evaluate the ability of the system to accurately capture complication events based on whether the event was medical versus spine-related, an isolated neurologic event, an event with complications including a neurologic component, intraoperative, radiographic or implant-related, or a central nervous system event. M1, M2, and M3 were further subcategorized as “at least” or “exactly” correct, as described above. After calculating M1, M2, and M3, the Pearson correlation coefficient between case number and percent of correct responses was calculated to assess whether accuracy improved as event readers gained experience with the system. Analyses were performed using SPSS version 22.0 (IBM Corp, Armonk, NY).

Figure 3
Figure 3:
Flowchart demonstrating response categories included in the M1, M2, and M3 analyses. The M1, M2, and M3 analyses include sequentially increasing level of detail to assess the ISSG-AO Spine Complications Classification System's ability to capture complication event details with an increasing level of granularity.


Overall Accuracy of Capturing Events

In round one, the 17 event readers captured a total of 94.4% of complication events, with 16 of 22 events being captured by >95% of the event readers. Type of event (medical versus spine) was captured correctly in 87.8% of events, event category within medical or spine complications was captured correctly in 92.4% of events, and event specifics were captured correctly for 88.6% of events. Timing, intervention grading (mild, moderate, severe), and presence of neurologic complications were accurately captured for 89.1%, 81.3%, and 76.5% of events, respectively. In round two, type of event was captured correctly in 93.8% of events, category was captured correctly in 93.2% of events, and event specifics were captured correctly in 89.2% of events. Timing, grading of intervention, and presence of neurologic complications were accurately captured in 96.3%, 88.4%, and 77.9% of events, respectively.

Table 1 and Figure 4 show the system's ability to accurately categorize events as level of complication detail increases (M1, M2, M3). Overall, the M1 data point percentage for at least correct answers was 86.7%, and 82.4% for exactly correct answers. For M2, at least and exactly correct were 71.9% and 67.1%, respectively. For M3, at least correct was 65.2% and exactly correct was 47.6%.

TABLE 1 - “At Least” and “Exact” Accuracy With Complication Events Subdivided by Type of Complication (Medical, Surgical, Isolated Neurologic Events, Events With a Neurologic Component, Mechanical, and Central Nervous System), as well as M1, M2, and M3 Analysis for Each Event Type. Mechanical Complications Combine Implant-related and Radiographic Complications
Timing Type Category Subcategory Neurologic Intervention M1 M2 M3
Overall complications
 At least 98.6% 93.8% 94.6% 92.4% 81.3% 88.4% 88.7% 71.9% 65.2%
 Exact 96.3% 93.8% 93.2% 89.2% 77.9% 70.3% 82.4% 67.1% 47.6%
Medical-related complications
 At least 97.7% 90.6% 95.9% 92.4% 74.9% 88.3% 84.2% 64.9% 60.2%
 Exact 95.9% 90.6% 94.2% 89.5% 73.1% 57.9% 81.9% 63.2% 36.3%
Spine surgery-related complications
 At least 99.5% 96.7% 93.4% 92.3% 87.4% 88.5% 89.0% 78.6% 69.8%
 Exact 96.7% 95.7% 92.3% 89.0% 82.4% 81.9% 83.0% 70.9% 58.2%
Intraoperative complications
 At least 100% 91.8% 100% 100% 83.7% 87.8% 91.8% 79.6% 73.5%
 Exact 100% 91.8% 100% 100% 83.7% 81.6% 91.8% 79.6% 69.4%
Mechanical complications (implant-related and radiographic)
 At least 100% 100% 93.9% 93.9% 84.6% 87.7% 93.9% 78.5% 69.2%
 Exact 93.9% 100% 92.3% 89.2% 84.6% 80.0% 83.1% 72.3% 56.9%
Isolated neurologic complications
 At least 98.0% 96.1% 84.3% 80.4% 96.1% 86.3% 76.5% 76.5% 62.8%
 Exact 98.0% 96.1% 82.4% 74.5% 78.4% 80.4% 70.6% 58.8% 45.1%
Complications with neurologic component
 At least 98.8% 94.0% 89.2% 86.8% 91.6% 77.1% 80.7% 74.7% 57.8%
 Exact 98.8% 94.0% 88.0% 80.7% 77.1% 68.7% 77.1% 62.7% 42.2%
Central nervous system complications
 At least 95.9% 89.8% 100% 100% 81.6% 87.8% 87.8% 69.4% 63.3%
 Exact 95.9% 89.8% 100% 95.9% 75.5% 65.3% 87.8% 67.4% 44.9%

Figure 4
Figure 4:
M1-M3 “at least” and “exact” values. M1 combines percent correct responses from timing, type, medical/spine event, and medical/spine event specifics categories. M2 combines M1 with percent correct responses from the neurologic complications category. M3 combines M2 with percent correct responses from the intervention category.

Among complication events without a neurologic component, there were a total of 255 potentially correct answers that could be provided by the event readers. For example, for events without a neurologic component, readers could correctly answer “no” to the question “Was there a neurologic component to this adverse event” (see Supplemental Digital Content 1,, slide 3). Of the 255 potentially correct selections, readers provided 197 correct answers (77%) and 58 incorrect answers (23%). However, answers were considered incorrect if readers provided no answer (neither “no” or “yes” was selected for whether a neurologic component existed in the adverse event). In the neurologic category for events without a neurologic component, there were a total of 53 nonselected answers that were counted as incorrect. Overall, nonselected answers accounted for 91% of the total number of incorrect answers and 21% of answers overall in the neurologic category for events without a neurologic component.

Accuracy Subanalysis by Event Type

Table 1 shows at least and exact accuracy with complication events subdivided by type of complication, as well as M1, M2, and M3 data points for each event type. The largest gap between at least and exactly correct answers was noted for medical complications, at 88.3% versus 57.9%, respectively. In contrast, accuracy for both at least and exactly correct answers in spine complication events were >80%, and remained >70% for events when neurologic data were included (M2), but dropped to 58.2% if intervention data were included (M3).

For intraoperative, mechanical, and central nervous system (CNS) complications, exactly correct accuracy for category and event details ranged from 92.3% to 100%. Neurologic complications were also captured with a high level of accuracy; neurologic data from both isolated neurologic complications and complications with a neurologic component were captured with at least accuracy in >90% of events. However, accuracy decreased substantially when intervention data were combined with neurologic complication data. For example, exactly correct M3 for medical events was 36.3%, spine events 58.2%, isolated neurologic events 45.1%, events with a neurologic component 69.4%, mechanical events 56.9%, and CNS events 44.9%.

Subanalysis by Event Recorder Type

Table 2 shows surgeon and research coordinator accuracy in capturing event details. For the entire data set, M3 at least accuracy for research coordinators was 67.3% compared to 63.3% among surgeons; M3 exact accuracy was 46.7% for research coordinators and 48.4% for surgeons.

TABLE 2 - Comparison of Analysis Accuracy Between Surgeons and Research Coordinators
Event Reporter Type Timing Type Category Subcategory Neurologic Intervention M1 M2 M3
At least accuracy
 Surgeon 99.4% 95.2% 95.8% 92.1% 83.0% 86.1% 88.5% 75.8% 67.3%
 Research coordinator 97.9% 92.6% 93.6% 92.6% 79.8% 90.4% 85.1% 68.6% 63.3%
Exact accuracy
 Surgeon 97.0% 95.2% 93.3% 87.3% 78.2% 66.1% 83.0% 69.1% 46.7%
 Research coordinator 95.7% 92.6% 93.1% 91.0% 77.7% 73.9% 81.9% 65.4% 48.4%

Accuracy Over Time

Figure 5 shows a trend toward increased accuracy with each subsequent case analyzed during round one, with r = 0.747 between case number and M3 at least accuracy.

Figure 5
Figure 5:
M1, M2, M3 accuracy versus increasing case number in round one of the validation analysis. Correlation coefficient for the M3 data points versus case number is r = 0.747.


Event readers submitted 3984 answers between rounds one and two. Of these, 3060 were similar between rounds (76.8%), and the accuracy of identifying complications within each case remained similar, with most cases having >75% accuracy. Similar consistency between rounds was noted between surgeons and research coordinators, at 77.6% and 76.1%, respectively. Repeatability was found to be higher at 86.9% when answers were analyzed categorically (ie, whether answers between the first and second rounds remained “at least” correct, remained wrong, or remained the same). See Supplemental Digital Content 3, for a visual representation of answer repeatability between rounds one and two when analyzed by response categories.


This study analyzed the ISSG-AO Spine Complications Classification System's accuracy and repeatability in capturing ASD surgical complication data. In 10 example cases including 22 complications analyzed by 17 event readers, nearly 95% of complications were captured by >95% of the readers. Event-specific details were captured in 88.6% of complications. Accuracy remained high in capturing event details, ranging from 92.3% to 100% when complication events were subdivided by complication type. The system accurately captured complication type, timing, and category as reflected by the M1 analysis; however, accuracy decreased with inclusion of neurologic complication and intervention details between the M1 and M3 analyses. Diminishing accuracy between M1-3 was influenced by a large number of missing responses for neurologic data; this accounted for 91% of incorrect answers and 21% of overall answers in the neurologic category for events without a neurologic component. Surgeons and research coordinators had similar accuracy in capturing event details, as demonstrated by similar M3 accuracy levels. The system demonstrated good repeatability of 86.9% between the first and second rounds, and accuracy tended to improve over time as event readers’ experience increased. The granularity and resultant complexity of our data precluded calculation of kappa values, preventing direct comparison to prior classification systems’ inter- and intrarater repeatability. However, the ISSG-AO Spine Complications Classification system demonstrated a high degree of repeatability, similar to other classification systems, including the Spine Adverse Events Severity System (SAVES)11 and adaptations of the Clavien-Dindo classification system to orthopedic surgery.13

Overall, results suggest that the ISSG-AO Spine Complications Classification System allows for better classification of ASD complications than current practices, which are inadequate for evaluating complications’ impact on clinical outcomes.10 Although the system may be overly inclusive or “double count” complication events, this is unavoidable when designing a system to thoroughly capture and account for complications. The risk of being overly inclusive is preferable to missing complications, as the intention is to develop a repeatable, accurate system for capturing complications and subsequently characterizing their impact on outcomes.

The wide range of reported ASD complication rates, from 37% to 71%, reflects the need for a thorough, accurate, and repeatable ASD-specific complication classification system.2–6 Although attempts have been made to adapt other complication systems, such as the Clavien-Dindo system for general surgery, to orthopedic procedures,13 and spine complication systems have been described,11 there is no comprehensive, detailed, and repeatable system specifically intended for classifying complications in ASD surgery. Therefore, there is significant potential for the ISSG-AO Spine Complications Classification system to better allow surgeons to study and understand the clinical implications of complications. The potential impact of this system is especially important given the current focus on value-based care and the increasing incidence of ASD surgery.8

Although this study was not designed to detect differences in ability of research coordinators and surgeons to capture event data, our results suggest that research coordinators can utilize the complications system with accuracy comparable to that of surgeons. This is significant, as the ISSG-AO Spine Complications Classification System may allow nonclinicians to accurately compile complication data, ultimately increasing the proportion of patients for whom complications data are recorded. This can facilitate further research into ASD complications, including investigating the impact of specific complication categories and intervention severity on costs and outcomes, while also assisting with value analyses, preoperative planning, and patient counseling. Data quality and entry can be further improved by applying the system as a logic-based entry form, limiting the data fields to simplify data entry and minimize missing data. This is especially significant as missing data accounted for a large proportion of incorrect answers in the neurologic category.

The validation study had additional useful findings. As accuracy improved with reader experience, cases used in this study may serve as a training set before real-world application of the system. Data obtained from this study may also be used as an audit tool to assess fidelity of data in future applications of the system.

This study has important limitations. Results were nonempirical, system performance was not directly compared to previous classification systems, and we did not attempt to use the system to predict outcomes. Furthermore, cases were simple with an average of two complications per case, and were analyzed under idealized conditions with descriptions provided to event readers, rather than requiring readers to search for details in the electronic medical record. However, use of such cases for initial validation of the system minimized potential confounding effects, such as transcriptional errors in real clinical documentation. Use of these cases also ensured that event readers were assessing a wide variety of complication types. Additional limitations include a small overall number of event readers, and inadequate power to detect differences between reader types. Accuracy may have been biased by readers’ training backgrounds, as most were spine care providers. Despite potential limitations, initial results from the present study are necessary to justify future studies of the system in real clinical populations. Additional strengths include the large number of data points analyzed, high response rates of event readers, and assessment of response repeatability.

In conclusion, the system demonstrated good accuracy and repeatability among surgeons and research coordinators in capturing complication event type, timing, and details. Future studies should assess real-world application of the system in understanding complication impacts on ASD surgical outcomes and costs.

Key Points

  • Complications are common in adult spinal deformity surgery.
  • Previously, no Adult Spinal Deformity-specific complication classification system existed, limiting understanding of complications’ impact on costs and outcomes following adult spinal deformity surgery.
  • The International Spine Study Group, AO Spine, and European Spine Study group utilized a Delphi method to develop a comprehensive classification system for adult spinal deformity surgery complications.
  • The new complication classification system was validated, demonstrating both accuracy and repeatability in capturing complications.


1. Schwab F, Dubey A, Gamez L, et al. Adult scoliosis: prevalence, SF-36, and nutritional parameters in an elderly volunteer population. Spine (Phila Pa 1976) 2005; 30:1082–1085.
2. Lenke LG, Fehlings MG, Shaffrey CI, et al. Neurologic outcomes of complex adult spinal deformity surgery: results of the prospective, multicenter Scoli-RISK-1 study. Spine (Phila Pa 1976) 2016; 41:204–212.
3. Jain A, Hassanzadeh H, Puvanesarajah V, et al. Incidence of perioperative medical complications and mortality among elderly patients undergoing surgery for spinal deformity: analysis of 3519 patients. J Neurosurg Spine 2017; 27:534–539.
4. Smith JS, Shaffrey CI, Glassman SD, et al. Risk-benefit assessment of surgery for adult scoliosis: an analysis based on patient age. Spine (Phila Pa 1976) 2011; 36:817–824.
5. Acosta FL Jr, McClendon J Jr, O'Shaughnessy BA, et al. Morbidity and mortality after spinal deformity surgery in patients 75 years and older: complications and predictive factors. J Neurosurg Spine 2011; 15:667–674.
6. Daubs MD, Lenke LG, Cheh G, et al. Adult spinal deformity surgery: complications and outcomes in patients over age 60. Spine (Phila Pa 1976) 2007; 32:2238–2244.
7. Kwan KYH, Bow C, Samartzis D, et al. Non-neurologic adverse events after complex adult spinal deformity surgery: results from the prospective, multicenter Scoli-RISK-1 study. Eur Spine J 2019; 28:170–179.
8. Arutyunyan GG, Angevine PD, Berven S. Cost-effectiveness in adult spinal deformity surgery. Neurosurgery 2018; 83:597–601.
9. McDonnell MF, Glassman SD, Dimar JR 2nd, et al. Perioperative complications of anterior procedures on the spine. J Bone Joint Surg Am 1996; 78:839–847.
10. Glassman SD, Hamill CL, Bridwell KH, et al. The impact of perioperative complications on clinical outcome in adult deformity surgery. Spine (Phila Pa 1976) 2007; 32:2764–2770.
11. Rampersaud YR, Neary MA, White K. Spine adverse events severity system: content validation and interobserver reliability assessment. Spine (Phila Pa 1976) 2010; 35:790–795.
12. Dindo D, Demartines N, Clavien PA. Classification of surgical complications: a new proposal with evaluation in a cohort of 6336 patients and results of a survey. Ann Surg 2004; 240:205–213.
13. Sink EL, Leunig M, Zaltz I, et al. Academic Network for Conservational Hip Outcomes Research G. Reliability of a complication classification system for orthopaedic surgery. Clin Orthop Relat Res 2012; 470:2220–2226.

adult spinal deformity; adverse event; complications; costs; length of stay; neurologic deficit; outcomes; postoperative; radiographic; reoperation; revision; thoracolumbar

Supplemental Digital Content

Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.