Despite the increasing prevalence of sacral fractures, controversy ensues regarding the appropriate management of these injuries.1–3 The absence of an appropriate conceptual framework for the classification of sacral fractures has constrained the communication, education, and research necessary to evaluate clinical outcomes after injury. Though numerous sacral fracture classifications have been created, none have been universally adopted due to various limitations and drawbacks.4–10 According to Maurice Müller, “A classification is useful only if it considers the severity of the bone lesion and serves as a basis for treatment and for evaluation of the results.”11 Previous classification systems are bereft of prognosis, focusing on injury location, morphology, or mechanism. Their inability to inform management has resulted in limited high-quality evidence standardizing the treatment of sacral fractures.
The Arbeitsgemeinschaft für Osteosynthesefragen (AO) Spine Knowledge Forum Trauma recently developed the AO spine sacral classification system in an attempt to provide a concise yet comprehensive classification scheme for the standardization of treatment and prognostication of outcomes after sacral fractures.12 This system is separated into three main morphologic fracture types: type A (lower sacrococcygeal injuries), type B (posterior pelvic injuries), and type C (spino-pelvic injuries). Similar to previous AO Spine classification systems, it is designed in a hierarchical manner in which each morphologic type is subdivided into increasing numerical subtypes based on the severity of injury.13,14 Case specific modifiers and neurologic injury at the time of examination incorporate patient specific data to individualize management within a universally applicable scheme (Figure 1).
An ideal classification system should be easily comprehensible and reliable amongst the diverse group of surgeons involved in the diagnosis and management of sacral fractures. A surgeons’ level of experience may have a significant effect on the reliability and accuracy of a classification system. Moreover, surgeons of different subspecialities may have various levels of comfort with imaging assessment of sacral injuries required for accurate diagnosis and classification. Accordingly, the current study aimed to investigate the influence of the surgeons’ level of experience and subspeciality training on (1) the reliability and reproducibility of the novel AO Spine Sacral Classification System, and (2) the appropriate classification of sacral fractures according to this system.
The methodology for the development and description of the AO Spine Sacral Classification System has been previously described.12,15 A request was sent out to all members of the AO Spine and AO Trauma community to recruit surgeons who routinely treat patients with sacral fractures for participation in the validation of the sacral fracture classification. Previously obtained imaging was reviewed and classified by members of the AO Knowledge Forum Trauma. Cases with complete agreement were deemed acceptable for use in the validation as the gold standard. High-resolution computerized tomography (CT) images from 26 cases were assessed by 172 investigators representing a diverse array of surgical subspecialities and experience. At minimum, two cases of each fracture subtype were included. Prior to the validation, a training session including a video introduction, visual and verbal definitions of the classification system, and a 10-case practice assessment. Validation assessments were performed via web conference where both key high-resolution images, as well as axial/sagittal/coronal CT scan sequences of the fracture, were presented. Two assessments were performed by each investigator independently 3 weeks apart from one another. The case order was randomized in both assessments such that a consecutive series was not presented given the hierarchical nature of the classification system.
Respondents were divided into groups based on their years of experience in practice (<5, 5–10, 11–20, >20 yrs) and surgical subspecialty (general orthopedics, neurosurgery, orthopedic spine, orthopedic trauma). Cohen kappa (k) statistic was used to assess the reliability of classification between independent observers (interobserver agreement) and reproducibility between classifications of the same observer during separate evaluations (intraobserver reproducibility). Reliability and reproducibility were determined for each fracture morphology (A, B, C) and fracture subtype (A1, A2, A3, B1, B2, B3, C0, C1, C2, C3) and stratified by experience and subspeciality. The k coefficients were interpreted using the Landis and Koch grading system with kappa less than 0.20 defined as slight reliability/reproducibility, 0.20 to 0.40 as fair reliability/reproducibility, 0.40 to 0.60 as moderate reliability/reproducibility, 0.60 to 0.80 as substantial reliability/reproducibility, and more than 0.80 as excellent reliability/reproducibility.16
Accuracy of classification was calculated through percentage agreement with the predetermined gold standard fracture type (morphology and subtype) for each assessment. Cases with incomplete or poor-quality imaging were excluded from use. Gold standard agreement was stratified by surgical experience and subspecialty and compared via Fisher exact test. General orthopedic and orthopedic trauma specialties were combined for the analysis of gold standard agreement due to the low number of participants in these groups. Statistical significance was defined at P value <.05 and no adjustment for multiplicity was performed.
Overall, 172 surgeons were invited to participate. Respondent demographic characteristics, including the number of years in practice and subspecialty, are shown in Table 1. The majority of respondents were fellowship trained orthopedic spine surgeons (66.3%), followed by a large minority of neurosurgeons (23.3%). Orthopedic traumatologists and general orthopedists represented 5.2% and 4.7% of respondents, respectively (Table 1). Of the 172 surgeons, 158 completed the first assessment and 162 completed the second assessment for a total of 8320 case assessments. The 26 cases for review consisted of 7 (26.9%) type A, 8 (30.8%) type B, and 11 (42.3%) type C fractures with two cases each representing A1, A2, B1, C0 subtypes, and three cases each representing A3, B2, B3, C1, C2, C3 subtypes.
TABLE 1 -
Summary of Surgeon Respondent Demographics
|Number of years in practicen = 171
|Surgical subspecialtyn = 172
∗Excluded from further analyses.
Surgeons from the various subspecialities and years in practice achieved an overall k = 0.87 for morphology and k = 0.77 for subtype classification, representing excellent and substantial reproducibility, respectively (Table 2). Classification reproducibility was comparable across all surgical subspecialties and years of practice experience.
TABLE 2 -
Intraobserver Reproducibility Mean Kappa Values by Surgeon Experience
||Years in Practice
||n = 39
||n = 41
||n = 57
||n = 34
||n = 8
||n = 40
||n = 114
||n = 9
||n = 172
Respondents from all four practice experience groups (<5, 5–10, 11–20, >20 yrs) demonstrated excellent interobserver reliability when classifying overall morphology (k = 0.842/0.850, Assessment 1/Assessment 2) and substantial reliability in overall subtype (k = 0.719/0.751) in both assessments (Table 3). Across all experience groups, type A classification was associated with excellent reliability whereas type B and C classifications demonstrated both substantial and excellent reliability, respectively. Surgeons with 11 to 20 years of practice experience were the only group to display excellent interobserver reliability for type A (k = 0.944/0.969), B (k = 0.823/0.848), and C (k = 0.858/0.875) morphologies. Surgeons with 11 to 20 years of experience were also the only group with at least substantial interobserver reliability for all fracture subtypes in both assessments (Table 3). Fracture subtype A1 was classified with excellent reliability for all experience groups, whereas B1, C0, C1, C2, and C3 demonstrated moderate reliability in one or more experience groups (B1: >20; C0: >20; C1: <5, 5–10, >20; C3: <5, 5–10, >20) (Table 3).
TABLE 3 -
Interobserver Reliability by Surgeon Experience
and Surgical Subspecialty
| Overall/ Combined
| Overall/ Combined
General orthopedists, neurosurgeons, and orthopedic spine surgeons exhibited excellent interobserver reliability in overall morphology classification and substantial reliability in overall subtype classification (Table 3). Orthopedic trauma surgeons classified overall morphology (k = 0.748) and subtype (k = 0.618) less reliably than the other subspecialists in the first assessment. All surgical subspecialties achieved excellent interobserver reliability for type A morphology and substantial interobserver reliability for type B and C morphologies. Subtype A1 had excellent classification reliability for all subspecialties. However, subtypes B1, B2, C0, C1, and C3 were all associated with a moderate reliability for all subspecialties (Table 3). Orthopedic traumatologists had the highest frequency of moderate interobserver subtype reliability (six occasions: A2, A3, B1, B2, C0, C3).
Gold Standard Agreement
Surgeons in each experience category and subspecialty correctly classified fracture morphology in over 90% of cases and fracture subtype in over 80% of cases according to the gold standard (Table 4). Correct overall classification of fracture morphology (P1 = 0.024, P2 = 0.006; P1 = Assessment 1, P2 = Assessment 2) and subtype (P2 < 0.001) differed significantly by years of experience but not by subspecialty. Classification of type A morphology demonstrated a significant difference (P2 ≤ 0.001) when comparing surgical experience of the raters. Fracture subtype classification also varied significantly by surgeon experience for subtypes A2 (P1 = 0.015), B1 (P1 = 0.046), and B3 (P1 = 0.012) injury patterns. Classification of type A (P1 = 0.003) and B (P1 = 0.007) morphologies demonstrated a significant difference when comparing the surgical subspecialty of the raters. Within type B morphology, classification of subtype B1 (P1 = 0.005) differed significantly by surgical subspecialty.
TABLE 4 -
Percent Agreement With Gold Standard Fracture Classification
by Surgeon Experience
and Surgical Subspecialty
||General Orthopedics and Orthopedic Trauma
| Overall/ Combined
| Overall/ Combined
The ultimate goal of any classification system is to set the framework for evidence-based algorithms in the treatment of the pathology being classified. In order to do so, the classification system must facilitate communication requiring all users to be able to accurately and reliably apply the scheme. In this study of 8320 case assessments, we investigated the effects of surgeon experience and subspecialty training on the reliability, reproducibility, and accuracy of sacral fracture classification using the novel AO spine sacral classification system. Our results suggest that the AO spine sacral classification system can be universally applied by surgeons of various subspecialties and differing levels of experience with satisfactory results.
In this study, surgeon years of experience was partitioned into four groups (<5, 5–10, 11–20, >20 yrs) with a relatively even distribution of participants. When evaluating interobserver reliability, those with >20 years of experience were found to classify fracture morphology and subtype with less reliability. When evaluating the accuracy of classification overall, significant differences were found between groups in both assessments for fracture morphology and in the second assessment for fracture subtype. Surgeons with 5 to 10 years of experience had increased difficulty correctly classifying fractures of A and C morphologies and surgeons with more than 20 years of experience had increased difficulty correctly classifying all fracture subtypes overall in comparison to the other groups. Similar results were seen in the intraobserver reproducibility for fracture subtype overall where surgeons with more than 20 years of experience were found to perform less reliable than those with less years of experience. These findings mirror the validation results of the AO spine thoracolumbar injury classification system where fractures were most frequently misclassified by the most experienced surgeons.17,18 A plausible explanation for these observations is that more experienced surgeons are less inclined to learn and follow a new classification system due to their familiarity with prior systems. However, surgeons with slightly less experience (11–20 yrs) were the only group to display excellent interobserver reliability for all morphologies and at minimum substantial interobserver reliability for all fracture subtypes, indicating the importance of a certain degree of experience in correctly classifying fractures. Additionally, experienced (albeit slightly less experienced) surgeons may still be adept at adopting and applying new classification systems. Despite these significant differences, it is important to note that the overall reliability in this study remains substantial at minimum with high accuracy regardless of surgeon level of experience.
Overall, surgeon subspecialty did not appear to have a significant effect on the classification of sacral fractures. The interobserver reliability demonstrated that orthopedic traumatologists performed less reliable in classifying fracture morphology and fracture subtype on the first assessment. However, meaningful improvement was seen on the second assessment in both categories, ultimately obtaining excellent reliability. General orthopedists had the lowest reproducibility scores for fracture morphology and fracture subtype, an observation that is unlikely to be clinically significant as all subspecialities demonstrated excellent and substantial reproducibility for fracture morphology and subtype, respectively. While there was no significant difference in the combined accuracy of all assessments between subspecialties, few significant differences were found in the analysis for certain morphology and fracture subtypes. Specifically, orthopedic spine surgeons had the greatest difficulty correctly classifying type B morphology fractures, and specifically B1 fractures, despite excellent and substantial reliability, respectively. This may be due to the relative rarity of B1 fracture type cases and possible inexperience in diagnosing isolated longitudinal fractures medial to the foramen on CT.19 Additionally, general orthopedists and orthopedic trauma surgeons had a lower (94.5%) accuracy in diagnosing fractures of A morphology. However, the excellent overall accuracy achieved calls into question the clinical relevance of these significant differences.
Not surprisingly, the simplest fracture patterns (type A) and those injuries that were most stable (lower sacrococcygeal injuries) demonstrated the highest reliability, reproducibility, and accuracy among all subspecialities and levels of experience. Spinopelvic injuries (type C) were the most challenging fracture morphology to accurately diagnose. Within that, the most challenging subtypes include C0, C1, and C3. Whereas displaced lumbopelvic dissociations may be visible on plain radiographs, non-displaced spinopelvic injuries can be obscured by the relatively cephalad location of the fracture and superimposed ilia making it more challenging to diagnose without a high index of suspicion.20 While CT imaging of fractures were presented during the validation, MRI has been demonstrated to have an evolving role in the diagnosis of sacral insufficiency and stress fractures which could have improved the reliability and accuracy results for spinopelvic injuries (type C).2
The respondents demonstrated improvements in interobserver reliability and accuracy between first and second assessments across most sacral injury patterns with few exceptions. The largest increases in both reliability and accuracy assessments were noted for C0, C1, and C3 subtypes, which were previously been noted to be most difficult to classify across all raters. This improvement across assessments underscores a potential “learning effect.” As surgeons become more familiar with the classification system and incorporate its use into their daily practice, the reliability and accuracy may continue to improve. However, taking into account the first assessment alone, the results demonstrate acceptable accuracy, which underscores the applicability of the classification scheme to even the naïve surgeon.
This study is not without limitations. This investigation was performed in a retrospective manner based on previously obtained images. The true reliability and accuracy of a classification system is measured through its prospective application in real time. However, the logistics of performing over 8000 case assessments worldwide pose significant hurdles. Accordingly, we have performed an assessment we believe to be as close as possible to the “real-life scenario” taking into account these practical obstacles. For example, using a live web conference with a single pass through CT sections, raters had limited time to diagnose and classify fractures. Additionally, raters were not able to return to previous answers once submitted, a relative strength compared with surveys where raters are able to compare fractures and classifications against one another by returning to previous cases/answers. Moreover, while the levels of experience were relatively evenly distributed, a higher proportion of orthopedic spine respondents compared with other subspecialties participated. As a result, general orthopedic and orthopedic trauma subspecialities were required to be combined for statistical analysis. This is consistent with AO Spine membership demographics. However, high agreement regarding management of patients has been previously shown between orthopedic spine and other subspecialty surgeons.21,22 Lastly, study participants were all members of AO Spine and/or AO Trauma which may impart a participation bias towards academic and hospital employed surgeons who may be more familiar with AO classification systems.
Overall, the AO spine sacral classification system appears to be universally applicable among surgeons of various subspecialties and levels of experience with acceptable reliability, reproducibility, and accuracy. Future prospective clinical studies are needed to evaluate the clinical relevance and the usefulness of classification categories before the system can be used as a management tool.
- Respondents from all levels of practice experience demonstrated excellent interobserver reliability when classifying overall morphology and substantial interobserver reliability in overall subtype.
- General orthopedists, orthopedic traumatologists, neurosurgeons, and orthopedic spine surgeons exhibited excellent interobserver reliability in overall morphology classification and substantial interobserver reliability in overall subtype classification.
- The AO spine sacral classification system appears to be universally applicable among surgeons of various subspecialties and levels of experience with acceptable reliability, reproducibility, and accuracy.
This study was organized and funded by AO Spine International through the AO Spine Knowledge Forum Trauma, a focused group of international spinal trauma experts acting on behalf of AO Spine. AO Spine is a clinical division of the AO Foundation which is an independent medically-guided non-profit organization. Study support was provided directly through the AO Spine Research Department. The authors would like to thank Olesja Hazenbiller (AO Spine) for her editorial and administrative assistance.
AO Spine Sacral Classification Group Members:
Ahmed Shawky Abdelgawaad
Akbar Jaleel Zubairi
Alex del Arco
Antonio Sanchez Rodriguez
Ashraf El Naga
Dave Anthony Dizon
De Falco Giovanni
Dewan Shamsul Asif
Duchén Rodríguez Luis Miguel
Elias Enmanuel Javier Martinez
Heiller Torres Valencia
Ignacio Fernández Bances
Janardhana Aithala Parampalli
Jose Joefrey Arbatin
Juan Carlos Ramos Torres
Juan Esteban Muñoz Montoya
Lady Lozano C
Maarten de Boer
Matias Pereira Duarte
Paul van Urk
Popescu Eugen Cezar
Rajesh Bahadur Lakhey
Sbaffi Pier Filippo
Sebastián Anibal Kornfeld
Subramaniam Macherla haribabu
Vijay Kumar Loya
1. Bydon M, De la Garza-Ramos R, Macki M, et al. Incidence of sacral fractures and in-hospital postoperative complications in the United States. Spine (Phila Pa 1976)
2. Wagner D, Ossendorf C, Gruszka D, Hofmann A, Rommens PM. Fragility fractures of the sacrum: how to identify and when to treat surgically? Eur J Trauma Emerg Surg
3. Rodrigues-Pinto R, Kurd M, Schroeder G, et al. Sacral fractures and associated injuries. Global Spine J
4. Strange-Vognsen HH, Lebech A. An unusual type of fracture in the upper sacrum. J Orthop Trauma
5. Denis F, Davis S, Comfort T. Sacral fractures: an important problem. Retrospective analysis of 236 cases. Clin Orthop Relat R
6. Isler B. Lumbosacral lesions associated with pelvic ring injuries. J Orthop Trauma
7. Roy-Camille R, Saillant G, Gagna G, et al. Transverse fracture of the upper sacrum. Spine (Phila Pa 1976)
8. Lehman RA, Kang DG, Bellabarba C. A new classification
for complex lumbosacral injuries. Spine J
9. Bonnin JG. Sacral fractures and cauda equina lesions. Med World
10. Tile M. Pelvic ring fractures: should they be fixed? J Bone Joint Surg Br
11. Audigé L, Bhandari M, Hanson B, Kellam J. A concept for the validation
of fracture classifications. J Orthop Trauma
12. Vaccaro AR, Schroeder G, Divi S, et al. Description and reliability of the AOSpine sacral classification
system. J Bone Joint Surg
13. Vaccaro AR, Oner FC, Kepler C, et al. AOSpine thoracolumbar spine injury classification
system: fracture description, neurological status, and key modifiers. Spine (Phila Pa 1976)
14. Vaccaro AR, Koerner J, Radcliff K, et al. AOSpine subaxial cervical spine injury classification
system. Eur Spine J
15. Schroeder GD, Kurd M, Kepler C, et al. The development of a universally accepted sacral fracture classification
: a survey of AOSpine and AOTrauma members. Global Spine J
16. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics
17. Sadiqi S, Oner FC, Dvorak M, et al. The influence of spine surgeons’ experience on the classification
and intraobserver reliability of the novel AOSpine thoracolumbar spine injury classification
system-An International Study. Spine (Phila Pa 1976)
18. Rajasekaran S, Kanna R, Schroeder G, et al. Does the spine surgeon's experience affect fracture classification
, assessment of stability, and treatment plan in thoracolumbar injuries? Global Spine J
19. Ebraheim NA, Biyani A, Salpietro B. Zone III fractures of the sacrum. Spine (Phila Pa 1976)
20. Porrino JA, Kohl CA, Holden D, et al. The Importance of Sagittal 2D reconstruction in pelvic and sacral trauma: avoiding oversight of U-shaped fractures of the sacrum. Am J Roentgenol
21. Grauer JN, Vaccaro AR, Beiner JM, et al. Similarities and differences in the treatment of spine trauma
between surgical specialties and location of practice. Spine (Phila Pa 1976)
22. Canseco JA, Schroeder G, Patel P, et al. Regional and experiential differences in surgeon preference for the treatment of cervical facet injuries: a case study survey with the AO Spine
Cervical Classification Validation
Group. Eur Spine J