We present methods used in conducting the systematic, evidence-based reviews and formulation of expert panel consensus statements and recommendations on the classification, risks, risk factors, and treatment of adjacent segment pathology (ASP). The development of ASP and clinical ASP (CASP) requiring treatment is a significant problem in spine surgery. These phenomena are often seen following a spinal fusion where either the proximal or distal adjacent segment can degenerate and lead to further symptoms. There is a lack of precision regarding the terminology used to describe ASP. As stated in the introduction of this Focus Issue, the term ASP is proposed in this issue as an umbrella term to refer to the breadth of clinical and/or radiographical changes at adjacent motion segments that developed subsequent to a previous spinal intervention. Under this umbrella, radiological adjacent segment pathology (RASP) and CASP are then used to categorize radiographical features (e.g., degenerative changes on MRI) and clinical manifestations (e.g., new radiculopathy), respectively.
We undertook this project to enhance physicians' awareness of the classification, risks, risk factors, and treatment of ASP. It is our desire that clinicians will use the information from these systematic reviews together with an understanding of their own capacities and experience to assess and better inform patients with respect to the potential for developing and preventing ASP. This methods article attempts to provide a detailed description of the methods undertaken in the systematic search and analytical summary of RASP and CASP issues and describe the process used to develop consensus statements and clinical recommendations regarding the development of ASP for this Focus Issue. Some nuances not consistent among the systematic reviews are not included in this article, but reported in the individual systematic review methods. The introduction of this Focus Issue highlights the importance of this endeavor and summarizes the results, consensus statements, and recommendations from the individual systematic reviews; therefore, these topics will not be addressed in this article.
MATERIALS AND METHODS
Topics and Clinical Questions
Topics were developed by an expert panel, which discussed and agreed upon important topics that were later translated into key clinical questions. These clinical questions were then assigned to expert author groups to refine as necessary and to specify the inclusion and exclusion criteria that would best address the clinical questions considering both clinical relevancy and the potential highest strength of evidence. Once the key clinical questions were finalized and the final inclusion/exclusion criteria were agreed upon, we began the systematic literature searches.
Literature Search and Article Selection
We identified articles for inclusion in 3 stages (Figure 1). In the first 2 stages, we conducted a systematic search of the PubMed and Cochrane Central Register of Controlled Trials1 electronic databases to identify articles reporting the classification, risks, risks factors, and treatment of ASP based on the clinical questions identified in each systematic review. The search period ended January or February 2012 depending on the systematic review. Searches were conducted using standard MeSH terms (controlled vocabulary) as well as specific free-text terms and combinations of terms related to the clinical conditions. We then hand-searched the bibliographies of key articles to ensure each topic was comprehensively examined. All possible relevant articles were screened using titles and abstracts by a single individual. Those articles that did not meet a set of a priori retrieval criteria for each topic area were excluded. In the third stage, we retrieved the full-text articles of those remaining, and 2 individuals independently examined each article. Again, those articles that did not meet a set of a priori inclusion criteria for each topic area were excluded. Any disagreement between screeners was resolved through discussion. Those articles selected form the evidence base for this report.
Most systematic reviews attempted to include articles on adult and/or adolescent patients (depending on the systematic review) who had surgery for degenerative spine conditions (e.g., degenerative disc disease, disc herniation, radiculopathy and/or myelopathy, kyphosis, deformity, or spondylolisthesis) when evaluating the risk and risk factors for ASP. One systematic review specifically included articles on patients who had spine fusion for reasons other than adult degeneration (i.e., congenital fusion, fusion in adolescents, and fusion for trauma). One systematic review included articles of population-based longitudinal studies on the incidence and prevalence of spine degeneration in the general population. For articles evaluating the treatment of ASP, we required that the primary purpose of the study was to evaluate outcomes of treating ASP. Whether an article was included for review depended on whether the study question was descriptive, one of therapy (whether one intervention or technology is superior to another), one of prognosis that identified risk of ASP or prognostic factors that influenced ASP (Figures 2A–C). For questions on the comparative safety of an intervention or technology, we sought and included randomized controlled trials or comparative cohort studies. Comparative cohort studies were defined as those clinical studies comparing the treatment or technology of interest to another treatment or technology in the same underlying patient population. For systematic reviews evaluating risks and risk factors for the development of ASP, we sought to include articles from the highest quality observational studies (e.g., cohort or case-control studies) that included raw data where we could calculate effect estimates or results from multivariate analyses so that we could report adjusted effect estimates for individual risk factors. For systematic reviews evaluating the treatment of ASP, we sought to include randomized trials or comparative cohort studies but included case series because of the paucity of articles evaluating the treatment of ASP.
In general, systematic reviews excluded articles including subjects with ASP, tumor, infection, trauma, neuromuscular scoliosis, and ankylosing spondylitis, because we were interested in the development of degenerative conditions in the adjacent segments following an index surgery. For articles evaluating the treatment of ASP, although ASP was not an exclusion criteria, the other criteria still applied. Other exclusions included reviews, editorials, case reports, articles not written in English, cadaver studies, animal studies, and laboratory simulations. Individual articles may have had other specific exclusion criteria that are specified in their methods.
Investigators extracted the following data from included clinical articles into evidence tables: study design, study population characteristics, study interventions for therapeutic studies, prognostic factors for studies of prognosis, follow-up time, follow-up rates, and rates or prevalence of radiographical and/or clinical ASP. An attempt was made to reconcile conflicting information among multiple articles presenting the same data.
The analysis methods for each individual systematic review are distinct and therefore not described here. A detailed explanation of the analysis methods from descriptive statistics to meta-analysis are described in the individual systematic review methods sections.
Rating the Evidence of Each Article
For reviews that were more descriptive in nature, the level of evidence of individual articles may or may not have been rated depending on the purpose. This is clarified in the individual reviews. For those aimed at therapy or prognosis, articles selected for inclusion were classified by level of evidence. The method used for assessing the quality of evidence of individual articles incorporates aspects of the rating scheme developed by the Oxford Centre for Evidence-Based Medicine2 and used with modification by The Journal of Bone & Joint Surgery American Volume.3 The focus of this scheme includes both study design and other methodological constructs consistent with evaluating the risk for bias. Each individual study was rated by 2 different investigators against pre-set criteria that resulted in an evidence rating (Level of Evidence I, II, III, or IV). Disagreements were resolved through discussion. Criteria used to assess an article's evidence varied according to the study type (therapy or prognosis) (Tables 1 and 2). For therapeutic articles, we assessed the type of study design, how patients were selected for treatment (concealed allocation versus nonconcealed), if the analysis included intention to treat, the presence of blinding during outcome assessment, whether co-interventions were applied equally, the completeness of follow-up, whether the study sample size was large enough to demonstrate statistical significance, and whether the investigators made an attempt at controlling potential confounding. For prognostic studies, we evaluated the article to determine whether the study design was a cohort study, a case-control study or a cross-sectional study. For cohort studies, we determined whether patients were at a similar point in the course of their disease or treatment, the completeness of follow-up, if patients were followed long enough for the outcome of interest to occur, and whether the investigators made an attempt at controlling potential confounding. For case-control studies, we assessed whether the incident cases were from a defined population and included all eligible cases during a specified time period, if controls represented the population from which the cases arose, if exposure preceded the outcome of interest, and whether the investigators accounted for other prognostic factors. For cross-sectional studies, we attempted to determine whether the study population was a representative sample of the population of interest, whether the exposure preceded the outcome of interest, if the investigators accounted for other prognostic factors and for surveys, whether there was a return rate of at least 80%. Whether or not criteria were met is displayed for each author as illustrated in Table 3 for therapeutic studies. All criteria met are marked. A blank for the criterion indicates that the criterion was not met, could not be determined, or was not reported by the author.
Overall Strength of Evidence
After individual article evaluation, the overall body of evidence with respect to each outcome was determined based on precepts outlined by the Grades of Recommendation Assessment, Development and Evaluation (GRADE) working group4 and recommendations made by the Agency for Healthcare Research and Quality (AHRQ).5 For all analytical systematic reviews (some descriptive reviews did not require this rigor), qualitative analysis was performed considering the following AHRQ required and additional domains.6 Risk of Bias was evaluated during the individual article evaluation described above in the section “Rating the Evidence of Each Article.” After individual article review, the literature evidence was rated as “high” for each individual key question and outcome if the majority of the articles were Level I or II. It was rated as “low” if the majority were level III or lower. We describe this as our “baseline” strength of evidence (Figure 3). We then considered the consistency, directness, and precision for potential “downgrading” the strength of the body of evidence (1 or 2 levels depending on the degree and number of domain violations). “Consistency” refers to the degree of similarity in the effect sizes of different studies within an evidence base. If effect sizes indicate the same direction of effect and if the range of effect sizes is narrow, an evidence base was judged to be consistent. If meta-analyses were conducted, we evaluated the consistency with an “eyeball test.” This test consists of a visual appraisal of the forest plots by 2 independent reviewers. Each reviewer gave an opinion on whether the 2 groups were similar or different. The 2 reviewers reached consensus. Single-article evidence bases were judged “consistency unknown (single study)” and downgraded. “Directness” concerned whether the evidence being assessed reflected a single, direct link between the interventions of interest and the ultimate health outcome; that is, a determination of whether the most clinically relevant outcome was measured or if a surrogate outcome was assessed. Directness also applied to indirect comparisons of treatment when head to head comparisons of interest could not be made within individual studies. Precision of evidence pertained to the degree of certainty surrounding an estimate of effect for a specific outcome. We based this on whether the estimate of effect reached statistical significance and/or the inspection of confidence intervals around effect estimates. When there were only 2 subgroups, the overlap of the confidence intervals of the summary estimates of the 2 groups was considered. No overlap of the confidence intervals indicates statistical significance, but the confidence intervals can overlap to a small degree and the difference still is statistically significant. Finally, if the strength of evidence was less than “HIGH,” we “upgraded” the evidence if there was a dose-response association or a strong magnitude of effect.
The following 4 possible levels and their definition were reported for each outcome of each key question:
- High—High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect.
- Moderate—Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate.
- Low—Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and likely to change the estimate.
- Insufficient—Evidence either is unavailable or does not permit a conclusion.
Delphi Process and Clinical Recommendations
To provide practical assessment and obtain a surgeon technical perspective, a panel of North American experts in the field of spine surgery and research was selected from the membership and teaching faculty of AOSpine North America, Inc., an educational foundation. The spine specialists selected are involved in a high level of peer reviewed practice, teaching, clinical research, and basic science activities. Their selection for involvement in this specific project was based upon the authors' active role in presenting critical and thoughtful teaching material and publication within the North American spine community. The editors sought senior members of the faculty. Twenty nine authors agreed to participate.
Guyatt et al7,8 highlight the importance of consensus expert opinion as it relates to evidence-based medicine, especially when higher levels of evidence fall short. Authors met through Web-based teleconferencing and in-person sessions. They then drew upon the panel's collective input during these sessions to reach consensus regarding statements and recommendations.
The consensus approach was modeled from the Delphi method, which is typically a group communication among a panel of geographically dispersed experts. The consensus technique allows experts to deal systematically with a complex problem and derive recommendations.7,9 A modified Delphi process was used to collect and distill knowledge from the group of experts by means of a series of meetings with controlled opinion feedback for the production of suitable information for decision making.9 Given the topics reviewed, many topics were not conducive to a clinical recommendation per se, in which case consensus statements were made. Clinical recommendations or consensus statements were made where appropriate using the GRADE/AHRQ approach that imparts a deliberate separation between the quality of the evidence (i.e., high, moderate, low, or inconclusive) from the strength of the recommendation or consensus statement. When appropriate, statements or recommendations “for” or “against” were given “strong” or “weak” designations based on the strength of the evidence, the balance of benefits/harms, and values and patient preferences. In some instances, costs may have been considered.
A total of 3382 articles were identified from our search strategies. After applying a priori inclusion/exclusion criteria to these articles, 127 articles were deemed relevant to answer the specific clinical questions and were critically reviewed and summarized in tables (Table 4). Lack of precision in the terminology related to what has traditionally been called adjacent segment disease and critical evaluation of definitions used across included articles led to a consensus to use adjacent segment pathology (ASP) and suggest it as a standard. No validated comprehensive classification system for ASP currently exists. Some of the highlights of the analysis revealed the annual, 5 and 10 year risks of developing cervical and lumbar ASP after surgery, several important risk factors associated with the development of cervical and lumbar ASP, and the possibility that some motion sparing procedures may be associated with a lower risk of ASP compared with fusion despite kinematic studies demonstrating similar adjacent segment mobility following these procedures. Other highlights included a high risk of proximal junctional kyphosis (PJK) following long fusions for deformity correction, postsurgical malalignment as a potential risk factor for RASP and the paucity of articles on treatment of cervical and lumbar ASP. A breakdown of critically appraised articles by clinical question are reported in Table 4.
We performed a systematic search of published literature on various topics for the classification, risks, risk factors, and treatment of RASP or CASP. Inclusion and exclusion criteria were used to identify the articles for selection. For most articles, those selected underwent critical appraisal and were assigned a level of evidence rating. We identified 3382 articles and critically appraised 127 relevant therapeutic and prognostic articles in an attempt to answer the clinical questions posed.
We restricted our search to articles written in English for practical reasons as we weighed the cost in terms of money and time versus the potential benefit of identifying, obtaining, and translating non-English language articles. Though the importance of non-English language trials is difficult to predict, one analysis suggests that excluding trials published in languages other than English has generally little effect on summary treatment effect estimates.10
Our approach to systematic review included a qualitative analysis for both individual article critical appraisal to assess risk of bias and the overall body of evidence using precepts set forth by both GRADE and AHRQ. For individual article “risk of bias” assessment, we chose the rating scheme developed by the Oxford Centre for Evidence-Based Medicine and modified for the orthopaedic surgical discipline by The Journal of Bone & Joint Surgery, American Volume.2,3 In this scheme, study design plays a prominent role in the level of evidence. In addition to study design, other important characteristics of internal validity were assessed depending on the type of study being appraised (studies of therapy, prognosis, or diagnosis), and these were considered in the final rating. The reliability of this system has been demonstrated11,12; however, its validity has yet to be established. Poolman et al13 compared the The Journal of Bone & Joint Surgery, American Volume rating with the Cochrane reporting quality score. They found among randomized controlled trials (RCTs) that the Cochrane reporting quality scores did not differ significantly between Level I and Level II studies. We established a “baseline” body of literature rating as “high” if the majority of articles were Level I or Level II and as “low” if the majority were Level III or lower.
Clinical recommendations or consensus statements were made where appropriate using the GRADE/AHRQ approach that imparts a deliberate separation between the quality of the evidence (i.e., high, moderate, low, or inconclusive) from the strength of the recommendation. The quality of evidence plays only a part as the strength of the recommendation or statement reflects the extent to which we can, across the range of patients for whom the recommendation or statements are intended, be confident that desirable effects of a management strategy outweigh undesirable effects.
- Establishing the risks, risk factors, and treatment of radiographical and clinical ASP is important yet complicated with many challenges.
- Systematic search, critical review, and qualitative data synthesis were conducted on various clinical questions concerning the classification, risks, risk factors, and treatment for radiographical and clinical ASP.
- We identified 3382 articles and critically appraised 127 relevant therapeutic and prognostic articles in an attempt to answer the clinical questions posed.
We are indebted to Nancy Holmes, RN, and Chi Lam, MS, for their administrative assistance.
1. Dickersin K, Manheimer E, Wieland S, et al. Development of the Cochrane collaboration's central register of controlled clinical trials. Eval Health Prof 2002;25:38–64.
3. Wright JG, Swiontkowski MF, Heckman JD. Introducing levels of evidence to the journal. J Bone Joint Surg Am 2003;85-A:1–3.
4. Atkins D, Best D, Briss PA, et al. Grading quality of evidence and strength of recommendations. BMJ 2004;328:1490.
5. West S, King V, Carey TS, et al. Systems to rate the strength of scientific evidence. Evidence Report/Technology Assessment No. 47 (Prepared by the Research Triangle Institute-University of North Carolina Evidence-based Practice Center, Contract No. 290–97-0011). Rockville, MD: Agency for Healthcare Research and Quality; 2002.
6. Owens DK, Lohr KN, Atkins D, et al. AHRQ series paper 5: grading the strength of a body of evidence when comparing medical interventions—agency for healthcare research and quality and the effective health-care program. J Clin Epidemiol 2010;63:513–23.
7. Guyatt G, Vist G, Falck-Ytter Y, et al. An emerging consensus on grading recommendations? ACP J Club 2006;144:A8–9.
8. Guyatt GH, Oxman AD, Kunz R, et al. Going from evidence to recommendations. BMJ 2008;336:1049–51.
9. Linstone H, Turnoff M. The Delphi Method: Techniques and Applications. Reading: Addison-Wesley; 1975.
10. Juni P, Holenstein F, Sterne J, et al. Direction and impact of language bias in meta-analyses of controlled trials: empirical study. Int J Epidemiol 2002;31:115–23.
11. Bhandari M, Swiontkowski MF, Einhorn TA, et al. Interobserver agreement in the application of levels of evidence to scientific papers in the American volume of the journal of bone and joint surgery. J Bone Joint Surg Am 2004;86-A:1717–20.
12. Obremskey WT, Bhandari M, Dirschl DR, et al. Internal fixation versus arthroplasty of comminuted fractures of the distal humerus. J Orthop Trauma 2003;17:463–5.
13. Poolman RW, Struijs PA, Krips R, et al. Does a “Level I Evidence” rating imply high quality of reporting in orthopaedic randomised controlled trials? BMC Med Res Methodol 2006;6:44.