Secondary Logo

Journal Logo

Focus Papers

Methods for the Systematic Reviews on Patient Safety During Spine Surgery

Dettori, Joseph R., MPH, PhD*; Norvell, Daniel C., PhD*; Dekutoski, Mark, MD; Fisher, Charles, MD, MPH; Chapman, Jens R., MD§

Author Information
doi: 10.1097/BRS.0b013e3181d70494
  • Free

Understanding the medical literature regarding the assessment of patient centered outcomes and safety is critical to developing an informed patient centered decision with respect to medical intervention. As surgery implicitly creates greater risk this obligation is heightened. A solid understanding of complication type, incidence, risk factors, and impact is integral to the physician's role. Furthermore, methods to reduce the impact or avoid them altogether depend heavily on this understanding.

We present methods used in conducting the systematic, evidence-based reviews and expert panel recommendations of key challenges to the spine surgical practice. We undertook this project so as to enhance the physician's awareness of the patient safety literature surrounding these controversial areas. It is our desire that spine surgeons will use the information from these reviews together with an understanding of their own capacities and experience to better inform patients with respect to potential treatment outcomes, safety, and life impact.

Materials and Methods

Literature Search and Article Selection

We identified articles for inclusion in 2 steps, Figure 1. In the first step, we conducted a systematic search of the MEDLINE, EMBASE, and Cochrane Central Register of Controlled Trials1 electronic databases to identify studies reporting patient safety with respect to the clinical questions identified above. The search period differed depending on completion of each article, but generally ended between December 2008 and June 2009. Searches were conducted using standard MeSH terms (controlled vocabulary) as well as specific free-text terms and combinations of terms related to the clinical conditions. We then hand searched the bibliographies of key articles to ensure each topic was comprehensively examined. All possible relevant articles were screened using titles and abstracts by 2 individuals independently. Those articles that did not meet a set of a priori retrieval criteria for each topic area were excluded. Any disagreement between screeners was resolved through discussion. In the second step, we retrieved and examined the full text articles of those remaining and applied the same inclusion criteria once more. Those articles selected form the evidence base for this report.

Figure 1
Figure 1:
Study selection algorithm.

Inclusion Criteria

Whether an article was included for review depended on whether the study question was one of therapy (whether one intervention or technology is better than another), one of prognosis that identified risks or rates of complications, one of prognosis that identified prognostic factors that influenced outcome, or one of diagnosis, Figures 2A–D. For questions on efficacy or effectiveness of an intervention or technology, we sought and included randomized controlled trials or comparative cohort studies. Comparative cohort studies were defined as those nonrandomized clinical studies comparing the treatment or technology of interest with another concurrent treatment or technology. Studies of prognosis that identified risks or rates of complications from spine surgery were included if both the numerator (number of cases with the complication or the number of complications) and the denominator (number of patients at risk for the complication) were reported. For prognostic studies attempting to identify prognostic factors associated with outcome, we included those that compared a group exposed with the prognostic factor with a group not exposed to the prognostic factor if the outcome reported was the outcome of interest in answering the clinical questions. Diagnostic studies were included if they were cohort studies that compared the diagnostic test or criteria with a reference standard.

Figure 2
Figure 2:
A, algorithms used to determine if studies were eligible for inclusion: studies on therapy evaluating effectiveness. B, Algorithms used to determine if studies were eligible for inclusion: studies on prognosis identifying risks. C, Algorithms used to determine if studies were eligible for inclusion: studies on prognosis identifying prognostic factors. D, Algorithms used to determine if studies were eligible for inclusion: studies on diagnosis.

Exclusion Criteria

We excluded studies not published in English; letters to the editor; opinion articles; articles without scientific data or a report of their methodology; cadavers, nonhuman subjects, or animal studies; and laboratory simulations.

Rating the Evidence of Each Article

Articles selected for inclusion were classified by level of evidence. The method used for assessing the quality of evidence of individual studies as well as the overall quality of the body of evidence incorporates aspects of the rating scheme developed by the Oxford Centre for Evidence-based Medicine2 and used with modification by The Journal of Bone and Joint Surgery American Volume (J Bone Joint Surg Am),3 precepts outlined by the Grades of Recommendation Assessment, Development and Evaluation (GRADE) Working Group,4 and recommendations made by the Agency for Healthcare Research and Quality (AHRQ).5

Each individual study was rated by 2 different investigators against preset criteria that resulted in an evidence rating (level of evidence I, II, III, or IV). Disagreements were resolved through discussion. Criteria used to assess an article's evidence varied according to the study type (therapy, prognosis, diagnosis), Table 1. For therapeutic articles, we assessed the type of study design, how patients were selected for treatment if allocation was random (concealed allocation vs. nonconcealed), if the analysis included intention to treat, the presence of blinding during outcome assessment, whether cointerventions were applied equally, the completeness of follow-up, whether the study sample size was large enough to demonstrate statistical significance, and whether the investigators made an attempt at controlling potential confounding. For prognostic studies, we evaluated study design, whether patients were at a similar point in the course of their disease or treatment, the completeness of follow-up, if patients were observed long enough for the outcome of interest to occur, and whether the investigators made an attempt at controlling potential confounding. For diagnostic studies, we determined the study design, whether the study included a broad spectrum of persons with the expected condition, if the appropriate reference standard was used, if there was an adequate description of the test and reference standard to allow for replication of the study, whether there was blind comparison of the tests with the appropriate reference standard, and whether the reference standard was applied objectively to all patients without the results of the test influencing the use of the reference. The presence or absence is displayed for each author as illustrated in Table 2 for therapeutic studies. All criteria met are marked. A blank for the criterion indicates that the criterion was not met, could not be determined, or was not reported by the author.

Table 1
Table 1:
Definition of the Different Levels of Evidence for Articles on Therapy and Prognosis
Table 2
Table 2:
Example of Methods Evaluation for Articles on Therapy

Data Extraction

Investigators extracted the following data from included clinical studies into evidence tables: study design, study population characteristics, study interventions for therapeutic studies, prognostic factors for studies of prognosis, diagnostic tests and reference standards for diagnosis, study complications or adverse events, and follow-up time. An attempt was made to reconcile conflicting information among multiple reports presenting the same data. The evidence tables formed the basis for the review on each topic.


We attempted to pool complication rates across studies with similar outcomes and study design, weighting by sample size. For all reviews, qualitative analysis6 was performed considering the following 3 domains: quality of studies, quantity of studies, and consistency of results across studies.5 Quality of studies is related to the level of evidence as described above. Quantity of studies refers to the number of published studies similar in patient population, condition treated, and outcome assessed. Consistency refers to whether the results of the different studies lead to a similar conclusion (similar values and in the same direction). We judged whether the body of literature represented a minimum standard for each of the 3 domains using the following criteria: for study quality, at least 80% of the studies reported needed to be rated as a level of evidence I or II; for study quantity, at least 3 published studies were needed which were adequately powered to answer the study question; for study consistency, at least 70% of the studies had to have consistent results. Based on these criteria, the possible scenarios that could be encountered are detailed in Figure 3. Each scenario is ranked according to the impact that future research is likely to have on both the overall results and the confidence in the results using the concepts from the GRADE Working Group.4 This ranking describes the “Overall Strength of Evidence” for the body of literature on a specific topic. For example, an overall strength of evidence rating of “high” means that the minimum criteria for each domain were met and is interpreted as “further research is very unlikely to change our confidence in the estimate of effect.”

Figure 3
Figure 3:
Definition of overall strength of evidence considering quality of studies, quantity of studies and consistency of results across studies.

Delphi Process and Clinical Recommendations

To provide practical assessment and surgeon technical perspective a panel of North American experts in the field of spine surgery was selected from the membership and teaching faculty of AOSpine North America, Inc., an educational foundation. The surgeons selected are involved in a high level of peer reviewed practice, teaching and basic science activities. Their selection for involvement in this specific project was solicited based on the authors' active role in presenting critical and thoughtful teaching material and publication within the North American spine community. The editors sought senior members of the faculty and excluded resident or fellow participation in the publication process. Twenty-six authors agreed to participate.

Guyatt et al highlight the importance of consensus expert opinion as it relates to evidence-based medicine, especially when higher levels of evidence fall short.7,8 Our authors met in 4 sessions to review the clinical questions and systematic review data, Table 3. They then drew on the panel's collective input during these sessions to draw consensus regarding recommendations.

Table 3
Table 3:
Consensus Development Process

The consensus approach was modeled from the Delphi method which is typically a group communication among a panel of geographically dispersed experts. The consensus technique allows experts to deal systematically with a complex problem and derive recommendations.7,9 A modified Delphi process was used to collect and distil knowledge from the group of experts by means of a series of meetings with controlled opinion feedback for the production of suitable information for decision-making.9


A total of 2020 citations were identified from our search strategies. After applying our inclusion/exclusion criteria to these studies, 273 studies were deemed relevant to answer the specific clinical questions and were critically reviewed and summarized in tables. A breakdown of critically appraised studies by clinical question, are reported in Table 4.

Table 4
Table 4:
No. of Citations Identified and Studies Critically Reviewed, by Article Title


We performed a systematic search of published literature on various patient safety topics in spine surgery. Inclusion and exclusion criteria were used to identify the studies for selection. Those selected underwent critical appraisal and were assigned a level of evidence rating. We identified 2020 citations and critically appraised 273 relevant therapeutic, prognostic, and diagnostic studies in an attempt to answer the clinical questions posed.

We restricted our search to articles written in English for practical reasons as we weighed the cost in terms of money and time versus the potential benefit of identifying, obtaining and translating non-English language articles. Though the importance of non-English language trials is difficult to predict, one analysis suggests that excluding trials published in languages other than English has generally little effect on summary treatment effect estimates.10

Our approach to systematic review included a qualitative analysis that incorporated 3 important domains which helped determine the overall strength of the body of literature addressing each of our clinical questions. These domains include study quality, study quantity, and study consistency. For quality, we chose the rating scheme developed by the Oxford Centre for Evidence-Based Medicine and modified for the orthopedic surgical discipline by The Journal of Bone and Joint Surgery. American Volume (J Bone Joint Surg Am).2,3 In this scheme, study design plays a prominent role in the level of evidence. In addition to study design, other important characteristics of internal validity were assessed depending on the type of study being appraised (studies of therapy, prognosis or diagnosis), and these were considered in the final rating. The reliability of this system has been demonstrated,11,12 however, its validity has yet to be established. Poolman et al compared the J Bone Joint Surg Am rating to the Cochrane reporting quality score.13 They found among RCTs that the Cochrane reporting quality scores did not differ significantly between level I and level II studies. We deemed the body of literature to have high quality if 80% of the studies were graded level I or level II. To meet the criteria of having ample quantity, we required at least 3 studies with sufficient power to detect the same complications of interest. We found no consensus in the literature as to the minimum number of studies needed to judge that this criterion was met. We chose 3 studies as our minimum somewhat arbitrarily believing that 2 studies are too few, and that more than 3 studies with large enough sample sizes for relatively rare outcomes will be scarce. To reach the standard for consistency, we established that at least 70% of the studies had to give consistent results with similar effect measures. Seventy percent was chosen acknowledging that complication rates can vary among surgeons and among different hospital settings.14,15

Key Points

  • Patient safety among those undergoing spine surgeries is an important health care issue.
  • Systematic search, critical review, and qualitative data synthesis were conducted on various clinical questions concerning patient safety following spine surgery.
  • We identified 2020 citations and critically appraised 273 relevant therapeutic, prognostic, and diagnostic studies in an attempt to answer the clinical questions posed.


The authors are indebted to Nancy Holmes, RN, for her administrative assistance and to Ms. Erika Ecker, BS, for her assistance in searching the literature, abstracting data, and proofing.


1. Dickersin K, Manheimer E, Wieland S, et al. Development of the Cochrane Collaboration's CENTRAL Register of controlled clinical trials. Eval Health Prof 2002;25:38–64.
2. Phillips B, Ball C, Sackett D, et al. Levels of Evidence and Grades of Recommendation. Available at: Last accessed December 2, 2006.
3. Wright JG, Swiontkowski MF, Heckman JD. Introducing levels of evidence to the journal. J Bone Joint Surg Am 2003;85:1–3.
4. Atkins D, Best D, Briss PA, et al. Grading quality of evidence and strength of recommendations. BMJ 19 2004;328:1490.
5. West S, King V, Carey TS, et al. Systems to Rate the Strength of Scientific Evidence. Rockville, MD: Agency for Healthcare Research and Quality; 2002. Evidence Report/Technology Assessment No. 47 (Prepared by the Research Triangle Institute-University of North Carolina Evidence-based Practice Center, Contract No. 290–97–0011).
6. van Tulder M, Furlan A, Bombardier C, et al. Updated method guidelines for systematic reviews in the cochrane collaboration back review group. Spine 2003;28:1290–9.
7. Guyatt G, Vist G, Falck-Ytter Y, et al. An emerging consensus on grading recommendations? ACP J Club 2006;144:A8–9.
8. Guyatt GH, Oxman AD, Kunz R, et al. Going from evidence to recommendations. BMJ 2008;336:1049–51.
9. Linstone H, Turnoff M. The Delphi Method: Techniques and Applications. Reading: Addison-Wesley; 1975.
10. Juni P, Holenstein F, Sterne J, et al. Direction and impact of language bias in meta-analyses of controlled trials: empirical study. Int J Epidemiol 2002;31:115–23.
11. Bhandari M, Swiontkowski MF, Einhorn TA, et al. Interobserver agreement in the application of levels of evidence to scientific papers in the American volume of the Journal of Bone and Joint Surgery. J Bone Joint Surg Am 2004;86:1717–20.
12. Obremskey WT, Bhandari M, Dirschl DR, et al. Internal fixation versus arthroplasty of comminuted fractures of the distal humerus. J Orthop Trauma 2003;17:463–5.
13. Poolman RW, Struijs PA, Krips R, et al. Does a “Level I Evidence” rating imply high quality of reporting in orthopaedic randomised controlled trials? BMC Med Res Methodol 2006;6:44.
14. Fenton JJ, Mirza SK, Lahad A, et al. Variation in reported safety of lumbar interbody fusion: influence of industrial sponsorship and other study characteristics. Spine 2007;32:471–80.
15. Katz JN, Losina E, Barrett J, et al. Association between hospital and surgeon procedure volume and outcomes of total hip replacement in the United States medicare population. J Bone Joint Surg Am 2001;83:1622–9.

systematic review; adverse events; spine surgery; complications; methods

© 2010 Lippincott Williams & Wilkins, Inc.