Secondary Logo

Journal Logo

Evaluating Common Outcomes for Measuring Treatment Success for Chronic Low Back Pain

Chapman, Jens R., MD*; Norvell, Daniel C., PhD; Hermsmeyer, Jeffrey T., BS; Bransford, Richard J., MD*; DeVine, John, MD; McGirt, Matthew J., MD§; Lee, Michael J., MD

doi: 10.1097/BRS.0b013e31822ef74d
Outcomes in Chronic Low Back Pain
Free
SDC

Study Design. Systematic review.

Objective. To identify, describe, and evaluate common outcome measures in patients with chronic low back pain (CLBP).

Summary of Background Data. The treatment of CLBP has been associated with multiple clinical challenges. Further complicating this is the myriad of outcome scores used to assess treatment of CLBP. These scores have been used to examine different domains of patient satisfaction and quality of life in the literature. Critical assessment of the frequency, parity, and the quality of these outcomes are essential to improve our understanding of CLBP.

Methods. A systematic review of the English-language literature was undertaken for articles published from January 2001 through December 31, 2010. Electronic databases and reference lists of key articles were searched to identify measures used to evaluate outcomes in six different domains in patients with CLBP. The titles and abstracts of the peer-reviewed literature of LBP were searched to determine which of these measures were most commonly reported in the literature and which have been validated in populations with CLBP.

Results. We identified 75 outcome measures cited to evaluate CLBP. Twenty-nine of these outcome measures were excluded because of only a single citation leaving 46 measures for the evaluation. The most commonly used functional outcomes were the Oswestry Disability Index, Roland Morris Disability Index, and range of motion. For pain, the Numeric Pain Rating Scale, Brief Pain Inventory, Pain Disability Index, McGill Pain Questionnaire, and visual analog scale were most commonly cited. For psychosocial function, the Fear Avoidance Beliefs Questionnaire, Tampa Scale for Kinesiophobia, and Beck Depression Inventory were most commonly used. For generic quality of life, short form 36, Nottingham Health Profile, short form 12, and Sickness Impact Profile were the most common measures. For objective measures, the work status/return to work, complications or adverse events, and medications used were the most commonly cited. For preference-based measures, the Euro-Quol 5 dimensions and short form 6 dimensions were most commonly cited. The validity, reliability, responsiveness, universality, and potential proprietary requirements are summarized for each.

Conclusion. Outcome measures should be routinely assessed in patients with CLBP. The choice of appropriate outcome measure should be influenced by the study objectives and design, as well as properties of the particular measure within the context of CLBP.

Clinical Recommendations. Recommendation 1: When selecting the appropriate outcome measures for clinical or research purposes, consider domains that best measure what are most important to patients. Measures that are valid, reliable, and responsive to change should be considered first. Other considerations include the number of items required (especially in the context of multiple measures), whether the measure is validated in the relevant language, and the associated costs or fees. Strength: Strong

Recommendation 2: Domains of greatest importance include pain, function, and quality of life. If cost utilization is a priority, then preference-based measures should be considered. For pain, we recommend the VAS and NRPS because of their ease of administration and responsiveness. For function, we recommend the ODI and RMDQ. The SF-36 and its shorter versions are most commonly used and should be considered if quality of life is important. If cost utility is important, consider the EQ-5D or SF-6D. Psychosocial tests are best used as screening tools prior to surgery because of their lack of responsiveness. Complications should always be assessed as a standard of clinical practice. Return to work and medication use are complicated outcome measures and not recommended unless the specific study question is focused on these domains. Consider staff and patient burden when prioritizing one's battery of measures.

Multiple outcome measures have been utilized in the assessment of chronic low back pain. The outcome measures lack parity not only in the domains examined but also the quality of their assessment. This study critically assesses these outcome measures for chronic low back pain in regard to range, frequency of use, and quality.

*Department of Orthopaedic Surgery, Harborview Medical Center, Seattle, WA;

Spectrum Research, Inc., Tacoma, WA;

Departments of Spine Surgery and Orthopedic Residency, Eisenhower Army Medical Center, Ft Gordon, GA;

§Department of Neurosurgery, Vanderbilt University Medical Center, Nashville, Tennessee;

Department of Orthopaedic Surgery, University of Washington Medical Center, Seattle, WA.

Address correspondence and reprint requests to Michael J Lee, MD, Department of Orthopaedic Surgery, University of Washington Medical Center, Box 356500, 1959 Pacific Ave NE, Seattle, WA 98195; E-mail: mjl3000@uw.edu.

Acknowledgment date: May 6, 2011. First Revision date: June 14, 2011. Second Revision date: July 18, 2011. Acceptance date: June 21, 2011.

The manuscript submitted does not contain information about medical device(s)/drug(s).

Professional Organization and Foundation funds were received to support this work. No benefits in any form have been or will be received from a commercial party related directly or indirectly to the subject of this manuscript.

Analytic support for this work was provided by Spectrum Research, Inc., with funding from the AOSpine North America.

Chronic low back pain (CLBP) continues to have a major clinical and economic impact and its management continues to be the Sisyphus of challenges for the clinician. The diagnosis and treatment of CLBP have been surrounded by debate, and there is no clear consensus on optimal management. Perhaps equally daunting to the task of managing CLBP is the assessment of outcomes for CLBP treatment. Just as the proposed etiologies for CLBP are wide and varied, methods described for assessing outcomes for CLBP are equally diverse and a myriad of outcome scores have been utilized and reported in the literature. These outcome scores lack parity and measure different aspects of patient satisfaction. Some outcome scores have focused on pain, some have focused on function, and some others have focused on health-related quality of life of which their back pain may be a small component. In addition to the disparity of outcome type being measured, there is likely to be a spectrum of quality-of-outcome scores. We are unaware of a critical assessment of the quality-of-outcome measures for the treatment of CLBP in the literature.

As we progress in this era of “comparative effectiveness,” a clear understanding of how “effectiveness” is defined as it pertains to outcome scores is of paramount importance. Furthermore, a critical assessment of the quality of these outcome scores is of great value and may serve to guide clinicians and researchers when evaluating CLBP in the future. The purpose of the present study was to present a comprehensive review of the breadth, frequency, and quality-of-outcome scores used to evaluate CLBP. We sought to answer the following clinical questions: (1) What are the most common outcomes cited for measuring treatment success for CLBP in randomized controlled trials (RCTs)? (2) How frequently are these measures cited in all studies evaluating treatment for CLBP? (3) Among the most frequently cited measures, which are the most valid, reliable, and responsive?

Back to Top | Article Outline

MATERIALS AND METHODS

Electronic Literature Database

A systematic search was conducted in MEDLINE and the Cochrane Collaboration. The search results were limited to human studies published in the English language. Our search process was divided into three key steps to match our three objectives:

  1. To identify the most common outcome measures cited to evaluate treatment success, we identified all RCTs using the following MeSH terms: “Low Back Pain/rehabilitation” or “Low Back Pain/surgery” or “Low Back Pain/therapy excluding “cost,” “cancer,” “deformity,” “scoliosis,” instability,” “infection,” or “trauma” with publication dates from December 31, 2006, through December 31, 2011. From these RCTs, we compiled a list of all outcome measures that were used. These included patient reported outcomes, clinician-based outcomes, and physiological outcomes in the following six domains (defined by the author group): functional, pain, psychosocial, generic/quality of life, objective measures, and preference-based measures. We used RCTs as a search term for our first step to limit the thousands of articles that would be identified if all study designs were included. We felt RCTs served a reasonable surrogate for identifying both common and important outcomes for measuring CLBP treatment success (Figure 1).
  2. To evaluate the relative frequency of citations for these common outcome measures in all studies, we searched PubMed using the name of the measure combined with the search terms in step 1. The search results were limited to human studies published in the English language with publication dates of December 31, 2001, through December 31, 2010 (more details of search strategy can be found in the Supplemental Digital Content, Supplemental Digital Content 1, http://links.lww.com/BRS/A543). The titles and abstracts of the studies identified were checked to verify that the measure of interest was reported. The total number of studies reporting on each outcome measure in the title or abstract was determined.
  3. For the measures with the highest frequency of search returns, we searched for studies that evaluated their validity, reliability, responsiveness, languages validated in, and proprietary requirements.
Back to Top | Article Outline

Data Extraction

Each retrieved citation was reviewed by two independently working reviewers (D.C.N., J.T.H.). Most articles were excluded on the basis of information provided by the title or abstract. Citations that appeared to be appropriate or those that could not be excluded unequivocally from the title and abstract were identified, and the corresponding full-text reports were reviewed by the two reviewers. Any disagreement between them was resolved by consensus. For the final selection of outcome measures, the following data were extracted and summarized: name of the measure, domain (e.g., function, pain, quality of life, complication), description, interpretation, valid (yes/no), reliable (yes/no), responsive (yes/no), language validated in, and proprietary status.

Back to Top | Article Outline

Analysis

We identified and listed measures of treatment success reported in RCTs for treating CLBP. We divided these into the following domains: generic/quality of life, functional, pain, psychosocial, objective measures, and preference based. From this list, we report the frequency of use of these measures as reported in all studies on the treatment for CLBP over a 10-year period. For the top three to four measures in each domain, we summarized the content and whether the measure has been successfully tested for validity, reliability, and responsiveness in a CLBP population. Validity is commonly defined as the extent to which an instrument measures what it is intended to measure. Reliability is concerned with the consistency of the instrument. Responsiveness, also known as “sensitivity to change,” is a measure of how well an instrument can detect changes as a result of an intervention.

Back to Top | Article Outline

RESULTS

What Are the Most Common Outcomes for Measuring Treatment Success for CLBP?

A search of RCTs was done to determine the most common outcomes for measuring treatment success for CLBP. A total of 354 RCTs were identified. After searching title and abstract of each RCT, 75 outcome measures in six different domains were identified. After excluding the measures that were only found in one reference, 46 of the most common outcome measures were included for the frequency of citations analysis. The most commonly cited functional outcome measures in a CLBP population were the Oswestry Disability Index (ODI) (n = 168), Roland Morris Disability Questionnaire (RMDQ) (n = 132), and range of motion (ROM) (n = 71) (Figure 2). The most commonly cited pain outcome measures were the Numeric Pain Rating Scale (NPRS) (n = 13), Brief Pain Inventory (BPI) (n = 10), Pain Disability Index (PDI) (n = 10), McGill Pain Questionnaire (MPQ) (n = 10), and the visual analog scale (VAS) (n = 9) (Figure 3). The most commonly cited psychosocial outcome measures were the Fear Avoidance Beliefs Questionnaire (FABQ) (n = 31), Tampa Scale for Kinesiophobia (TSK) (n = 14), and Beck Depression Inventory (BDI) (n = 11) (Figure 4). The most commonly cited generic quality-of-life outcome measures were the short form 36 (SF-36) (n = 151), Nottingham Health profile (NHP) (n = 15), short-form 12 (SF-12) (n = 14), and Sickness Impact Profile (SIP) (n = 12) (Figure 5). The most commonly cited objective measures were work status/return to work (n = 199), complications or adverse events (n = 195), and medications used (n = 191) (Figure 6). The most commonly cited preference-based measures were the Euro-Quol 5 dimensions (EQ-5D)(n = 16) and short from 6 dimensions (SF-6D) (n = 4) (Figure 7). There were a few citations reporting a “global effect” single-item measure; however, these questions were not consistent between studies. Therefore, we could not validly combine them into one global affect measure for the results section.

Figure 1

Figure 1

Figure 2

Figure 2

Figure 3

Figure 3

Figure 4

Figure 4

Figure 5

Figure 5

Figure 6

Figure 6

Figure 7

Figure 7

Back to Top | Article Outline

Among the Most Frequently Used Measures, Which Are the Most Valid, Reliable, and Responsive?

A summary of the number of items, properties, additional language availability, and proprietary status for the most commonly cited measures in each domain is available in Table 1. Detailed results related to each of these measures can be found in Table 2.

TABLE 1

TABLE 1

TABLE 2

TABLE 2

Back to Top | Article Outline

Functional Outcome Measures

Eighteen functional outcome measures were identified (Figure 2). The two most common were the ODI and RMDQ. Both measures have been validated, tested successfully for reliability, and found to be responsive in a CLBP population (Table 3). Range of motion was not assessed because of the heterogeneity of ROM measurements (e.g., goniometer, visual estimation, opoelectronic systems). The ODI has been validated in 14 different languages (German, French, Italian, Greek, Portuguese, Norwegian, Danish, Korean, Japanese, Chinese, Thai, Persian, Turkish, and Arabic). The RMDQ has been validated in nine different languages (German, French, Greek, Portuguese, Spanish, Swedish, Turkish, Persian, and Tunisian).

TABLE 3

TABLE 3

Back to Top | Article Outline

Pain Outcome Measures

Eleven pain outcome measures were identified. The five most common were the NPRS, BPI, PDI, MPQ, and the VAS. A detailed summary of each outcome measure is available in Table 4. Among these, only the BPI has been validated in a Clow back pain population. Only the PDI and VAS have been found to be reliable in a CLBP population and both the NPRS and VAS have been found to be responsive in this population (Table 2). No studies were identified producing negative results with respect to these psychosocial properties. None of the pain outcomes have been validated in different languages.

TABLE 4-a

TABLE 4-a

TABLE 4-b

TABLE 4-b

Back to Top | Article Outline

Psychosocial Outcome Measures

Seven psychosocial outcome measures were identified. The three most common were the FABQ, TSK, and BDI (Figure 4). A detailed summary of each outcome measure is available in Table 5. All three have been successfully validated and found to be reliable in populations with CLBP, Table 4. Both the FABQ and TSK have been found to be responsive in a CLBP population; however, the FABQ responsiveness was low. The BDI has not been tested for responsiveness. The FABQ has been validated in five different languages (German, Spanish, Turkish, Portuguese, and Norwegian). The TSK has been validated in three languages (Chinese, Italian, and Norwegian). The BDI has not been validated in any other language.

TABLE 5-a

TABLE 5-a

TABLE 5-b

TABLE 5-b

Back to Top | Article Outline

Generic Quality-of-Life Outcome Measures

Six generic quality-of-life outcome measures were identified. The four most common were the short form 36, NHP, SF-12 (short form 12), and SIP. All four have been successfully validated and found to be reliable in populations with CLBP, Table 6. Only the SF-36 has been found to be responsive in a CLBP population. The others have not been tested. The SF-36 has been translated but not necessarily validated in 121 languages. The NHP has been validated in Norwegian. Like the SF-36, the SF-12 has been translated into many languages but not necessarily validated in them. The SIP has not been validated in any other language.

TABLE 6-a

TABLE 6-a

TABLE 6-b

TABLE 6-b

TABLE 6-c

TABLE 6-c

Back to Top | Article Outline

Objective Outcome Measures

Three “objective” outcome measures were identified including return to work, complications/adverse events, and medications used (Figure 4). These measures were not assessed because of the heterogeneity of methods utilized.

Back to Top | Article Outline

Preference-Based Outcome Measures

Two preference-based outcome measures were identified including EQ-5D and SF-6D. A summary of each outcome measure is available in Table 7. Neither outcome measure has been tested for validity, reliability, or responsiveness in a CLBP population. Neither the EQ-5D nor the SF-6D have been validated in any other language than English.

TABLE 7

TABLE 7

Back to Top | Article Outline

DISCUSSION

The most common functional outcome measures cited in the literature for evaluating the effectiveness of treatment for CLBP are the ODI, RMDQ, and ROM. Both the ODI and RMDQ have been found to be valid, reliable, and responsive to treatment. The ODI is composed of 10 items and the RMDQ 24 items. Among these, only the ODI is proprietary. The tools and devices used for ROM assessment are extremely heterogeneous and therefore were beyond the scope of this article. In addition, physical measures are often recommended, which include such evaluations as walking tests. The Shuttle Walking Test scores were found to be significantly correlated with walking items on the ODI, EQ-5D, and SF-36; however, it was found to be a less responsive measure1 and therefore, not necessary in studies evaluating CLBP outcomes. These tests are very time intensive and not recommended.

The most common pain outcome measures cited in the literature for evaluating the effectiveness of treatment for CLBP are the NPRS, BPI, PDI, MPQ, and VAS. Only the NPRS and VAS have been found to be responsive in the treatment of CLBP. The others have not been tested in this population. However, there are no studies establishing the validity of the NPRS and VAS though they are often considered the “gold standard” for pain. The VAS has been shown to be reliable; the NPRS has not been tested. The BPI has been tested for validity and the PDI for reliability. The MPQ has not been tested. Among these, only the BPI is proprietary. Pain may be the most responsive measure after spine surgery2 and therefore is a critical measurement. Because the NPRS and VAS are nearly identical single-item measures, the most widely used, and have been found to be responsive, they should receive the strongest consideration for measuring pain.

The most common psychosocial outcome measures cited in the literature for evaluating the effectiveness of treatment for CLBP are the FABQ, TSK, and BDI. All three measures have been validated in populations with CLBP and have been found to be reliable. Only the BDI has not been assessed for responsiveness. None of these are proprietary. One must consider whether depression is a domain that should be expected to change after CLBP treatment (i.e., treated as an outcome) or more appropriately a risk factor assessed prior to treatment to determine the potential prognosis of the patient.3

The most common generic/quality-of-life outcome measures were the SF-36, NHP, SF-12, and SIP. Only the SF-36 has been tested and found to be valid, reliable, and responsive in a CLBP population; however, the SF-36 is significantly less responsive than the VAS for pain and ODI for function.3 Furthermore, the SF-36 has been found to be minimally responsive in a nonoperative trial, which may be attributed to instrument floor effects.1 The SF-36 and SF-12 are proprietary.

Return to work, complications/adverse events, and medications used were the most common objective measures. An evaluation of their validity and reliability is either not possible or beyond the scope of this project. Complications should always be collected and monitored as part of standard clinical practice; however, the utility of return to work and medication use as outcomes are prone to several potential biases not the least of which is the baseline status of these measures. Therefore, they should not be used unless the study objectives are focused specifically on these issues.

The most common preference-based outcome measures were the EQ-5D and the SF-6D. Both measures are proprietary. Neither outcome measure has yet been tested for validity, reliability, or responsiveness in a CLBP population. However, both instruments provide a measure of general health state utility in units of quality-adjusted life years (QALYs) and allow comparisons of effectiveness across all disease states in medicine. The index or “utility” scale is anchored on 0 (death) and 1 (full health), and is integrated with survival, so that it is not merely the number of years of life expectancy but also the quality of those years that is considered. The SF-6D was developed to bridge the gap between the SF-36 and the QALY approach and can be calculated from the SF-36.

Such preference-based health-state measures are becoming the gold standard for cost-effectiveness and value assessment. In the era of health-care reform and value-based purchasing, demonstrating the value of spine care in cost per QALYs via EQ-5D or SF-6D will likely become increasingly utilized to compare spine outcomes to those in other disease states and serve as the primary utility measure in policy.

This systematic review has limitations. First, we used the frequency of citations initially among randomized trials only, and then among all studies to derive the final set of outcome measures to be reviewed. This was our best attempt at identifying important measures without deriving them ourselves. It was beyond the scope of this article to individually evaluate the validity, reliability, and responsiveness of these measures. We relied on the results reported by authors. Furthermore, many measures were not specifically tested for validity, reliability, and responsiveness. Thus, if a measure was not deemed valid, this may simply reflect its lack of testing for validity and should not necessarily be interpreted as “invalid.” For measures deemed “validated,” it is important to consider which measure it was validated against. It is possible that a measure is deemed valid against a measure that is not deemed important to patients with CLBP. Reliability and responsiveness, however, are universally important and not dependent on another measure and therefore, arguably, the most important of the three properties to consider when selecting a tool for measuring change in status before and after treatment for CLBP.

To our knowledge, this is the first systematic review that describes the frequency of citations of common spine outcome measures with the additional parameters of validity, reliability, responsiveness, languages, and proprietary status of each in the literature. As the study of comparative effectiveness of various treatment modalities is increasingly emphasized, it is of paramount importance that the measures of these outcomes have appropriate comparability and quality. In the current study, we have critically assessed the frequency of use and quality of a large range of outcome measures used for CLBP pain and have made recommendations regarding their use. When selecting a battery of outcome measures, one must always consider the clinician and patient burden. Multiple measures may be useful but at the expense of missing data or loss to follow-up if the burden is too high. In addition, the financial cost of processing more data may not outweigh the benefits. These must be considered. The recommendations from this review may serve as a guide toward the selection of outcome measure for the treatment of CLBP.

The SF-36 and its shorter versions are most commonly used and should be considered if quality of life is important. If cost utility is important, consider the EQ-5D or SF-6D. Psychosocial tests are best used as screening tools prior to surgery because of their lack of responsiveness. Complications should always be assessed as a standard of clinical practice. Return to work and medication use are complicated outcome measures and not recommended unless the specific study question is focused on these domains. Consider staff and patient burden when prioritizing one's battery of measures.

Back to Top | Article Outline

Key Points

  • Outcome measures should be routinely assessed in CLBP patients.
  • The choice of appropriate outcome measure should be influenced by the study objectives and design, as well as psychometric properties of the particular measure within the context of CLBP
  • Overall logistical and patient burden should also influence decisions when multiple measures are being used.

Supplemental digital content is available for this article. Direct URL citation appears in the printed text and is provided in the HTML and PDF versions of this article on the journal's Web site (www.spinejournal.com).

Back to Top | Article Outline

References

1. Fairbank JC, Couper J, Davies JB, et al. The Oswestry low back pain disability questionnaire. Physiotherapy 1980;66(8):271–3.
2. Roland M, Morris R. A study of the natural history of back pain. Part I: development of a reliable and sensitive measure of disability in low-back pain. Spine 1983;8:141–4.
3. Jensen MP, Turner JA, Romano JM. What is the maximum number of levels needed in pain intensity measurement? Pain 1994;58:387–92.
4. Cleeland CS. Measurement of pain by subjective report. In:Chapman CR, Loeser JD, eds. Advances in Pain Research and Therapy. New York, NY: Raven Press; 1989:391–403.
    5. Cleeland CS, Ryan KM. Pain assessment: global use of the Brief Pain Inventory. Ann Acad Med Singapore 1994;23:129–38.
    6. Pollard CA. Preliminary validity study of the pain disability index. Percept Mot Skills 1984;59:974.
    7. Melzack R. The McGill Pain Questionnaire: major properties and scoring methods. Pain 1975;1:277–99.
    8. Waddell G, Newton M, Henderson I, et al. A Fear-Avoidance Beliefs Questionnaire (FABQ) and the role of fear-avoidance beliefs in chronic low back pain and disability. Pain 1993;52:157–68.
    9. Kori SH, Miller RP, Todd DD. Kinesiophobia: a new view of chronic pain behavior. Pain Manag 1990;3:35–43.
      10. Vlaeyen JW, de Jong J, Geilen M, et al.. Graded exposure in vivo in the treatment of pain-related fear: a replicated single-case experimental design in four patients with chronic low back pain. Behav Res Ther. 2001;39:151–66.
      11. Beck AT, Ward CH, Mendelson M, et al. An inventory for measuring depression. Arch Gen Psychiatry 1961;4:561–71.
      12. Ware JE Jr., Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care 1992;30:473–83.
      13. Hunt SM, McKenna SP, McEwen J, et al. The Nottingham Health Profile: subjective health status and medical consultations. Soc Sci Med [A] 1981;15(3 pt 1):221–9.
        14. Ware J Jr., Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care 1996;34:220–33.
        15. Bergner M, Bobbitt RA, Kressel S, et al. The sickness impact profile: conceptual formulation and methodology for the development of a health status measure. Int J Health Serv 1976;6:393–415.
        16. Bergner M, Bobbitt RA, Carter WB, et al. The Sickness Impact Profile: development and final revision of a health status measure. Med Care 1981;19:787–805.
        17. EuroQoL. EuroQoL—a new facility for the measurement of health-related quality of life. The EuroQol Group. Health Policy 1990;16:199–208.
        18. Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271–92.
        Keywords:

        chronic low back pain; outcomes; patient-reported; reliability; responsiveness; spine surgery; validity

        Supplemental Digital Content

        Back to Top | Article Outline
        © 2011 Lippincott Williams & Wilkins, Inc.