Generic Quality-of-Life Outcome Measures
Six generic quality-of-life outcome measures were identified. The four most common were the short form 36, NHP, SF-12 (short form 12), and SIP. All four have been successfully validated and found to be reliable in populations with CLBP, Table 6. Only the SF-36 has been found to be responsive in a CLBP population. The others have not been tested. The SF-36 has been translated but not necessarily validated in 121 languages. The NHP has been validated in Norwegian. Like the SF-36, the SF-12 has been translated into many languages but not necessarily validated in them. The SIP has not been validated in any other language.
Objective Outcome Measures
Three “objective” outcome measures were identified including return to work, complications/adverse events, and medications used (Figure 4). These measures were not assessed because of the heterogeneity of methods utilized.
Preference-Based Outcome Measures
Two preference-based outcome measures were identified including EQ-5D and SF-6D. A summary of each outcome measure is available in Table 7. Neither outcome measure has been tested for validity, reliability, or responsiveness in a CLBP population. Neither the EQ-5D nor the SF-6D have been validated in any other language than English.
The most common functional outcome measures cited in the literature for evaluating the effectiveness of treatment for CLBP are the ODI, RMDQ, and ROM. Both the ODI and RMDQ have been found to be valid, reliable, and responsive to treatment. The ODI is composed of 10 items and the RMDQ 24 items. Among these, only the ODI is proprietary. The tools and devices used for ROM assessment are extremely heterogeneous and therefore were beyond the scope of this article. In addition, physical measures are often recommended, which include such evaluations as walking tests. The Shuttle Walking Test scores were found to be significantly correlated with walking items on the ODI, EQ-5D, and SF-36; however, it was found to be a less responsive measure1 and therefore, not necessary in studies evaluating CLBP outcomes. These tests are very time intensive and not recommended.
The most common pain outcome measures cited in the literature for evaluating the effectiveness of treatment for CLBP are the NPRS, BPI, PDI, MPQ, and VAS. Only the NPRS and VAS have been found to be responsive in the treatment of CLBP. The others have not been tested in this population. However, there are no studies establishing the validity of the NPRS and VAS though they are often considered the “gold standard” for pain. The VAS has been shown to be reliable; the NPRS has not been tested. The BPI has been tested for validity and the PDI for reliability. The MPQ has not been tested. Among these, only the BPI is proprietary. Pain may be the most responsive measure after spine surgery2 and therefore is a critical measurement. Because the NPRS and VAS are nearly identical single-item measures, the most widely used, and have been found to be responsive, they should receive the strongest consideration for measuring pain.
The most common psychosocial outcome measures cited in the literature for evaluating the effectiveness of treatment for CLBP are the FABQ, TSK, and BDI. All three measures have been validated in populations with CLBP and have been found to be reliable. Only the BDI has not been assessed for responsiveness. None of these are proprietary. One must consider whether depression is a domain that should be expected to change after CLBP treatment (i.e., treated as an outcome) or more appropriately a risk factor assessed prior to treatment to determine the potential prognosis of the patient.3
The most common generic/quality-of-life outcome measures were the SF-36, NHP, SF-12, and SIP. Only the SF-36 has been tested and found to be valid, reliable, and responsive in a CLBP population; however, the SF-36 is significantly less responsive than the VAS for pain and ODI for function.3 Furthermore, the SF-36 has been found to be minimally responsive in a nonoperative trial, which may be attributed to instrument floor effects.1 The SF-36 and SF-12 are proprietary.
Return to work, complications/adverse events, and medications used were the most common objective measures. An evaluation of their validity and reliability is either not possible or beyond the scope of this project. Complications should always be collected and monitored as part of standard clinical practice; however, the utility of return to work and medication use as outcomes are prone to several potential biases not the least of which is the baseline status of these measures. Therefore, they should not be used unless the study objectives are focused specifically on these issues.
The most common preference-based outcome measures were the EQ-5D and the SF-6D. Both measures are proprietary. Neither outcome measure has yet been tested for validity, reliability, or responsiveness in a CLBP population. However, both instruments provide a measure of general health state utility in units of quality-adjusted life years (QALYs) and allow comparisons of effectiveness across all disease states in medicine. The index or “utility” scale is anchored on 0 (death) and 1 (full health), and is integrated with survival, so that it is not merely the number of years of life expectancy but also the quality of those years that is considered. The SF-6D was developed to bridge the gap between the SF-36 and the QALY approach and can be calculated from the SF-36.
Such preference-based health-state measures are becoming the gold standard for cost-effectiveness and value assessment. In the era of health-care reform and value-based purchasing, demonstrating the value of spine care in cost per QALYs via EQ-5D or SF-6D will likely become increasingly utilized to compare spine outcomes to those in other disease states and serve as the primary utility measure in policy.
This systematic review has limitations. First, we used the frequency of citations initially among randomized trials only, and then among all studies to derive the final set of outcome measures to be reviewed. This was our best attempt at identifying important measures without deriving them ourselves. It was beyond the scope of this article to individually evaluate the validity, reliability, and responsiveness of these measures. We relied on the results reported by authors. Furthermore, many measures were not specifically tested for validity, reliability, and responsiveness. Thus, if a measure was not deemed valid, this may simply reflect its lack of testing for validity and should not necessarily be interpreted as “invalid.” For measures deemed “validated,” it is important to consider which measure it was validated against. It is possible that a measure is deemed valid against a measure that is not deemed important to patients with CLBP. Reliability and responsiveness, however, are universally important and not dependent on another measure and therefore, arguably, the most important of the three properties to consider when selecting a tool for measuring change in status before and after treatment for CLBP.
To our knowledge, this is the first systematic review that describes the frequency of citations of common spine outcome measures with the additional parameters of validity, reliability, responsiveness, languages, and proprietary status of each in the literature. As the study of comparative effectiveness of various treatment modalities is increasingly emphasized, it is of paramount importance that the measures of these outcomes have appropriate comparability and quality. In the current study, we have critically assessed the frequency of use and quality of a large range of outcome measures used for CLBP pain and have made recommendations regarding their use. When selecting a battery of outcome measures, one must always consider the clinician and patient burden. Multiple measures may be useful but at the expense of missing data or loss to follow-up if the burden is too high. In addition, the financial cost of processing more data may not outweigh the benefits. These must be considered. The recommendations from this review may serve as a guide toward the selection of outcome measure for the treatment of CLBP.
The SF-36 and its shorter versions are most commonly used and should be considered if quality of life is important. If cost utility is important, consider the EQ-5D or SF-6D. Psychosocial tests are best used as screening tools prior to surgery because of their lack of responsiveness. Complications should always be assessed as a standard of clinical practice. Return to work and medication use are complicated outcome measures and not recommended unless the specific study question is focused on these domains. Consider staff and patient burden when prioritizing one's battery of measures.
- Outcome measures should be routinely assessed in CLBP patients.
- The choice of appropriate outcome measure should be influenced by the study objectives and design, as well as psychometric properties of the particular measure within the context of CLBP
- Overall logistical and patient burden should also influence decisions when multiple measures are being used.
Supplemental digital content is available for this article. Direct URL citation appears in the printed text and is provided in the HTML and PDF versions of this article on the journal's Web site (www.spinejournal.com).
1. Fairbank JC, Couper J, Davies JB, et al. The Oswestry low back pain disability questionnaire. Physiotherapy 1980;66(8):271–3.
2. Roland M, Morris R. A study of the natural history of back pain. Part I: development of a reliable and sensitive measure of disability in low-back pain. Spine 1983;8:141–4.
3. Jensen MP, Turner JA, Romano JM. What is the maximum number of levels needed in pain intensity measurement? Pain 1994;58:387–92.
4. Cleeland CS. Measurement of pain by subjective report. In:Chapman CR, Loeser JD, eds. Advances in Pain Research and Therapy. New York, NY: Raven Press; 1989:391–403.
5. Cleeland CS, Ryan KM. Pain assessment: global use of the Brief Pain Inventory. Ann Acad Med Singapore 1994;23:129–38.
6. Pollard CA. Preliminary validity
study of the pain disability index. Percept Mot Skills 1984;59:974.
7. Melzack R. The McGill Pain Questionnaire: major properties and scoring methods. Pain 1975;1:277–99.
8. Waddell G, Newton M, Henderson I, et al. A Fear-Avoidance Beliefs Questionnaire (FABQ) and the role of fear-avoidance beliefs in chronic low back pain
and disability. Pain 1993;52:157–68.
9. Kori SH, Miller RP, Todd DD. Kinesiophobia: a new view of chronic pain behavior. Pain Manag 1990;3:35–43.
10. Vlaeyen JW, de Jong J, Geilen M, et al.. Graded exposure in vivo in the treatment of pain-related fear: a replicated single-case experimental design in four patients with chronic low back pain
. Behav Res Ther. 2001;39:151–66.
11. Beck AT, Ward CH, Mendelson M, et al. An inventory for measuring depression. Arch Gen Psychiatry 1961;4:561–71.
12. Ware JE Jr., Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care 1992;30:473–83.
13. Hunt SM, McKenna SP, McEwen J, et al. The Nottingham Health Profile: subjective health status and medical consultations. Soc Sci Med [A] 1981;15(3 pt 1):221–9.
14. Ware J Jr., Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability
. Med Care 1996;34:220–33.
15. Bergner M, Bobbitt RA, Kressel S, et al. The sickness impact profile: conceptual formulation and methodology for the development of a health status measure. Int J Health Serv 1976;6:393–415.
16. Bergner M, Bobbitt RA, Carter WB, et al. The Sickness Impact Profile: development and final revision of a health status measure. Med Care 1981;19:787–805.
17. EuroQoL. EuroQoL—a new facility for the measurement of health-related quality of life. The EuroQol Group. Health Policy 1990;16:199–208.
18. Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271–92.
Keywords:© 2011 Lippincott Williams & Wilkins, Inc.
chronic low back pain; outcomes; patient-reported; reliability; responsiveness; spine surgery; validity