Determining Health-Related Quality-of-Life Outcomes Using the SF-6D Following Total Hip Arthroplasty

Elmallah, Randa K. MD; Chughtai, Morad MD; Adib, Farshad MD; Bozic, Kevin J. MD, MBA; Kurtz, Steven M. PhD; Mont, Michael A. MD

Journal of Bone & Joint Surgery - American Volume: 15 March 2017 - Volume 99 - Issue 6 - p 494–498
doi: 10.2106/JBJS.15.01351
Scientific Articles

Background: Following total hip arthroplasty, patients’ perception of their postoperative improvement and health plays a large role in satisfaction with and success of the surgical procedure. The Short Form-6D (SF-6D) is a health-related quality-of-life measure that assigns numerical value to the perception of patients’ own health. The purpose was to determine SF-6D values of patients after total hip arthroplasty, to determine whether score changes were clinically relevant, and to compare these with postoperative functional improvements.

Methods: We evaluated 188 patients who underwent primary total hip arthroplasty at 7 institutions and who had a mean age of 69 years (range, 47 to 88 years) and a mean body mass index of 28.8 kg/m2 (range, 19.8 to 38.9 kg/m2). The SF-6D values were obtained from patients’ SF-36 scores, and clinical relevance of value changes was determined using effect size. Using previous research, effect sizes were considered small between 0.2 and 0.5, moderate between 0.6 to 0.8, and large at >0.8. Clinical correlation was assessed using the Lower-Extremity Activity Scale and Harris hip scores. Patients were assessed preoperatively and postoperatively at 6 months and 1, 2, 3, and 5 years.

Results: The SF-6D scores improved from preoperatively and achieved significance (p < 0.05) at all points. The effect size demonstrated good clinical relevance up to the latest follow-up: 1.27 at 6 months, 1.30 at 1 year, 1.07 at 2 years, 1.08 at 3 years, and 1.05 at 5 years. The Lower-Extremity Activity Scale improved at all follow-up points from preoperatively to 1.8 at 6 months, 2.0 at 1 year, 1.8 at 2 years, 1.5 at 3 years, and 1.6 points at 5 years. The Harris hip score improved to 38 points at 6 months, 40 points at 1 year, 38 points at 2 years, 39 points at 3 years, and 41 points at 5 years postoperatively. The improvements in the Lower-Extremity Activity Scale and the Harris hip score significantly positively correlated (p < 0.01) with the SF-6D scores at all time points.

Conclusions: SF-6D scores after total hip arthroplasty correlate with functional outcomes and have clinical relevance, as demonstrated by their effect size. Incorporating this straightforward and easy-to-use measurement tool when evaluating patients following total hip arthroplasty will facilitate future cost-utility analyses.

Level of Evidence: Therapeutic Level IV. See Instructions for Authors for a complete description of levels of evidence.

1Department of Orthopaedic Surgery, University of Mississippi, Jackson, Mississippi

2Department of Orthopaedic Surgery, Cleveland Clinic, Cleveland, Ohio

3Department of Orthopaedics, University of Maryland Medical Center Midtown Campus, Baltimore, Maryland

4Department of Surgery and Perioperative Care, Dell Medical School, The University of Texas at Austin, Austin, Texas

5Drexel University, Philadelphia, Pennsylvania

E-mail address for M.A. Mont:

Article Outline

The continued success of total hip arthroplasty and the projected rise in demand are dependent on improved postoperative functional scores, reduced lengths of stay, and reductions in pain1. However, patients’ perception of postoperative improvement in health plays a large role in their satisfaction and surgical success. Because of its subjective nature, it is difficult to quantify patients’ perception of their quality of life. Consequently, measures have been developed to assign numerical value to this abstract concept, with the aim to quantify satisfaction and to facilitate additional research on this topic.

The Short Form-6D (SF-6D) utility score is one such tool, which is a unique subclass of quality-of-life measures. It is scored on a scale of 0 to 1, which allows comparison of the health benefits of different interventions across disease states. The SF-6D is derived from the SF-36 scoring system and incorporates 6 components: vitality, pain, mental health, social functioning, physical functioning, and role limitations2. These scores can be used in cost-effectiveness analyses and determining quality-adjusted life years (QALYs), which aid in deducing whether the costs of undergoing total hip arthroplasty are justified3. These analyses are becoming more pertinent with increased pressure on the economic sustainability of the United States health-care system. However, understanding incremental changes in SF-6D values may be difficult for clinicians, particularly as these are metrics not regularly used in clinical practice. Therefore, to determine whether changes in SF-6D values are clinically relevant, the effect size is deduced for each change. The larger the effect size, the more likely that this change is clinically relevant4,5.

Although the SF-6D appears to be an easy-to-use quality-of-life measure, to our knowledge, there have been few studies that evaluate its feasibility and use in a clinical setting, particularly in patients who underwent total hip arthroplasty. Therefore, the purpose of this study was to determine the SF-6D utility values of patients who underwent total hip arthroplasty, to determine whether any changes in values were clinically important using effect size, and to correlate these findings with functional improvements in patients who underwent total hip arthroplasty.

Back to Top | Article Outline

Materials and Methods

We prospectively evaluated 188 patients who underwent primary total hip arthroplasty (194 total hip arthroplasties) between January 2006 and December 2009, at 7 different institutions, and who were recruited for a longitudinal post-market trial. As this was an industry-sponsored study, the data were collated and stored at Stryker. Patients were included if they underwent a primary total hip arthroplasty for a diagnosis of osteoarthritis and were excluded if they had an active infection within the affected hip joint prior to the total hip arthroplasty, had a body mass index (BMI) of ≥40 kg/m2, had a neuromuscular or neurosensory deficit, were immunologically suppressed or receiving chronic corticosteroids for >30 days, or had a diagnosis of Paget disease, renal osteodystrophy, lupus erythematosus, rheumatoid arthritis, metabolic bone disease, or sickle cell anemia. The mean age of the patients was 69 years (range, 47 to 88 years) and the mean BMI was 28.8 kg/m2 (range, 19.8 to 38.9 kg/m2). The cohort consisted of 61 men (32%) and 127 women (68%). This study was approved by the institutional review board at each center.

All patients underwent a primary total hip arthroplasty using a tapered, proximally coated, titanium, cementless stem design (Accolade TMZF, Stryker). The surgical procedures were performed by a fellowship-trained joint reconstruction surgeon. Patients were evaluated preoperatively and at 6 months, 1 year, 2 years, 3 years, and 5 years after the total hip arthroplasty. Please see Table I for the number of hips available for follow-up at each time point.

Back to Top | Article Outline

SF-6D and Utility Scores

The SF-36 questionnaire was completed by patients at each of the above time points. The SF-6D questionnaire is derived from the SF-36, and contains 6 categories: vitality, pain, mental health, social functioning, physical functioning, and role limitations. Each category is subdivided into 4 to 6 levels of severity5,6. Based on various combinations of severity from each of the 6 categories, this SF-6D questionnaire can produce up to 18,000 combinations of answers, otherwise known as health states. Each one of these health states is then correlated with a single index, or utility, score, between 0 and 1: 0 is a health state equivalent to death, and 1 denotes full health. Negative scores were possible for health states considered to be worse than death6.

Back to Top | Article Outline

Effect Size

Although changes in the utility values derived from SF-6D scores may be significant with large samples, it is important to determine if these differences are clinically relevant, particularly as the units are likely to be unfamiliar to many clinicians. Clinical relevance was estimated using effect size, which measures the sensitivity of change in the utility scores and whether these changes make an impact clinically (i.e., the greater that the effect size is, the greater that the impact is, and the more relevant that changes in scores are)5. The effect size and the minimal clinically important difference are two of the main ways to interpret health-related quality-of-life changes. The effect size was used as it is a distribution-based measure, compared with the minimal clinically important difference, which is anchor-based and requires an independent measure to elucidate the meaning of a change. However, both the effect size and minimal clinically important difference have been shown to yield similar results2,7.

Using the Cohen d method, the effect size was calculated at each time point by dividing the mean change in utility scores by the standard deviations (weighted and pooled)4. This formula, in which the magnitude of change is divided by the standard deviation of the observed change, is described by Walters and Brazier2. Using this formula, effect size is classified as small (between 0.2 and 0.5), moderate (between 0.6 and 0.8), or large (above 0.8)2,5,8.

Back to Top | Article Outline

Functional Outcomes

To determine whether SF-6D changes correlated with changes in function after total hip arthroplasty, we evaluated functional outcomes using the Lower-Extremity Activity Scale and Harris hip scores9,10. Both the Lower-Extremity Activity Scale and Harris hip scores were obtained preoperatively and postoperatively at 6 months, 1 year, 2 years, 3 years, and 5 years.

Back to Top | Article Outline

Statistical Analysis

An intention-to-treat analysis was performed. The SF-6D derived utility values were determined for each patient, and mean values were calculated for each follow-up. The effect size was calculated using the Cohen d method. Multiple imputation was used to account for missing follow-up observations. The imputed values were assumed to be missing at random and, therefore, would have a similar distribution pattern to the present data11. A p value of <0.05 was considered significant for all outcomes. Data were stored within the Structured Query Language server (Microsoft) at Stryker, and statistical analyses were performed using SigmaStat version 3.0 (Systat Software).

Back to Top | Article Outline


SF-6D-Derived Utility Values

When compared with the preoperative SF-6D score (0.614), there were significant improvements in the mean SF-6D-derived utility values at all time points postoperatively (p < 0.0001). The mean score was 0.788 at 6 months, but peaked at 1 year (0.799). It was 0.777 at 2 years, 0.786 at 3 years, and 0.774 at 5 years. However, differences between postoperative scores were not significant (p > 0.12), implying that the perceived gains in health were maintained throughout this time period (Table I and Fig. 1).

Back to Top | Article Outline

Effect Size

The effect size was large at all postoperative time points (1.27 at 6 months, 1.30 at 1 year, 1.07 at 2 years, 1.08 at 3 years, and 1.05 at 5 years), which demonstrated good clinical relevance. This supports our findings that the changes in the SF-6D-derived utility values at these time points were significant (p < 0.05) and clinically relevant.

Back to Top | Article Outline

Harris Hip Score and Lower-Extremity Activity Scores

The Harris hip score improved postoperatively to 38 points at 6 months, 40 points at 1 year, 38 points at 2 years, and 41 points at 5 years (p < 0.001). These scores significantly correlated (p < 0.01) with the SF-6D (Table II). When evaluating the Lower-Extremity Activity Scale, the scores also significantly improved from preoperatively to 1.8 points at 6 months, 2.0 points at 1 year, 1.8 points at 2 years, 1.5 points at 3 years, and 1.6 points at 5 years (p < 0.001). The mean scores also significantly correlated (p < 0.01) with the SF-6D (Table II).

Back to Top | Article Outline


The success of total hip arthroplasty is measured by improvements in pain and function postoperatively. However, this success is also largely dependent on patients’ overall satisfaction with a procedure and their perceived improvement in their quality of life. Consequently, health-related quality-of-life measures, such as the SF-6D, have been developed to quantify these improvements. Our results demonstrated that in this patient population, the SF-6D score significantly improved postoperatively, and these improvements were clinically relevant, as demonstrated by the large effect size. In addition, the mean Harris hip score and Lower-Extremity Activity Scale significantly improved at all time points, and these improvements significantly positively correlated with the SF-6D scores.

It is important to note that the purpose of the SF-6D is not primarily to provide a shorter or faster outcome measurement, but rather is a useful tool for cost analyses, and it provides a means to measure the improvement in quality of life from a patient’s perspective. The SF-36 (and SF-12) do not allow for this. The SF-6D condenses available SF-36 data and acts as a means to convert these functional outcome data into abstract and less intuitively apparent utility values, scaled from 0 to 1, which are needed for economic analyses5. Currently, both versions of the SF-36 and SF-12 are the only measures available that provide a description of health and the capability to conduct an economic evaluation through the SF-6D. We attempted to correlate this with clinical findings in two ways to make this more tangible to clinicians. The first way to do this was by determining the effect size of the utility value or how clinically relevant the findings are. The effect sizes are separated into categories of magnitude, and clinical relevance is always demonstrated when the effect size is large. The second method was to correlate the values with functional outcome measures, such as the Harris hip score and Lower-Extremity Activity Scale. The correlation with the Harris hip score and Lower-Extremity Activity Scale was performed to demonstrate to clinicians that utility values, or patients’ perceptions of their health, do correlate with functional outcomes, and to provide tangible evidence.

To our knowledge, there have been few studies showing the use of SF-6D in total hip arthroplasty. Osnes-Ringen et al. evaluated the cost-effectiveness of total hip arthroplasty, among other procedures, in patients with inflammatory arthropathy, using the SF-6D12. The SF-6D had a significant improvement (p < 0.001) at 1 year after arthroplasty (0.06). In addition, the authors were able to use the SF-6D values to determine the cost-effectiveness of multiple procedures and concluded that undergoing total hip arthroplasty was cost-effective. Feeny et al. evaluated SF-6D changes in 63 patients after total hip arthroplasty and observed an improvement of 0.1 at 6 months postoperatively, which was clinically relevant as determined by the effect size (1.06)13.

In addition, Quintana et al. observed significant improvements in SF-6D utility scores at 6 months at 0.16 (effect size, 1.23) and at 2 years at 0.18 (effect size, 1.38) following total hip arthroplasty (p < 0.001)14. Furthermore, these measures can aid in evaluating the effect of comorbidities on health-related quality-of-life. Sach et al. compared the effect of BMI on quality of life and demonstrated that obese patients had significantly lower SF-6D scores (p < 0.001)15.

Determining the validity of these measures is controversial, as there is no gold standard for evaluating the quality of life following an intervention3. Therefore, studies have justified the use of new scales by determining whether they are comparable with existing health-related quality-of-life measures. In patients who had a variety of illnesses, Brazier et al. demonstrated that the SF-6D is comparable with the EuroQol (EQ)-5D, which is a more commonly used measure of health-related quality of life that incorporates 5 dimensions (pain, anxiety and depression, self-care, usual activities, and mobility), each of which has 3 levels, allowing a total of 243 health states3,16,17. They concluded that there was significant agreement between the two, as the indices generated were within 0.05 of each other. In addition, the study by Sach et al. demonstrated that the SF-6D, EQ-5D, and EuroQol visual analog scale (VAS) were in agreement and provided comparable results15. Conversely, in the study by Feeny et al., the SF-6D was found to have low agreement with the Health Utilities Index, another quality-of-life measure13. However, one of the main advantages of using the SF-6D over other indices is that it is the only quality-of-life measure that allows the utilization of existing SF-36 data. This provides clinicians the opportunity to generate cost-utility analyses and to determine cost-effectiveness retrospectively. In addition, by using these outcomes, the SF-6D provides a bridge between clinician-assessed functional outcomes and the patient’s perceived improvement in health. Also, the SF-6D utilizes a much larger descriptive system than measures such as the EQ-5D, potentially allowing for a greater degree of sensitivity (18,000 compared with 243 health states).

Several concerns have been highlighted with the use of health-related quality-of-life measures. The SF-6D scores, or the use of utility values in general, are unfamiliar to many clinicians and most patients, and comprehending the clinical effect of a change in this score is likely to be difficult. In addition, improvements in quality of life are an abstract concept, and assigning numerical values to this may be challenging to grasp2,5. Furthermore, the preferences for the health states were determined by a sample population, and it can be argued that not all patients would assign the same preferences to the individual states, because of differences in backgrounds, life experiences, and postoperative expectations. In addition, there are several different methods to measure utility. Here, the standard gamble technique was used to derive health states; the time tradeoff and rating scale, or VAS, are also alternatively used18. However, despite these potential disadvantages, the SF-6D is one of the few straightforward, easily obtainable methods that provide clinicians quantifiable insight into a patient’s quality of life.

This study had several limitations. These patients had a short-term mean follow-up of 5 years, and we were thus unable to evaluate whether these improvements in quality of life were maintained after this period. Also, although the SF-36 is widely used, disadvantages include the lack of age-specific questions, and thus it is not clear whether it is equally appropriate across age levels. In addition, incorporating the use of utility tools, such as the SF-6D, into clinical practice is not without cost, and clinicians should be aware of this when using these for quality-of-life analyses. Furthermore, it may have been valuable to incorporate cost-effectiveness analyses of total hip arthroplasty using these SF-6D scores. However, our purpose was to report on the ease of use and clinical relevance of the SF-6D system in total hip arthroplasty, particularly as, to our knowledge, there have been few studies that have evaluated this. Ultimately, future studies should focus on using these values for cost-effectiveness analyses.

In conclusion, the SF-6D provides clinicians with a method of quantifying patient satisfaction and perception of their own health. This quality-of-life measure is particularly convenient in total joint arthroplasty, as it can be deduced from the SF-36, which is one of the most widely used evaluations of mental and physical health following a surgical procedure. Therefore, widely incorporating the SF-6D into future postoperative assessments is straightforward, and having these values readily available may make prospective cost-effectiveness analyses considerably easier.

Investigation performed at the Rubin Institute for Advanced Orthopedics, Center for Joint Preservation and Replacement, Sinai Hospital of Baltimore, Baltimore, Maryland

Disclosure: Stryker provided financial support for data analysis and manuscript production and the Accolade TMZF, manufactured by Stryker, was used as the stem design for all primary total hip arthroplasties performed in this study. This is an industry-sponsored study, and the data are stored at Stryker. On the Disclosure of Potential Conflicts of Interest forms, which are provided with the online version of the article, one or more of the authors checked “yes” to indicate that the author had a relevant financial relationship in the biomedical arena outside the submitted work (

Back to Top | Article Outline


1. Kurtz S, Ong K, Lau E, Mowat F, Halpern M. Projections of primary and revision hip and knee arthroplasty in the United States from 2005 to 2030. J Bone Joint Surg Am. 2007 ;89(4):780–5.
2. Walters SJ, Brazier JE. Comparison of the minimally important difference for two health state utility measures: EQ-5D and SF-6D. Qual Life Res. 2005 ;14(6):1523–32.
3. Brazier J, Roberts J, Tsuchiya A, Busschbach J. A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ. 2004 ;13(9):873–84.
4. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale: Routledge; 1988.
5. Elmallah RK, Cherian JJ, Kristiansen IS, Thingstad M, Henriksen JE, Kvien TK, Dagfinrud H. Determining health-related quality-of-life outcomes using the SF-6D preference-based measure in patients following total knee arthroplasty. J Arthroplasty. 2015 ;30(7):1150–3. Epub 2015 Feb 7.
6. Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF-36. J Health Econ. 2002 ;21(2):271–92.
7. Preedy VR, Watson RR.editors. Handbook of disease burdens and quality of life measures. New York: Springer; 2010.
8. Larner AJ. Effect size (Cohen’s d) of cognitive screening instruments examined in pragmatic diagnostic accuracy studies. Dement Geriatr Cogn Dis Extra. 2014 ;4(2):236–41. Epub 2014 Jul 3.
9. Saleh KJ, Mulhall KJ, Bershadsky B, Ghomrawi HM, White LE, Buyea CM, Krackow KA. Development and validation of a lower-extremity activity scale. Use for patients treated with revision total knee arthroplasty. J Bone Joint Surg Am. 2005 ;87(9):1985–94.
10. Söderman P, Malchau H. Is the Harris hip score system useful to study the outcome of total hip replacement? Clin Orthop Relat Res. 2001 ;384:189–97.
11. Collins LM, Schafer JL, Kam CM. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2001 ;6(4):330–51.
12. Osnes-Ringen H, Kvamme MK, Kristiansen IS, Thingstad M, Henriksen JE, Kvien TK, Dagfinrud H. Cost-effectiveness analyses of elective orthopaedic surgical procedures in patients with inflammatory arthropathies. Scand J Rheumatol. 2011 ;40(2):108–15. Epub 2011 Jan 17.
13. Feeny D, Wu L, Eng K. Comparing short form 6D, standard gamble, and Health Utilities Index Mark 2 and Mark 3 utility scores: results from total hip arthroplasty patients. Qual Life Res. 2004 ;13(10):1659–70.
14. Quintana JM, Escobar A, Bilbao A, Arostegui I, Lafuente I, Vidaurreta I. Responsiveness and clinically important differences for the WOMAC and SF-36 after hip joint replacement. Osteoarthritis Cartilage. 2005 ;13(12):1076–83. Epub 2005 Sep 9.
15. Sach TH, Barton GR, Doherty M, Muir KR, Jenkinson C, Avery AJ. The relationship between body mass index and health-related quality of life: comparing the EQ-5D, EuroQol VAS and SF-6D. Int J Obes (Lond). 2007 ;31(1):189–96. Epub 2006 May 9.
16. EuroQol Group. EuroQol—a new facility for the measurement of health-related quality of life. Health Policy. 1990 ;16(3):199–208.
17. Brooks R. EuroQol: the current state of play. Health Policy. 1996 ;37(1):53–72.
18. Tijhuis GJ, Jansen SJ, Stiggelbout AM, Zwinderman AH, Hazes JM, Vliet Vlieland TP. Value of the time trade off method for measuring utilities in patients with rheumatoid arthritis. Ann Rheum Dis. 2000 ;59(11):892–7.

Supplemental Digital Content

Back to Top | Article Outline
Copyright 2017 by The Journal of Bone and Joint Surgery, Incorporated