The American Academy of Orthopaedic Surgeons and the Société Internationale de Chirurgie Orthopèdique et de Traumatologie recommend that an assessment of clinical complications, a physical examination of the hip, radiographic studies, and an assessment of well-being (pain, gait, some activities of daily living, and overall satisfaction) as reported by the patient be included in any outcome studies. A disease specific measure should be included in all studies of outcome of the hip arthroplasty. 16 Harris 11 introduced a rating scale with a maximum of 100 points, including the domains of pain, function, deformity, and motion. The Harris hip score was compared with the Larson and Shepard system, and it was found “reproducible and reasonably objective.”11 Despite that the Harris hip score is one of the most widely used scoring systems, only a few minor validity tests and no reliability tests have been presented. 6,16,25 These earlier studies of the validity of the Harris hip score did not use modern definitions of validity, and the only validity tests that were done concerned the construction of the Harris hip score compared with other scores. 6,16,25 The aim of this study was to assess whether the Harris hip score is a valid, reproducible instrument, providing information about the clinical outcome of total hip replacement.
MATERIALS AND METHODS
The Western Ontario and McMaster University Osteoarthritis Index is a disease-specific, self-administered health measure developed to study patients with osteoarthritis in the hip or knee who have had arthroplasty or have been treated by nonsurgical intervention. 3,25 The index consists of the domains of pain, stiffness, and physical function. The Western Ontario and McMaster University Osteoarthritis Index is reliable, valid, and sensitive for detecting clinically important changes in health status after the surgical interventions (responsiveness). 2,3,17,19,23 The three domains in the Western Ontario and McMaster University Osteoarthritis Index can be analyzed separately or aggregated into one score. Each question consists of five response alternatives with corresponding scores of 0 to 4 points. The maximum score in the Likert version is 20 points for pain, eight for stiffness, and 68 points for physical function. In the current study, the results for the relative importance of the different domains were converted to a 0-to 100-point score for each specific domain. To facilitate comparison with the Medical Outcomes Study 36-Item Short-Form Health Survey and the Harris hip score, the score was inverted. The inversion implies that the maximum score is 100 points when the patient has a minimum of pain and stiffness and optimal function.
The Medical Outcomes Study 36-Item Short-Form Health Survey is a generic, self-administered questionnaire. It has been used for measuring effects that could be a direct function of disease and treatment, that is health related quality of life. 5,27 The Medical Outcomes Study 36-Item Short-Form Health Survey consists of 36 questions divided into eight domains: physical function, social function, role-physical, role-emotional, bodily pain, general health, mental health, and vitality. The Medical Outcomes Study 36-Item Short-Form Health Survey was translated into Swedish and tested for validity and reliability by Sullivan et al. 25 The raw score was transformed to a 0 to 100 scale, as recommended by the Swedish manual, with a high value indicating a better result. The Medical Outcomes Study 36-Item Short-Form Health Survey is the most frequently used health status measure in North America and has been used after total joint replacement in the hip and knee. 17,20,27
The Harris hip score is a disease-specific test used to provide an evaluation system for various hip disabilities and methods of treatment. 11 This rating system is not self-administered (staff administered). The Harris hip score gives a maximum of 100 points and the domains include pain, function, deformity, and motion. Pain and function were the two basic considerations and received the heaviest weighting (44 and 47 points). Range of motion (ROM) and deformity are seldom of primary importance and thus received five and four points, respectively. Function was subdivided into activities of daily living (14 points) and gait (33 points). A total Harris hip score below 70 points was considered a poor result, 70 to 80 fair, 80 to 90 good, and 90 to 100 excellent.
The Statistical Package for the Social Sciences version 8.0 for Windows (SPSS Inc, Chicago, IL) was used for the statistical analyses.
Questionnaires with good content validity are expected to have fewer answers that scores the lowest and highest possible score (floor and ceiling effects). 17 Content validity was tested by directly comparing the Harris hip score, the Western Ontario and McMaster University Osteoarthritis Index, and the Medical Outcomes Study 36-Item Short-Form Health Survey and by studying floor and ceiling effects, and mean, median, and standard deviation values in the individual domains.
Construct validity can be evaluated by correlating the questionnaire scores with the characteristics of the patients (comorbidity, perceived overall health status, and changes in activities). Patients who have high comorbidity, poorer perceived overall health status, or changes in activities are expected to receive a poorer score; this generally is found with generic questionnaires. Pearson’s correlation between total score of the instrument that was tested (the Harris hip score) compared with the domain of interest in other questionnaires (such as function in the Western Ontario and McMaster University Osteoarthritis Index and the Medical Outcomes Study 36-Item Short-Form Health Survey) was considered significant at the 1% level (p < 0.01). 17
Total scores for the Harris hip score, Western Ontario and McMaster University Osteoarthritis Index, and the Medical Outcomes Study 36-Item Short-Form Health Survey were calculated for all patients. They also were calculated in patients, with the Medical Outcomes Study 36-Item Short-Form Health Survey as the gold standard, where the material was divided into one group scoring less than 70 and one group scoring more than 70. Male, female, 2-and 10-years postoperatively, and age older and younger than 70 years also were calculated according to the total score. The patients were grouped according to the Charnley classification: A, one hip affected; B, both hips affected; and C, multiple joint disease or other disabilities leading to difficulties in walking. 8 Differences among these groups were compared using the Mann-Whitney U test.
Construct validity also was tested for divergent validity and convergent validity in the domains of pain and physical function and total score using Pearson’s and Spearman’s correlation coefficients. The hypothesis was that the same domains should have a higher correlation between each other, for example pain in the Western Ontario and McMaster University Osteoarthritis Index, the Medical Outcomes Study 36-Item Short-Form Health Survey, and the Harris hip score, than with other domains, such as function.
Criterion validity is present when the scores correlate with an accepted measure (gold standard) of the condition being evaluated. An acceptable level of Spearman rho for criterion validity is more than 0.40 and p < 0.001. The Medical Outcomes Study 36-Item Short-Form Health Survey is one of the best validated and commonly used general scoring systems for many conditions such as total hip replacement. 9,17 Thus, the Medical Outcomes Study 36-Item Short-Form Health Survey was used as the gold standard in the current study.
To study test and retest reliability, 58 patients were evaluated two times within 4 weeks using the three questionnaires. The patients answered the items in the Western Ontario and McMaster University Osteoarthritis Index and the Medical Outcomes Study 36-Item Short-Form Health Survey the same week as a physician and a physiotherapist examined them using the Harris hip score. Total score, domains, and items were calculated with Pearson’s and Spearman’s correlation coefficients.
Interobserver reliability between physicians and physiotherapists was tested. The patients were divided into two groups, one physician and one physiotherapist examining each group. The procedure was repeated after 4 weeks. Goodman-Kruskal’s gamma, Pearson’s and Spearman’s correlations for items, domains, and total score of the Harris hip score were calculated. Classification of patients according to Charnley, measurement of exact bone length, and the Trendelenburg test are not parts of the Harris hip score but have been used commonly in clinical practice and thus were included in the study of interobserver reliability.
Internal consistency reliability was tested for the questionnaires and within the different domains in the Western Ontario and McMaster University Osteoarthritis Index, the Medical Outcomes Study 36-Item Short-Form Health Survey, and the Harris hip score. Cronbach’s alpha coefficient was used for this purpose.
Pearson’s correlation coefficient was used to compare domains in the Western Ontario and McMaster University Osteoarthritis Index. The following guidelines were used to interpret the correlation coefficients (r): poor correlation (r < 0.3), moderate correlation (0.3 < r < 0.6), good correlation (0.6 < r < 0.8), and excellent correlation (r > 0.8). 3
The study consisted of two cohorts. Cohort I, consisting of 62 patients, not related to the investigators, was selected randomly by a computer algorithm from Sahlgrenska University Hospital for inclusion in the study. Four patients were excluded because of intellectual deficiencies (stroke and language difficulties). Eighteen patients had total hip arthroplasty 10 years ago, using the Charnley, Lubinus IP, Christiansen, or Exeter polished prostheses. Forty patients had undergone surgery 2 years earlier using the Spectron prosthesis. All patients were asked to fill in the Western Ontario and McMaster University Osteoarthritis Index and the Medical Outcomes Study 36-Item Short-Form Health Survey the same week as they were evaluated using the Harris hip score. The mean age was 70.6 years (range, 52–86 years), and 20 were men. The 2-and 10-year groups had the same proportion of men and women, and the mean ages for each group were 73.0 and 69.6 years, respectively.
Cohort II, consisting of 1056 patients who had undergone total hip replacement in Sweden, 2–10 years earlier, with equal distribution of patients per year, was selected randomly from the Swedish National Board of Health and Welfare’s Discharge register. The patients were asked to complete the Western Ontario and McMaster University Osteoarthritis Index and the Medical Outcomes Study 36-Item Short-Form Health Survey. From this cohort, a stratified selection (age and gender matched) of 344 patients from nine cities were examined by one independent physiotherapist and one independent physician. The most commonly used prostheses were the Charnley, the Lubinus SP II, and the Scan Hip implants. The mean age at followup was 75 years (range, 33–97 years), and 46% were men. Eighty-six percent had surgery for osteoarthritis. Power analysis predicted that, with 35 patients, there should be at least a 70% chance of detecting a correlation between health status instrument score differences if one existed, assuming a Pearson’s r value of 0.40. 19
All 58 patients in Cohort I answered the full questionnaire, but a few patients did not answer some of the individual items (as many as 10%). The response rates in the national Cohort (II) were 93% for the Western Ontario and McMaster University Osteoarthritis Index and 84% for the Harris hip score.
The content of the Harris hip score and the Western Ontario and McMaster University Osteoarthritis Index is closely related (pain, function, and stiffness), whereas the Medical Outcomes Study 36-Item Short-Form Health Survey contains additional domains (vitality, social function, mental health, sleep, and role-function). Few floor values were seen, but there were several domains that contained ceiling values. The domains in the Harris hip score showed high ceiling values, and the mean score for deformity was the maximum (100 points). The mean score for pain was higher in each questionnaire than that for function, and the generic test (the Medical Outcomes Study 36-Item Short-Form Health Survey) showed the lowest mean values for pain, function, and total score.
When comparing the difference between patients with values higher and lower values than 70, when using the Medical Outcomes Study 36-Item Short-Form Health Survey as a gold standard, the generic instrument gave a significantly higher difference than the disease specific questionnaires (p < 0.001). The same difference between the generic instrument (the Medical Outcomes Study 36-Item Short-Form Health Survey) and the disease specific questionnaires (the Western Ontario and McMaster University Osteoarthritis Index and the Harris hip score) was seen for gender and the Charnley categories (p < 0.01). However, there were no significant differences among the three questionnaires regarding age and years since surgery. These results indicate that disease-specific instruments (the Western Ontario and McMaster University Osteoarthritis Index and the Harris hip score) show the clinical results of hip surgery without influence of general health, such as high comorbidity.
The pain domain in the Harris hip score correlated at least as well as with pain in the Western Ontario and McMaster University Osteoarthritis Index and the Medical Outcomes Study 36-Item Short-Form Health Survey as with function in the Western Ontario and McMaster University Osteoarthritis Index and the Medical Outcomes Study 36-Item Short-Form Health Survey. The same results were obtained when comparing the domains of function in the three scores with pain domains (Tables 1, 2). That is, corresponding domains correlate better than different domains, supporting high construct validity. The Western Ontario and McMaster University Osteoarthritis Index had the highest domain total score values (r = 0.91 for pain and r = 0.93 for function in Cohort I). In general, the disease-specific scores correlated better with each other than with the Medical Outcomes Study 36-Item Short-Form Health Survey.
The Spearman’s rho was more than 0.40 when the total score for the Medical Outcomes Study 36-Item Short-Form Health Survey (used as the gold standard) and the Harris hip score (0.69) or the Western Ontario and McMaster University Osteoarthritis Index (0.73) were correlated (Tables 1, 2). This also was true for the correlation between the Medical Outcomes Study 36-Item Short-Form Health Survey domains and the same domains in the Harris hip score or the Western Ontario and McMaster University Osteoarthritis Index (Spearman’s rho range, 0.41–0.67).
Total score, domains, and each item were studied for test and retest reliability in Cohort I with Pearson’s and Spearman’s correlation coefficients between two examinations, 3 to 4 weeks apart. In the Harris hip score, physicians’ and physiotherapists’ evaluations were studied. The reliability for total score was excellent for the self-administered questionnaires (the Western Ontario and McMaster University Osteoarthritis Index r = 0.90 and the Medical Outcomes Study 36-Item Short-Form Health Survey r = 0.92) and for the Harris hip score (physician r = 0.94, physiotherapist r = 0.95).
The reliability in domains also was high, and the disease-specific tests showed the highest values. For the Harris hip score, the physiotherapist and orthopaedic surgeon showed excellent test and retest reliability in the domains of pain (0.93 and 0.98, respectively) and function (0.95 and 0.93, respectively). The correlation was significant at the 0.01 level (two-tailed). Deformity and ROM showed high values, but when items were compared in these domains there was no significant correlation in the motion, except for flexion and abduction. No correlation could be found for deformity because all patients received a full score. The sitting item (can the patient sit comfortably in a chair for 1 or ½ hour, or is he or she unable to sit comfortably?) showed the lowest significant value, and pain had the highest value.
The Trendelenburg test and bone length evaluation (5-mm intervals) showed moderate correlation when measured by the physician (0.58 and 0.49, respectively) and by the physiotherapist (0.37 and 0.55, respectively). These tests were performed according to Harris’ original article. 11
The results showed good to excellent correlation (range, 0.74–1.00) for interobserver reliability. For the items, sitting had the lowest value in the function domain (0.67). In the motion domain, flexion and abduction showed significant correlation between the physician and physiotherapist, although there was no correlation for the other items in the motion domain. Correlation in the deformity domain was not computed because deformity was constant.
Cronbach’s alpha coefficient showed high internal consistency reliability in each domain. The Western Ontario and McMaster University Osteoarthritis Index test received the highest values. The deformity domain was not computed because the scale had zero variance. Range of motion showed the lowest value (0.52).
The results in the studied cohorts indicate that all scoring systems used, including the Harris hip score, have high validity and are reproducible and reliable. An earlier study showed that the responsiveness of disease-specific tests was higher than that of generic questionnaires. 22 However, the problem is that many of the disease-specific scores have not been well established regarding validity and reliability, including the Harris hip score. 1,7,16,21,26 Convergent validity was shown by McGrory et al 18,19 between hip motion and the Harris hip score items put on socks and tie a shoe and between the Harris hip score item ROM and the Western Ontario and McMaster University Osteoarthritis Index function score (p < 0.05; n = 28). High correlation also was found between the Harris hip score and the Western Ontario and McMaster University Osteoarthritis Index pain and function domains (p = 0.001). 18,19
Several recently presented disease specific tests (Murray’s 12-item score, Johansson’s score, and the Western Ontario and McMaster University Osteoarthritis Index) have been validated compared with the older ones (Harris hip score) before routine clinical use. 3,10–12 The Western Ontario and McMaster University Osteoarthritis Index probably is the most widely tested scoring system and has been translated into several languages. 3 However, some of the older scoring systems have been used extensively, and the Harris hip score is the most widely used hip questionnaire throughout the world.
This study showed there were no major differences in the content validity among the scores. The hypothesis that a generic questionnaire would be more sensitive to changes in general health was supported regarding gender, Charnley classification, and how severe the patient’s problem was as measured with the Medical Outcomes Study 36-Item Short-Form Health Survey as a reference. Differences in patient age and followup did not vary, indicating that the Harris hip score had the same responsiveness as the Western Ontario and McMaster University Osteoarthritis Index and the Medical Outcomes Study 36-Item Short-Form Health Survey. However, the study was too small for measuring responsiveness, which should be measured in the same patients before and after surgery. For interobserver reliability, multicenter testing with the same patients being evaluated by several healthcare professionals would be more appropriate.
Concerning different parts of the Harris hip score, pain and function had high validity and reliability. One exception was sitting, which had low values in several of the validity and reliability tests in this study. This also was detected in the German version of the Western Ontario and McMaster University Osteoarthritis Index. 24 Harris developed the deformity domain for patients with severe deformity after traffic accidents. 11 Patients who had total hip arthroplasty seldom have severe deformity after surgery, so it was not possible to measure this domain. All patients scored four points in this sample of 58 patients who had total hip arthroplasty (100% ceiling values). The ROM domain showed high ceiling values (84%), and the only part of the domain that scored significant reliability at an acceptable level was hip flexion. Extension, rotation, abduction, and adduction had low or no correlation in construct validity, test and retest reliability, interobserver reliability, and internal consistency reliability. For patients who had undergone modern total hip replacement procedures, the deformity and motion domains in the Harris hip score seem unnecessary. For specific categories (infected hip prosthesis, where the prosthesis has been extracted), the deformity and motion domain in the Harris hip score may be of value to the orthopaedic surgeon. Harris considered this to be the case in a 1969 article. 11
The value of using correct correlation coefficients (Pearson’s r and Spearman’s rho have been used most often) has been discussed in numerous articles. 3,4,6,9,13–15,24,26 In the current study, a scatterplot showed that the distribution of the material was skewed and Spearman’s rho should be used. The results also showed that the Gamma coefficient should be interpreted carefully.
A generic instrument should be used when results of hip arthroplasty are compared with results of other interventions. Disease specific measures focus on the disorder under consideration and the patient’s problems related to it and thus may be more relevant to the patient and the physician than are generic instruments and be better at detecting the effect of treatment. 4 The Harris hip score is the most widely used scoring system for evaluating hip arthroplasty. The current study indicates high validity and reliability for the Harris hip score. The motion domain had lower reliability, but this domain contributes a maximum of only four points to the total score. The Harris hip score can be used for total score as it is.
1. Andersson G: Hip assessment: A comparison of nine different methods. J Bone Joint Surg 54B:621–625, 1972.
2. Bellamy N, Buchanan W, Goldsmith CH, Campbell J, Stitt LW: Validation study of WOMAC: A health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip and the knee. J Rheumatol 15:1833–1840, 1988.
3. Bellamy N, Wells G, Campbell J: Relationship between severity and clinical importance of symptoms in osteoarthritis. Clin Rheumatol 10:138–143, 1991.
4. Bombardier C, Melfi CA, Paul J, et al: Comparison of a generic and a disease-specific measure of pain and physical function after knee replacement surgery. Med Care 33(Suppl 4):AS131–AS144, 1995.
5. Brazier JE, Harper R, Jones, et al: Validating the SF-36 health survey questionnaire: New outcome measure for primary care. Br Med J 305:160–164, 1992.
6. Bryant MJ, Kernohan JR, Nixon JR, Mollan RAB: A statistical analysis of hip scores. J Bone Joint Surg 75B:705–709, 1993.
7. Callaghan JJ, Dysart SH, Savory CF, Hopkinson WJ: Assessing the results of hip replacement. A comparison of five different rating systems. J Bone Joint Surg 72B:1008–1009, 1990.
8. Charnley J: Numerical Grading of Clinical Results. In Charnley J (ed). Low Friction Arthroplasty of the Hip. Berlin, Springer-Verlag 23–24, 1997.
9. Chetter IC, Spark JI, Dolan P, Scott DJA, Kester RC: Quality of life analysis in patients with lower limb ischaemia: Suggestion for European standardization. Eur J Vasc Endovasc Surg 13:597–604, 1997.
10. Dawson J, Fitzpatrick R, Carr A, Murray D: Questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg 78B:185–190, 1996.
11. Harris WH: Traumatic arthritis of the hip after dislocation and acetabular fractures: Treatment by mold arthroplasty. J Bone Joint Surg 51A:737–755, 1969.
12. Johanson NA, Charlson ME, Szatrowski TP, Ranawat CS: A self-administered hip-rating questionnaire for the assessment of outcome after total hip replacement. J Bone Joint Surg 74A:587–597, 1992.
13. Katz JN, Phillips CR, Poss R, et al: The validity and reliability of a total hip arthroplasty outcome evaluation questionnaire. J Bone Joint Surg 77A:1528–1534, 1995.
14. Katz JN, Phillips CR, Poss R, et al: Correspondence. J Bone Joint Surg 78A:1445–1448, 1996.
15. Krabbe PFM, Essink-Bot M-L, Bonsel GJ: On the equivalence of collectively and individually collected responses. Med Decis Making 16:120–132, 1996.
16. Laupacis A, Bourne R, Rorabeck C, et al: The effect of elective total hip replacement on health-related quality of life. J Bone Joint Surg 75A:1619–1626, 1993.
17. Martin DP, Engelberg R, Agel J, Swiontkowski F: Comparison of the musculosketal function assessment questionnaire with the Short Form-36, the Western Ontario and McMaster Universities Osteoarthritis Index, and the Sickness Impact Profile Health-Status Measures. J Bone Joint Surg 79A:1323–1335, 1997.
18. McGrory BJ, Freiberg AA, Shinar AA, Harris WH: Correlation of measured range of motion following total hip arthroplasty and responses to a questionnaire. J Arthroplasty 11:565–571, 1996.
19. McGrory BJ, Harris WH: Can the Western Ontario and McMaster Universities (WOMAC) Osteoarthritis Index be used to evaluate different hip joints in the same patients? J Arthroplasty 11:841–844, 1996.
20. Ritter MA, Albohm MJ, Keating M, Faris PM, Meding JB: Comparative outcomes of total joint arthroplasty. J Arthroplasty 10:737–741, 1995.
21. Ritter MA, Fechtman RW, Keating EM, Faris PM: The use of hip score for evaluation of the results of total hip arthroplasty. J Arthroplasty 5:187–189, 1990.
22. Rorabeck CH, Bourne RB, Laupacis A, et al: A double-blind study of 250 cases comparing cemented with cementless total hip arthroplasty. Clin Orthop 298:156–164, 1994.
23. Söderman P, Malchau H: Validity and reliability of Swedish WOMAC Osteoarthritis Index. A self-administrated disease-specific questionnaire (WOMAC) versus generic instruments (SF-36 and NHP). Acta Orthop Scand 710:39–46, 2000.
24. Stucki G, Meier D, Stucki S, et al: Evaluation einer Deutschen version des WOMAC (Western Ontario und McMaster Universities) arthroseindex. Z Rheumatol 55:40–49, 1996.
25. Sullivan M, Karlsson J, Ware JR: The Swedish SF-36 Health Survey—I. Evaluation of data quality, scaling assumptions, reliability and construct validity across general populations in Sweden. Soc Sci Med 41:1349–1358, 1995.
26. Sun Y, Sturmer T, Gunther KP, Brenner H: Reliability and validity of clinical outcome measurements of osteoarthritis of the hip and knee—A review of the literature. Clin Rheumatol 16:185–198, 1997.
27. Ware JE, Sherbourne CD: The MOS 36-Item Short-Form Health Survey (SF-36). Med Care 30:473–483, 1992.