Scoring systems with which subjective and objective factors are evaluated together and the final result is rated categorically are commonplace in orthopaedics1-8. These systems reflect the patient's functional status from the surgeon's point of view. Patient-completed health status questionnaires have become popular as a means of evaluating function from the patient's point of view9-12. Comparison of physician-based scoring systems with health status instruments can facilitate the investigation of the causes of disagreement between the physician's and patient's points of view.
While using physician-based elbow scoring systems for the evaluation of patients in several scientific investigations, we observed that pain seemed to dominate the scoring systems. This was a cause of some concern because of the strong influence of psychosocial aspects of illness on the perception of pain13,14. If physician-based scoring systems do not adequately reflect objective measures of elbow function, they will not provide a basis for comparison of elbow function according to the physician's and patient's points of view, and they may undervalue objective improvements achieved by operative intervention.
In this study, we analyzed the influence of objective and subjective factors on the quantitative ratings of elbow function generated by several physician-based and patient-based rating systems in order to test the hypothesis that pain has a strong influence on these scores and that the rating systems are therefore less sensitive to objective measures of elbow dysfunction.
Materials and Methods
During a five-year period, health status data and elbow ratings were systematically gathered during evaluations of patients at various stages of recovery as part of various prospective and retrospective trials, all approved by our Human Research Committee. Inclusion criteria for the present study were (1) a previously sustained intra-articular fracture of the elbow that had been treated operatively, (2) an age of eighteen years or older, (3) a date of evaluation more than six months following the most recent surgery, and (4) no interpositional or prosthetic arthroplasty. One hundred and four patients satisfied these criteria and represent the study cohort.
There were fifty-two women and fifty-two men with an average age of forty-six years (range, eighteen to seventy-nine years). The right arm was involved in forty-five patients and the left arm, in fifty-nine. The dominant elbow was affected in seventy-five patients (72%). Sixty-six patients were employed outside the home at the time of the injury; forty-four performed desk-based work, and twenty-two were laborers. Six patients were unemployed, four were disabled, fourteen were retired, eleven were homemakers, one was a volunteer, and two were students.
The initial injury was the result of a fall from a standing height in forty-nine patients and the result of a higher-energy injury in fifty-five patients. The injuries included an intra-articular fracture of the distal part of the humerus in fifty patients, a fracture-dislocation of the elbow in thirty-one (a posterior fracture-dislocation of the olecranon in eleven and a posterior dislocation of the elbow with intra-articular fractures in twenty), a simple elbow dislocation in nine, and an isolated radial head fracture in fourteen. The patients were evaluated at a mean of fifty-eight months (range, six to 273 months) after the injury and forty-six months (range, six to 136 months) after the latest surgery.
An investigator who was not involved in the patient's care evaluated each patient according to the American Shoulder and Elbow Surgeons Elbow Evaluation (ASES)2. The evaluation consisted of an interview, a physical examination, radiographs, the completion of three physician-based elbow scoring systems (Mayo Elbow Performance Index [MEPI]3, Broberg and Morrey rating system4, and American Shoulder and Elbow Surgeons Elbow Evaluation [ASES]2), and the administration of an upper-extremity-specific health status questionnaire (Disabilities of the Arm, Shoulder and Hand [DASH]) and a general health status questionnaire (Short Form-36 [SF-36]). Arthrosis was graded according to the system of Broberg and Morrey4.
We employed the pain subscales of the ASES2 as a quantitative measure of pain for use in all analyses. Patients rated their pain on five Likert 11-point ordinal scales ranging from 0, indicating no pain, to 10, indicating the worst imaginable pain. The five scales were (1) pain when it is at its worst, (2) pain at rest, (3) pain when lifting a heavy object, (4) pain when doing a task with repeated elbow movements, and (5) pain at night. We added the scores of these five scales for a summary pain score ranging from 0 to 50 points, with 0 points indicating no pain.
Physician-Based Scoring Systems
The MEPI3 is one of the most commonly used physician-based elbow-rating systems. This index divides 100 points among a physician assessment of pain (45 points), ulnohumeral motion (20 points), stability (10 points), and the ability to perform five functional tasks (25 points). Pain is rated as none (45 points), mild (30 points), moderate (15 points), or severe (0 points) by the physician on the basis of an interview with the patient. The total score ranges from 5 to 100 points, with higher scores indicating better function. Categorical ratings are assigned, with 90 to 100 points considered to be excellent; 75 to 89 points, good; 60 to 74 points, fair; and <60 points, poor.
The rating system of Broberg and Morrey4 is also a 100-point system, based on motion (40 points), strength (20 points), stability (5 points), and pain (35 points). The physician rates pain as none (35 points); mild with activity but requiring no medication (28 points); moderate with or after activity (15 points); or severe at rest, requiring constant medication, and disabling (0 points). In the categorical rating, 95 to 100 points indicates an excellent outcome; 80 to 94 points, a good outcome; 60 to 79 points, a fair outcome; and <60 points, a poor outcome.
The ASES2 is a 100-point scale that combines an assessment of pain based on the patient's completion of five 11-point Likert scales (25 points); the patient's assessment, on five 11-point Likert scales, of the same five functional tasks used in the MEPI (30 points); ulnohumeral and radioulnar motion (30 points); strength (10 points); and stability (5 points). This instrument is therefore based partly on the perspective of the patient and partly on factors measured and evaluated by the physician. The scores range from 0 to 100 points, with higher scores indicating better function. No categorical ratings are assigned.
Health Status Questionnaires
The DASH questionnaire10 was developed by the American Academy of Orthopaedic Surgeons in collaboration with the Council of Musculoskeletal Specialty Societies and the Institute for Work and Health as an outcomes instrument specific to the upper extremity, and it is applicable to a wide variety of problems10. The questionnaire contains thirty items: twenty-one evaluate difficulty with specific tasks, five evaluate symptoms, and one each evaluates social function, work function, sleep, and confidence. The score ranges from 0 to 100 points, with higher scores indicating worse upper-extremity function.
The SF-369 is the most commonly used general health status measure. The physical (PCS) and mental (MCS) component summary scores were calculated and used for this analysis. Both component scores range from 0 to 100 points and are standardized to population norms. A score of 50 points is equal to the mean score for the general population. Every 10 points above or below 50 represents one standard deviation from the mean for the general population.
Continuous data are presented in terms of the mean, standard deviation, and range. Eleven demographic and clinical variables were examined with respect to each of the outcome instruments; these included age, gender, injury to the dominant side, distal humeral fracture, time since the last surgery, number of operations subsequent to the original treatment of the injury, total arc of flexion and extension, total arc of pronation and supination, ulnar neuropathy, arthrosis, and pain. The Pearson product-moment correlation coefficient was used to evaluate the association between continuous predictor variables and each outcome instrument as well as between the outcome instruments themselves. A power analysis (nQuery Advisor program, version 4.0; Statistical Solutions, Saugus, Massachusetts) indicated that a minimum sample size of 100 patients would provide 90% statistical power (β = 0.1) to detect a significant moderate correlation (absolute r ≥ 0.50) with use of a Pearson coefficient to correlate each health outcome score with the patients' subjective pain scores and a Bonferroni significance level of 0.01 to account for the different outcome instruments. Subsequent operations were defined according to three levels (index procedure only, one subsequent operation, and more than one subsequent operation), and the Spearman rho correlation was used to test the influence of this variable on each outcome score. In addition to measuring correlation, the univariate analysis involved Student t tests for comparing outcome scores between men and women and according to the presence or absence of categorical variables.
Multivariate analysis was based on two statistical approaches. First, multivariate analysis of variance was performed to identify which of the eleven variables were independently associated with scores on each outcome instrument in the entire cohort of 104 patients. The F test was used to judge the significance of each variable. A backward stepwise procedure was utilized, with testing of all eleven predictor variables as candidates to determine the final models, and goodness-of-fit was assessed with use of adjusted r squared15. A multivariate model containing the significant independent predictors was established for each health outcome of interest (Broberg and Morrey system, MEPI, ASES, DASH, SF-36 PCS, and SF-36 MCS). Since the arc of ulnohumeral motion and pain were found to be important predictors of each outcome score (with the exception of the SF-36 MCS), the second part of the statistical modeling approach was to construct predictive equations with use of these variables, with an ulnohumeral arc of 100° as the cutoff. Finally, on the basis of the results of the multiple regression analyses, the relationships of an individual patient's pain score and flexion-extension arc (<100° or ≥100°) with their DASH and MEPI outcome scores were depicted visually16. The data were analyzed with use of the SPSS software package (version 12.0; SPSS, Chicago, Illinois).
The scores (mean and standard deviation) for the physician-based measures were 81 ± 16 points (range, 30 to 100 points) for the MEPI, 85 ± 12 points (range, 50 to 100 points) for the Broberg and Morrey system, and 82 ± 15 points (range, 31 to 100 points) for the ASES. The scores for the patient-based evaluations were 20 ± 19 points (range, 0 to 73 points) for the DASH, 45 ± 11 points (range, 13 to 63 points) for the PCS of the SF-36, and 49 ± 8 points (range, 25 to 59 points) for the MCS of the SF-36.
Correlations Among Evaluation Instruments
Scores derived with the six evaluation instruments had moderate-to-high levels of correlation with one another. The raw scores derived with the elbow scoring systems (MEPI, Broberg and Morrey, and ASES) demonstrated excellent agreement (Pearson correlation coefficient, 0.86 < r < 0.89; p < 0.001). The health status instruments (DASH and SF-36) showed moderate-to-good correlation with the surgeon-based systems, with the upper-extremity-specific DASH scores showing stronger correlations (Pearson correlation coefficient, -0.65 < r < -0.81; p < 0.001) than those shown by the PCS (Pearson correlation coefficient, 0.55 < r < 0.61; p < 0.001) and the MCS (Pearson correlation coefficient, 0.48 < r < 0.55; p < 0.001) of the SF-36.
Predictors of MEPI Scores
In the univariate analysis, the MEPI was strongly correlated with the ASES pain scores (r = -0.82; p < 0.001) and moderately correlated with the range of flexion-extension (r = 0.40; p < 0.001), the range of pronation-supination (r = 0.38; p < 0.001), and the number of subsequent operations (rSpearman = -0.26; p < 0.01).
Multivariate analysis revealed age (F = 4.28; p < 0.05), flexion-extension (F = 7.34; p < 0.01), pronation-supination (F = 5.10; p < 0.05), and pain (F = 184.32; p < 0.001) to be independent predictors of MEPI scores (Table I). The model with these four variables accounted for 73% of the variability in the MEPI scores. A model including pain alone accounted for 66% of the variability in the MEPI scores. When pain was excluded from the model, the best model accounted for only 22% of the variability in the MEPI scores.
The relative importance of the independent variables of pain and motion is illustrated by the equation: Y[MEPI] = -1.1 × (ASES pain score) + 6 points (flexion-extension arc < 100°) or + 12 points (flexion-extension arc ≥100°) + 85 (Fig. 1).
Predictors of Broberg and Morrey Scores
The results of the univariate and multivariate analyses of the Broberg and Morrey evaluation system were similar to those of the MEPI. The univariate analysis showed a significant correlation with the number of subsequent operations (rSpearman = -0.27; p < 0.01), moderate correlation with flexion-extension (r = 0.54; p < 0.001) and pronation-supination (r= 0.50; p < 0.001), and strong correlation with the ASES pain score (r = -0.77; p < 0.001).
The independent predictors of the Broberg and Morrey score in the multivariate analysis were age (F = 4.86; p < 0.05), flexion-extension (F = 31.90; p < 0.001), pronation-supination (F = 21.89; p < 0.001), and pain (F = 178.38; p < 0.001) (Table I). The model with these three variables accounted for 79% of the variability in the Broberg and Morrey scores. The model with pain alone accounted for 59% of the variability in the Broberg and Morrey scores, whereas the model without pain accounted for 41% of the variability.
Predictors of ASES Scores
The ASES scores correlated with ASES pain scores (r = -0.76; p < 0.001), flexion-extension (r = 0.56; p < 0.001), pronation-supination (r = 0.49; p < 0.001), and number of subsequent operations (rSpearman = -0.37; p < 0.001) in the univariate analysis. The ASES was the only measure that showed significant correlation with arthrosis (t = 2.17; p < 0.05) and ulnar neuropathy (t = 2.46; p < 0.05) in the univariate analysis, with the presence of arthrosis or ulnar neuropathy associated with worse ASES scores.
In the multivariate analysis, only age (F = 9.19; p < 0.05), flexion-extension (F = 36.57; p < 0.001), pronation-supination (F = 20.28; p < 0.001), and pain (F = 178.00; p < 0.001) were independent predictors of the ASES scores (Table I). The multivariate model accounted for 79% of the variability in the ASES scores. When age and range of motion were excluded from the model, 57% of the variability in the ASES scores was accounted for by pain alone. A model without pain accounted for 41% of the variability in the ASES scores.
Predictors of DASH scores
Univariate analysis of continuous variables showed significant (p < 0.01) but only moderate correlation of DASH scores with the number of operations subsequent to the index procedure (rSpearman = 0.32), flexion-extension (r = -0.42; p < 0.001), and pronation-supination (r = -0.34; p < 0.01). The strongest correlation was between the DASH scores and the ASES pain scores (r = 0.61; p < 0.001). Univariate analysis of dichotomous variables showed a significant influence of ulnar neuropathy (t = 2.22; p = 0.03).
Multivariate analysis identified pain (F = 49.1; p < 0.001) and flexion-extension (F = 15.96; p < 0.001) as significant independent predictors of the DASH scores (Table II). The model with pain and flexion-extension accounted for 45% of the variability in the DASH scores, whereas the model with pain alone accounted for 36% of the variability in the DASH scores and the model with motion alone accounted for only 17% of the variability.
We used a cutoff point of 100° for the flexion-extension arc to derive the following equation to illustrate the independent relationship of pain and flexion-extension with the final DASH scores: Y[DASH] = 0.93 × (ASES pain score) - 5 points (flexion-extension arc < 100°) or - 10 points (flexion-extension arc ≥100°) + 17 (Fig. 2).
Predictors of SF-36 PCS Scores
Univariate analysis demonstrated correlations between the SF-36 PCS scores and flexion-extension (r = 0.22; p < 0.05), pronation-supination (r = 0.29; p < 0.05), number of subsequent operations (rSpearman = -0.25; p < 0.05), and pain (r = -0.57; p < 0.001).
In the multivariate analysis, older age (F = 18.8; p < 0.001), a smaller pronation-supination arc (F = 4.4; p < 0.05), and a higher ASES pain score (F = 57.4; p < 0.001) were identified as significant independent predictors of a worse PCS score (Table II). This model accounted for 45% of the variability in the PCS scores. A model with pain alone accounted for 32% of the variability in the PCS scores, and the model without pain accounted for 12% of the variability.
Predictors of SF-36 MCS Scores
The SF-36 MCS scores had a moderate correlation with pain (r = -0.50; p < 0.001) and pronation-supination (r = 0.21; p < 0.05) in the univariate analysis.
The multivariate analysis showed that pain also dominated the MCS scores. The model with pain (F = 47.6; p < 0.001) and age (F = 13.9; p < 0.001) accounted for 35% of the variability in the MCS scores, but only 3% of the variability in the scores was accounted for when pain was removed from the model.
These data confirm our hypothesis that pain dominates measures of elbow function and health status. Pain was the strongest predictor of all physician and patient-based scores. Objective factors alone were much poorer predictors of final scores and added relatively little predictive value to that provided by pain alone in most of the multivariate statistical models. These statements are most applicable to patients who have recovered from intra-articular elbow trauma (the focus of this study), but they are likely to be generalizable to other elbow conditions.
Our data agree with those of Turchin et al.17, who compared five physician-derived elbow-scoring systems and reported a lack of agreement in the categorical rankings derived with those systems but good correlation among the raw scores. Our finding of greater responsiveness of the upper-extremity-specific health status measure (DASH) compared with that of the general health status measure (SF-36) with regard to detecting clinical change in the function of the upper extremity has also been noted previously17-19. In our study, the surgeon-based scores showed moderate correlation with the health status measures, with a better correlation with the upper-extremity-specific measures than with the general health status measures. One interpretation of those findings is that the various instruments have good agreement with each other and seem to be measuring elbow function consistently. An alternative interpretation, supported by our analysis, is that the instruments are all being driven and dominated by the variable of pain.
The strong influence of pain on outcome measures has been noted for other upper-extremity conditions. Tomaino et al. noted that a satisfactory postoperative function score following limited wrist fusion was more dependent on pain relief than on residual motion20. Midha et al. found two significant predictors (p < 0.01) of various outcome measures for assessing patients with ulnar neuropathy at the elbow: pain accounted for 60% of the variation in the scores whereas objective measures of strength and function accounted for only 17% of the variability in the scores21. Karnezis and Fragkiadakis identified grip strength as the only significant objective predictor of posttraumatic wrist function (p < 0.01) as assessed with a wrist-specific health status measure22. In the absence of neuromuscular disorders, grip strength has a strong relationship with pain.
The experience and expression of pain are strongly influenced by psychological and sociological factors and are not always explained by objective factors13,14. As a result of the strong influence of pain on elbow ratings and health status measures, objective improvements in elbow function achieved by operative procedures may be devalued by the use of these systems. For example, a patient with a complex fracture-dislocation of the elbow who regains nearly normal motion, strength, and stability would be considered to have an excellent result according to the standards of any orthopaedic surgeon, but several such patients seen in our practices and evaluated in our research had poor standardized elbow scores and health status ratings because of substantial pain. Some patients had a clear secondary gain issue such as a lawsuit, an insurance claim, or narcotic dependence that we believed to cause the discrepancy between the objective and subjective evaluations of the results. The reasons for the discrepancy were less clear for other patients, but they seemed to be related to less obvious psychosocial factors such as psychiatric illnesses (e.g., anxiety or depression), stress and dissatisfaction (at work or of a personal nature), or a maladaptive personality and poor coping mechanisms. These factors have been noted in patients with chronic pain and idiopathic pain13,14,23,24.
We suggest the following as an alternative to elbow scoring systems that combine objective and subjective factors. The results of reconstructive elbow procedures can often be evaluated on the basis of one primary objective goal such as the restoration of elbow mobility. The change in the upper-extremity health status could then be compared with the change in the objective outcome measure (e.g., elbow motion). Any discrepancies could then be evaluated to determine what is preventing the patient from feeling that the elbow is more functional when improvements in objective measures of function (e.g., increased motion) have been observed. In other words, this approach would facilitate an evaluation of the relationship between achieving the goal of surgery (e.g., motion) and improvement in function (e.g., upper-extremity-specific health status), while accounting independently for other potentially important objective factors (such as instability, arthrosis, or ulnar neuropathy) and subjective factors (such as pain, depression, anxiety, and maladaptive coping skills or personality disorders) with use of multivariate statistical techniques. ▪
In support of their research or preparation of this manuscript, one or more of the authors received grants or outside funding from the AO Foundation (unrestricted grant), Joint Active Systems, Fulbright Scholarship, and Dutch Anna Fonds Scholarship. None of the authors received payments or other benefits or a commitment or agreement to provide such benefits from a commercial entity. No commercial entity paid or directed, or agreed to pay or direct, any benefits to any research fund, foundation, educational institution, or other charitable or nonprofit organization with which the authors are affiliated or associated.
Investigation performed at the Orthopaedic Hand and Upper Extremity Service, Massachusetts General Hospital, Boston, Massachusetts
1. , Kay SP. The Hand Injury Severity Scoring System. J Hand Surg [Br]. 1996;21: 295-8.
2. , Richards RR, Zuckerman JD, Blasier R, Dillman C, Friedman RJ, Gartsman GM, Iannotti JP, Murnahan JP, Mow VC, Woo SL. A standardized method for assessment of elbow function. Research Committee, American Shoulder and Elbow Surgeons. J Shoulder Elbow Surg. 1999;8: 351-4.
3. , editor. The elbow and its disorders. 2nd ed. Philadelphia: Saunders; 1993.
4. , Morrey BF. Results of delayed excision of the radial head after fracture. J Bone Joint Surg Am. 1986;68: 669-74.
5. , editor. Guides to the evaluation of permanent impairment. 4th ed. Chicago: American Medical Association; 1993.
6. , van den Ende CH, Eygendaal D, Jolie IM, Hazes JM, Rozing PM. Clinical reliability and validity of elbow functional assessment in rheumatoid arthritis. J Rheumatol. 1999;26: 1909-17.
7. , Hanker G, Bayer M. Repair of the rotator cuff. End-result study of factors influencing reconstruction. J Bone Joint Surg Am. 1986;68: 1136-44.
8. , Liang MH, Daltroy L, Rudicel S, Richmond J. American Academy of Orthopaedic Surgeons lower limb outcomes assessment instruments. Reliability, validity, and sensitivity to change. J Bone Joint Surg Am. 2004;86: 902-9.
9. , Snow KK, Kosinski M, Gandek B. SF-36 health survey: manual and interpretation guide. Boston: The Health Institute; 1993.
10. , Amadio PC, Bombardier C. Development of an upper extremity outcome measure: the DASH (disabilities of the arm, shoulder and hand) [corrected]. The Upper Extremity Collaborative Group (UECG). Am J Ind Med. 1996;29:602-8. Erratum in: Am J Ind Med. 1996;30: 372.
11. , Geyer M. The subjective shoulder rating system. Arch Orthop Trauma Surg. 1997;116: 324-8.
12. , Alvarez C, Griffin S. The development and evaluation of a disease-specific quality-of-life questionnaire for disorders of the rotator cuff: The Western Ontario Rotator Cuff Index. Clin J Sport Med. 2003;13: 84-92.
13. , Renier CM, Palcher JA. Chronic pain, depression, and quality of life: correlations and predictive value of the SF-36. Pain Med. 2003;4: 331-9.
14. , Martelli MF, Baker JM. Psychological assessment of persons with chronic pain. NeuroRehabilitation. 2000;14: 69-83.
15. , Secic M. How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. Philadelphia: American College of Physicians; 1997. p 81-92.
16. , Berry G, Matthews JNS. Statistical methods in medical research. 4th ed. Malden, MA: Blackwell Publishing; 2001.
17. , Beaton DE, Richards RR. Validity of observer-based aggregate scoring systems as descriptors of elbow pain, function, and disability. J Bone Joint Surg Am. 1998;80: 154-62.
18. , Richards RR. Measuring function of the shoulder. A cross-sectional comparison of five questionnaires. J Bone Joint Surg Am. 1996;78: 882-90.
19. , Richards RS, Donner A, Bellamy N, Roth JH. Responsiveness of the short form-36, disability of the arm, shoulder, and hand questionnaire, patient-rated wrist evaluation, and physical impairment measurements in evaluating recovery after a distal radius fracture. J Hand Surg [Am]. 2000;25: 330-40.
20. , Miller RJ, Burton RI. Outcome assessment following limited wrist fusion: objective wrist scoring versus patient satisfaction. Contemp Orthop. 1994;28: 403-10.
21. , Noble J, Patel V, Ho PH, Munro CA, Szalai JP. Prospective analysis of relationships of outcome measures for ulnar neuropathy at the elbow. Can J Neurol Sci. 2001;28: 239-44.
22. , Fragkiadakis EG. Association between objective clinical variables and patient-rated disability of the wrist. J Bone Joint Surg Br. 2002;84: 967-70.
23. , Bondegaard Thomsen A, Olsen AK, Sjogren P, Bech P, Eriksen J. Pain epidemiology and health related quality of life in chronic non-malignant pain patients referred to a Danish multidisciplinary pain center. Pain. 1997;73: 393-400.
24. , Jacobsson LT, Herrstrom P, Petersson IF. Health status as measured by SF-36 reflects changes and predicts outcome in chronic musculoskeletal pain: a 3-year follow up study in the general population. Pain. 2004;108: 115-23.