Journal Logo

Perspectives on Modern Orthopaedics

Evidence-based Orthopaedic Surgery: What Is Evidence Without the Outcomes?

Suk, Michael MD, JD, MPH; Norvell, Daniel C. PhD; Hanson, Beate MD, MPH; Dettori, Joseph R. PhD, MPH; Helfet, David MD

Author Information
Journal of the American Academy of Orthopaedic Surgeons: March 2008 - Volume 16 - Issue 3 - p 123-129
  • Free


More than 200 general orthopaedic musculoskeletal outcomes instruments (ie, questionnaires, tools) are being used for research or clinical purposes,1 with more than 100 measures applicable to the spine alone.2 These outcomes instruments, which are intended to assess function and quality of life in patients presenting with orthopaedic conditions, can play an important role in the development of new procedures, techniques, and protocols, as well as provide some measure of quality. However, the musculoskeletal literature is filled with clinical justifications based on outcome results (eg, excellent, good, poor) that can be difficult to verify at best and misleading at worst. Further, without a common language and standards, it is nearly impossible to adequately compare results. Important considerations in assessing outcomes in the field of orthopaedics include selecting the appropriate outcomes measure, determining the critical components of an outcomes instrument, understanding the different types and significance of the outcomes measures, and knowing how to evaluate currently recommended outcome measures. Continued outcomes research is needed to provide orthopaedic surgeons with a standard approach for the review and selection of appropriate outcomes measures.

Current Challenges

Results described as “good” or “excellent” can be heard in physicians' offices and rehabilitation clinics and are commonly found in the musculoskeletal literature. But determining the exact meaning of these words and their relationship to reproducible results can be difficult.

In an attempt to move away from the vernacular and toward the scientific, researchers and clinicians have developed a variety of outcomes instruments to collect relevant data and to provide an objective basis for outcomes results. Terms such as good and excellent are often used to describe a spectrum of outcomes that might reflect a mix of objective (clinician-based) results and subjective (patient-reported) scores, which is confusing, if not altogether unclear to most people. The challenge is in understanding the terminology, learning how to critically evaluate outcomes measurement instruments, and becoming skillful in selecting the appropriate outcomes instruments that best fit the clinical problem being evaluated.

The Importance of

Selecting the

Appropriate Outcomes Measure

Taking into account the published, scientific results of an appropriate outcomes measure related to a specific clinical entity is a critical step in recommending a course of treatment for any musculoskeletal condition. This can be a challenging task, however. One treatment protocol or intervention may be deemed better than another based on a specific desired end point (eg, range of motion), but not as good based on another end point (eg, pain relief).

Once a surgeon has determined that surgery is required, he or she is faced with the critical decision of which device or intervention strategy best suits the needs of the patient. For example, there are nearly 10 different implants available that purportedly serve the same function for managing proximal humerus fracture. For the surgeon who is uninformed about the utility of each, selecting the appropriate implant would be similar to viewing several closed doors that appear to be identical and, without knowing what lies behind each, committing to going through a particular door. The informed surgeon would recognize the unique characteristics of each closed door that would indicate the strengths and limitations of each treatment method, making the decision of which door to open much easier.

Selecting an outcomes instrument is even more challenging. For the shoulder joint alone, there are more than 40 instruments to choose from, and the list is growing.1 Bourne et al3 discussed three factors that were pivotal in stimulating the outcomes movement: the increasing cost of medical care, appropriateness of care, and the phenomenon of geographic variation with respect to surgical procedures. Outcomes research has evolved from an emerging science to one that may play a pivotal role in managing our health care systems.3 Evidence-based medicine as a measure of quality is gaining greater importance, as is illustrated by the debate surrounding pay-for-performance. Consequently, the selection of an appropriate outcomes measure may have far-reaching implications for the future of orthopaedic care.

The Critical Components of a Quality Outcomes Instrument

Three components should be considered when assessing the overall quality of an outcomes instrument: content, methodology, and clinical utility. Content is what the instrument is trying to measure. Are the questions relevant to the particular population of patients? Does the interpretation of the score make sense to you? The content of the instrument can be divided into three categories: type, scale, and interpretation (Figure 1). Type refers to whether the instrument is clinician-based or patient-reported. Scale refers to the measurements or questions that make up the instrument and the way in which it is scored. Interpretation refers to the meaning behind the score. Do higher scores indicate a better outcome? Do certain scores pertain to an excellent or poor outcome?

Figure 1
Figure 1:
Algorithm demonstrating the three components of the content of an outcomes instrument: type, scale, and interpretation. (Adapted with permission from Suk M, Hanson BP, Norvell DC, Helfet DL: What makes a quality outcomes instrument? in: AO Handbook of Musculoskeletal Outcomes Measures and Instruments. Davos, Switzerland: Thieme, 2005.)

The methodology of the instrument can be divided into three categories: validity, reliability, and responsiveness (Figure 2). Validity is commonly defined as the extent to which an instrument measures what it is intended to measure. For example, in the sport of archery, validity would be represented by the accuracy of the shots—on average, how close the shots come to the bull'seye of the target.

Figure 2
Figure 2:
Algorithm demonstrating the three components of the methodology of an outcomes instrument: validity, reliability, and responsiveness. (Adapted with permission from Suk M, Hanson BP, Norvell DL, Helfet DL: What makes a quality outcomes instrument? in: AO Handbook of Musculoskeletal Outcomes Measures and Instruments. Davos, Switzerland: Thieme, 2005.)

Reliability refers to the consistency of the instrument—the ability of the instrument to measure something the same way twice. Extending the archery example, reliability would be represented by how close successive shots are grouped, regardless where they hit the target. Ideally, all shots would land in the bull'seye of the target, which would express validity and reliability. A close grouping of shots on the periphery of the target is analogous to an instrument that is reliable but not valid. In orthopaedics, a resident who is learning trauma surgery must first learn to reduce a fracture successfully, then learn to do so consistently.

Responsiveness refers to the ability of the instrument to change with the status of the patient. An instrument that measures the functional status of a patient should be sensitive enough to reflect improvement, as in the case of a patient who returns to work with some disability 6 months postoperatively, then progresses to full work with no disability after 12 months.

Even an outcomes instrument that contains the appropriate content and that demonstrates adequate methodology may be difficult for the patient to complete or for a clinician or researcher to administer or analyze.

Thus, clinical utility also plays a role in the quality of an outcomes instrument (Figure 3). It is essential that an instrument be acceptable to the patient (ie, patient-friendly) to minimize additional stress in a patient who is already living with health problems.4 Additionally, the response rate will be higher, making the data collected easier to interpret, generally applicable, and less prone to nonresponse bias. Outcomes measures are not often evaluated for patient friendliness, and there are no established standards for what constitutes this concept. Nonetheless, patient friendliness should be considered when selecting an appropriate outcomes instrument. Questions to consider when determining patient friendliness include the following: Can the outcomes instrument be completed in a relatively short period of time? Are the questions clear, concise, and easy to understand? Will the patient be uncomfortable answering the questions?

Figure 3
Figure 3:
Algorithm demonstrating the two components that make up the clinical utility of an outcomes instrument: patient friendliness (acceptability) and clinician friendliness (feasibility). (Adapted with permission from Suk M, Hanson BP, Norvell DC, Helfet DL: What makes a quality outcomes instrument? in: AO Handbook of Musculoskeletal Outcomes Measures and Instruments. Davos, Switzerland: Thieme, 2005.)

In addition to patient burden, it is important to consider the impact of collecting and processing the information on clinical researchers and staff.5–7 This is also known as feasibility. Several questions should be considered when determining whether the outcomes instrument exhibits clinician friendliness and feasibility: Is this outcomes instrument completed by the physician/staff or by the patient? What is the staff effort and cost in administering, recording, and analyzing the data? How much time is required to train the staff in administering the instrument?

Types of Outcomes Measure

Outcomes measures are either clinician-based or patient-reported. Clinician-based outcomes are usually physiologic and can be measured directly by the clinician. Examples of physiologic outcomes include muscle strength, joint range of motion, gait abnormalities, limb length, and bony alignment. These physiologic measures, which are considered “hard” or “objective,” often serve as surrogate outcomes (ie, substitutes for an end point that measures directly how a patient feels, functions, and survives) that are used to infer functional ability. With patient-reported outcomes, the concern is with the patient's perception of her or his functional ability, symptoms, and quality of life. These outcomes have been considered “soft” or “subjective,” and there has been reluctance to trust these types of outcomes measures.

Clinician-based Outcomes

Some physicians and health care administrators believe that clinician-based outcomes are inherently objective. After all, a clinician can directly measure motion, strength, and alignment. However, the key attribute to whether an outcome is objective is not necessarily who makes the assessment; rather, objectivity is dependent on the reliability or reproducibility of a finding.8,9 Although some measures are more reliable than others, substantial variability remains in many clinician-based outcomes measures. For example, interobserver agreement in determining motion of the spine10,11 and extremities12–15 is often poor. Muscle strength can be difficult to reproduce, particularly manually,16 as well as in cases when a dynamometer is used.17–19

Variability in interpretation of simple imaging studies has also been documented.20,21 However, the reproducibility of many patientreported outcomes is quite high.1,2 Using the standard of reliability, patient-reported outcomes can be at least as reliable, if not more so, than clinician-based outcomes measures, making them as or more objective than clinician-based measures.

Patient-reported Outcomes in Orthopaedic Surgery and Research

To properly evaluate medical and surgical interventions and identify whether one treatment is better than another, traditional clinician-based outcomes measures must be complemented by measures that focus on the patient's concerns.22 Interest in patient-reported outcomes has been fueled by the increased socioeconomic and clinical importance of chronic conditions, for which the objectives of treatment are to restore or improve function while preventing functional decline.23

The US Food and Drug Administration recently released a Draft Guidance encouraging the use of patient-reported outcomes in clinical trials for new medical products because (1) some treatment effects are known only to the patient; (2) there is a desire to know the patient perspective about the effectiveness of a treatment; or (3) systematic assessment of the patient's perspective may provide valuable information that can be lost when that perspective is filtered through a clinician's evaluation of the patient's response to clinical interview questions.24

Patient-reported outcomes are classified as either general (generic) or disease-specific measures of health-related quality of life. General measures are designed to be used across different diseases and across different demographic and cultural subgroups.25 They are usually multidimensional and are designed to give a comprehensive and general overview of health-related quality of life. The most well-known general measure of health-related quality of life is the Medical Outcomes Study Short Form-36 (SF-36).26 General measures of health-related quality of life permit comparisons across populations with different health conditions27 and are more likely to reveal the unexpected effects of an intervention.25,27 An important limitation is that they tend to be less responsive than specific measures of health-related quality of life to changes in health status,28,29 making them less likely to uncover the effects of a specific intervention.

Measures of health-related quality of life specific to musculoskeletal disease are focused on aspects of health that are specific to an injury (eg, fracture), disease (eg, osteoarthritis), anatomic area (eg, knee), or population of interest (eg, athletes). Several advantages have been reported for disease-specific measures,4 which were developed to have relevance when used with a specific disease or region of the body. This specificity has been shown to contribute to a more responsive measure.28,29 In general, disease-specific measures are more useful for detecting small changes as well as important changes that occur over time in the particular disease studied.25,30 Such specificity has also been shown to contribute to a more patient-responsive measure.28,29 For example, a hip-specific instrument designed for patients with osteoarthritis should be particularly responsive to important changes in patients treated with total hip arthroplasty because it is focused on only the most relevant items. Assuming that the instrument has clear relevance to the health problem of the patient,4 it might also be argued that greater patient acceptance leads to higher response and data collection rates.

If resources allow, both a generic and a disease-specific patient-reported outcomes measure should be administered to ensure adequate assessment of a patient's entire healthrelated quality of life.31,32

Evaluating Existing Outcomes Measures

Outcomes measures are designed to provide clinicians and researchers in the field of orthopaedics the data necessary for self-improvement and for critical evaluation of their own patient outcomes. For the practicing clinician, evaluating the results of one's own medical or surgical intervention and assessing the results reported by others in the literature is often as important as understanding the intervention itself. For epidemiologists and clinical researchers, outcomes measurements are essential to advancing education and developing new techniques. Several questions should be considered when evaluating, selecting, and developing the most appropriate outcome instruments for a given situation (Table 1).

Table 1
Table 1:
Questions to Ask When Evaluating an Outcomes Measure or Instrument

The surgeon who is considering whether it is more effective to treat an elderly patient with a displaced distal radius fracture with surgical fixation or cast immobilization may find a recent article in which the authors conclude that cast immobilization is better than surgical fixation.

Critical evaluation of the authors' methods may reveal that the authors performed a randomized clinical trial with a 97% follow-up rate and a sample size >500 patients. Would that be enough to convince the surgeon to treat all of his or her patients nonsurgically? If the authors highlighted the higher complication rate in the surgically treated group as an important contributor to their conclusion, the surgeon would need to determine whether complication rates are necessarily the best outcome on which to base superiority claims. What else might be important? Suppose the surgical group had a higher union rate and better motion than did the nonsurgical group at 6-month follow-up. Might that change the mind of the surgeon reading the article? Is it possible for a patient with a fibrous nonunion to do just as well as one who goes on to normal bony union?

Alternatively, the surgeon should consider the utility of a patientreported outcome. In this example, elderly patients with a distal radius fracture had similar patient-reported wrist evaluation (PRWE) scores in both the surgical and nonsurgical groups at 1-year follow-up. When evaluating the PRWE, the surgeon notes that it measures pain and function on a scale of 0 to 100 points; that it was found to be valid, reliable, and responsive in several distal radius populations; and that it is moderately patient-friendly (15 questions) and is strongly clinicianfriendly because it is a patientreported outcome (ie, completed solely by the patient). This evaluation passes the surgeon's test for a quality outcomes measure in this patient population.

As illustrated by this example, different conclusions regarding treatment superiority may be drawn depending on the outcomes measures being interpreted (ie, physicianversus patient-based). Nonsurgical management may be favored when considering the complication rates alone. Surgical management may be favored when the rate of nonunion alone is considered (ie, clinicianbased outcome) and/or when outcomes are similar with each when using a “validated” patient-reported outcome (ie, PRWE).

Overview of Validated Orthopaedic Outcomes Measures

Given the plethora of outcomes measures reported in the literature, it is impossible to list all of the orthopaedic outcomes measures in existence.1 However, based on anatomic area, we have identified five to six of the most widely validated patientreported outcomes measures in the general orthopaedic literature. These measures are commonly used and, in most cases, have been subjected to important psychometric testing to include validity, reliability, and responsiveness testing in several populations. The large number of outcome measures references for the spine precluded reviewing measures related to treatment of the spine (disease or injury). Measurement instruments are listed in Tables 2 through 9 and are separated by anatomic area. They are categorized by type (ie, patient-reported or clinicianbased outcome), content, scale, interpretation, population tested, and method of validation (available at

Future Directions

Outcomes research in orthopaedic medicine measures what the patient thinks of the results of the medical care that he or she has received. Traditional physician-based research is focused on the evaluation of range of motion, strength, radiographs, and the conclusion of the physician as to the treatment result. Outcomes research is meant not to replace the usual methods of research in the evaluation and treatment of musculoskeletal disease but to add another dimension of evaluation.33

With the emergence of patientreported outcomes and a firm understanding of how to select an appropriate measure, clinicians and researchers are getting closer to achieving excellence in orthopaedic outcomes research. However, even the best existing patient-reported outcomes may pose challenges in the future. Assessing success or failure based on a score alone can be misleading because of the variation in expectations of patients and surgeons. For instance, an elderly patient operated on for a distal radius fracture who scores 30 on the PRWE (the lower the score the higher the function) will have a better outcome than a young patient who scores a 20. Even though the scores reflect the contrary, the patient with a PRWE score of 20 may have a poorer outcome if he or she considers the surgery a relative failure, even if the surgeon considers it a relative success. The opposite could be said for the score of 30 on the PRWE resulting in a better outcome for the elderly patient. Each patient has different expectations, and these expectations influence perceived outcomes.

To adequately address this conundrum, the surgeon may want to consider what the outcome score of a specific measure indicates. The score may provide some sense of a patient's final outcome relative to those of other patients who have been treated; however, each patient presents with different baseline characteristics. Developing methods to account for these differences rather than relying on the mean outcomes of validated instruments is an interesting adjunct to current outcomes research in orthopaedics. Developing a method of bridging the gap between already existing validated outcomes measures and patient expectations may enable surgeons to select the appropriate measure and apply some form of adjustment to the final outcome to account for patient and surgeon expectations. We are currently conducting a clinical trial from which we hope to develop a prediction tool to use in determining whether the 12-month outcome was a success or failure based on baseline patient expectations. Such a tool may assist in moving orthopaedics outcomes research to the next level of excellence.


Evaluating and selecting an outcomes measure becomes increasingly challenging as the number of measures available increases. Selection of the best measure is dependent on the purpose of its use and the population for which it is intended. The surgeon should first consider the content of the measure. Does the scale address the appropriate questions for the study population? Was anything left out? Is the interpretation of the overall score satisfying? Then the surgeon should ask whether the measure has been validated in a population in which he or she is interested. If so, how did it perform? Is it reliable, and is it able to demonstrate important clinical change after an intervention has been introduced? Will the surgeon's patients and staff find the measure acceptable and feasible?

All of these questions must be considered when comparing measurement instruments. One measure may be better in certain situations than in others. Even with appropriate selection, the outcomes measure may still be unsatisfactory. Patient expectations are an important aspect of treatment failure and success. Understanding the interaction between patient and surgeon expectations and the final patient outcome may be the key to the future of orthopaedic outcomes research and better patient care.


Citation numbers printed in bold type indicate references published within the past 5 years.

1. Suk M, Hanson B, Norvell D, Helfet D: AO Handbook of Musculoskeletal Outcomes Measures and Instruments. Davos, Switzerland: Thieme Publishing, 2005.
2. Chapman J, Hanson B, Dettori J, Norvell D: The Spine Manual of Outcomes Measures and Instruments. Davos, Switzerland: Thieme Publishing, 2007.
3. Bourne RB, Maloney WJ, Wright JG: An AOA critical issue: The outcome of the outcomes movement. J Bone Joint Surg Am 2004;86:633-640.
4. Fitzpatrick R, Davey C, Buxton MJ, Jones DR: Evaluating patient-based outcome measures for use in clinical trials. Health Technol Assess 1998;2:1-74.
5. Aaronson NK: Assessing the quality of life of patients in cancer clinical trials: Common problems and common sense solutions. Eur J Cancer 1992;28A:1304-1307.
6. Lansky D, Butler JB, Waller FT: Using health status measures in the hospital setting: From acute care to ‘outcomes management’. Med Care 1992;30(5 suppl):MS57-MS73.
7. Erickson P, Taeuber RC, Scott J: Operational aspects of Quality-of-Life Assessment: Choosing the right instrument. Pharmacoeconomics 1995;7:39-48.
8. Feinstein AR: Clinical biostatistics: XLI. Hard science, soft data, and the challenges of choosing clinical variables in research. Clin Pharmacol Ther 1977;22:485-498.
9. Deyo RA: Using outcomes to improve quality of research and quality of care. J Am Board Fam Pract 1998;11:465-473.
10. Nelson MA, Allen P, Clamp SE, de Dombal FT: Reliability and reproducibility of clinical findings in low-back pain. Spine 1979;4:97-101.
11. Miller SA, Mayer T, Cox R, Gatchel RJ: Reliability problems associated with the modified Schöber technique for true lumbar flexion measurement. Spine 1992;17:345-348.
12. Edwards TB, Bostick RD, Greene CC, Baratta RV, Drez D: Interobserver and intraobserver reliability of the measurement of shoulder internal rotation by vertebral level. J Shoulder Elbow Surg 2002;11:40-42.
13. Hoving JL, Buchbinder R, Green S, et al: How reliably do rheumatologists measure shoulder movement? Ann Rheum Dis 2002;61:612-616.
14. Youdas JW, Bogard CL, Suman VJ: Reliability of goniometric measurements and visual estimates of ankle joint active range of motion obtained in a clinical setting. Arch Phys Med Rehabil 1993;74:1113-1118.
15. Bovens AM, van Baak MA, Vrencken JG, Wijnen JA, Verstappen FT: Variability and reliability of joint measurements. Am J Sports Med 1990;18:58-63.
16. Hayes K, Walton JR, Szomor ZL, Murrell GA: Reliability of 3 methods for assessing shoulder strength. J Shoulder Elbow Surg 2002;11:33-39.
17. Möller M, Lind K, Styf J, Karlsson J: The reliability of isokinetic testing of the ankle joint and a heel-raise test for endurance. Knee Surg Sports Traumatol Arthrosc 2005;13:60-71.
18. Moreland J, Finch E, Stratford P, Balsor B, Gill C: Interrater reliability of six tests of trunk muscle function and endurance. J Orthop Sports Phys Ther 1997;26:200-208.
19. Agre JC, Magness JL, Hull SZ, et al: Strength testing with a portable dynamometer: Reliability for upper and lower extremities. Arch Phys Med Rehabil 1987;68:454-458.
20. Koran LM: The reliability of clinical methods, data and judgments (second of two parts). N Engl J Med 1975;293:695-701.
21. Deyo RA, McNiesh LM, Cone RO III: Observer variability in the interpretation of lumbar spine radiographs. Arthritis Rheum 1985;28:1066-1070.
22. Slevin ML, Plant H, Lynch D, Drinkwater J, Gregory WM: Who should measure quality of life, the doctor or the patient? Br J Cancer 1988;57:109-112.
23. Byrne M: Cancer chemotherapy and quality of life. BMJ 1992;304:1523-1524.
24. US Food and Drug Administration: Guidance for Industry: Patient-Reported Outcome Measures. Use in Medical Product Development to Support Labeling Claims [Draft Guidance]. Rockville, MD: U.S. Department of Health and Human Services, 2006.
25. McSweeny AJ, Creer TL: Healthrelated quality-of-life assessment in medical care. Dis Mon 1995;41:1-71.
26. Ware JE Jr, Sherbourne CD: The MOS 36-item short-form health survey (SF-36): I. Conceptual framework and item selection. Med Care 1992;30:473-483.
27. Kessler RC, Mroczek DK: Measuring the effects of medical interventions. Med Care 1995;33(4 suppl):AS109-AS119.
28. Guyatt GH, Feeny DH, Patrick DL: Measuring health-related quality of life. Ann Intern Med 1993;118:622-629.
29. Wright JG, Young NL: A comparison of different indices of responsiveness. J Clin Epidemiol 1997;50:239-246.
30. Patrick DL, Deyo RA: Generic and disease-specific measures in assessing health status and quality of life. Med Care 1989;27(3 suppl):S217-S232.
31. Fletcher A, Gore S, Jones D, Fitzpatrick R, Spiegelhalter D, Cox D: Quality of life measures in health care: II. Design, analysis, and interpretation. BMJ 1992;305:1145-1148.
32. Guyatt G, Feeny D, Patrick D: Issues in quality-of-life measurement in clinical trials. Control Clin Trials 1991;12(4 suppl):81S-90S.
33. Simmons BP, Swiontkowski MF, Evans RW, Amadio PC, Cats-Baril W: Outcomes assessment in the information age: Available instruments, data collection, and utilization of data. Instr Course Lect 1999;48:667-685.
© 2008 by American Academy of Orthopaedic Surgeons