50 Years Ago in CORR
Ascertaining the level of disability of a patient or his or her response to treatment has long been important to doctors; although, until the 20th century, these attempts were typically based on the reporting of individual patients or small series of cases without formal collections of information in populations. E. Armory Codman, perhaps, was one of the earliest to recognize the need for systematic collection and storage of data on patients [3, 4, 6]. In his classic articles on the development of a registry for bone sarcomas [4, 5], he stressed the need for routine documentation of specific pieces of information. That information, as useful as it was for research purposes, was not easily quantified so as to give an indication of an individual patient’s disability of that for a population. Robert Merle d’Aubigné and colleagues [7, 8, 10, 11], in a series of articles beginning in 1949, proposed three slightly different methods to quantify or categorize the disabilities of patients with hip disorders . These scales were likely the first attempts to readily summarize the levels of function of patients with musculoskeletal disabilities, and other versions of them continue to be used today. Carroll Larson published what became known as the “Iowa hip score” in 1963 (Charts 1 and 2). In the scores proposed by Merle d’Aubigné each scoring level required the patient to have a certain level of pain, mobility, and function. While many patients would fit within that scheme, others would not, because, for example, the pain level was high, yet the mobility and function levels could be high rather than low. Further, function was gauged only by the ability to walk. The method proposed by Larson had the advantage of not requiring a given score to reflect a composite of parameters, but rather scoring each parameter separately. Further, it included other aspects of function (such as bathing without help) and deformity (Charts 1 and 2). A subsequent hip scoring system based on such individual aspects was published by William Harris in 1969 . That score differed in details compared to that of Larson, but not in the concept of scoring each item separately. The scoring systems proposed by Merle d’Aubigné, Larson, and Harris arbitrarily weighted the parameters without any sort of validation of those weightings. All three scoring systems continue to be reported today (sometimes in modified forms), although all three are physician-generated. Such systems are not only prone to bias on the part of the rating physician but may not consider those things most important to patients.
In the past few decades, therefore, patient-generated ratings have been increasingly introduced [1, 13], and these are now the standard for reporting the levels of patient disability. Since the original descriptions of the three hip scoring systems were introduced, the field of measuring outcomes has become far more sophisticated and complex, demanding the expertise of a wide array of scientists. Early investigators, such as Merle d’Aubigné, Larson, and Harris, evidently did not consider validating their scores or determining their reliability. Multiple forms of validation characterize any rigorous assessment, including: face validity (the instrument measures what one wants or “makes sense”), content validity (the instrument captures all aspects of what one wants to measure), construct validity (the instrument correlates with what one wants to measure or will actually measure it), and criterion-related validity (the instrument predicts an outcome). (Readers should be aware these sorts of definitions vary considerably in the field of psychometrics.) Some early outcome instruments were later validated but typically in limited ways. Even contemporary outcomes are frequently inadequately validated. Patient satisfaction is arguably among the most important of outcomes, yet finding valid and reliable ways of assessing satisfaction has eluded us. In an analysis of 195 studies from 139 journals, Sitzia  found only 46% of studies reported any validity or reliability data, and only 6% reported content validity and criterion or construct validity and reliability. He concluded, “With few exceptions, the study instruments in this sample demonstrated little evidence of reliability of validity. Moreover, study authors exhibited a poor understanding of the importance of these properties in the assessment of satisfaction.” Despite the progress in developing better outcome instruments, we have a long way to go.
Nonetheless, patient rating systems, such as that proposed by Larson, advanced the manner in which surgical treatments of musculoskeletal disorders could be assessed and added some objectivity to what had previously been subjective interpretations.
1. Bellamy, N., Buchanan, WW., Goldsmith, CH., Campbell, J. and Stitt, LW. Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol.
1988; 15: 1833-1840.
2. Biau, DJ. and Brand, RA. Robert Merle d’Aubigne, 1900-1989. Clin Orthop Relat Res.
2009; 467: 2-6. 10.1007/s11999-008-0571-2
3. Brand, RA. Ernest Amory Codman, MD, 1869-1940. Clin Orthop Relat Res.
2009; 467: 2763-2765. 10.1007/s11999-009-1047-8
4. Codman, EA. The Registry of Bone Sarcoma as an Example of the End-Result Idea in Hospital Organization. Bull Amer Coll Surg.
1924; 8: 34-38.
5. Codman, EA. Registry of Bone Sarcoma: Part I.—Twenty-Five Criteria for Establishing the Diagnosis of Osteogenic Sarcoma; Part II.—Thirteen Registered Cases of “Five Year Cures” Analyzed According to These Criteria. Surg Gynecol Obstet.
1926; 42: 381-393.
6. Codman, EA. The classic: the registry of bone sarcomas as an example of the end-result idea in hospital organization. 1924. Clin Orthop Relat Res.
2009; 467: 2771-2782. 10.1007/s11999-009-1049-6
7. d’Aubigne, RM. and Postel, M. Functional results of hip arthroplasty with acrylic prosthesis. J Bone Joint Surg Am.
1954; 35: 451-475.
8. d’Aubigne, RM. and Postel, M. The classic: functional results of hip arthroplasty with acrylic prosthesis. 1954. Clin Orthop Relat Res.
2009; 467: 7-27. 10.1007/s11999-008-0572-1
9. Harris, WH. Traumatic arthritis of the hip after dislocation and acetabular fractures: treatment by mold arthroplasty. An end-result study using a new method of result evaluation. J Bone Joint Surg Am.
1969; 51: 737-755.
10. Merle D’Aubigne R, Cauchoix J, Ramadier JV. [Quantitative evaluation of the function of the hip. Application to the study of the results of operations mobilizing the hip]. Rev Chir Orthop
11. Merle d’Aubigne R, Cauchoix J, Ramadier, Postel. [250 Arthroplasties of the hip with interposition of inert substances; first results.]. Mem Acad Chir (Paris)
12. Sitzia, J. How valid and reliable are patient satisfaction data? An analysis of 195 studies. Int J Qual Health Care.
1999; 11: 319-328. 10.1093/intqhc/11.4.319
13. Ware, JE, Jr, and Sherbourne, CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care.
1992; 30: 473-483. 10.1097/00005650-199206000-00002