50 Years Ago in CORR®: Rating Scale for Hip Disabilities Carroll B. Larson, MD, CORR 1963;31:85-93

Brand, Richard, A., MD1, a

Clinical Orthopaedics and Related Research: February 2013 - Volume 471 - Issue 2 - p 697–699
doi: 10.1007/s11999-012-2710-z
50 Years Ago in CORR

1 Clinical Orthopaedics and Related Research, Philadelphia, PA, USA

Ascertaining the level of disability of a patient or his or her response to treatment has long been important to doctors; although, until the 20th century, these attempts were typically based on the reporting of individual patients or small series of cases without formal collections of information in populations. E. Armory Codman, perhaps, was one of the earliest to recognize the need for systematic collection and storage of data on patients [3, 4, 6]. In his classic articles on the development of a registry for bone sarcomas [4, 5], he stressed the need for routine documentation of specific pieces of information. That information, as useful as it was for research purposes, was not easily quantified so as to give an indication of an individual patient’s disability of that for a population. Robert Merle d’Aubigné and colleagues [7, 8, 10, 11], in a series of articles beginning in 1949, proposed three slightly different methods to quantify or categorize the disabilities of patients with hip disorders [2]. These scales were likely the first attempts to readily summarize the levels of function of patients with musculoskeletal disabilities, and other versions of them continue to be used today. Carroll Larson published what became known as the “Iowa hip score” in 1963 (Charts 1 and 2). In the scores proposed by Merle d’Aubigné each scoring level required the patient to have a certain level of pain, mobility, and function. While many patients would fit within that scheme, others would not, because, for example, the pain level was high, yet the mobility and function levels could be high rather than low. Further, function was gauged only by the ability to walk. The method proposed by Larson had the advantage of not requiring a given score to reflect a composite of parameters, but rather scoring each parameter separately. Further, it included other aspects of function (such as bathing without help) and deformity (Charts 1 and 2). A subsequent hip scoring system based on such individual aspects was published by William Harris in 1969 [9]. That score differed in details compared to that of Larson, but not in the concept of scoring each item separately. The scoring systems proposed by Merle d’Aubigné, Larson, and Harris arbitrarily weighted the parameters without any sort of validation of those weightings. All three scoring systems continue to be reported today (sometimes in modified forms), although all three are physician-generated. Such systems are not only prone to bias on the part of the rating physician but may not consider those things most important to patients.



In the past few decades, therefore, patient-generated ratings have been increasingly introduced [1, 13], and these are now the standard for reporting the levels of patient disability. Since the original descriptions of the three hip scoring systems were introduced, the field of measuring outcomes has become far more sophisticated and complex, demanding the expertise of a wide array of scientists. Early investigators, such as Merle d’Aubigné, Larson, and Harris, evidently did not consider validating their scores or determining their reliability. Multiple forms of validation characterize any rigorous assessment, including: face validity (the instrument measures what one wants or “makes sense”), content validity (the instrument captures all aspects of what one wants to measure), construct validity (the instrument correlates with what one wants to measure or will actually measure it), and criterion-related validity (the instrument predicts an outcome). (Readers should be aware these sorts of definitions vary considerably in the field of psychometrics.) Some early outcome instruments were later validated but typically in limited ways. Even contemporary outcomes are frequently inadequately validated. Patient satisfaction is arguably among the most important of outcomes, yet finding valid and reliable ways of assessing satisfaction has eluded us. In an analysis of 195 studies from 139 journals, Sitzia [12] found only 46% of studies reported any validity or reliability data, and only 6% reported content validity and criterion or construct validity and reliability. He concluded, “With few exceptions, the study instruments in this sample demonstrated little evidence of reliability of validity. Moreover, study authors exhibited a poor understanding of the importance of these properties in the assessment of satisfaction.” Despite the progress in developing better outcome instruments, we have a long way to go.

Nonetheless, patient rating systems, such as that proposed by Larson, advanced the manner in which surgical treatments of musculoskeletal disorders could be assessed and added some objectivity to what had previously been subjective interpretations.

