“Patient-based outcome measure” is a shorthand term referring to the array of questionnaires, interview schedules, and other related methods of assessing health, illness, and benefits of health care interventions from the patient's perspective. Patient-based outcome measures, addressing constructs such as health-related quality of life, subjective health status, and functional status, are increasingly used as primary or secondary end points in clinical trials.1 However, the concern over the psychometric properties of outcome measures is not just a prerogative of the researcher but is as important to clinicians who employ outcome measures to obtain baseline information, assess progress, and inform treatment planning.2
In rehabilitation, the measurement of “outcome” has gained increasing importance in recent years, driven primarily by the need for evidence-based practice, rather than providing services for patients based on tradition and anecdote.
In the foreword to a textbook on outcome measures, Professor Alan Jette states that “in the face of mounting pressures to demonstrate that what they do works, researchers and clinicians within the rehabilitation professions are aggressively pursuing clinical outcomes research … outcome research findings are being used increasingly in physical rehabilitation to form evidence-based decisions regarding clinical practice.” 3
Specifically in the field of amputee and prosthetic rehabilitation, there has been a parallel increase in the use of outcome measures; however, there are a multitude of measures currently being used by researchers and clinicians, and there currently is no consensus regarding the most appropriate, or gold standard, measure or measures in this field.4–7 Further, it is important to be able to distinguish between outcome measures that have adequate evidence and statistical estimates of validity and reliability and those that lack such evidence. An evidence-based approach to selecting outcome measures involves making judgments about the quality of the validity and reliability studies, interpreting the findings, and deciding whether they are applicable to one's specific practice.2
The goal of this project was to conduct a structured review of the international literature on lower limb prosthetic outcome measures.
- To determine which validated instruments are available in English to measure global lower limb prosthetic outcomes.
- To identify what these instruments attempt to measure.
- To examine the relative strengths and weaknesses of these measures.
- To attempt to rigorously appraise the scientific content of the reviewed publications.
- To make recommendations about the use of particular outcome measures in prosthetic rehabilitation.
Our review is based on searches of the RECAL and Medline databases for sources from 1995 to the end of April 2005. RECAL Information Services specializes in the provision of bibliographic information about the scientific, technical, and clinical literature in the field of prosthetics, orthotics, and related physical medicine and rehabilitation. The literature is indexed with the RECAL Thesaurus, which uses the specialized terminology of these fields. RECAL may be accessed through the following Web site: http://www.recal.org.uk. RECAL is the most comprehensive international database in the fields of prosthetics, orthotics, and rehabilitation technology and is based at the University of Strathclyde, United Kingdom. Individual search strategies were developed for each of these databases:
RECAL SEARCH STRATEGY
- Amput* and lower limb* and instrument* and eval*
- Amput* and outcome* and instrument*
- Amput* and outcome* and patient satisfaction
- Amput* and outcome* and [predict* or indicat*]
- Amput* and lower limb* and outcome* and rehabil*
- Amput* and health status* and [measure* or assess* or examin or eval]
- Amput* and instrument*
- Lower limb* and prosthe* and instrument*
MeSH TERMS FOR MEDLINE SEARCH STRATEGY (THROUGH PUBMED INTERFACE)
- Outcome Assessment (Health Care)
- Quality of Life
- Health Status
- Treatment Outcome
- Health Status Indicators
- Amputation Stumps
- Amputation Traumatic
- amput* [text term]
- Lower Extremity
- lower limb [text term]
- prosthe* [text term]
- Artificial Limbs
- (1 OR 2 OR 3 OR 4 OR 5 OR 6) AND (7 OR 8 OR 9 OR 10) AND (11 OR 12 OR 13) AND (14 OR 15)
The search strategies were liberal and designed to pick up as many potential articles as possible. The search itself was not restricted by language, but time restrictions meant that only English-language articles were included in the review. The initial search identified 335 articles. An additional five articles were identified by hand-searching personal literature collections, for a total of 340 articles.
Studies were considered to be eligible for inclusion in the review if they met the following inclusion criteria:
- The outcome measure must have been used with lower limb amputees (through-ankle or above) fitted with prostheses.
- Studies must not include only or mainly congenital amputees.
- The primary purpose of the outcome measure must be the measurement of function or quality of life (social functioning).
- The outcome measure must assess outcomes that can reasonably be thought of as being directly influenced by the fitting of a prosthesis.
- The outcome measure must not be used only to predict prosthetic potential; the measure must be used after fitting.
- The outcome measure can be used in routine care or more experimental studies. Pilot studies are to be included.
- The study must be described in a full paper (i.e., no posters, abstracts or newsletters).
- The study must have been published in or after 1995.
- The study must involve more than 20 patients.
- The full paper must be in English.
Inclusion criteria were based on the authors' collective experience of working with national and international groups involved in evidence-based practice and structured appraisal.
Most of these criteria are self-explanatory, but the third requires additional explanation. We were interested in locating studies that had as a primary aim the description and evaluation of an instrument, method, or system that could be used to assess the effect of prosthetic fitting on physical function or mobility and quality of life. Measures that attempted to measure function or quality of life as a secondary aim (e.g., a pain measure that considered function) would be excluded. Studies that were primarily surveys presenting a wide range of information on lower limb amputees, including some outcome data, were excluded from our main analysis but are listed in Survey Papers at the end of this article.
Abstracts for the 340 articles were scanned for relevance by at least two reviewers. Full text copies of all potentially relevant studies were obtained. These articles were then considered for inclusion in the review by at least two reviewers. Any discrepancies between reviewers arising from the inclusion assessment were resolved by discussion with a third reviewer.
A form was developed to extract information from the eligible articles regarding study characteristics and findings. The form was based on a number of existing forms but particularly on the checklist for assessing preference-based measures of health developed by Brazier et al.8 and Finch et al.3 It is also very similar to a checklist developed by Jerosch-Herold2 and published when our current study was almost complete. The similarity of these independently developed forms suggests that the items extracted are the key items needed when critically appraising an outcome measure. The form used in the current study covered six main areas: general information, practicality, reliability, validity, scaling, and potential for bias. Reliability and validity were broken down into three and four categories, respectively, based on Finch et al.3 The reliability categories, together with a short definition, were:
- Test-retest reliability: two or more assessments over an interval when patients are believed to be stable with regard to physical function or quality of life. Expressed as interclass correlation coefficient (ICC), which typically varies from 0 to 1. Values below 0.40 are considered poor values, between 0.40 and 0.75 are considered fair to good, and values above 0.75 are considered excellent.9
- Interrater reliability: two or more raters assess patients at the same time and compare results. Also expressed as ICC.
- Internal consistency: the extent to which all items in a scale or test measure the same concept. For example, do all parts of a functional outcome measure actually measure function, rather than, say, mixing function with quality of life. Expressed as Cronbach's alpha, which varies from 0 to 1. Values below 0.60 are considered unacceptable, 0.60 to 0.70 poor, 0.70 to 0.90 are good, whereas a value over 0.90 suggests redundancy in the measure.10
The validity categories included:
- Face validity: does the measure appear to be measuring what it is intended to measure? The extent of face validity is generally simply expressed in words.
- Content validity: does the measure contain a comprehensive sample of items that completely assess the domain (e.g., physical function) of interest? Also generally expressed in words.
- Construct validity: the extent to which a measure is consistent with other ways of assessing the physical function or quality of life. For example, level of amputation or age affect functional outcome, so a functional outcome measure would be expected to be correlated with these. Convergent, discriminant, and known group validity are forms of construct validity, although convergent validity becomes a measure of criterion validity when the comparison is with a gold standard (see below). Construct validity is expressed as a correlation coefficient, r, which ranges from -1 to 1. In general, r ≥ 0.6 is considered good to excellent, whereas r < 0.6 is poor to moderate.
- Criterion validity: the extent to which a measure provides results that are consistent with a gold standard. There is no clear gold standard outcome measure for lower limb amputees, but we (and others) have considered some form of timed walking test as an objective approximation to a gold standard. Where authors have provided a comparison to a walking test, we have classed this as a measure of criterion validity and, often, so have the study authors. Concurrent and predictive validity, terms often seen in the outcome literature, are both measures of criterion validity. Criterion validity is also expressed as a correlation coefficient, r.
Additional information on the reliability and validity of outcome measures can be found in the excellent article by Jerosch-Herold.2
The data extraction form was used by at least two reviewers to extract data from each included study, with any differences being resolved by discussion. Extracted data were then put into tables. Most studies clearly identified a single primary outcome measure, but a small number gave two or three measures equal importance. In this case, a separate entry was made in the table for each outcome measure investigated; thus, some studies appear more than once in the data tables. Authors were not contacted if an article was unclear regarding one of the data items, and “Not stated” or “Unclear” was put into the table. Study quality was considered informally, but we decided against a formal attempt at assessing quality and providing a grade for each study because of time constraints. In addition, a simple rating is unlikely to capture the complexity of the task.
Of the 340 articles identified in our search, 60 appeared to meet our inclusion criteria, and full-text articles were obtained. Of these, 28 met all our inclusion criteria and are included in the review (see References: Included Studies). Most articles were excluded because they were clearly irrelevant, were survey papers, or involved 20 or fewer lower limb amputees. Articles that appeared relevant but that were rejected after consideration of the full article text are listed in Table 1 (see also References: Excluded Studies).Tables 2 to 7 give an overview of the 28 included studies. Of the 28 studies, 9 were done in the United Kingdom, 7 in the United States, 6 in Canada, 3 in Italy, 2 in Ireland and 1 in the Netherlands. Twenty-five primary measures were studied (Appendix 1).
Twenty-one papers were excluded as “surveys,” that is, the authors used an outcome measure or measures to obtain information on their amputee population, rather than reporting the psychometric properties of the measures themselves (see References: Surveys). A striking feature is the total number of measures employed, n = 19, with only six (Short Form 36 or Short Form 12, Barthel Index, Prosthetic Profile of the Amputee, Sickness Impact Profile, Functional Independence Measure, and Harold Wood/Stanmore Scale), being used by more than one author.
Other measures cited were Frenchay Activities Index, Russek's Code, Trinity Amputation and Prosthesis Experience Scales, International Classification of Impairment, Disability, and Handicap (ICIDH), Musculo-Skeletal Functional Assessment, Minineural State Examination, Brief Symptom Inventory, Amputee Medical Rehabilitation Society (AMRS), Rivermead Mobility Index, Hospital Anxiety and Depression Scale, Nottingham ADL, Prosthesis Evaluation Questionnaire, and “demographic data.”
Two papers reviewed the national extent of use of outcome measures in Britain11 and Canada.5
DESCRIPTION OF MEASURES AND SUMMARY OF CHECKLIST RESULTS
Timed Up and Go Test.
The timed up and go (TUG) test measures mobility by assessing many of the basic components of mobility.3 It was developed for use with the elderly. The subject is observed rising from an armchair, walking 3 m, and returning to the chair on a standard carpet. The test is reported in seconds, and the time to carry out the test is 1 to 2 minutes. The TUG has been reported to be quick, reliable (inter- and intrarater), and valid with a variety of conditions.3
The TUG has been found to have excellent test-retest and inter-rater reliability and poor to moderate construct validity with lower limb amputees, and it is recommended for routine clinical practice and research.12
Timed Walk Tests.
Timed walk tests (TWTs) measure function in terms of mobility and have been used with a variety of clinical conditions,3,13 including lower limb amputees.14,15 Timing of walking can be carried out in several different ways, either testing speed over a short distance (e.g., 10 meters15 that can include an 180° turn16) or cardiovascular fitness/endurance in which the subject is asked to walk as far as he/she can in a given time (i.e., 2,15 6,6 or 10 minutes3,13). The results of the tests are reported in seconds or meters per second for the speed test and distance walked for the endurance test. Standard instructions should be used, and all tests require observation of the patient by a test administrator. TWTs are known to be valid and reliable with a variety of clinical conditions and are frequently used as the gold standard comparator test.6,15,17,18
Checklist results indicated poor to moderate construct validity, depending on comparison measure. TWT seems to differentiate between levels (p < 0.05) and is recommended for clinical use.15
Amputee Mobility Predictor with Prosthesis.
AMPPro is a predictive tool to assess the ambulatory potential of lower limb amputees, and it can also be used as an evaluative tool to measure function during or after rehabilitation.6 It consists of six domains containing 21 items in total: sitting balance, transfers, standing balance, gait, stairs, and use of an assistive device. Most items offer three scoring choices, and the AMPPro has a total score range from 0 to 42. It is completed based on observed performance, takes 10 to 15 minutes to complete, and the score is easily totaled.
The AMPPro appears to be very reliable and to have poor (when compared with age and co-morbidities index) to good (when compared to AAS and 6-minute TWT) validity and is recommended for clinical and research use.6
Locomotor Capabilities Index.
The Locomotor Capabilities Index (LCI) measures a lower limb amputee's locomotor capabilities with a prosthesis during and after rehabilitation.19 It consists of 14 items divided into two subscales: basic and advanced. Each item is scored on a four-point ordinal scale giving a total maximum score of 42 with subscores of 21 for the basic and advanced activities. There is a new version, the LCI5, in which the upper ordinal level is split into two portions according to the use or nonuse of walking aids.20 The higher the score of the LCI, the greater the capabilities of the amputee. It is a self-report tool, takes 5 minutes to complete, and scores are simple to total, as just described. It is available in several languages. The LCI is widely used.18,20–22
Both the LCI and LCI5 demonstrate good internal consistency, test-retest reliability and construct validity, and the LCI5 has been shown to reduce the ceiling effect associated with the LCI by 50%.20 It is recommended for clinical and research use.20–22
Russek's Code measures mobility of lower limb amputees fitted with a prosthesis.19 It is a six-point scale used to indicate an individual's functional ability with his or her prosthesis. It is a direct observation tool, i.e., the grading is based on the patient's actual performance during the preceding 1 to 2 weeks, and no further analysis is required. The time taken to complete is not reported.
Russek's Code is stated to have face and content validity and reliability, although they have not been investigated.23 It appears to have construct validity (known groups) but requires large numbers. It is not recommended for use with amputees.23
Special Interest Group in Amputee Medicine.
The Special Interest Group in Amputee Medicine (SIGAM) measures function of lower limb amputees fitted with a functional or cosmetic prosthesis in terms of mobility. It was developed from the Harold Wood/Stanmore Mobility Grades to improve accuracy of grade allocation.24 It includes a benchmark distance of 50 meters and uses a questionnaire and algorithm. Each item in the questionnaire is a closed-ended question. The time taken to complete is not stated, and the final grading is assigned using the algorithm.
It is reliable (test-retest and inter-rater), responsive to change in mobility during the first 6 months after amputation, appears to have construct and criterion validity, and it is recommended for routine clinical practice.24
NOT GENERIC, NOT AMPUTEE SPECIFIC
Rivermead Mobility Index.
The Rivermead Mobility Index (RMI) is a measure of function in terms of the capacity to perform a mobility activity and was developed and reported to be valid for use with stroke and various neurological conditions.13 It consists of 15 items: 14 closed-ended questions and one direct observation. It covers a range of mobility activities ranked according to their difficulty. It is a self-report instrument (self-administered or interview) with one item requiring direct observation. The RMI takes 2 to 5 minutes to complete; data analysis involves totaling the scores to give a maximum of 15 and minimum of 0.13,19
Used with lower limb amputees, the RMI is reliable (internal consistency,25 test-retest and interrater16) and responsive to change with rehabilitation.25,26 However, ceiling effects have been reported,16 and although two authors have found good to excellent construct validity,16,26 there are some concerns following a Rasch analysis, and these authors did not recommend its use with amputees.16
The Barthel Index is a well-documented and recognized measure of function.3,13,19,27 It originally was developed for use with neurological conditions but is now used with a wide range of conditions including amputation. It evaluates 10 activities of daily living. It is based on the individual's actual performance during the previous 24 to 48 hours, ascertained by direct observation, interview (face-to-face or telephone, with respondent or significant other), or self-administration. It takes 5 minutes to carry out an interview or 20 to 60 minutes to observe. No training is needed, but guidelines are available with the original version. The Barthel Index has a three-point ordinal scale system using five-point increments. The scale is weighted, and values are assigned to each item based on time and amount of physical assistance needed. The total score varies from 0 (total dependence) to 100 (complete independence). It is available in many languages.
The Barthel Index is stated to have face and content validity and to be reliable with lower limb amputees but appears to have limited construct validity (known groups) and ceiling effects, and it is not recommended for use with amputees.23
Functional Independence Measure.
The Functional Independence Measure (FIM) is a well documented and recognized multidimensional functional status tool designed originally for neurological patients and now for all rehabilitation diagnostic groups.3,13,19,27 It contains 18 items pertaining to six basic life activities measuring level of dependency on a seven-point ordinal scale. The total FIM scores range from 18 to 126, where high scores indicate greater levels of independence. The FIM must be administered by a trained health care professional and can be based on observed performance or self report (face-to-face or telephone interview) at different points during the rehabilitation of an individual. There is a detailed, updated guide available; the tool takes 20 to 45 minutes to administer; and a total score is easily calculated. It is available in 11 different languages. FIM is thought not to be suitable for use with lower limb amputees because of ceiling effects and lack of responsiveness.19
FIM is reported to be reliable,28,29 but the same researchers found its construct validity to be poor to moderate.
Office of Population Consensus and Surveys Scale.
The Office of Population Consensus and Surveys Scale (OPCS) measures functional capacity based on the World Health Organization (WHO) International Classification of Impairment, Disability and Handicap and was designed for use with disabled people in the community.13 It consists of 13 disability scales containing a total of 108 items. Each disability scale has been weighted so that different disabilities can be compared. An overall “severity of disability” score or category1–10 can be derived from the individual disability scales using a formula. It is a self-report tool (self-administered or by interview). The time taken is not clear, and data analysis requires a computer.
OPCS is stated to be valid (content and face) and reliable.29 It appears to have construct validity and to be responsive to change during rehabilitation, and it is recommended for use with inpatients.29
Amputee Activity Score.
The Amputee Activity Score (AAS) is a measure of function intended for outpatient lower limb amputees fitted with a prosthesis; its use was first reported in 1981. It has eight subscales and 20 items. Each item is either a closed-ended question or is scored on a three-, four-, or five-point ordinal scale rating frequency of participation in an activity. The AAS is a self-report tool (face-to-face interview). The time taken to complete it is unclear, and data analysis is fairly complex, requiring a guide to assist.29 The score is totaled (possible range is unclear); the greater the score, the better the level of activity. The AAS has been found to be reliable and is stated to have content and face validity.
AAS is responsive to change in mobility with rehabilitation and at follow-up, but construct validity is unclear; it is recommended for community practice.29
Functional Measure for Amputees.
The Functional Measure for Amputees (FMA) measures function of lower limb amputees in terms of prosthetic wear, use, and function with a prosthesis. It was modeled using selected elements of the Prosthetic Profile of the Amputee (PPA). It consists of 13 closed-ended questions (48 items) including the LCI. It is scored using a guide, and there is no overall score. The FMA is a self-report tool (face-to-face, telephone, and mail), the time to complete it is unclear, and data analysis requires a computer. It is stated to have face and content validity because it is modeled on the PPA. Construct validity of the LCI has been reported,19 but additional validity testing of the FMA as a whole has not been done.
Test-retest reliability is moderate to good, and 9 of the 13 questions have good to excellent reliability; it is recommended for routine clinical practice.30
The Houghton Scale measures function of lower limb amputees fitted with a prosthesis in terms of wear and use of the prosthesis.19 It consists of four items: the amount of time the prosthesis is used, the manner in which it is used, whether an assistive device is used outside, and the individual's perception of stability while walking outside on a variety of terrain. Each item is scored on a four-point ordinal scale, and the perception of stability questions are binary yes/no answers. The responses are summed easily to give a score from 1 to 12; the scale is designed for self-report (in clinic, by mail), and the time taken to complete it is unclear.17,18
It has content and face validity,17 poor to good construct validity dependent on the comparison measure,17,18 some responsiveness to change (item 4 not on its own),17 some floor and ceiling effects,17,18 good test-retest reliability, and adequate internal consistency.17,18 It is recommended for routine clinical use.17
Prosthetic Profile of the Amputee.
The Prosthetic Profile of the Amputee (PPA) measures function of adult unilateral lower limb amputees (prosthetic users and nonusers) in terms of predisposing, enabling, and facilitating factors related to prosthetic use after discharge from the hospital.19 It contains 38 questions (151 items) with six basic subsections. The questions are predominantly closed-ended using nominal, ordinal, and ratio scales, except for the LCI. There is no composite score for the PPA or the six basic subsections. It is a self-report tool (face-to-face, by telephone). Training is needed for interviews, and there is a guide available for that and for scoring. The time needed to complete the tool is 25 minutes, and data analysis requires a computer. It is available in several languages. The PPA has been reported to have face, content, and construct validity and excellent reliability.19
Problems have been reported with self-administration and patients' understanding of the questions.22 These authors recommended its use for research and interview administration only.
NOT GENERIC, NOT AMPUTEE SPECIFIC
Frenchay Activities Index.
The Frenchay Activities Index (FAI) measures function in terms of participation in social activity and was developed for use with patients who have experienced stroke.13 It has three subscales with 15 items. There is now the FAI 18 with an additional three items. The summary score is derived by adding individual item responses that capture the frequency of participation giving a total score ranging from 0 to 45. It is a self-report instrument in which the respondent considers actual activity during the recent past, not potential activity. Guidelines are available, and data analysis involves simply totaling the scores. The time taken for completion is not clear. It has been found to be valid and reliable for use with patients who have experienced stroke.13,19
The FAI is reported to be reliable (internal consistency and test-retest) but to have poor to moderate construct validity with the lower limb amputee.18 The FAI 18 shows no advantages over the FAI.18
QUALITY OF LIFE
Patient Generated Index.
The Patient Generated Index (PGI) is a measure of quality of life designed to be used with a variety of clinical conditions and now lower limb amputees fitted with a prosthesis. The respondents choose the five most important areas or activities of their life that have been affected by the amputation and its treatment. They rate how badly affected they are in each chosen area on a scale between 0 and 10 and then grade the importance of each area by “spending” 12 points on them. The index is generated by multiplying the ratings by the proportion of points awarded to that area, dividing each by 12, and summing these numbers to give a score between 0 and 10. A higher score indicates a better quality of life. The PGI is designed for self-report or interview; the time taken to complete the tool is not reported; and data analysis is described above and requires a calculator.
The test-retest reliability of the PGI is moderate, and its construct validity varies from poor (relative to Physical Component Summary (PCS) of Short Form 12) to moderate (relative to Mental Component Summary (MCS) of Short Form 12).31 It is not suitable for routine clinical use because some patients encounter difficulties understanding and completing it.31
Short Form 36 and Short Form 12.
The Short Form 36 (SF-36) and its abbreviated version, Short Form 12 (SF-12), are well documented and recognized measures of health-related quality of life in the general population and all rehabilitation, severity of illness, and sociodemographic groups.3,19,27 Both questionnaires have eight subscales with a total of 36 items in the SF-36 and 12 in the SF-12. Some questions are closed-ended, and some are scored on an ordinal scale. Scores are summed, resulting in an individual score for each of the eight subscales and the two domains of physical functioning score and mental functioning score. A higher score indicates better quality of life. Both questionnaires are intended for self-report either by self-administration or interview. The SF-36 takes 10 to 15 minutes and the SF-12 2 minutes to complete. Data analysis takes “a few minutes with a computer.” Computer software is available, and there is a guide for scoring. Both questionnaires have been translated into more than 50 languages.
There are no reported validity or reliability data for use with amputees.4
Sickness Impact Profile.
The Sickness Impact Profile (SIP) is a well-documented and recognized measure of quality of life in functional and behavioral terms in all rehabilitation diagnostic groups for use on completion of rehabilitation.3,19,27 It has 136 items in 12 subscales forming two distinct domains: physical and psychological health. The SIP68 is a shortened version of the SIP and contains 68 items in six subscales. Each item is scored, and the total score is expressed as a percentage of the maximum possible score. It is possible to calculate a global SIP score, a profile of scores across the categories, and a score of the two general domains. The SIP is a self-report instrument (self-administered or by interview), and training is required for the interviewer. It takes 20 to 30 minutes to complete and 5 to 10 minutes using a calculator to analyze the data. The SIP is available in Spanish. It has been published and stated to be valid and reliable but not with amputees.3,19,27 SIP often is considered the gold standard against which other scales are evaluated.3
No data are reported to validate its use with lower limb amputees.32
Attitude to Artificial Limb Questionnaire.
The Attitude to Artificial Limb Questionnaire (AALQ) was specifically designed to measure quality of life of lower limb amputees fitted with a prosthesis.33 It contains 10 items measuring satisfaction with prosthesis, walking ability, attitude of others to them, and restoration of body image. Each item is scored on a five-point ordinal scale measuring the respondent's agreement from “not at all” to “completely.” The total range of possible scores is from 0 to 50, where low scores indicate negative responses. The questionnaire is designed for self-completion or interview (face to face). The time taken to complete the tool is not stated, and data analysis involves simply totaling the scores for each item.
Internal consistency of the AALQ is good,33 but other psychometric properties remain unreported.
Amputation Related Body Image Scale.
The Amputation Related Body Image Scale (ARBIS) measures quality of life of amputees in terms of body image disturbance. It is an 11-item scale reflecting concerns reported by amputees regarding their prostheses, the attitude of others to them, social activity, being an amputee, and having a residual limb. The respondents rate the frequency of each thought or behavior during the last 6 months on a five-point Likert scale, ranging from “never” to “all the time.” It is designed for self-report. The time taken to complete it is unclear, and data analysis involves simply summing the 11 scores.
Internal consistency of the ARBIS is good; it has reported face and content validity and discriminant construct validity when correlated with the Perceived Social Stigma Scale (PSSS).34
Body Image Questionnaire.
The Body Image Questionnaire (BIQ) measures quality of life in terms of body image and was amended from the body image questionnaire used in the treatment of eating disorders to be suitable for use with lower limb amputees fitted with a prostheses. It contains 17 items asking respondents about their feelings with regard to their body shape, the shape of their prostheses, the attitude of others to them, and the impact amputation has had on their social activity. Each item is rated on a six-point ordinal scale measuring the respondent's agreement from “never” to “always.” Total scores range from 17 to 102, and low scores indicate satisfaction with body image. The questionnaire is designed for self-completion or interview (face to face in clinic); the time needed to complete the questionnaire is unclear, and data analysis involves simply totaling the scores for each item. The original questionnaire has known validity and reliability.
The BIQ has good internal consistency33 but requires additional testing to establish other aspects of reliability and validity.
Orthotics and Prosthetics National Outcomes Tool.
The Orthotics and Prosthetics National Outcomes Tool (OPOT) measures three constructs (health-related quality of life, client satisfaction, and functional ability [prosthetist's perception]) and is a new tool developed for use with lower limb amputees.19 The quality-of-life section contains the entire SF-12. The physical functioning and bodily pain sections were expanded to 35 items (24 taken from SF-36 and 11 new added), and all are scored as previously described for the SF-36. The satisfaction section consists of 13 items covering service delivery, prosthetic appearance and comfort, ability to walk, and quality of life, with each response scored from 0 to 100. The prosthetist's perception of functional ability covers three ambulatory abilities (stairs, walking, and use of assistive devices), with each response being scored from 0 to 100. The OPOT is partly a self-report instrument, and the prosthetist's section is based on observed performance. The time taken to complete it is unclear. There is a guide available with coding instructions for the quality-of-life section, and data are analyzed “in a few minutes” using a computer.
The OPOT has good to excellent construct validity, good internal consistency,35 and other aspects of validity, reliability, and responsiveness require additional investigation. It is recommended for research only.35
Prosthesis Evaluation Questionnaire.
The Prosthesis Evaluation Questionnaire (PEQ) measures prosthetic-related quality of life.19 It consists of 82 items grouped into nine subscales. In addition, there are individual questions not contained in the subscales regarding satisfaction, pain, transfers, prosthetic care, self-efficacy, and importance. The PEQ scales are not dependent on each other, so it is reasonable to use only those scales of interest. For each scale used, all of the individual questions that make up that scale must be included.36 The questions refer to the 4 weeks immediately preceding the administration of the instrument and are scored using a visual analogue scale (100-mm line). There is no total score. A guide gives coding instructions for all questions and groups the questions under different subscales. Each subscale score is computed by excluding any unanswered questions and averaging the answers within each subscale. The PEQ is a self-report instrument. The time taken to complete is not reported. It is also available in French. The PEQ has been published and is reported to be valid.19
Additional testing has shown the PEQ to have good reliability (internal consistency and test-retest) and good to excellent construct validity.18,37,38
Perceived Social Stigma Scale.
The Perceived Social Stigma Scale (PSSS) measures quality of life in terms of an amputee's perceived social stigma resulting from loss of limb(s).34 It consists of 22 items derived from a large pool of attributes that embody common negative stereotypes associated with disabled people. Each item is rated on a four-point Likert scale. The total score is obtained by reversing scores on the antonyms and then summing all 22 scores. The PSSS is a self-report questionnaire. The time taken to complete it is unclear, and data analysis is simply summing the scores as described.
The internal consistency of the PSSS is good, and it is reported to have good face and content validity.34 It demonstrates discriminant construct validity (relative to ARBIS); that is, a low correlation of scores confirms each questionnaire is measuring what are thought to be separate psychological factors of body image and perceived social stigma.34
Trinity Amputation and Prosthesis Experience Scales.
The Trinity Amputation and Prosthesis Experience Scales (TAPES) measures the health-related quality of life of lower limb amputees fitted with a prosthesis.39,40 It consists of nine subscales containing 38 items. In addition, there are questions regarding phantom limb pain, residual limb pain, and general health. Each item of the subscales is rated on a three- or five-point ordinal scale, and the greater the score the greater the degree of adjustment, restriction, and satisfaction. The rating of the pain and general health section is unclear. The questionnaire is designed for self-completion (by mail). The time taken to complete it is 5 to 10 minutes. The data analysis process is unclear.
TAPES has good internal consistency, but other aspects of reliability and validity remain largely unreported.39,40 It is reported to be suitable for routine clinical practice and research.39,40
PRESENTATION AND TERMINOLOGY
A multitude of measures currently are being used in this field,4–7 but ironically, there is an absence of advice or guidance on how to select the most appropriate measure. Measures with reported flaws (e.g., the Barthel Index and ceiling effect23) are still being used.
A very recent article on outcome measures2 states that in evidence-based health care, it is important to make judgments about the quality of validity and reliability studies and then to decide whether the measure is appropriate to use in practice. However, in this review we found many of the publications difficult to read and peppered with jargon, potentially limiting the ability of full-time clinicians to make these judgments. This is particularly true of construct validity.27 In addition, we think too much or too complex statistical information is not appealing for clinicians with little research experience. These factors create significant barriers to the implementation of research findings into routine clinical practice.
Specifically on terminology, we sought the opinion of a senior information officer, the manager of the RECAL database, on the practical difficulties encountered in scanning and extracting literature in this field.
“Outcome measures have considerable potential as a mechanism to evaluate quality, improve effectiveness and link practice to professional accountability. In order to realize this potential there is a need for greater clarity and precision in the use of outcomes terminology. Precise language allows shared meaning and increases the power of outcomes” (H. Smart, 2005, personal communication).
PROPERTIES OF OUTCOME MEASURES
Twenty-five outcome measures were listed as primary outcome measures in the 28 included studies. Many of these were also used as secondary or comparative measures. However, 12 were mentioned only once, usually in an article by the developer of the measure.
For the purposes of this review, we grouped the outcome measures into three categories: mobility, function, and quality of life.
MEASURES OF MOBILITY
Two generic measures, the timed walking test14,15 and the timed up and go test,12 are the most widely used and frequently are used as comparative measures. They have been well tested on other patient groups and appear to be valid and reliable measures that are easy to use in a clinical setting. For the elderly, it is suggested that the timed up and go test may be a better measure of overall disability, including balance, because it incorporates sit-to-stand and a turn.12 It would be helpful to obtain some evidence about the most appropriate of these measures for each distinct amputee group (e.g., elderly dysvascular transfemoral and younger unilateral transtibial amputees); however, we think that some form of TWT could be recommended as a gold standard measure of amputee mobility.
In the mobility group, there are four amputee-specific measures: LCI and LCI5,20,21,23,25 the AMPPro,6 SIGAM,24 and Russek's classification.23 The last showed a lack of sensitivity to change and ceiling effects, and we think its use should be discontinued. If comparative measures are included, there were numerous mentions of the LCI. The LCI520 adds an additional category of walking alone with or without aids that appears to reduce the ceiling effect previously noted with the LCI and provides a simple, easy-to-use, and clinically acceptable scale. The AMPPro is a promising instrument, which would be strengthened by additional reliability testing on larger groups. There is one nongeneric, nonamputee-specific measure in this group, the RMI. The reported results are somewhat conflicting, and the authors of one article16 do not recommend its use with amputees.
MEASURES OF FUNCTION
The three generic measures in this group—FIM, Barthel Index, and OPCS—are rated poorly by the authors as unsuitable for clinical or research use.28,29 Panesar et al.29 considered OPCS to be suitable for inpatients, but we think this long and rather complex scale would be difficult to use in routine practice.
In the amputation-specific category, the PPA shows good measurement properties but is long and, in part, difficult to understand.22 Its derivative, the FMA, although reliable and shorter,30 requires additional testing.
The AAS has been found to be valid, reliable, and responsive to change. However, it is somewhat difficult to interpret and recommended only for community-based amputees.29 The Houghton Scale is recommended for clinical use,18 although it shows some floor and ceiling effects.
The FAI and FAI 18 in the nongeneric nonamputee-specific category need additional testing before being recommended for use on amputees.
MEASURE OF QUALITY OF LIFE
Two generic measures, the Short Form 36 (or 12) and SIP (or SIP68), have been widely tested on other patient groups and were generally found to be valid and reliable.4,32 It seems likely they are useful for amputees, but it would be helpful if their properties were properly tested and reported with this group. These two measures are the most frequently cited comparator measures in this review. Of the amputee-specific measures, the PEQ is known to be valid and reliable,37,38 although it is long, and scoring is not entirely straightforward, limiting its usefulness in routine clinical practice. Initial results from the OPOT scale35 suggest good validity and internal consistency, although because of its length, we consider it unappealing for clinical practice. The articles concerned with the remaining quality-of-life scales, namely the TAPES, BIQ, AALQ, PSSS, and ARBIS, contained sketchy evidence of the measurement properties and often were written by the developer of the scale. However, these are new scales, and more evidence may be in the pipeline.
Although at the outset of this review we set ourselves the objective of attempting to rigorously appraise the scientific content of the reviewed articles, we chose not to do this. This was partly pragmatic (lack of time) and partly because of the complexity of the task. Judging the quality of an outcome measure is not simply a matter of reliability and validity but requires qualitative judgments about the totality of evidence for a measure.2 A simple rating is unlikely to capture this complexity.
However, we have several observations to make regarding, broadly speaking, methodological issues:
- The patient population often is poorly described. For example, there is inconsistent use of percentages and actual numbers to describe the population, and there is frequent mixing of populations (e.g., uni/bilateral, vascular/traumatic).
- There can be a lack of clarity between “response” and “completion” rates. Analysis of difference of response/nonresponse groups often is missing. We recommend the book Mail and Telephone Surveys: The Total Design Method by Dillman, which provides excellent clarification on these matters.41
- Many studies were excluded from the review because of small numbers. Other studies included in the review used subgroup analysis during the course of the project, which reduced the initial population, often quite significantly. It is difficult to draw firm conclusions from small group investigations.2
- We sometimes question the appropriateness of the comparative measure(s).
- Some authors concur with our observations that statements such as “an outcome measure is known to be ‘valid' and or ‘reliable',” cannot always be substantiated or justified.2,27
- Conclusions can overstate study findings.
We also noted that the method of delivery sometimes is unclear, the time taken to complete a questionnaire often is missing, and a statement of “easy” or “difficult” to complete is not objective.
Large numbers of outcome measures are in use, and there is no gold standard. There is little agreement regarding which measure to use and when. However, there are some measures in this review that have proven reliability and validity and seem relatively easy to use.
Given the complexity of many of the studies, it is unlikely that clinicians will use the research findings to assist in making an informed choice of outcome measures. They are simply too difficult to read. Terminology is inconsistent and can be confusing; a lexicon of terms is urgently required.
For measuring mobility, the ease and objectivity of a timed walking test is appealing. Specifically for an elderly population, including amputees, a test that incorporates a sit-to-stand and a turn, such as the TUG, seems appropriate. Currently, we believe that the addition of the LCI5 would provide important information on community mobility.
Generic, nonamputee-specific measures of function and quality of life are inappropriate for lower limb amputees. Amputee-specific measures such as the PPA, Houghton Scale, and PEQ have been shown to be valid and reliable tools; however, as with other measures, ease of use remains an issue with some.
The authors acknowledge the generous support of the American Academy of Orthotists and Prosthetists in funding this literature search and review. The authors also thank the RECAL Information Services and secretarial staff at the University of Strathclyde for invaluable assistance in acquiring the data and preparing the text.
1. Fitzpatrick R, Davey C, Buxton MJ, et al. Evaluating patient-based outcome measures
for use in clinical trials. Health Technol Assess
2. Jerosch-Herold C. An evidence-based approach to choosing outcome measures
: a checklist for the critical appraisal of validity, reliability and responsiveness studies. Br J Occup Ther
3. Finch E, Brooks D, Stratford P, et al. Physical Rehabilitation Outcome Measures.
Baltimore: Lippincott Williams & Wilkins; 2002.
4. Pezzin LE, Dillingham TR, Mackenzie EJ. Rehabilitation and the long-term outcomes of persons with trauma-related amputations. Arch Phys Med Rehabil
5. Deathe B, Miller WC, Speechley M. The status of outcome measurement in amputee rehabilitation in Canada. Arch Phys Med Rehabil
6. Gailey RS, Roach KE, Applegate EB, et al. The Amputee Mobility Predictor: an instrument to assess determinants of the lower-limb amputee's ability to ambulate. Arch Phys Med Rehabil
7. Rommers GM, Vos LD, Groothoff JW, et al. Mobility of people with lower limb amputations: scales and questionnaires: a review. Clin Rehabil
8. Brazier J, Deverill M, Green C, et al. A review of the use of health status measures in economic evaluation
. Health Technol Assess
9. Fleiss JL. Statistical Methods for Rates and Proportions
(2nd ed). New York: John Wiley; 1981.
10. Devellis RF. Scale development: theory and applications.
Newbury Park, CA: Sage; 1991.
11. Dawes D. The use of outcome measures
through England, Ireland & Wales. BACPAR Newsletter
12. Schoppen T, Boonstra A, Groothoff JW, et al. The timed “up and go” test: reliability and validity in persons with unilateral lower limb amputation. Arch Phys Med Rehabil
13. Wade DT. Measurement in neurological rehabilitation.
Oxford: Oxford University Press, 1992.
14. Brooks D, Parsons J, Hunter JP, et al. The 2-minute walk test as a measure of functional improvement in persons with lower limb amputation. Arch Phys Med Rehabil
15. Datta D, Ariyaratnam R, Hilton S. Timed walking test – an all-embracing outcome measure for lower-limb amputees? Clin Rehabil
16. Ryall NH, Eyres SB, Neumann VC, et al. Is the Rivermead Mobility Index appropriate to measure mobility in lower limb amputees? Disabil Rehabil
17. Devlin M, Pauley T, Head K, et al. Houghton Scale of prosthetic use in people with lower-extremity amputations: reliability, validity, and responsiveness to change. Arch Phys Med Rehabil
18. Miller WC, Deathe AB, Speechley M. Lower extremity prosthetic mobility: a comparison of 3 self-report scales. Arch Phys Med Rehabil
19. Gauthier-Gagnon C, Grise MC. Tools for outcome measurement in lower limb amputation.
Montreal: University of Montreal, 2001.
20. Franchignoni F, Orlandini D, Ferriero G, et al. Reliability, validity, and responsiveness of the Locomotor Capabilities Index in adults with lower-limb amputation undergoing prosthetic training. Arch Phys Med Rehabil
21. Gauthier-Gagnon C, Grise M-C, Lepage Y. The Locomotor Capabilities Index: content validity.. J Rehabil Outcomes Measurement
22. Streppel KRM, De Vries J, Van Harten WH. Functional status and prosthesis use in amputees, measured with Prosthetic Profile of the Amputee (PPA) and the short version of the Sickness Impact Profile (SIP68). Int J Rehabil Res
23. Treweek SP, Condie ME. Three measures of functional outcome for lower limb amputees: a retrospective review. Prosthet Orthot Int
24. Ryall NH, Eyres SB, Neumann VC, et al. The SIGAM mobility grades: a new population-specific measure for lower limb amputees. Disabil Rehabil
25. Franchignoni F, Brunelli S, Orlandini D, et al. Is the Rivermead Mobility Index a suitable outcome measure in lower limb amputees? A psychometric validation study.. J Rehabil Med
26. Traballesi M, Paolucci S, Lubich S, et al. Non-traumatic above-knee amputation in elderly patients: results of rehabilitation and prognostic factors. Europa Medicophysica
27. Mcdowell I, Newell C. A Guide to Rating Scales and Questionnaires.
Oxford: Oxford University Press; 1996.
28. Leung EC-C, Rush PJ, Devlin M. Predicting prosthetic rehabilitation outcome in lower limb amputee patients with the functional independence measure. Arch Phys Med Rehabil
29. Panesar BS, Morrison P, Hunter J. A comparison of three measures of progress in early lower limb amputee rehabilitation. Clin Rehabil
30. Callaghan BG, Sockalingam S, Treweek SP, et al. A post-discharge functional outcome measure for lower limb amputees: test-retest reliability with transtibial amputees. Prosthet Orthot Int
31. Callaghan BG, Condie ME. A post-discharge quality of life outcome measure for lower limb amputees: test-retest reliability and construct validity. Clin Rehabil
32. Mackenzie EJ, Bosse MJ, Castillo RC, et al. Functional outcomes following trauma-related lower-extremity amputation. J Bone Joint Surg
33. Fisher K, Hanspal R. Body image and patients with amputations: does the prosthesis maintain the balance? Int J Rehabil Res
34. Rybarczyk B, Nyenhuis DL, Nicholas JJ, et al. Body image, perceived social stigma, and the prediction of psychosocial adjustment to leg amputation. Rehabil Psychol
35. Hart DL. Orthotics and Prosthetics National Office Outcomes Tool (OPOT): initial reliability and validity assessment for lower extremity prosthetics. J Prosthet Orthot
36. Guide for the Use of the Prosthesis Evaluation Questionnaire
[Monograph], Seattle: Prosthetics Research Study, 7 pp, 1998.
37. Harness N, Pinzur MS. Health related quality of life in patients with dysvascular transtibial amputation. Clin Orthop
38. Legro MW, Reiber GD, Smith DG, et al. Prosthesis evaluation
questionnaire for persons with lower limb amputations: assessing prosthesis-related quality of life. Arch Phys Med Rehabil
39. Gallagher P, Maclachlan M. Development and psychometric evaluation
of the Trinity Amputation and Prosthesis Experience Scales (TAPES). Rehabil Psychol
40. Gallagher P, Maclachlan M. The Trinity Amputation and Prosthesis Experience Scales and quality of life in people with lower-limb amputation. Arch Phys Med Rehabil
41. Dillman D. Mail and telephone surveys: the total design method.
New York: John Wiley and Sons Inc; 1978.
REFERENCES: INCLUDED STUDIES
REFERENCES: EXCLUDED STUDIES
List of Outcome Measures
AALQ: Attitude to Artificial Limb Questionnaire
AAS: Amputee Activity Score
AMP: Amputee Mobility Predictor
ARBIS: Amputation Related Body Image Scale
BIQ: Body Image Questionnaire
FAI: Frenchay Activities Index
FIM: Functional Independence Measure
FMA: Functional Measure for Amputees
LCI: Locomotor Capabilities Index
OPCS: Office of Population Consensus and Surveys Scale
OPOT: Orthotics and Prosthetics National Outcomes Tool
PEQ: Prosthesis Evaluation Questionnaire
PGI: Patient Generated Index
PPA: Prosthetic Profile of the Amputee
PSSS: Perceived Social Stigma Scale
RMI: Rivermead Mobility Index
Russek's Code (Classification)
SIGAM: Special Interest Group in Amputee Medicine
SF-36, SF-12: Short Form 36, Short Form 12
SIP: Sickness Impact Profile
TAPES: Trinity Amputation and Prosthesis Experience Scales
TUG: Timed Up and Go Test
TWT: Timed Walk Tests