Secondary Logo

Journal Logo


Development, Validity, Reliability, and Responsiveness of a New Leg Ulcer Measurement Tool

Woodbury, M. Gail PhD; Houghton, Pamela E. PhD; Campbell, Karen E. MScN; Keast, David H. MD

Author Information
Advances in Skin & Wound Care: May 2004 - Volume 17 - Issue 4 - p 187-196


Epidemiologic studies performed in the United States and other countries suggest that chronic leg ulcers occur in 1% to 2% of the population. 1,2 Venous insufficiency is the most common underlying cause, occurring in 40% to 50% of patients with lower extremity ulcers. 1,2 Over the course of a year, 7% of people with diabetes will develop foot ulcers. 3 These ulcers often result in lower leg amputation, with serious functional and lifestyle repercussions. 4 Not only are leg ulcers associated with significant human consequences, but they also represent a tremendous financial burden to health care. For example, the cost per healed wound in the United Kingdom was estimated to range from £342 to £6741, depending on the treatment. These estimates were based on 12 published multinational leg ulcer studies involving 842 ulcers. 5 In Canada, the annual cost of home care expenditures for leg ulcers in 1 urban region was estimated to be $1.3 million. 6 In the United States, the cost for treatment of leg ulcers for working-age individuals with diabetes averaged $2687 per patient per year, or $4595 per ulcer per episode. 7

Numerous therapies have been developed over the last 30 to 40 years to accelerate closure of chronic wounds. Assessing the effectiveness of these therapies requires a measurement tool that will describe the current condition of the wound and detect any improvement or deterioration in wound status over time. Many of the recently developed wound assessment tools were designed specifically to evaluate pressure ulcers. Some of these tools include the Pressure Sore Status Tool (PSST), 8 the Pressure Ulcer Scale for Healing (PUSH Tool), 9 the Sussman Wound Healing Tool (SWHT), 10 the Sessing Scale, 11 the Wound Healing Scale (WHS), 12 and the Photographic Wound Assessment Tool (PWAT). 13 Many of these tools have been found to provide reproducible evaluations of pressure ulcers 14,15; however, only the Sessing Scale and PWAT have been shown to detect changes in pressure ulcer status over time. 16

Tools utilized specifically to assess pressure ulcer status do not necessarily provide accurate evaluation of other common types of chronic ulcers, such as diabetic foot ulcers and venous leg ulcers. Assessment tools that describe the severity of diabetic foot ulcers have been developed, 17,18 and a staging system for wound bed preparation applicable to venous ulcers has been proposed. 19 Although these classification systems for lower extremity ulcers may be useful in predicting patient outcomes, such as amputation 17 or complete wound closure, 19 they were neither designed nor validated to detect improvement or deterioration in wound status over time.

Wound care professionals have limited choices when evaluating leg ulcers because of a lack of validated measurement tools specifically designed for these wounds. In practice, many clinicians use pressure ulcer assessment tools designed to measure wounds that are characteristically and morphologically different from venous ulcers. Clinicians who measure wound size and use it as the indicator of change in wound status have similar issues. Therefore, wound care clinicians need evaluative tools designed specifically to assess leg ulcer status and change over time so that they can accurately evaluate the effectiveness of their interventions. The goal of the present study was to develop and validate a Leg Ulcer Measurement Tool (LUMT) that would be used for this purpose.

A review of literature was undertaken to determine items/domains to be included in the LUMT. Existing scales were reviewed, and some elements were incorporated into the initial drafts. The clinical observations and experience of local wound care experts were employed. This resulted in a comprehensive list of items related to leg ulcers and much discussion of the contents to include. It was decided that the tool should include only items that had the potential to change with wound improvement.

A pen and paper instrument was developed. This instrument consists of 14 clinician-rated items and 3 patient- or proxy-rated items (Figure 1). Each item has 5 ordered response categories, coded 0 to 4, with the intervals between responses designed to be equal. The clinician-rated part of the tool can be summed to derive a total score that ranges from 0 to 56. A score of 0 indicates that the wound has closed. After the form was developed, it was pretested on 10 to 15 inpatients of the hospital and 15 to 20 outpatients at the wound management clinic to determine the feasibility of collecting all items and the amount of time needed to collect the data.

Figure 1
Figure 1:
Figure 1
Figure 1:

A panel consisting of local wound care specialists (1 dermatologist, 1 podiatrist, and 9 enterostomal therapy or wound nurses with a combined 128 years of wound experience) assessed content validity. A cover letter and brief questionnaire detailed the specific information required for the study of content validity. This included questions about format, content, and the response categories for each item. The wound care specialists reviewed and discussed the proposed LUMT items and responses with the authors. Changes were made based on comments from the panel. Consensus was reached that all suitable domains were included and that the response options were appropriate—ie, represented characteristics that could change over time, were listed in correct order, and had equal intervals between responses. The responses of the content validity panel members were not quantified; their consensus ensured content validity of the instrument.


Subject recruitment and informed consent

Outpatients of the wound management clinic who had chronic leg ulcers were asked to participate in the evaluation of the LUMT. Persons who were ambulatory, physically able to participate in a half day of testing, and whose leg ulcers included a variety of sizes and etiologies were sought to evaluate the entire LUMT scale. The validation study was fully explained to subjects, who also received a letter giving them information about the study, and informed consent was obtained.

Study design

An initial evaluation day was held so that all subjects could be evaluated and reevaluated on a single day to determine reliability and concurrent criterion validity. The research nurse removed the subjects’ dressings and covered each wound with saline-moistened gauze. Subjects were seated with their legs extended on examination beds while evaluators circulated from one subject to the next according to a predetermined random order, using the LUMT to rate each wound. Measurement of wound surface area was obtained by using an acetate tracing and planimeter, a method that has been validated in all common etiologies of ulcers. 20,21 This was considered the “gold standard” for measurement of wound size.

An assessor, who was blind to the LUMT assessments (ie, not involved in the reliability of the LUMT assessments) obtained and measured the acetate tracings using a planimeter. For concurrent criterion validity, the total LUMT score and scores assigned to the LUMT size item were compared with the surface area tracing. For intrarater reliability, all subjects were rated up to 4 times by the same evaluators according to an ordered schedule. New forms were used for each evaluation in an effort to blind the evaluators to their previous ratings for the subject. For interrater reliability, 4 wound care specialists and 2 inexperienced evaluators who were blind to the ratings of the other evaluators rated all subjects. For all assessments, precautions were taken to avoid cross-contamination of the wounds. Fresh gloves were used for each evaluation; waterless hand cleanser was used between evaluations; and no measurement instruments were transferred between patients.

The expert evaluators all had considerable clinical experience in wound care. They included a family physician, a registered nurse, a physical therapy educator, and a senior level physical therapy graduate student. Two of the specialists were members of the team that developed the LUMT; the other 2 experts were using the LUMT for the first time. The inexperienced raters were physical therapy undergraduate students in the final year of their program. They were educated about leg ulcers and the use of the LUMT in 2 1-hour sessions using photographs.

To evaluate responsiveness of the LUMT, subjects were reevaluated approximately monthly for 4 months. A registered nurse specializing in wound care assessed them using the LUMT and surface area tracings.

Sample size requirement

It was determined that 22 subjects who were measured up to 4 times each were needed to evaluate both intrarater and inter-rater reliability with 0.80 statistical power at the 0.05 significance level. 22 This was based on a value of 0.6 for rho (ρ) for the null hypothesis (Ho), the minimum value for the reliability that the authors would consider acceptable, and a population value for ρ of 0.8. Sample size calculation was not performed for the determination of responsiveness.

Statistical analyses

Concurrent criterion validity was evaluated by correlating the results obtained at baseline for the total LUMT and the size item of the LUMT with the measurement of surface area as determined by acetate tracing. Because the total LUMT consists of many domains or items that are unrelated to wound size, the extent of the relationship between the surface area tracing measurement of size and the total LUMT score was not expected to be strong; the correlation with the LUMT size item was expected to be stronger. In the present study, a Pearson product moment correlation coefficient greater than 0.75 was considered to be sufficient to demonstrate concurrent criterion validity.

The statistical methodology for the reliability evaluation has been described. 23 Reliability was expressed in terms of intraclass correlation coefficients (ICCs). The ICC is a measure of association that indicates the agreement of scores measured by different raters or more than once by the same rater. The ICC describes the variance due to differences among patients divided by the total variance. Variance estimates for the coefficients were derived from two-way analysis of variance (ANOVA). 24 For the present study, it was decided that interrater and intrarater reliability would be expressed in terms of the ICC (2,1), which is based on a random effects ANOVA model and, therefore, is generalizable to other raters and to other ratings by the same raters, respectively. Values of the ICC can vary from 0 to 1, with 1 indicating perfect reliability. Different ranges of the reliability values have been characterized with respect to the degree of agreement they imply. 25,26 Using characterization from Fleiss 26, values of the ICC below 0.40 represent poor agreement, between 0.40 and 0.75 represent fair to good agreement, and values greater than 0.75 may be considered excellent agreement beyond chance. Values greater than 0.75 are considered reliable in the present study.

All raters could not conduct 4 repeated measurements for all subjects due to patient and rater fatigue. Individual estimates of intrarater reliability were determined, therefore, based on the minimum number of repeated measurements per rater obtained for all subjects. From these individual ICC estimates, the mean ICC estimate was calculated.

Absolute reliability was expressed in terms of the standard error of measurement (SEM), which is the square root of the error variance. 27 The formulae for the calculation of SEM for interrater and intrarater reliability have been previously published. 23

Several methods exist for reporting the responsiveness of measurement instruments. In the present study, the LUMT was considered responsive if the responsiveness coefficient 27 was greater than 0.75.



Twenty-two subjects participated in the assessments of concurrent validity and reliability. Characteristics of the subjects at baseline are illustrated in Table 1. Three subjects did not complete all subsequent monthly reevaluations; 1 died and 1 withdrew before the first reevaluation, and 1 was lost to follow-up before the second reevaluation. Therefore, 19 subjects were evaluated to determine responsiveness of the LUMT.

Table 1
Table 1:

Concurrent criterion validity

The correlation coefficient for the relationship between the measurement of surface area, as determined by acetate tracing, and the LUMT total score was r = 0.43, a fair to good correlation. The relationship between the surface area measurement and the LUMT size item yielded a correlation coefficient of r = 0.82, an excellent correlation.


Reliability coefficients derived from repeated measures ANOVA and the measurement error, SEM, are reported in Tables 2 and 3 for the total LUMT and for the individual items. For intrarater reliability, the same mean ICC value for the total LUMT score was obtained for the experienced raters and the inexperienced raters (0.96, considered excellent). Most items had coefficients greater than 0.75. For the experts, the only mean ICC below 0.75 was for leg edema location. For the students, the mean ICCs for edema location, edema type, periulcer skin viability, and granulation tissue amount were below 0.75. Values of the SEM were similar for the experts and students for the total LUMT score (approximately 2.0); most were less than 0.5 for the individual items.

Table 2
Table 2:
Table 3
Table 3:

The interrater reliability coefficients for the total LUMT score were 0.77 and 0.89 for the experienced and the inexperienced raters, respectively; these values are considered excellent. (These ICC [2,1] coefficients are lower than those reported previously due to use of a different coefficient model.) For both experts and students, several individual items had ICCs greater than 0.60 (substantial), others had ICCs greater than 0.41 (moderate), and 3 (periulcer skin viability, leg edema type, and location) had ICCs less than 0.40 (slight to fair).

The SEMs for the total LUMT score for expert raters and for the inexperienced raters were 4.8 and 3.3, respectively. These values, which express the measurement error in the same units as the original instrument, represent error magnitude of approximately the value of 1 LUMT item. Almost all of the individual items had SEM values less than 1.0; many were less than 0.75. Because a value of 1 on a scale from 0 to 4 is 25% of the scale, all these SEM values are acceptable.


The total group of 19 subjects was divided into 3 groups based on the direction of change in surface area tracings: healers (those who had decreased surface area; n=8), nonhealers (those whose wounds remained the same or became larger; n=6), and no change (those whose wounds were almost closed and almost closed for the duration of the study; n=5). Based on the total LUMT score, the responsiveness coefficient was 0.84, after controlling for the baseline LUMT score and dividing the group into healers, nonhealers, and those with no change.

The LUMT scores of these groups are shown in Figure 2.

Figure 2
Figure 2:


The LUMT is the first instrument developed specifically to evaluate leg ulcer appearance. Therefore, there are no criteria against which to evaluate the total LUMT for concurrent criteria validity. Some experts consider surface area tracing to be the best criterion available 28 because of its relationship with actual wound healing and closure of the wound. 29,30 Although surface area tracing provides an appropriate comparison and an excellent relationship with the size item of the LUMT, its moderate relationship with the total LUMT reflects the fact that the total score consists of many items besides wound size.

Although consideration was given to using the Kappa statistic to determine the reliability of the total LUMT and/or the individual items, using the ICC (for continuous data) was appropriate because the intervals between response categories for each item were designed to be equal, thereby producing an interval measurement scale. In addition, the ICC is commonly used and recognized in this area. The ICC (2,1) model was used because the raters were considered to be randomly selected from all possible raters (for interrater reliability) and the ratings of the individual raters were considered to be a random selection of all possible ratings of those raters (for intrarater reliability). As a result, the interrater and intrarater reliability ICCs are generalizable to use of the LUMT by other potential raters.

For the assessment of reliability, the value of the ICC will be higher (closer to 1) when the full spectrum of leg ulcers is represented in the sample. For this reason, the subjects in the sample of the present study were selected to represent wounds that ranged from larger, more extensive wounds (open) to fully healed wounds (closed). The wounds ranged in size from 0 to 19 cm2, with a median of 1.2 cm2. Assuming that subjects were stable over time and that raters could reproduce their measurements, the ICCs should be close to 1.

The ideal situation for assessing the reliability of a measurement tool is to reevaluate the leg ulcers in a short time frame, over which no change in wound appearance is likely to occur. For the present study, successive evaluations by the raters were conducted over a 4-hour period in random order, such that one rater might complete the evaluations within an hour while another rater might take 3½ hours to complete the evaluations.

In some situations, wounds did change over the evaluation period. For example, edema location and type gradually changed in patients whose compression bandages were removed for the assessments. Other ways in which wound appearance could change during the evaluation period include the following: macerated skin appeared to improve when it dried; dermatitis appearance changed; erythema could become less red if it was due to a reaction to adhesive; skin dehydration could go either way; and dry wound beds were debrided when moistened with saline. Therefore, real change occurred in some of the individual LUMT items over the successive evaluations, as reflected by the lower reliability coefficients. Although this was a disadvantage for examining reliability, it suggests that raters could detect these changes in wound appearance.

For the individual items with lower coefficients, real change occurred over approximately 4 hours, during which successive evaluations were made by 6 evaluators. Hindsight indicates that in a population with chronic leg ulcers, the appearance of both the leg and the leg ulcer can change when dressings and/or compression are removed. Therefore, the mechanics of assessing reliability are extremely complicated and require further consideration.

The total LUMT coefficients for interrater reliability are high enough to suggest that more than 1 rater can use the LUMT reliably on successive evaluations. However, having the same evaluator perform successive assessments would reduce measurement error. For intrarater reliability, the high reliability coefficients for total score and individual items suggest that the same rater can use the LUMT reliably on successive evaluations. Most individual items (for each rater or when the mean of raters is considered) had coefficients greater than 0.75.

The sample size determination for the reliability evaluation provided the number of subjects (22) required and the number of repeated measurements (4). Although it was not possible for experienced raters to conduct all the scheduled repeated measurements due to patient and rater fatigue, the inclusion of student ratings maintained the statistical power at 0.80.

Having more subjects for the responsiveness evaluation would have been preferable. Nevertheless, the present study provides preliminary information indicating that the LUMT is responsive to change. One would have greater confidence, however, in an estimate based on larger numbers of subjects.

The similar reliability results obtained by inexperienced and experienced raters indicate the adequacy of 2 1-hour sessions of training using photographs to teach students to assess wounds using the LUMT. Given the higher interrater reliability total score for students, the authors recommend training to improve consistency and interpretation of LUMT criteria and the form. For example, experienced wound care specialists not familiar with the LUMT might benefit from 2 1-hour sessions similar to those used to train student raters.

In the present study, responsiveness was presented using the responsiveness coefficient because its interpretation is similar to that of the reliability coefficients, ie, the more reliable or responsive an instrument is, the closer the value of the coefficient is to 1. The excellent responsiveness coefficient of 0.84 suggests that the LUMT would be able to detect change over time.

Subjects from the outpatient wound clinic were chosen for the heterogeneity needed for the reliability assessment, and they produced heterogeneous results. This was not the best group for assessment of responsiveness because some leg ulcers did not have the same potential to heal. Initially, it had been anticipated that all wounds would improve over time in response to the standard care provided in the wound clinic. This care is consistent with published national best practice principles. 31–34 However, healing was not the goal of care in all cases; for some subjects, the goal was to minimize infection. For this reason, subjects were organized into more homogeneous groups of healers and non-healers to evaluate responsiveness. Responses are more likely to be homogeneous in randomized controlled trial setting in which the sample is chosen based on potential for healing.

The LUMT is easy to use: It takes about 3 minutes to complete after training. Therefore, this tool would be appropriate for use in research and in clinical practice, and it would provide a full and complete description of wound appearance.


Content validity of the LUMT has been demonstrated by the endorsement of the wound care specialist panel. Concurrent criterion validity has been illustrated by the excellent relationship between the LUMT size item and surface area tracings, and by the moderate relationship between the LUMT total score and tracings.

The clinician-rated section of the LUMT is reliable when used by different raters or by the same rater, both experienced and inexperienced. Increased clinical training in the use of the LUMT is required to improve reliability. Responsiveness of the LUMT has been demonstrated by its ability to adequately detect change in leg ulcer appearance over time and to demonstrate a difference between healers and nonhealers.

The LUMT is appropriate for research and clinical use. Further testing of responsiveness is advised.


The authors acknowledge Beverly Phillips, Terri Labate, Wilma Sterling, Abbey Thawer, Brett Lyons, and Vik Chaabra for their assistance with the study.


1. Baker SR, Stacey MC, Jopp-McKay AG, Hoskin SE, Thompson PJ. Epidemiology of chronic venous ulcers. Br J Surg 1991;78:864–7.
2. Lorimer KR, Harrison MB, Graham ID, Friedberg E, Davies B. Assessing venous ulcer population characteristics and practices in a home care community. Ostomy Wound Manage 2003;49(5):32–43.
3. Abbott CA, Vileikyte L, Williamson S, Carrington AL, Boulton AJM. Multicentre study of the incidence of and predictive risk factors for diabetic neuropathic foot ulceration. Diabetes Care 1998;21:1071–5.
4. Armstrong DG, Lavery LA, Harkless LB. Validation of a diabetic wound classification system. The contribution of depth, infection, and ischemia to risk of amputation. Diabetes Care 1998;21:855–9.
5. Harding, K, Cutting K, Price P. The cost-effectiveness of wound management protocols of care. Br J Nurs 2000;9(19Suppl):S5, S8, S10 passim.
6. Friedberg EH, Harrison MB, Graham ID. Current home care expenditures for persons with leg ulcers. J Wound Ostomy Continence Nurs 2002;29:186–92.
7. Holzer SE, Camerota A Martens L, Cuerdon AT, Crystal-Peters J, Zagari M. Costs and duration of care for lower extremity ulcers in people with diabetes. Clin Ther 20:169–81, 1998.
8. Bates-Jensen B. New pressure ulcer status tool. Decubitus 1990;3(3):14–5.
9. Thomas DR, Rodeheaver GT, Bartolucci AA et al. Pressure ulcer scale for healing: derivation and validation of the PUSH tool. Adv Wound Care 1997;10(5):96–101.
10. Sussman C, Swanson G. Utility of the Sussman wound healing tool in predicting wound healing outcomes in physical therapy. Adv Wound Care 1997;10(5):74–7.
11. Ferrell BA, Artinian BM, Sessing D. The Sessing scale for assessment of pressure ulcer healing. J Am Geriatr Soc 1995;43:37–40.
12. Krasner D. Wound healing scale, version 1.0: a proposal. Adv Wound Care 1997;10(5):82–5.
13. Houghton PE, Kincaid CB, Campbell KE, Woodbury MG, Keast DH. Photographic assessment of the appearance of chronic pressure and leg ulcers. Ostomy Wound Manage 2000;46(4):20–30.
14. Thomas DR. Existing tools: are they meeting the challenges of pressure ulcer healing? Adv Wound Care1997;10(5):86–90.
15. Woodbury MG, Houghton PE, Campbell KE, Keast DH. Pressure ulcer assessment instruments: a critical appraisal. Ostomy Wound Manage 1999;45(5):42–55.
16. Houghton PE, Woodbury MG. Assessment of wound and appearance of chronic pressure ulcers. In: Krasner DL, Rodeheaver GT, Sibbald RG, editors. Chronic Wound Care: A Clinical Source Book for Healthcare Professionals. 3rd ed. Wayne, PA: HMP Communications; 2001.
17. Armstrong DG, Lavery LA, Harkless LB. Validation of a diabetic wound classification system: the contribution of depth, infection, and ischemia to risk of amputation. Diabetes Care 1998;21:855–9.
18. Wagner FW Jr. The dysvascular foot: a system for diagnosis and treatment. Foot Ankle 1981;2(2):64–122.
19. Falanga V. Classification for wound bed preparation and stimulation of chronic wounds. Wound Repair Regen 2000;8:347–52.
20. Bohannon RW, Pfaller BA. Documentation of wound surface area from tracings of wound perimeters: clinical report on three techniques. Phys Ther 1983;63:1622–4.
21. Griffin JW, Tolley EA, Tooms RE, Reyes RA, Clifft JK. A comparison of photographic and transparency-based methods for measuring wound surface area. Phys Ther 1993;73:117–22.
22. Donner A, Eliasziw M. Sample size requirements for reliability studies. Stat Med 1987; 6(4):441–8.
23. Eliasziw M, Young SL, Woodbury MG, Fryday-Field K. Statistical methodology for the concurrent assessment of interrater and intrarater reliability: using goniometric measurements as an example. Phys Ther 1994;74(8):89–100.
24. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86(2):420–8.
25. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.
26. Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York: John Wiley & Sons; 1981.
27. Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use. Oxford: Oxford University Press; 1989.
28. Rodeheaver GT, Stotts NA. Session II: Methods for assessing change in pressure ulcer status. Adv Wound Care 8(4):28–34.
29. van Rijswijk L, Multi-center Leg Ulcer Study Group. Full-thickness leg ulcers: patient demographics and predictors of healing. J Fam Pract 1993;36:625–32.
30. Skene AI, Smith JM, Doré CJ, Charlett A, Lewis JD. Venous leg ulcers: a prognostic index to predict time to healing. BMJ 1992;305:1119–21.
31. Sibbald RG, Williamson D, Orsted HL, et al. Preparing the wound bed–debridement, bacterial balance, and moisture balance. Ostomy Wound Manage 2000;46(11):14–35.
32. Dolynchuk K, Keast D, Campbell K, et al. Best practices for the prevention and treatment of pressure ulcers. Ostomy Wound Manage 2000;46(11):38–52.
33. Kunimoto B, Cooling M, Gulliver W, Houghton P, Orsted H, Sibbald RG. Best practices for the prevention and treatment of venous leg ulcers. Ostomy Wound Manage 2001;47(2):34–50.
34. Inlow S, Orsted H, Sibbald RG. Best practices for the prevention, diagnosis and treatment of diabetic foot ulcers. Ostomy Wound Manage 2000;46(11):55–68.
© 2004 Lippincott Williams & Wilkins, Inc.