Secondary Logo

Journal Logo

Development and Validation of a Photonumeric Scale for Evaluation of Volume Deficit of the Hand

Jones, Derek MD; Donofrio, Lisa MD; Hardas, Bhushan MD, MBA; Murphy, Diane K. MBA; Carruthers, Jean MD; Carruthers, Alastair MA, BM, BCh, FRCPC, FRCP(Lon); Sykes, Jonathan M. MD; Creutz, Lela PhD; Marx, Ann MD; Dill, Sara MD

doi: 10.1097/DSS.0000000000000850
Original Article

BACKGROUND A validated scale is needed for objective and reproducible comparisons of hand appearance before and after treatment in practice and clinical studies.

OBJECTIVE To describe the development and validation of the 5-point photonumeric Allergan Hand Volume Deficit Scale.

METHODS The scale was developed to include an assessment guide, verbal descriptors, morphed images, and real-subject images for each grade. The clinical significance of a 1-point score difference was evaluated in a review of image pairs representing varying differences in severity. Interrater and intrarater reliability was evaluated in a live-subject validation study (N = 296) completed during 2 sessions occurring 3 weeks apart.

RESULTS A score difference of ≥1 point was shown to reflect a clinically significant difference (mean [95% confidence interval] absolute score difference, 1.12 [0.99–1.26] for clinically different image pairs and 0.45 [0.33–0.57] for not clinically different pairs). Intrarater agreement between the 2 validation sessions was almost perfect (mean weighted kappa = 0.83). Interrater agreement was almost perfect during the second session (0.82, primary end point).

CONCLUSION The Allergan Hand Volume Deficit Scale is a validated and reliable scale for physician rating of hand volume deficit.

*Division of Dermatology, University of California at Los Angeles, Los Angeles, California;

Department of Dermatology, Yale University School of Medicine, New Haven, Connecticut;

Allergan plc, Irvine, California;

Departments of §Ophthalmology and Visual Sciences, and

Dermatology and Skin Science, University of British Columbia, Vancouver, British Columbia, Canada;

UC Davis Medical Group, Sacramento, California;

#Peloton Advantage, LLC, Parsippany, New Jersey

Address correspondence and reprint requests to: Derek Jones, MD, Skin Care and Laser Physicians of Beverly Hills, 9201 W. Sunset Boulevard, Suite 602, Los Angeles, CA 90069, or e-mail:

Supported by Allergan plc, Dublin, Ireland. Editorial support for this article was provided by Peloton Advantage, Parsippany, New Jersey, and was funded by Allergan plc. The authors received an honorarium for participating in scale development and validation. B. Hardas, A. Marx, and D.K. Murphy are employees of Allergan plc. L. Creutz provided medical writing services at the request of the authors, which was funded by Allergan plc. The remaining authors have indicated no significant interest with commercial supporters.

The opinions expressed in this article are those of the authors.

The authors received no honorarium or other form of financial support related to the development of this article.

With aging, atrophy of the subdermal fat and dermis of the hands can lead to the appearance of prominent bones, tendons, and veins on the dorsum of the hand.1,2 In addition, hands are exposed to high levels of UV solar radiation, which can cause irregular surface pigmentation and thinning of the dermis because of the gradual loss and disorganization of supporting collagen, elastin fibers, and connective tissue.1,3 Other environmental factors (e.g., cigarette smoking)4 and genetics5 may accelerate skin aging. As more patients undergo facial rejuvenation treatments, discrepancies in the appearance of a youthful face and aged hands may become bothersome and reveal a patient's true age.2,6 Accordingly, greater numbers of aesthetically aware patients are seeking hand rejuvenation treatments.

Several treatments are used to restore lost volume and minimize the appearance of veins and tendons in the hand, including injectable hyaluronic acid,6 poly-L-lactic acid,7 and calcium hydroxylapatite8; autologous fat transfer2; vein treatment (sclerotherapy)2; chemical peels2; and laser and light therapies.2 One photonumeric scale has been validated for photographic9 and live-subject10 assessments of the severity of hand aging. Based on photographic assessment, hyaluronic acid was proven effective in the treatment of hand rejuvenation6; live-subject assessments demonstrated sensitivity of the scale for detecting clinically meaningful and aesthetically pleasing changes in hand appearance after treatment with a calcium hydroxylapatite–based dermal filler.11 However, although that scale includes morphed images to represent each scale grade, it does not include representative real-world images or a range of skin types.9,10

This report describes the development and validation of a new photonumeric scale designed to rate the severity of volume deficit in the hands (Allergan Hand Volume Deficit Scale) using a combination of real- and morphed-subject images over a range of Fitzpatrick skin types. The objectives of this study were to determine the clinically significant difference in scale scores and to establish the interrater and intrarater reliability of this scale for rating hand volume deficits in live subjects.

Back to Top | Article Outline


Scale Development

Figure 1 summarizes key steps in the creation and validation of the Allergan Hand Volume Deficit Scale. A 9-member team comprising 5 external members (3 board-certified dermatologists, 1 board-certified facial plastic surgeon, and 1 board-certified oculoplastic surgeon) and 4 Allergan employees (2 dermatologists, 1 plastic surgeon, and 1 clinical scientist) developed the scale from a pool of subject images captured by Canfield Scientific, Inc. (Canfield, Fairfield, NJ). A total of 396 untreated men and women aged 18 years or older with Fitzpatrick skin Types I through VI and in good general health volunteered for image capture. All subjects provided informed photo consent before image collection. Subjects were excluded if they had anything that would interfere with visual assessment of the area of interest. Two-dimensional (2D) images of right hands were obtained using a 2D custom camera system for hand imaging (Hand Device and Nikon D90 SLR). Images of the right hand were cropped from the fingertips to 2 cm proximal to the wrist to ensure that the dorsum of the hand was the primary focus and fully visible.

Figure 1

Figure 1

Scale descriptors were created for each of the 5 grades of the scale (Table 1). Two members of the Allergan team met with each member of the scale development team for preliminary input on each scale grade. After preliminary scale grades were established, all 9 individuals involved in scale creation had a collaborative discussion about the scale grades and descriptors. The wording for each grade was then finalized by the Allergan team.



An assessment guide with a line drawing of anatomic markers demarcating the dorsal hand area from the metacarpophalangeal joints to 1 cm distal to the wrist was created by Canfield based on detailed instructions from the Allergan team regarding anatomic markers (Figure 2). The drawing was then revised by Canfield multiple times after careful review by the Allergan team.

Figure 2

Figure 2

A base image to demonstrate Grade 2 hand volume deficit was selected, and this image was morphed to represent all 5 grades of the scale. A Canfield graphics technician morphed the hand area of interest in the base image to match the descriptors provided for Grades 0, 1, 3, and 4. Alignment of the morphed images with the scale descriptors was achieved through an interactive process with the Allergan team.

A forced ranking review was performed to delineate the range of severity between Grades 2 and 3 and to confirm the selection of the best representative image to be used as Grade 2 on the scale. The 5 external scale developers performed a web-based forced ranking exercise on preselected images that represented the upper and lower boundaries of Grades 2 and 3.

To determine whether there was a clinically significant difference between grades of the scale, the 5 external scale developers were asked to perform an on-line clinical significance review. Multiple image pairs were selected to represent varying degrees of differences in severity (ranging from no difference to a 4-point difference). During the session, the scale developers determined whether there was a clinically significant difference (Yes/No) between images for each pair. After the session, the individual images from all image pairs were randomly mixed in with other images to be used in the morphed image scale validation (described in the following paragraph) and assigned a score by the external scale developers so that score differences between each image in each pair could be calculated.

The morphed image scale was validated by having the 5 external scale developers use the scale to rate randomized images representing all grades of the scale during 2 web-based sessions occurring at least 3 days apart. A total of 293 images were rated (120 images in Session 1 and 173 images in Session 2). The scale had acceptable interrater and intrarater agreement (>0.5), so scale development proceeded using the morphed images.

For both the clinical significance review and the morphed image scale validation review, scale developers were provided uniform hardware by Canfield to complete the reviews. Before the reviews, the external scale developers completed web‐based PowerPoint training to familiarize themselves with the hardware, the review platform, and the purpose of the clinical significance and morphed image validation reviews. The external scale developers were not allowed to discuss the review with one another, and each completed the image review independently.

After the morphed scale was created, 2 subjects' photographs representing each grade of the scale were selected to represent diversity in sex and Fitzpatrick skin type per grade. The final scale includes scale descriptors for each grade, an assessment guide, the morphed images, and the real-subject images (Figure 3).

Figure 3

Figure 3

Back to Top | Article Outline

Scale Validation

The interrater and intrarater reliability of the final scale was evaluated in a live-subject rating validation study. Eight physician raters experienced in using aesthetic photonumeric scales, who were not involved in scale development, participated in two 2-day live validation sessions occurring 3 weeks apart. Before the first live evaluation session, all physician raters were trained on the use of the scale in an interactive group training session using 4 example subjects. Only right hands were rated to align with the hand shown in the scale. Right hands were used because more people are right handed, and the appearance of the dominant hand is usually worse than the nondominant hand. Raters were instructed to rate hands primarily based on tendons rather than veins. The only grade determined by veins is the difference between Grade 0 (no visible tendons or veins) and Grade 1 (no protruding tendons; veins are visible and may be mildly protruding). Raters were also instructed that hands with any tendon showing (excluding metacarpophalangeal joints) should be rated at least Grade 2.

All subjects who qualified for the initial image capture events were invited to attend the live validation sessions. Because the subjects were participating in validation sessions for facial scales on the same day, they were instructed to arrive at the study center clean shaven, to remove make-up and jewelry, to wear dark pants or jeans and a provided black T-shirt, to not drink alcohol excessively before the sessions, to try not to alter their usual routine (e.g., their facial care routine and normal sleep or hydration patterns) between sessions, and to not have tanning sessions or extensive sun exposure between sessions. On arrival at the study center for the first live validation session, subjects signed informed consent and were then assessed for eligibility, age, sex, race (as reported by the subject), and Fitzpatrick skin type (determined by the investigator). Subjects were excluded if they had the following: their photographs included in the scale, anything that would interfere with visual assessment of the hands; any treatment with toxin/fillers, or surgery that would alter hand appearance within 2 weeks of the first evaluation session, or plans to have one of these procedures between the 2 evaluation sessions; or diagnosis of pregnancy. 2D images of each subject's right hand were collected at the first live validation session using a hand device and Nikon D90 SLR camera. The first 5 subjects rated during the first validation session were considered run-in training subjects and were excluded from the analysis.

During the first and second live scale validation sessions, each physician rater evaluated all subjects on all scales (7 additional scales for other anatomic features were evaluated at the same sessions and are reported separately12–18). Raters had separate evaluation stations with an examination lamp, table, and a stool for subject seating, supplies, and the photonumeric scale mounted and displayed for use in subject evaluation. Subjects presented themselves to each rater individually and proceeded from 1 rating station to the next in the same order until evaluated by all 8 raters. Raters were instructed to not discuss ratings with subjects or other raters. The raters took at least a 10-minute break every hour and at least a 30-minute lunch break to avoid rater fatigue.

Back to Top | Article Outline


To determine the utility of the scale grades for detecting clinically significant differences in hand volume deficit, absolute score differences for the image pairs deemed “clinically different” or “not clinically different” during scale development were summarized (mean, SD, range, 95% confidence interval [CI]). For the live scale validation study, intrarater reliability was compared between rounds 1 and 2's scores by calculating weighted kappa scores using Fleiss–Cohen weights.19 Kappa scores within the range of 0.0 to 0.20 indicate slight agreement, 0.21 to 0.40 indicate fair agreement, 0.41 to 0.60 indicate moderate agreement, 0.61 to 0.80 indicate substantial agreement, and 0.81 to 1.00 indicate almost perfect agreement.20 Interrater agreement was measured by determining the intraclass correlation coefficient (ICC[2,1]) and 95% CIs calculated using the formula described by Shrout and Fleiss.21 The a priori primary end point for the interrater agreement analysis was ICC(2,1) for the second rating session. SAS version 9.3 (Cary, NC) was used for all statistical analyses.

Back to Top | Article Outline

Sample Size Considerations

The sample size for the live-subject validation sessions was calculated using the method described by Bonett.22 With up to 10 raters and an ICC of 0.5, a total of 66 subjects were needed for the scale to have a 95% Cl with a width of 0.2 for interrater reliability. Considering potential loss of subjects between the 2 rounds, at least 80 subjects were to be enrolled. Because 296 subjects were eligible for the hand scale validation analysis, the number of subjects evaluated using the scale was substantially larger than the preplanned sample size of 80, and the overall number of assessments for some grades of this scale were larger than those for the other grades. To minimize imbalance in the number of subjects across scale grades and to meet the sample size requirement, the mean score across the 8 raters for each subject was used to assign an overall grade for each subject. A subset of 81 subjects with minimal imbalance across the grades (∼16 subjects per scale for each of the 5 scale grades) was randomly selected from the eligible subjects using a prespecified procedure. This random selection of the subset was performed 20 times. Interrater and intrarater agreements calculated for each of the 20 subsets were combined using SAS procedure PROC MIANALYZE to obtain the overall interrater and intrarater agreements.

Back to Top | Article Outline


Clinical Significance Determination by Scale Developers

The mean (95% CI) absolute difference in scale scores was 1.12 (0.99–1.26) for clinically different image pairs and 0.45 (0.33–0.57) for pairs deemed not clinically different (Table 2). The 95% CIs for the pairs deemed to be clinically different did not overlap with the CIs for the pairs deemed not clinically different, confirming that a 1-point difference in scores is clinically significant.



Back to Top | Article Outline

Live-Subject Scale Validation

Of the 296 subjects eligible for scale validation analysis, 288 subjects were selected in at least 1 of the 20 random subsets for analysis of intrarater and interrater agreement. Demographic characteristics of subjects in the final scale validation set are shown in Table 3. Most subjects were women (67%), Caucasian (79%), and had Fitzpatrick skin Type III (26%) or IV (33%). Median age was 48 years, and a broad span of age was represented (range: 18–83 years).



Intrarater agreement between the 2 live-subject rating sessions was almost perfect (mean weighted kappa = 0.83) (Table 4). Interrater agreement was substantial (ICC = 0.78) during the first rating session and almost perfect (ICC = 0.82) during the second rating session (primary end point) (Table 4).



Back to Top | Article Outline


This study demonstrated substantial to almost perfect interrater and intrarater agreement for the Allergan Hand Volume Deficit Scale, suggesting that multiple assessments for the same subject and across different raters are reliable. A 1-point difference in ratings was shown to reflect clinically significant differences, indicating that the scale has sufficient sensitivity for detecting clinically significant changes in volume deficit of the hands.

This scale assesses volume deficit on the dorsum of the hands, an area for which patients seek aesthetic treatment. The scale includes verbal descriptors for each grade and a diagram delineating the hand area of interest. These factors likely contributed to the high interrater reliability and may translate to ease of use by clinicians. The use of morphed images to represent each grade helps to focus the rater's attention on the change from 1 grade to the next, as all other features remain constant across scale grades. The inclusion of real-world images representing a diverse range of skin types across sexes and races is also important, as morphed images may not always translate clinically to the broad array of physical appearances or physical changes observed in the aging hand. The scale ratings do not take into consideration the appearance of skin discoloration because the Allergan Hand Volume Deficit Scale was designed to rate only the severity of hand volume loss, which may be treated with filler treatments. When using the scale, each hand should be rated separately, as volume loss in the left and right hands may differ in individual patients because of increased use of the dominant hand.

The Merz Hand Grading Scale (MHGS) has been validated for photographic and live assessment of hands. In a randomized blinded study, 3 physician raters used the MHGS to rate the hands of 84 live subjects.10 The study demonstrated overall intrarater reliability (weighted kappa) of 0.74 and interrater reliability (kappa) ranging from 0.59 to 0.71. In this study, the intrarater agreement was 0.83 (weighted kappa) and the interrater agreement was 0.82 (ICC) for the Allergan Hand Volume Deficit Scale.

In the authors' experience, some patients present with hand aging as an isolated concern, but it is much more common for patients to have had therapeutic improvement in facial appearance and present with concerns about the incongruities between their aged hands and their less aged face. Their response to treatment is generally positive if they have been appropriately informed regarding the potential degree of improvement and possible side effects. The use of a validated scale for formalized and reproducible consultation procedures can help to prepare patients for potential treatment outcomes and may thus improve patient satisfaction.23

Back to Top | Article Outline

Study Limitations

The verbal descriptors for each grade of the Allergan Hand Volume Deficit Scale are subjective; however, the descriptors were developed and refined by extensive feedback between 9 experts, minimizing inherent subjectivity. The clinical significance of scale scores was determined solely by the scale developers; although a 1-point change on the scale was considered significant to the scale developers, it may or may not be meaningful to patients. A less than 1-point change may be meaningful to patients desiring a subtle change, whereas other patients may perceive only dramatic changes as meaningful; hence, this scale is not recommended for patient's self-assessment of meaningful improvement. The Michigan Hand Outcomes Questionnaire has an aesthetics subscale for the assessment of patient satisfaction with hand appearance and may be helpful for assessing patient satisfaction before and after hand treatment.24

Back to Top | Article Outline


The Allergan Hand Volume Deficit Scale demonstrated almost perfect interrater and intrarater agreement among physicians, and 1-point score differences were shown to reflect clinically significant differences in hand volume deficit. This volume deficit scale includes user-friendly diagrams, detailed verbal descriptions, and morphed- and real-subject images representative across sexes and skin types. The scale's standardized ratings may be uniformly applied in day-to-day clinical practice and potentially in clinical trials because of its validation in live subjects and use of both morphed and unaltered images.

Back to Top | Article Outline


The authors thank the following physicians for completing the scale validation study: David E. Bank, MD, FAAD; Sue Ellen Cox, MD; Timothy M. Greco, MD, FACS; Z. Paul Lorenc, MD, FACS; David J. Narins, MD, PC, FACS; William B. Nolan, MD; Robert A. Weiss, MD; and Margaret Weiss, MD. Statistical support was provided by Yijun Sun, PhD, and Shraddha Mehta, PhD of Allergan plc, Irvine, CA.

Back to Top | Article Outline


1. Shamban AT. Combination hand rejuvenation procedures. Aesthet Surg J 2009;29:409–13.
2. Fabi SG, Goldman MP. Hand rejuvenation: a review and our experience. Dermatol Surg 2012;38:1112–27.
3. Fisher GJ, Varani J, Voorhees JJ. Looking older: fibroblast collapse and therapeutic implications. Arch Dermatol 2008;144:666–72.
4. Freiman A, Bird G, Metelitsa AI, Barankin B, et al. Cutaneous effects of smoking. J Cutan Med Surg 2004;8:415–23.
5. Monnat RJ Jr. “…Rewritten in the skin”: clues to skin biology and aging from inherited disease. J Invest Dermatol 2015;135:1484–90.
6. Dallara JM. A prospective, noninterventional study of the treatment of the aging hand with Juvederm Ultra 3 and Juvederm Hydrate. Aesthet Plast Surg 2012;36:949–54.
7. Rendon MI, Cardona LM, Pinzon-Plazas M. Treatment of the aged hand with injectable poly-l-lactic acid. J Cosmet Laser Ther 2010;12:284–7.
8. Sadick NS. A 52-week study of safety and efficacy of calcium hydroxylapatite for rejuvenation of the aging hand. J Drugs Dermatol 2011;10:47–51.
9. Carruthers A, Carruthers J, Hardas B, Kaur M, et al. A validated hand grading scale. Dermatol Surg 2008;34(Suppl 2):S179–S83.
10. Cohen JL, Carruthers A, Jones DH, Narurkar VA, et al. A randomized, blinded study to validate the Merz Hand Grading Scale for use in live assessments. Dermatol Surg 2015;41(Suppl 1):S384–S8.
11. Bertucci V, Solish N, Wong M, Howell M. Evaluation of the Merz hand grading scale after calcium hydroxylapatite hand treatment. Dermatol Surg 2015;41(Suppl 1):S389–S96.
12. Carruthers J, Jones D, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of volume deficit of the temple. Dermatol Surg 2016;42(Suppl 10):S203–10.
13. Sykes JM, Carruthers A, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for assessment of chin retrusion. Dermatol Surg 2016;42(Suppl 10):S211–18.
14. Donofrio L, Carruthers A, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of facial skin texture. Dermatol Surg 2016;42(Suppl 10):S219–26.
15. Carruthers J, Donofrio L, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of facial fine lines. Dermatol Surg 2016;42(Suppl 10):S227–34.
16. Jones D, Carruthers A, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of transverse neck lines. Dermatol Surg 2016;42(Suppl 10):S235–42.
17. Carruthers A, Donofrio L, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of static horizontal forehead lines. Dermatol Surg 2016;42(Suppl 10):S243–50.
18. Donofrio L, Carruthers J, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of infraorbital hollows. Dermatol Surg 2016;42(Suppl 10):S251–58.
19. Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measure of reliability. Educ Psychol Meas 1973;33:613–9.
20. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.
21. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420–8.
22. Bonett DG. Sample size requirements for estimating intraclass correlations with desired precision. Stat Med 2002;21:1331–5.
23. Jandhyala R. Improving consent procedures and evaluation of treatment success in cosmetic use of incobotulinumtoxinA: an assessment of the treat-to-goal approach. J Drugs Dermatol 2013;12:72–80.
24. Chung KC, Pillsbury MS, Walters MR, Hayward RA. Reliability and validity testing of the Michigan hand outcomes Questionnaire. J Hand Surg Am 1998;23:575–87.
© 2016 by the American Society for Dermatologic Surgery, Inc. Published by Wolters Kluwer Health, Inc. All rights reserved.