The appearance of lax skin on the body, whether it is a result of aging or mechanical stretching of the skin after excess weight gain, can be distressing and is a major cosmetic concern for many people. Treatments to tighten the skin were once restricted to surgical procedures, but individuals with mild-to-moderate skin laxity are increasingly seeking noninvasive options that have minimal downtime and no scarring.
The aim of noninvasive skin-tightening procedures is to improve dermal strength and elasticity by remodeling of the dermis with neocollagenesis and elastogenesis. This is achieved by using targeted energy to penetrate varying depths of the dermis, while leaving the outer epidermal layer undamaged.1 A range of noninvasive skin-tightening devices are available and cleared by the US FDA and in Europe. These use energy from a variety of sources including monopolar and bipolar radiofrequency, broadband and laser light sources, ultrasound, and most recently microfocused ultrasound with visualization (MFU-V).2–5
Skin laxity frequently occurs in areas where a large amount of adipose tissue is covered by a relatively thin layer of skin such as the buttocks, thighs, the region above the knee, and the upper arms. In these areas, the skin is subject to the mechanical action of weight exerted by adipose tissue and other subcutaneous structures as well as the effects of gravity. Treatments such as MFU-V have proven effective at improving skin laxity as well as the appearance of cellulite in the buttocks and upper thighs,6,7 but there is currently no validated skin laxity scale for these areas to allow for objective and consistent assessment of treatment outcomes. In this article, the authors present the skin laxity grading scales for the objective quantification of skin laxity severity in the posterior thigh and buttocks, and anterior thigh and knee areas.
Creation of the Skin Laxity Scales
The methods for the development of the skin laxity scales followed the methodology for the creation and validation of other Merz Aesthetic Scales8–11 and are outlined in detail in a related publication in this issue.12 In brief, a photographic database of 120 female subjects representing a range of skin laxity severities in the posterior thighs, buttocks, anterior thighs, and knees was established, and from these images, one was selected to serve as the base image for each treatment area. Additional images were then selected from the database to superimpose varying degrees of skin laxity severity onto the base image for the creation of two 5-point photonumeric scales: Skin Laxity—At Rest for Posterior Thighs and Buttocks, and Skin Laxity—At Rest for Anterior Thighs and Knees (Figure 1). For both scales, all images were obtained in the resting state (a dynamic scale was not produced).
All subjects were photographed by a professional photographer using a Nikon D800 camera/70 to 200 mm lens (Nikon Corporation, Tokyo, Japan). The angle of the lights and distances between the platform, lights, and camera were all standardized and confirmed for each photography session (Figure 2). The areas to be captured covered posterior thighs and buttocks, and the anterior thighs and knees. Images included both posterior and oblique (45° angle) views of both sides and were taken at rest only. Microrelief images were also obtained.
The skin laxity scales differ slightly from other Merz Aesthetic Scales, in that 2 additional reference images were included for each severity grade. These were designed to act as a photo guide to be used alongside the photonumeric scale to aid physicians with the grading process. The reference images covered all Fitzpatrick skin types.
Demographic data were collected for all subjects including age, ethnicity, body mass index (BMI) class, smoking status, Fitzpatrick skin phototypes, and self-reported exposure to sunlight (based on a 5-point rating scale where 0 = never and 4 = very often). All subjects were informed of the objectives and targets of the study and gave consent to their photographs being rated, analyzed, and used in publications for scientific purposes.
Validation of the Skin Laxity Scales
After the creation of the scales, 16 international experts in the field of aesthetics conducted a psychometric validation to assess their validity and reliability. As described for the Cellulite Dimple scales also published in this issue,12 a validation booklet was produced which contained images of 50 subjects displaying all levels of skin laxity in the buttock, posterior and anterior thigh, and knee area, presented alongside the 2 skin laxity grading scales: Skin Laxity—At Rest for Posterior Thighs and Buttocks, and Skin Laxity—At Rest for Anterior Thighs and Knees. As described above, the booklets also contained images of 2 reference subjects alongside each severity grade image to assist with grading. Two sets of validation booklets were produced for each expert, so that ratings could be performed in 2 validation cycles at least 2 weeks apart to reduce any memory effects between sessions.
Subjects' images were presented in a randomized sequence in the booklets for each of the 2 sessions by assigning a page number to the respective subject's image. Raters were blinded with respect to the chosen subjects, subject identifiers, subject characteristics, and subject randomization sequences in the booklets. The ratings of each aesthetic expert were entered directly into the booklet for each scale.
During the validation of the skin laxity scales, one expert inadvertently used an earlier version of the scales for rating the subjects, and the respective ratings were therefore invalid. As a result, rating data for the statistical analysis were only available for 15 aesthetic experts. All rating data from the validation booklets were entered into a database using the double-entry method and subjected to quality control. Rating results from the first and second validation sessions were summarized by descriptive statistics including number of ratings (n), arithmetic mean, SD, median, range, and number of missing values.
Inter-rater and Intra-rater Reliability
The reliability between pairs of aesthetic experts and experts overall (inter-rater reliability) and the reliability between the first and second validation sessions for each aesthetic expert and aesthetic experts overall (intra-rater reliability) were assessed by the intraclass correlation coefficient (ICC) 2.1 of Shrout and Fleiss.13 To account for the fact that the validated scales are ordinal scales and the ICC derivation assumes a quantitative scale, weighted kappa values (Fleiss–Cohen weights) were also derived.14,15 The quality of reliability was defined by the following ICC ranges for interpretation of both inter-rater and intra-rater results16,17: ICC values of 0.00 to 0.20 denote slight reliability, 0.21 to 0.40 fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial, and ≥0.81 almost perfect reliability. Lower ICC values indicate variability in the assessment of subjects (e.g., different ratings for the same subject by the raters). Based on the above classifications, ICC values of >0.60 were considered to demonstrate a high consistency of scale ratings.
Bivariate scatter plots (bubble plots) for validation session 1 versus validation session 2 were also generated for representation of intra-rater reliability for all aesthetic experts combined.
Validity of the scales was explored by means of Spearman correlation coefficients with bias adjustment for subject demographic variables including age, height, weight, BMI, smoking status, Fitzpatrick skin classification, and self-rated level of sun exposure. The correlation coefficients were calculated by validation session for each aesthetic expert and over all aesthetic experts.
All analyses were written, validated, and performed using SAS version 9.3.
For each validation session and each skin laxity scale, there were 750 planned ratings for the 15 aesthetic experts (15 aesthetic experts × 50 subjects rated). A few aesthetic experts did not provide a rating for each subject, but missing data were few (<1%) in both validation sessions. For most experts, the mean duration between the 2 validation sessions was 3 weeks.
All the subjects were women with a mean age of 35.8 ± 13.8 years in the Skin Laxity—Posterior Thighs and Buttocks population and 33.8 ± 13.1 years in the Skin Laxity—Anterior Thighs and Knees population. Mean BMI values were 23.5 ± 4.6 and 23.3 ± 4.3 kg/m2, respectively, and current smokers accounted for 22.0% and 28.0% of participants, respectively. All Fitzpatrick skin Types I to VI were represented across both scales, but the most common was type III (38.0% and 40.0%, respectively). For both skin laxity scale populations, 76% reported exposure to sunlight “seldom,” “seldom to sometimes,” or ”sometimes.”
Of the 16 aesthetic experts (9 women and 7 men), 12 were dermatologists, 3 were plastic surgeons, and 1 was an ophthalmologist.
For the “Skin Laxity—Posterior Thighs and Buttocks” scale, the grading of aesthetic experts at validation session 1 covered all severity scores from grade 4 “very severe” (12.7% of women) to Grade 0 “none” (11.7% of women). For validation session 2, grading ranged between Grade 4 (12.4%) and Grade 0 (9.7%) (Figure 3). In both validations sessions, most women were graded as having mild or moderate skin laxity. Mean ratings were comparable between validation sessions 1 and 2 at 1.9 (SD: 1.18) and 1.9 (SD: 1.25), respectively, indicating mild-to-moderate skin laxity on the posterior thighs and buttocks.
For the “Skin Laxity—Anterior Thighs and Knees” scale, the grading of experts at validation session 1 ranged from Grade 4 “very severe” (15.5% of women) to Grade 0 “none” (13.3% of women) (Figure 3). For validation session 2, grading ranged between Grade 4 (16.0%) to Grade 0 (11.1%). Most women in both validation sessions were graded as having mild or moderate skin laxity. Mean ratings were again comparable for validation sessions 1 and 2 at 1.9 (SD: 1.25) for both sessions, indicating mild-to-moderate skin laxity on the anterior thighs and knees.
The ICC and weighted kappa values for overall inter-rater reliability of the 2 skin laxity scales are presented by validation session in Table 1. Weighted kappa and ICC values for inter-rater reliability in validation sessions 1 and 2 were very similar for Skin Laxity—Posterior Thighs and Buttocks and identical for Skin Laxity—Anterior Thighs and Knees. Overall inter-rater reliability was determined to be substantial (0.61–0.80) at both validation sessions for the Skin Laxity—Posterior Thighs and Buttocks scale. For Skin Laxity—Anterior Thighs and Knees, inter-rater reliability was substantial at validation session 1 and almost perfect (≥0.81) at validation session 2. For both scales, inter-rater reliability was ≥0.7 in both validation sessions.
The ICC and weighted kappa values for intra-rater reliability of the 2 skin laxity scales are presented in Table 2. For the Skin Laxity—Posterior Thighs and Buttocks scale, overall intra-rater reliability was determined to be almost perfect (≥0.81) based on ICC 2.1 and substantial (0.61–0.80) based on the weighted kappa. For the Skin Laxity—Anterior Thighs and Knees scale, intra-rater reliability was almost perfect based on both ICC 2.1 and weighted kappa values. Intra-rater reliability of individual aesthetic experts for the Posterior Thighs and Buttocks and Anterior Thighs and Knees scales ranged from 0.67 to 0.95 and 0.60 to 0.94, respectively.
Bubble plots for all experts pooled, illustrating the frequency of rating combinations between the first and second validation session for the “Skin Laxity—Posterior Thighs and Buttocks” scale, are shown in Figure 4. There were 430 of 742 ratings with perfect agreement and 30 of 742 ratings with a difference of more than 1 grade. The location of the high-frequency ratings on the diagonal line of the bubble plot demonstrates the high intra-rater reliability. The bubble plot for ratings of “Skin Laxity—Anterior Thighs and Knees” shows 436 of 742 ratings with perfect agreement and 25 of 742 ratings with a difference of more than 1 grade (Figure 5). For both scales, the results for the 2 validation sessions were similar.
Validity of the Scales
The Spearman correlations between the skin laxity scale ratings and subject demographic characteristics are shown in Table 3. For the Skin Laxity—Posterior Thighs and Buttocks scale, several positive Spearman correlation coefficients were observed, the strongest was for BMI, followed in descending order by age, weight, and sun exposure. A negative correlation was observed for height. There were no relevant associations for Fitzpatrick skin type or smoking status.
For the Skin Laxity—Anterior Thighs and Knees scale, positive Spearman correlation coefficients were observed for age, BMI, weight, and sun exposure. A negative correlation was again observed for height as well as for Fitzpatrick skin type. There was no relevant association for smoking status.
A high correlation was observed between the 2 Skin Laxity scales and the Cellulite Dimple at Rest and Dynamic scales, which are also published in a related article in this journal issue.12
Women of all ages are increasingly seeking nonsurgical aesthetic treatments to tighten skin in the thighs, knees, and buttocks to achieve a toned body appearance and/or because skin laxity in these areas may stand out in contrast to a smooth appearance in other exposed areas of the body such as the face, hands, neck, and décolletage. This has led to a number of innovations in energy-based devices such as ultrasound, radiofrequency, and infrared laser devices, which can deliver controlled heat to the dermis to stimulate neocollagenesis and improve skin laxity. Microfocused ultrasound with visualization is one modality developed to meet the growing public demand for noninvasive skin-lifting and skin-tightening procedures. To standardize clinical evaluations, quantify results, guide best techniques, and measure the longevity of the treatment effects, assessment tools are required to grade skin laxity both before and after treatments. However, until now, no specifically designed grading scales were available. To address this need, Merz Aesthetics has developed and validated 2 skin laxity grading scales for the posterior thighs and buttocks, and the anterior thighs and knees using accepted criteria for the reporting of reliability studies.18 The scales are intended to be intuitive, easy to use, and reliable outcome evaluation tools for use by aesthetic experts.
Validation of these scales was performed by 15 aesthetic experts who graded real-life photographs against images in a 5-point photonumeric scale over 2 validation sessions. The results showed substantial inter-rater reliability (degree of rating agreement between aesthetic experts) for the Posterior Thighs and Buttocks scale, and substantial to almost perfect inter-rater reliability for the Anterior Thighs and Knees scale, with consistent results between validation sessions. Intra-rater reliability (degree of rating agreement for a single aesthetic expert at different rating sessions) was almost perfect for both scales. Reliability is a useful measure of how consistently values are rated and is a major determinant of a scale's utility and application.18 It is affected by the degree of discrimination between the different scale grades that users are required to make. The skin laxity scales and other Merz Aesthetic scales use a 5-point photonumeric grading system because this has been determined to provide the optimal degree of discrimination between the different severity levels of the aesthetic trait of interest with the highest degree of reliability. This assumption was confirmed in the current study where there was high agreement between aesthetic experts for patients presenting with a range of skin laxity severities.
For evaluation scales to be accurate, they must reflect the target population assessed. Reliability of the scales was demonstrated across a heterogeneous population covering the whole spectrum of age, BMI, Fitzpatrick skin phototypes (I–IV), and skin laxity severities that a physician is likely to encounter in clinical practice. Regardless of differences in skin color, texture, and degree of laxity, the scales have proven to be highly reliable and reproducible.
The robustness of the scales was further demonstrated by the consistency of the ratings among a large group of 15 experts who had received no specific training on their use. The statistical model used in the analysis (ICC 2.1) assumes a random subset of raters from a population of experts in aesthetic medicine. The results can therefore be assumed to be a conservative estimate of inter-rater reliability, and real inter-rater reliability may be even higher.
Skin laxity is an inevitable consequence of aging. Histologic studies of lax skin show dermal atrophy, primarily due to loss of collagen, degradation of elastin fibers, and loss of hydration.19–21 The breakdown is exacerbated by extrinsic factors, such as ultraviolet radiation and excess weight gain. In the current study, the validity of the scales was also explored by means of correlations of the scale grades with subject demographic variables that might influence skin laxity. For both scales, a high correlation was observed between worsening skin laxity and increasing age and BMI. Positive correlations were also observed for weight and level of sun exposure, and for the Anterior Thighs and Knees scale only, there was a negative correlation for Fitzpatrick skin type. Similar to the cellulite dimple severity scales,12 a small negative correlation was found between severity of skin laxity and subject's height, for example, smaller women had less severe skin laxity—probably because the available area is more limited.
There was no correlation between the skin laxity scales and smoking status. A high correlation was observed between the 2 skin laxity scales and the recently developed cellulite dimple scales (the development and validation of these scales is published in an accompanying article in this issue).12 Previous studies have shown that cellulite severity is influenced by skin laxity, particularly in older individuals.22,23 Improving dermal strength and elasticity are therefore also important goals for treatments targeting cellulite dimples.12
To the authors' knowledge, there are currently no other aesthetic scales that have been specifically designed to evaluate skin laxity in the knee, upper thigh, and buttock areas. The cellulite severity scale developed by Hexsel and colleagues24 included a skin laxity component, but only as part of the overall cellulite scale. When used in combination with standardized photographic equipment and parameters, the current skin laxity scales will prove of value in clinical practice as well as for ongoing research. The photographic documentation, without the need for any measurements, is easy to use in clinical practice and allows for rapid and consistent subject assessment. Further studies are now warranted to evaluate the use of the scales for live assessment of subjects rather than from 2-dimensional photographic images. Their use for communicating the success of skin laxity treatments, as well as for establishing a common benchmark for research into treatment, results with energy-based devices and other procedures targeting skin laxity should also be investigated.
The results of this validation study confirm that the newly developed Merz Aesthetics skin laxity grading scales are a reliable and reproducible scoring system for aesthetic evaluation of clinical photographs of skin laxity on the posterior thighs and buttocks, and anterior thighs and knees in conjunction with standardized photographic methods. The scales have been validated using photographs and should be of practical value for assessing live patients; this will be confirmed in clinical trials. Further evaluation in live patients will confirm the ability of the scales to assess clinical outcomes after skin laxity treatment.
1. Zelickson B, Ross EV, Strasswimmer J. Definition and proposed mechanisms of non-invasive skin tightening. In: Alam M, Dover JS, editors. Non-surgical Skin Tightening and Lifting. Philadelphia: Saunders, Elsevier; 2009.
2. Carruthers J, Fabi S, Weiss R. Monopolar radiofrequency for skin tightening: our experience and a review of the literature. Dermatol Surg 2014;40(Suppl 12):S168–73.
3. Loesch MM, Somani AK, Kingsley MM, Travers JB, et al. Skin resurfacing procedures: new and emerging options. Clin Cosmet Investig Dermatol 2014;7:231–41.
4. Alam M, White LE, Martin N, Witherspoon J, et al. Ultrasound tightening of facial and neck skin: a rater-blinded prospective cohort study. J Am Acad Dermatol 2010;62:262–9.
5. Fabi SG, Burgess C, Carruthers A, Carruthers J, et al. Consensus recommendations for combined aesthetic interventions using botulinum toxin, fillers, and microfocused ultrasound in the neck, décolletage, hands, and other areas of the body. Dermatol Surg 2016;42:1199–208.
6. Goldberg DJ, Hornfeldt CS. Safety and efficacy of microfocused ultrasound to lift, tighten, and smooth the buttocks. Dermatol Surg 2014;40:1113–7.
7. Casabona G, Pereira G. Microfocused ultrasound with visualization and calcium hydroxylapatite for improving skin laxity and cellulite appearance. Plast Reconstr Surg Glob Open 2017;5:e1388.
8. Flynn TC, Carruthers A, Carruthers J, Geister TL, et al. Validated assessment scales for the upper face. Dermatol Surg 2012;38:309–19.
9. Geister TL, Bleßmann-Gurk B, Rzany B, Harrington L, et al. Validated assessment scale for platysmal bands. Dermatol Surg 2013;39:1217–25.
10. Landau M, Geister TL, Leibou L, Blessmann-Gurk B, et al. Validated assessment scales for décolleté wrinkling and pigmentation. Dermatol Surg 2016;42:842–52.
11. Rzany B, Carruthers A, Carruthers J, Flynn TC, et al. Validated composite assessment scales for the global face. Dermatol Surg 2012;38:294–308.
12. Hexsel D, Fabi SG, Sattler G, Bartsch R, et al. Validated assessment scales for cellulite dimples on the buttocks and thighs in female patients. Dermatol Surg 2019;45:S2–11.
13. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420–8.
14. Fleiss JL, Cohen J, Everitt B. Large sample standard errors of kappa and weighted kappa. Psychol Bull 1969;72:323.
15. Fleiss JL, Cohen L. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliablilty. Educ Psychol Meas 1973;33:613–9.
16. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.
17. Shrout PE. Measurement reliability and agreement in psychiatry. Stat Methods Med Res 1998;7:301–17.
18. Kottner J, Audigé L, Brorson S, Donner A, et al. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. J Clin Epidemiol 2011;64:96–106.
19. Uitto J. The role of elastin and collagen in cutaneous aging: intrinsic aging versus photoexposure. J Drugs Dermatol 2008;7:s12–6.
20. Nürnberger F, Müller G. So-called cellulite: an invented disease. J Dermatol Surg Oncol 1978;4:221–9.
21. Piérard GE, Nizet JL, Piérard-Franchimont C. Cellulite: from standing fat herniation to hypodermal stretch marks. Am J Dermatopathol 2000;22:34–7.
22. Rosenbaum M, Prieto V, Hellmer J, Boschmann M, et al. An exploratory investigation of the morphology and biochemistry of cellulite. Plast Reconstr Surg 1998;101:1934–9.
23. Stavroulaki A, Pramantiotis G. Cellulite, smoking and angiotensin-converting enzyme (ACE) gene insertion/deletion polymorphism. J Eur Acad Dermatol Venereol 2011;25:1116‒7.
24. Hexsel DM, Dal'forno T, Hexsel CL. A validated photonumeric cellulite severity scale. J Eur Acad Dermatol Venereol 2009;23:523–8.