Cellulite is the term used to describe the uneven, dimpled skin that typically appears on the surface of the thighs and buttocks and which is estimated to affect between 80% and 90% of women at some point during their lives.1 While not a pathologic condition, it is an issue of cosmetic concern for many women.2 Women are believed to be particularly susceptible to the condition because the fibrous septae in the subcutaneous adipose tissue are oriented perpendicularly in relation to the skin surface.3,4 Between these fibrous strands, fat is stored in large globular adipocytes. It is believed that increased tension in the fibrous septae as a result of either expansion of the fat cells or shortening of the septae due to connective tissue changes, such as trauma, leads to retraction at their cutaneous insertion points causing the typical cellulite dimples.5,6 The raised areas between the dimples represent the projection of underlying adipocytes.7 In men, altered fat distribution and a crisscross rather than perpendicular organization of the septae make the development of cellulite much less likely.3 The likelihood of cellulite developing is increased by a number of factors including a predisposing genetic background, hormonal changes or imbalances, impaired microcirculation, medications that cause water retention, a sedentary lifestyle, unhealthy eating habits, and Caucasian ethnic background.8–10 Cellulite appearance is also worsened by age-associated skin laxity.11–13
In recent years, a better understanding of the etiology of cellulite has led to the development of new treatment approaches that target the underlying cause of the condition.6,14,15 As new pharmacological and technological medical advances reach the market, reliable and specific methods of cellulite assessment become necessary to identify subjects appropriate for therapy and to measure treatment outcomes. Currently available scales do not meet this need16,17 because they are not specific for cellulite dimples and because they are time-consuming for use in daily clinical practice. In this article, the authors present the cellulite dimple grading scales for the objective quantification of the severity of cellulite dimples in both static (relaxed or “at rest”) and dynamic states, as well as the validity and reliability of these photonumeric scales.
Subject Selection and Photographic Imaging
A photographic database of the buttocks and thighs of 120 female subjects was established to provide representative images across the complete spectrum of cellulite dimple severity. The women were aged 18 to 65 years with a body mass index (BMI) in the range 18 to 42 kg/m2, Fitzpatrick skin Types I to VI, and even cellulite contour irregularities on both sides. Individuals were excluded if they had any dermatosis, scarring, or tattoos on the buttock or thigh area or if they had received any previous aesthetic treatments or procedures in these areas. Subject demographic data were collected including age, ethnicity, body mass index (BMI) class, smoking status, Fitzpatrick skin phototypes, and self-reported exposure to sunlight (based on a 5-point rating scale where 0 = never and 4 = very often). All subjects were informed of the objectives and targets of the study and gave consent to their photographs being rated, analyzed, and used in publications for scientific purposes.
All subjects were photographed by a professional photographer using a Nikon D800 camera/70- to 200-mm lens (Nikon Corporation, Tokyo, Japan). Photographs were standardized as to framing, lighting, and subject orientation. The angle of the lights and distances between the platform, lights, and camera were all standardized and confirmed for each photography session. The area to be captured covered the buttocks and the upper thighs up to about 8 to 10 cm below the gluteal crease (infragluteal sulcus). Images included both posterior and oblique (45° angle) views of both sides and were taken at rest and with maximum contraction of the musculus gluteus maximus (dynamic state). A microrelief image was also obtained.
Creation of the Cellulite Dimple Scales
The process of scale creation followed the methodology used for the creation of the other Merz Aesthetic Scales.18–21 In brief, the subjects' images were screened, and one subject's image was chosen as the base image for scale creation. Additional images were then selected from the photographic database to superimpose varying degrees of cellulite dimple severity onto the base image to create composite computer-generated images for the cellulite dimple scale. The software used to produce the superimposed images was Adobe Photoshop. Several versions were reviewed with aesthetic experts/physicians and improved stepwise until a final version was acknowledged and agreed upon for validation. Photographs used for creation of the assessment scales could not be used for the validation process. The final rating scale was a 5-point cellulite severity scale with a score ranging from 0 to 4 (Figure 1). The cellulite dimple scales differ slightly from other Merz Aesthetic scales in that 2 additional reference images were included for each severity grade. These were designed to act as a photo guide to be used alongside the photonumeric scale to aid physicians with the grading process. The reference images covered all Fitzpatrick skin types.
Validation of the Cellulite Dimple Scales
After the creation of the scales, a psychometric validation was conducted to determine their validity and reliability for assessment of photos of cellulite severity and to determine whether they would be appropriate measurement tools for use in clinical practice. The validation was performed by 16 international experts in the field of aesthetics. The experts each rated a validation booklet containing images of 50 subjects displaying all cellulite dimple severity grades presented alongside the 2 cellulite dimple grading scales: Cellulite Dimples—At Rest, and Cellulite Dimples—Dynamic. The booklets contained different sets of 50 images, so that those used in the “At Rest” booklet could not also be used in the “Dynamic” booklet. The booklets were designed in an A4 landscape, double-page, spiral-bound format, and each had unique identifiers (raters name and randomization number). Experts were given a general introduction on the procedure and methods for validation, but no specific training. Two sets of validation booklets were produced for each expert, so that ratings could be performed in 2 validation cycles at least 2 weeks apart.
Subjects' images were presented in a randomized sequence in the booklets for each of the 2 sessions by assigning a page number to the respective subject's image. Raters were blinded with respect to the chosen subjects, subject identifiers, subject characteristics, and subject randomization sequences in the booklets. The ratings of each aesthetic expert were entered directly into the booklet for each scale.
All rating data from the validation booklets underwent a double-data entry by independent and qualified professionals followed by data entry verification. Rating results from the first and second validation sessions were summarized by descriptive statistics including number of ratings (n), arithmetic mean, SD, median, range, and number of missing values.
Inter-rater and Intra-rater Reliability
For both scales, the reliability between the aesthetic experts (inter-rater reliability) and the reliability between the first and second validation sessions for each aesthetic expert (intra-rater reliability) were assessed by the intraclass correlation coefficient (ICC) 2.1 of Shrout and Fleiss.22 To account for the fact that the validated scales are ordinal scales and the ICC derivation assumes a quantitative scale, weighted kappa values (Fleiss-Cohen weights) were also derived.23,24 Intraclass correlation coefficient 2.1 assumes that all subjects are rated by the same aesthetic experts, who are assumed to be a random subset of all possible aesthetic experts. The following ICC ranges were used for interpretation of both inter-rater and intra-rater results25,26: 0.00 to 0.20 slight, 0.21 to 0.40 fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial, and ≥0.81 almost perfect. Lower ICC values indicate variability in the assessment of subjects (e.g., different ratings for the same subject by the raters). Based on the above classifications, ICC values of >0.60 were considered to demonstrate a high consistency of scale ratings.
Bivariate scatter plots (bubble plots) for validation session 1 versus validation session 2 were also generated for representation of intra-rater reliability.
Validity of the scales was explored by means of Spearman correlation coefficients with bias adjustment of at rest and dynamic scale ratings against subject demographic variables including age, height, weight, BMI, smoking status, Fitzpatrick skin classification, and self-rated level of sun exposure. Correlation of the cellulite dimple scales with the skin laxity severity scales27 was also determined. The correlation coefficients were calculated by validation session for each aesthetic expert and over all aesthetic experts. In addition, the Spearman correlation coefficients with bias adjustment between the at rest and dynamic outcome measures were calculated by validation session over all aesthetic experts.
All analyses were written, validated, and performed using SAS version 9.3.
For each validation session and each cellulite scale, there were 800 planned ratings (16 aesthetic experts × 50 subjects rated). For most experts, there was a duration of 3 to 4 weeks between the 2 validation sessions. A few aesthetic experts did not provide a rating for each subject, but missing data were few (<1%) in both validation sessions.
All the subjects were women with a mean age of 33.2 ± 12.3 years in the Cellulite Dimples—At Rest population and 34.0 ± 13.9 years in the Cellulite Dimples—Dynamic population. Mean BMI values were 23.5 ± 4.6 kg/m2 and 23.1 ± 4.4 kg/m2, respectively. All Fitzpatrick skin Types (I–VI) were represented, but the most frequent was Fitzpatrick skin Type III. Exposure to sunlight “seldom,” “seldom to sometimes,” or “sometimes” was reported by 78% and 84% of women, respectively, and 22% and 24%, respectively, were current smokers.
Of the 16 aesthetic experts (9 women and 7 men), 12 were dermatologists, 3 were plastic surgeons, and 1 was an ophthalmologist.
For the “Cellulite Dimples—At Rest” scale, the grading of aesthetic experts at validation session 1 covered all severity scores from Grade 4 “very severe” (12.3% of women) to Grade 1 “mild” (34.4% of women); 15% had no dimples. For validation session 2, grading ranged between Grade 4 (9.0%) and Grade 1 (38.6%); 11.6% had no dimples. Mean ratings were comparable between validation sessions 1 and 2 at 1.8 (SD: 1.26) and 1.7 (SD: 1.15), respectively, indicating mild-to-moderate cellulite dimples.
For the “Cellulite Dimples—Dynamic” scale, the grading of experts at validation session 1 ranged from Grade 4 “moderate” (11.5% of women) to Grade 1 “mild” (20.4% of women); 5.8% had no dimples. For validation session 2, grading ranged between Grade 4 (12.5%) and Grade 1 (20.8%); 7.1% had no dimples. Mean ratings were again comparable for validation sessions 1 and 2 at 2.2 (SD: 1.07) and 2.2 (SD: 1.11), respectively, indicating moderate cellulite dimples.
The ICC and weighted kappa values for overall inter-rater reliability of the 2 cellulite dimple scales are presented by validation session in Table 1. Weighted kappa and ICC values for inter-rater reliability were very similar and showed qualitatively the same results. Overall inter-rater reliability was determined to be almost perfect (≥0.81) at both validation sessions for the Cellulite Dimples—At Rest scale and substantial (0.61–0.80) at both validation sessions for the Cellulite Dimples—Dynamic scale. For both scales, inter-rater reliabilities were slightly higher in validation session 1 compared with session 2.
The ICC and weighted kappa values for intra-rater reliability of the 2 cellulite dimple scales are presented in Table 2. Overall intra-rater reliability was determined to be almost perfect (≥0.81) for the Cellulite Dimples—At Rest scale and substantial (0.61–0.80) for the Cellulite Dimples—Dynamic scale. Intra-rater reliability of individual aesthetic experts for the At Rest and Dynamic scales ranged from 0.69 to 0.93 and 0.57 to 0.89, respectively. For Cellulite Dimples—At Rest, intra-rater reliability was ≥0.70 for all experts. For Cellulite Dimples—Dynamic, intra-rater reliability was ≥0.70 in 87.5% of experts and ≥0.60 in 93.8% of experts. With such large numbers of experts, individual reliability comparisons are expected to sometimes vary by chance, but the majority of the reliability estimates indicated at least substantial reliability.
A bubble plot for all experts pooled, illustrating the frequency of rating combinations between the first and second validation session for the Cellulite Dimples—At Rest scale, is shown in Figure 2. There were 477 of 793 ratings with perfect agreement and 24 of 793 ratings with a difference of more than 1 grade. The location of the high-frequency ratings on the diagonal line of the bubble plot demonstrates the high intra-rater reliability. The bubble plot for ratings of “Cellulite Dimples—Dynamic” shows 425 of 794 ratings with perfect agreement and 16 of 794 ratings with a difference of more than 1 grade (Figure 3).
Validity of the Scales
Relevant Spearman correlations between the cellulite dimple scale ratings and subject demographic characteristics are shown in Table 3. For both scales, positive Spearman correlation coefficients were observed for BMI, age, and weight, and a negative correlation was observed for height. There was also a high correlation between the 2 cellulite dimple scales and the recently released skin laxity scales for the buttock, thigh, and knee area, which are being published in an accompanying article in this issue27 (Table 4).
The results of this validation study demonstrate that the newly developed Merz Aesthetics cellulite dimple grading scales are a reliable and reproducible scoring system for aesthetic evaluation of cellulite dimples on the buttocks and thighs. The scales provide 5-point photonumeric assessments with photo guides of cellulite severity at rest and in a dynamic state.
For evaluation scales to be accurate, they must reflect the target population assessed. The subjects included in this study represented the whole spectrum of cellulite severity grades and covered a large age range, BMI levels, as well as all Fitzpatrick skin types. The scales were therefore evaluated across all cellulite severity grades in a heterogeneous population similar to what a physician might encounter in clinical practice. To further reflect clinical practice, no specific detailed training in advance of the validations was performed to gauge whether the newly created scales are robust assessment tools for general use.
The reproducibility of the scales for the assessment of cellulite dimple severity was based on the evaluation of 50 sets of photographs by 16 aesthetic experts on 2 separate occasions separated by at least 2 weeks. The 2-week interval between validation sessions was implemented to eliminate recall bias by the raters. For most raters, the second validation session took place 3 to 4 weeks after the first, further reducing the potential of any memory effects between sessions.
Data recorded on a rating scale are the subjective judgment of the rater, and the generality of a set of ratings is therefore a concern. For the scales to be of use, it is important to demonstrate that the obtained ratings are not peculiar to one rater's subjective judgment, but representative of a group of raters as a whole. Knowledge of inter-rater reliability is therefore crucial when evaluating the generality of a set of ratings as it represents the extent to which the different aesthetic experts tend to make exactly the same judgments about the rated subject. For representation of cellulite dimples at rest, there was almost perfect inter-rater reliability (ICC ≥0.81) in both validation sessions. The weighted kappa coefficients were approximately equivalent to the ICC 2.1 values, confirming the high inter-rater reliability. For most aesthetic scales, dynamic representation is the most consistent and reliable method of assessing the severity of an aesthetic trait. In contrast to the “at rest” scale, inter-rater reliability for cellulite severity in the dynamic state did not reach perfect agreement, but it was still substantial (ICC 0.61–0.80). This may suggest that cellulite severity is more difficult to rate in a dynamic state and that both scales should be used when evaluating a patient to ensure accurate grading. For both scales, inter-rater reliabilities were slightly higher in validation session 1 compared with session 2, but in both sessions, reliability was nevertheless almost perfect for cellulite dimples at rest and substantial for cellulite dimples dynamic. Similar small variations in inter-rater reliability between validation sessions are not unusual and have also been observed with other aesthetic scales.18,28,29 It is likely that had the raters been trained in the use of the scales before validation, even higher consistency between raters in the 2 validation sessions would have been observed.
Reliability between the first and second validation sessions for the same aesthetic expert (intra-rater reliability) also showed almost perfect intra-rater ICC values (≥0.81) for the Cellulite Dimples—At Rest scale. For the Cellulite Dimples—Dynamic scale, intra-rater reliability was substantial (0.61–0.80) overall. With such large numbers of experts, individual reliability comparisons are expected to sometimes vary by chance, but the majority of the reliability estimates indicated at least substantial reliability.
Validity of the cellulite dimple scale scores was also explored by means of correlations with the scales themselves and other variables that might be expected to influence cellulite severity. There was a high correlation between the 2 cellulite scales themselves, and with a separate scale assessing skin laxity in the buttock and thigh region.27 Other factors with a high correlation with cellulite dimples were BMI, followed by age and weight, supporting the concept that while not causal, cellulite may be worsened by aging and weight gain. Fitzpatrick skin type, sun exposure, and smoking status were found to have no influence on cellulite severity.
The cellulite dimple scales have been specifically developed as a tool to assist physicians offering treatments that target cellulite dimples. While they can also be used to give an overall impression of cellulite severity, they cannot be generalized to all cellulite-related deformities. Cellulite can also be influenced by skin laxity, particularly in older individuals,11,12 and a separate publication in this issue details the development of a new skin laxity scale for the buttock and thigh area that can be used in conjunction with the dimple scale when assessing cellulite severity and deciding on the best treatment options.27 The cellulite dimple scales differ from other cellulite severity scales16,17 available in the literature in their specificity for cellulite dimples and in their simplicity. The Nürnberger and Müller16 classification was developed in 1978 and has 4 severity grades. It is based on observations both at rest and in a dynamic state. Hexsel and colleagues17 included the Nürnberger and Müller classification in the Cellulite Severity Scale, which also comprises the 4 most important clinical features of cellulite (number of evident depressions, depth of depressions, morphological appearance of skin surface alterations, and grade of skin laxity). The severity of each of the 5 scale items is graded from 0 to 3, allowing a final sum of scores that range numerically from 1 to 15. Based on the final numeric score, cellulite is classified as mild, moderate, or severe.17 The Hexsel and colleagues' scale was created to provide an objective method of measuring cellulite severity based on their main characteristics and to guide the choice of different treatment modalities, but the 5 different scale items can be time-consuming to assess. Although it lacks some sensitivity,30 this was the first attempt to a better clinical evaluation of skin laxity.
The cellulite dimple scale was developed on the premise that treatments that target cellulite dimples act in a separate manner to those that target skin laxity. While both improve cellulite severity, their differing mechanisms of action necessitate separate validation scales. The current study validates the cellulite severity scale for photographic assessment of cellulite and confirms the validity of the scales to objectively rate cellulite severity regarding its depressed lesions. Further studies are now warranted that investigate the accuracy of the scales for measuring the efficacy of treatments that target cellulite dimples such as the US FDA-cleared Cellfina System, a minimally invasive procedure that involves subcision of the fibrous septae and results in improvements in the appearance of cellulite on the buttocks and thighs with no loss of benefit for up to 3 years,15,31 the Cellulaze laser-based treatment for the release of fibrous septae,32 and the manual subcision for cellulite, which is the basis of the above cited technologies.33,34
An accurate classification of cellulite is important when planning a therapeutic strategy, both in deciding which patients are suitable for treatment and in assessing treatment outcomes. The cellulite dimple scales provide physicians and researchers with a simple, accurate, and reliable assessment tool for both clinical and research purposes.
1. Luebberding S, Krueger N, Sadick NS. Cellulite: an evidence-based review. Am J Clin Dermatol 2015;16:243–56.
2. Hexsel D, Hexsel CL. Social impact of cellulite and its impact on quality of life. In: Goldman MP, Bacci PA, Leibaschoff G, Hexsel D, et al, editors. Cellulite Pathophysiology and Treatment. New York, NY: Taylor & Francis; 2006; pp. 1–5.
3. Querleux B, Cornillon C, Jolivet O, Bittoun J. Anatomy and physiology of subcutaneous adipose tissue by in vivo magnetic resonance imaging and spectroscopy: relationships with sex and presence of cellulite. Skin Res Technol 2002;8:118–24.
4. Gensanne D, Josse G, Theunis J, Lagarde JM, et al. Quantitative magnetic resonance imaging of subcutaneous adipose tissue. Skin Res Technol 2009;15:45–50.
5. Mirrashed F, Sharp JC, Krause V, Morgan J, et al. Pilot study of dermal and subcutaneous fat structures by MRI in individuals who differ in gender, BMI, and cellulite grading. Skin Res Technol 2004;10:161–8.
6. Hexsel DM, Abreu M, Rodrigues TC, Soirefmann M, et al. Side-by-side comparison of areas with and without cellulite depressions using magnetic resonance imaging. Dermatol Surg 2009;35:1471–7.
7. Hexsel D, Siega C, Schilling-Souza J, Porto MD, et al. A comparative study of the anatomy of adipose tissue in areas with and without raised lesions of cellulite using magnetic resonance imaging. Dermatol Surg 2013;39:1877–86.
8. de la Casa Almeida M, Suarez Serrano C, Rebollo Roldán J, Jiménez Rejano JJ. Cellulite's aetiology: a review. J Eur Acad Dermatol Venereol 2013;27:273–8.
9. Rossi AB, Vergnanini AL. Cellulite: a review. J Eur Acad Dermatol Venereol 2000;14:251–62.
10. Leszko M. Cellulite in menopause. Prz Menopauzalny 2014;13:298–304.
11. Rosenbaum M, Prieto V, Hellmer J, Boschmann M, et al. An exploratory investigation of the morphology and biochemistry of cellulite. Plast Reconstr Surg 1998;101:1934‒9.
12. Stavroulaki A, Pramantiotis G. Cellulite, smoking and angiotensin-converting enzyme (ACE) gene insertion/deletion polymorphism. J Eur Acad Dermatol Venereol 2011;25:1116‒7.
13. Lorencini M, Camozzato F, Hexsel D. Skin aging and cellulite in women. In: Farage MA, Miller KW, Maibach HI. editors. Textbook of Aging Skin. Heidelberg: Springer-Verlag Berlin Heidelberg; 2016; pp. 1–9.
14. Green JB, Cohen JL, Kaufman J, Metelitsa AI, et al. Therapeutic approaches to cellulite. Semin Cutan Med Surg 2015;34:140–3.
15. Kaminer MS, Coleman WP III, Weiss RA, Robinson DM, et al. A multicenter pivotal study to evaluate tissue stabilized-guided subcision using the Cellfina device for the treatment of cellulite with 3-year follow-up. Dermatol Surg 2017;43:1240–8.
16. Nürnberger F, Müller G. So-called cellulite: an invented disease. J Dermatol Surg Oncol 1978;4:221–9.
17. Hexsel DM, Dal'forno T, Hexsel CL. A validated photonumeric cellulite severity scale. J Eur Acad Dermatol Venereol 2009;23:523–8.
18. Flynn TC, Carruthers A, Carruthers J, Geister TL, et al. Validated assessment scales for the upper face. Dermatol Surg 2012;38:309–19.
19. Geister TL, Bleßmann-Gurk B, Rzany B, Harrington L, et al. Validated assessment scale for platysmal bands. Dermatol Surg 2013;39:1217–25.
20. Landau M, Geister TL, Leibou L, Blessmann-Gurk B, et al. Validated assessment scales for décolleté wrinkling and pigmentation. Dermatol Surg 2016;42:842–52.
21. Rzany B, Carruthers A, Carruthers J, Flynn TC, et al. Validated composite assessment scales for the global face. Dermatol Surg 2012;38:294–308.
22. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420–8.
23. Fleiss JL, Cohen J, Everitt B. Large sample standard errors of kappa and weighted kappa. Psychol Bull 1969;72:323.
24. Fleiss JL, Cohen L. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliablilty. Educ Psychol Mea 1973;33:613–9.
25. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.
26. Shrout PE. Measurement reliability and agreement in psychiatry. Stat Methods Med Res 1998;7:301–17.
27. Kaminer MS, Casabona G, Sattler G, Bartsch R, et al. Validated assessment scales for skin laxity on the posterior thighs, buttocks, anterior thighs, and knees in female patients. Dermatol Surg 2019:45:S12–21.
28. Donofrio L, Carruthers J, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of infraorbital hollows. Dermatol Surg 2016;42(Suppl 1):S251–8.
29. Sykes JM, Carruthers A, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for assessment of chin retrusion. Dermatol Surg 2016;42(Suppl 1):S211–8.
30. De La Casa Almeida M, Suarez Serrano C, Jiménez Rejano JJ, Chillón Martínez R, et al. Intra- and inter-observer reliability of the application of the cellulite severity scale to a Spanish female population. J Eur Acad Dermatol Venereol 2013;27:694–8.
31. Kaminer MS, Coleman WP III, Weiss RA, Robinson DM, et al. Multicenter pivotal study of vacuum-assisted precise tissue release for the treatment of cellulite. Dermatol Surg 2015;41:336–47.
32. DiBernardo BE, Sasaki GH, Katz BE, Hunstad JP, et al. A multicenter study for cellulite treatment using a 1440-nm Nd:YAG wavelength laser with side-firing fiber. Aesthet Surg J 2016;36:335–43.
33. Hexsel DM, Mazzuco R. Subcision: a treatment for cellulite. Int J Dermatol 2000;39:539–44.
34. Hexsel D, Dal Forno T, Hexsel C, Schilling-Souza J, et al. Magnetic resonance imaging of cellulite depressed lesions successfully treated by subcision. Dermatol Surg 2016;42:693–6.