Development of the Adult Vulvar Lichen Sclerosus Severity Scale—A Delphi Consensus Exercise for Item Generation

Sheinis, Michal, BSc1,2; Selk, Amanda, MD, MSc, FRCSC2,3

Journal of Lower Genital Tract Disease: January 2018 - Volume 22 - Issue 1 - p 66–73
doi: 10.1097/LGT.0000000000000361
Original Research Articles: Vagina and Vulva

Objective To generate a list of items through international expert consensus consisting of both symptoms and clinical signs for inclusion in an adult vulvar lichen sclerosus severity scale.

Methods This study was carried out as a three-stage Delphi consensus exercise. After an extensive literature review, any items used to determine disease severity in previous clinical trials were compiled into a survey. The Delphi participants were recruited from the International Society for the Study of Vulvovaginal Disease most of whom were gynecologists and in practice for more than 20 years. Participants were asked to rate the importance of these items. Consensus was defined as 75% agreeing that an item was very important or essential toward determining disease severity. Participants were also asked to indicate preferred method of measurement for these items.

Results Of approximately 400 members of the International Society for the Study of Vulvovaginal Disease, 66 participated in the study. Of the 14 symptoms presented, 7 reached consensus for inclusion. Of the 23 signs presented, 11 reached consensus for inclusion and 1 reached consensus for exclusion. Of the six architectural changes presented, all six reached consensus for inclusion. No consensus was reached regarding method of measurement for any of the symptoms and signs that reached consensus for inclusion.

Conclusion International consensus was reached for a variety of items for use in an adult vulvar lichen sclerosus severity scale that will be further developed and tested. Ideally, this scale will be used in clinical practice and in research to allow for high-quality trials.

Through expert international consensus, 24 items are proposed for inclusion in a future validated adult vulvar lichen sclerosus severity scale.

1Faculty of Medicine, University of Toronto, Toronto, Ontario, M5S 1A8, Canada; 2Department of Obstetrics and Gynecology, Mount Sinai Hospital, Toronto, Ontario, M5G 1X5, Canada; and 3Department of Obstetrics and Gynecology, University of Toronto, Toronto, Ontario, M5G 1E2

Lichen sclerosus is a chronic inflammatory skin condition that most commonly affects the anogenital region in women. Lichen sclerosus can be asymptomatic in some patients; however, in others, it can result in severe itch, burning, dyspareunia, and irreversible anatomical changes with the potential to interfere with voiding and sexual function.1

Over the past few decades, there have been many clinical trials testing treatments of vulvar lichen sclerosus. To date, there is no agreed-on standard way to measure lichen sclerosus disease severity. Researchers have tried measuring disease severity using differing combinations of patients’ symptoms, physical examination findings, quality of life, sexual functioning measures, histological characteristics, and immunohistochemical staining. Researchers conducting multiple trials in this field have used many of the same measures for disease severity,2–9 yet there is still variation among individual research groups spanning different projects and even greater variation between research groups.

A major issue with the measurement tools used in previous studies is their lack of objective definitions for various levels of severity of signs and symptoms. Furthermore, these scales have not been tested for reliability and validity prior to being used in treatment trials. To perform high-quality treatment trials, a high-quality scale is required.

Owing to the wide variation in measurement of lichen sclerosus severity among clinical studies, it is very difficult to directly compare between studies. Creation of a standard severity scale would provide a tool for the clinical and research communities to describe lichen sclerosus, and a common language to test different treatments in future randomized controlled trials.

This study is the first step in creating an adult vulvar lichen sclerosus severity scale, and it sought to generate a list of items for inclusion in a vulvar lichen sclerosis severity scale with the combination of an extensive literature review, patient input, and expert consensus.

Literature Review for Preliminary Item Generation

An extensive literature review was conducted with the use of MEDLINE and EMBASE databases. These databases were searched with a combination of medical subject headings (MeSH) as well as keywords to include every variation on spelling for lichen sclerosus along with a search for known lichen sclerosus treatments (Appendix A shows complete search strategy).

All clinical trials studying vulvar lichen sclerosus treatment were reviewed, and the measures of severity that were used to determine impact of treatment were extracted manually. This list was reviewed by the principal investigator, and items that were more frequently discussed in the literature were compiled; whereas those that were unfeasible to measure in a clinical setting for all patients (e.g., histological and immunohistochemical characteristics) and terms not using standard medical language were removed.

Further Item Generation

The principal investigator posted on the International Society for the Study of Vulvovaginal Disease (ISSVD) Facebook page and sent messages through member newsletters and message boards to request suggestions of further categories to include (e.g., symptoms, signs, quality of life measures) as well as specific items to include (e.g., under symptoms: pruritus). Patient input was solicited through the international online support group for lichen sclerosus via an e-mail to its director (The Association for Lichen Sclerosus & Vulval Health).10

Delphi Consensus Exercise

Upon completion of the category and item generation phase, a questionnaire was created. A Delphi consensus exercise was embarked on to elicit expert consensus with a series of three online surveys. The experts selected in this case were members and fellows of the ISSVD who actively care for patients with vulvar lichen sclerosus. The individuals who participated were the most appropriate to provide expert consensus given their clinical experience in treating the disease. These were also the individuals who would be the most likely users of the severity scale after its establishment.

Research ethics board approval was attained through the Mount Sinai Hospital, Toronto, Ontario, Canada board (REB 16-0065-E). The ISSVD members were invited to participate via an e-mail invitation, which included a consent form. Consent was assumed if participants chose to move forward with the survey as indicated in the invitation letter and the introductory page of the survey. The survey was distributed using the online tool “SurveyMonkey”.11

Each round of three surveys was conducted over a 2-week period. In the first round of the e-Delphi exercise (, basic demographic information was collected about participants including specialty of practice, number of years in practice, and country of practice. Participants were asked to rate a series of symptoms and signs to assess disease severity on a five-point Likert scale from 1: not important at all, to 5: essential for assessing disease severity. An opportunity was provided for participants to add additional categories or items they felt to be important as well as to indicate which method they felt to be appropriate for measuring signs and symptoms (i.e., categorical distinction [presence or absence of that sign/symptom] or severity scale [e.g., a Likert scale]).

After all rounds, medians, interquartile ranges, and percentages were calculated, and these data were provided to participants. Items that did not reach consensus for inclusion or exclusion were carried forward to the next round. Participants were also notified with regard to which questions had been amended, added, or excluded after analysis.

Upon completion of the first round, items which 75% of participants had agreed on were “very important” (4 on the Likert scale) or “essential” (5 on the Likert scale) were considered to have reached consensus for inclusion. Items for which consensus of 75% was reached as “not important at all” (1 on the Likert scale) or “not very important” (2 on the Likert scale) were determined to have reached consensus for exclusion. The 75% cutoffs used were in accordance with similar studies using the Delphi technique.12,13

In the second round (, the survey was repeated with a few amendments. Questions that were determined to be confusing or unhelpful were removed. Additional items suggested by participants in the first round were incorporated into the second round. The scale was shifted after review of the results of the first round, as the study investigators were concerned that the group would not be able to move toward a consensus with the neutral response (“somewhat important”: 3 on the Likert scale) available to respondents with a five-point scale. In the second round, a four-point Likert scale was used, where 75% of participants had to agree that the item was “very important” (3 on the Likert scale) or “essential” (4 on the Likert scale) for the item to reach consensus for inclusion and to agree that the item was “not important at all” (1 on the Likert scale) or “not very important” (2 on the Likert scale) for the item to reach consensus for exclusion. After the third round (, this process was repeated a final time. The scale was again shifted to allow participants the option to either “include” or “exclude” items that had not reached consensus in the first or second rounds.

After the final analysis, the results were circulated for formal feedback and comments from the participants.

Statistical Analysis

Data were analyzed with Microsoft Excel 2013 for measures of central tendency (i.e., mean, mode, and median) as well as level of dispersion (standard deviation and interquartile range).

Literature Search

A literature search conducted with the use of MEDLINE and EMBASE databases yielded 359 and 638 results, respectively (Appendix A shows full description of the literature search). When duplicated articles were removed and irrelevant articles screened out with the use of titles and abstracts, 338 articles remained. In total, an exhaustive list of items was generated including 103 items spanning the categories of symptoms, signs, histological findings, immunohistochemical markers, quality of life, and sexual functioning. The various scales used to measure the severity of the items within these categories were also compiled. Results of this search are summarized in Table 1. Although input was elicited from members and fellows of the ISSVD and patient members of the Association of Lichen Sclerosus & Vulval Health, no items were suggested by either of these groups.



With regard to symptoms, scales for measurement included continuous14 and discrete2,4–9,15–34 scales varying between 4 and 10 points. These scales focused on specific symptoms with itching, burning, and dyspareunia being the most common. Qualitative measures of minimal, moderate, or severe were also used without any associated numerical values or definitions of the labels used.35

Clinical appearance was measured in much the same way as symptoms. The most common signs measured were hyperkeratosis, erosion, atrophy, erythema, and purpuric lesions and/or itching-related excoriations. Both discrete15 and continuous scales4–9,18,20,21,23,27,29,32,36–42 were used to measure these signs. In addition to determining the specific sign, some studies added measurements of lesion size28,29,38,41,43 as well as the duration of the lesion.44,45 As with measurement of symptoms, there were also those who measured signs within qualitative categories of mild, moderate, and severe or early-, mid-, and end-stage disease, with some providing definitions of two of these labels (early and end stage,46 and moderate, and severe47), and others not defining these labels at all.48 Some studies combined symptomatic complaint scores with clinical signs scores.41,42

Less commonly used measures of disease severity included the description of histological features such as epidermal atrophy, hyperkeratosis, and dermal inflammation14,25,48–53 and the measurement of various inflammatory marks with immunohistochemical staining.14,20,21,26,46,48,51,54–58 Finally, some studies focused on the quality-of-life effects of this disease as well as interference with sexual functioning.18,22,35,48,59,60

Delphi Consensus Exercise

Of approximately 400 members and fellows of the ISSVD, 66 participated in the three rounds of the survey. Retention of participants was 100% from the first round to the third round of the survey. Most participants were practicing gynecologists (67%), practitioners from the United States (45%), and practitioners with more than 20 years in practice (52%), although there was also good representation from dermatologists; the exercise included participants from 15 countries across the world (Table 2). Over the three rounds of the survey, 24 items reached consensus for inclusion including 7 symptoms, 11 signs, and 6 architectural changes (Table 3). Upon completion of the last round of the survey, 18 items remained for which no consensus had been reached (Table 4). With regard to methods of measurement, 72% of the participants wished to measure symptoms with a five-point severity scale (Table 4). When asked whether to measure frequency of symptoms (described in items 4 and 5 in Table 4), 82% of the participants agreed that this should be done. With regard to measuring special symptoms (including quality of life and changes in sexual functioning), most participants (52%) preferred to measure these with a five-point severity scale rather than a pre-existing validated scale (Table 5). There was a fairly even distribution of how practitioners preferred to measure signs and architectural changes (Table 4), although most agreed that these items must be measured to determine disease severity. With regard to architectural changes, most participants (87%) felt that either a photo or a colored-in diagram were to be used to record architectural changes in the context of determining disease severity.









Although there have been many studies in the past to propose methods for measurement of lichen sclerosus severity,2–9,14–61 this is the first study to present a list of items for inclusion in a scale based on an international expert consensus using the iterative Delphi consensus protocol.

Lichen sclerosus treatment varies widely especially with regard to maintenance therapy.62 A meta-analysis in 2012 concluded that the limited evidence available supports clobetasol propionate, mometasone furoate, and pimecrolimus for treating adult vulvar lichen sclerosus.63 Since then, many additional randomized controlled trials have been completed studying treatments including clobetasol propionate, mometasone furoate, fibroblast lysate cream, and topical tacrolimus5,18,20,37,63–66; however, as trials between different research groups have been completed with individual nonstandardized severity scales, comparison of results is challenging and imprecise.

Disease severity scales feature prominently in dermatological research with scales for diseases such as leprosy, hyperhidrosis, and psoriasis.67–69 However, in the subspecialized field of gynecological dermatology, such scales are lacking; and for conditions that affect the other regions apart from the vulva, such as lichen planus, a scale exists for oral lichen planus alone.70

To produce higher-quality research in a randomized controlled fashion, to improve observational research, and to compare data between different populations across the world, the creation and implementation of a standardized adult lichen sclerosus severity scale is essential. This study proposes 24 items for inclusion in a future lichen sclerosus severity scale. The categories of items proposed included symptoms, signs, and architectural changes. Although previous studies had considered the use of histological and immunohistochemical markers,14,20,21,25,26,46,48–58 these categories were not used in the initial survey because including them would require a biopsy at every patient visit. The fact that these categories were not proposed by any of the experts throughout the consensus exercise indicates that this feeling was upheld by experts worldwide. It may be objective to look at biopsy changes to treatment, but it is unclear whether this outcome is important to patients and likely should be used as an adjunct to other measures.

Most of the participants in this study were practitioners having more than 20 years of experience in the field. The participants were also those who actively see patients with vulvar lichen sclerosus as part of their clinical practice, which lends greater credibility to the results. The 100% retention rate through the three rounds of the survey demonstrates a commitment on behalf of the participants to advancing the field.

Despite its strengths, this study has several limitations. First, although this study sought to include patients’ input, unfortunately, this endeavor was unsuccessful; this removes a very important voice from the development of the scale. However, as the list of items included in the scale was based on an extensive literature review, it was thought that most if not all the items that would have been suggested by patients would likely have already been considered when generating the first survey. The type of scale used to rate the items was changed between rounds of the survey (from a five-point Likert, to a four-point Likert, to a choice between two options) for the purpose of moving the group toward consensus. The change of the scale may have affected the quality of the data collected. The expert group consisted mostly of gynecologists, which may have skewed the results toward one specialty. However, given that gynecologists are more likely to encounter the more severe cases of lichen sclerosus, it was thought that this would not affect the tool’s use too heavily, but future testing is required to see if this is true. Furthermore, although this study was conducted in English, for some of the international experts, English was not their first language. This could have affected the understanding of some questions and affected the answers provided. Despite three rounds, consensus was not reached on several items, and there was an even split into three groups over the ideal way to measure anatomical changes. It is a matter of concern that participants did not want to use already validated scales to measure things like sexual function and quality of life and speaks to a lack of understanding of the difficulty in creating high-quality measurement tools. Finally, the principal investigator and research coordinator were responsible for determining which items to incorporate into the survey based on the literature review, which allowed for personal bias.

This study is only the first step of many subsequent steps that must be undertaken for the completion of a lichen sclerosus severity scale. Before testing the items generated through this study, patient focus groups will need to be conducted for their crucial input on relevant symptoms, and the symptom list will subsequently be expanded before testing the symptom section with patients. Future steps include testing the scale for inter-rater and intrarater reliability, examining ratings by dermatologists versus gynecologists, completing a factor analysis to combine similar items, removing redundant and unhelpful items, and ensuring feasibility of use in a clinical setting. Currently, there are too many items and there is overlap in items that will be reduced through testing the scale. The various ways to measure anatomical changes will need to be tested to see which method gives the most reliable reproducible results. Upon completion of these steps, further studies will be undertaken to determine whether the scale is useful for assisting with medication selection and determining prognosis. The ultimate goal of these further studies will be to produce a severity scale that is high enough in quality for use in research and, concurrently, sufficiently user-friendly that it will have high uptake clinically.

It was felt that the results thus far would be helpful to clinicians and researchers involved in the care of women with vulvar lichen sclerosus. In a recent editorial, Foster et al. (2017) highlighted the absence of core outcome sets, which they define as “a minimum set of outcomes that is used in clinical trials and observational studies… that will enable trials or studies to be compared in meta-analyses”, within the field of vulvovaginal disease, making the production of high-quality research trials next to impossible.71 Although there is a long way to go toward the completion of a validated scale for measuring adult vulvar lichen sclerosus severity and creating an internationally accepted core outcome set for adult vulvar lichen sclerosus, this study brings us one step closer toward achieving this goal.

Thank you to our colleagues from all the way around the world for providing their expert opinions. Without their dedication, enthusiasm, and contribution, this project would not have been possible.

Dr Tolu O Adedipe, Melanie Altas, Jeff Andrews, MD, FRCSC, Pedro Vieira Baptista, Deborah Bartholomew, MD, Debra Birenbaum, Celine Bouchard, Carol Bunten, MD. The Vancouver Clinic, Matthé PM Burger, Carla Carpenter, DO, Carmine Carriero, Christine Conageski, M E Cruickshank, Claire S Danby, MD, Tania Day, Graeme Dennerstein, FRCOG, FRANZCOG, Arucha Ekeowa-Anderson, Diane Elas, MSN, ARNP, Robyn B Faye, MD, FACOG, NCMP, IF, Theodore Fellenbaum, MD, FACOG, A/Prof Gayle Fischer, José Fonseca-Moutinho, Theresa Freeman-Wang, Karen L Gibbon, FRCP, Erin Gross, Anne Lise Helgesen, Helen Henzell, Susan Kelly, Carly Kirshen, Catherine M. Leclair, MD, Joana Lima-Silva, Jamie B MacKelfresh, NA Madnani, Lynette J Margesson, MD, FRCPC, Melissa Mauskar, Mary Gail Mercurio, MD, Merle Monsein, Beth Morrel, Micheline Moyal-Barracco, Hon A/Prof Amanda Oakley, Dimitrios Papoutsis, Mario Preti, M Luann Racher, Gianluigi Radici, MD, PhD, Cara Berg Raunick, DNP, Sandra Rivero, MD, Edna Lima Rizado, Darion M Rowan, Amanda Selk, Priya Selva-Nayagam, Ms V Shesha, Mark Spitzer, MD, Danielle Staecker, MD, Marc Steben, Amy L Stenson, Elizabeth G Stewart, Colleen K Stockdale, Bram ter Harmsel, MD, PhD, Jay R. Trabin, MD, Lucia Treviño-Rangel, MD, Catherine F Vanderloos, MD, Aruna Venkatesan, A Vitorino, Anuja Vyas, John Willems, MD.

Database search strategy for both MEDLINE and EMBASE MeSH heading “Lichen Sclerosus et Atrophicus” was searched with subclassifications of “therapy”, “classification”, and “drug therapy”. This term was combined with “lichen sclerosis et atrophicus” as a keyword with a subclassification of “disease” with OR as well as keyword “lichen scleros?s” to include all spellings of the term and MeSH heading “Lichen Sclerosus et Atrophicus” without any subclassifications. These two terms were further combined with the MeSH heading “Clobetasol”, keyword “Clobetasol propionate”, MesH heading “Tacrolimus”, keyword “tacrolimus”, MeSH heading “glucocorticoids” in exploded form, keyword “topical steroid*”, and keyword “glucocorticoid*” with an OR term.

The results were then limited to “female”.


lichen sclerosus et atrophicus; lichen sclerosus; lichen sclerosis; severity scale; Delphi Consensus

Supplemental Digital Content

