Author Information
Gender dysphoria is the distress caused by a discrepancy between a person’s gender identity and his or her sex assigned at birth1; when the condition is not treated, quality of life of transgender patients is reduced, and the risk of suicide is high.2 The number of patients diagnosed with gender dysphoria and seeking clinical care is increasing3 and estimated to be approximately one in 60,000.4

The treatment of gender dysphoria consists of a combination of different therapeutic approaches, such as hormonal therapy, surgical therapy, and psychotherapy.1 Surgery performed for the purpose of lessening gender dysphoria is called gender confirmation surgery. Core procedures for transgender women are vaginoplasty and breast augmentation; in addition, female feminization surgery is currently becoming more requested. Core procedures for transgender men are penis reconstruction and mastectomy. Sometimes, patients also request minor surgical interventions, such as liposuction, gluteal augmentation, and hair reconstruction.1

Proper evaluation of the results of gender confirmation surgery is important to verify the efficacy of a specific surgical treatment, to compare different surgical techniques and preoperative and postoperative management, to understand the problems existing within the current practice, and to enable improvements. Historically, outcomes of gender confirmation surgery have been evaluated mainly by the caregiver’s subjective assessment and by presenting the amount of postoperative complications. In recent years, instead, with the development of patient-centered care, rating the outcomes from the patient’s perspective has been highlighted. According to Kuiper and Cohen-Kettenis, in fact, “an evaluation of SRS [sex reassignment surgery] can be made only on the basis of subjective data, because SRS is intended to solve a problem that can not be determined objectively.”5 As a consequence, Chung and Pusic, among others, have emphasized the usefulness of patient-reported outcome measures; these are instruments (e.g., questionnaires, structured interviews) designed to report a patient’s health condition without external bias, such as the clinician’s interpretation.6

When implementing patient-reported outcome measures, evidence of the instrument’s validity, reliability, and responsiveness in the target population is necessary to secure accurate reporting of the patient’s experience. Validity (measuring what is supposed to be measured), reliability (producing consistent results under similar conditions), and responsiveness (sensitivity to change over time) are established by developmental and validation processes, including psychometric testing.

Patient-reported outcome measures can be of different types: generic, evaluating psychosocial or psychiatric aspects, function specific, or ad hoc.7 Generic instruments, such as the 36-Item Short-Form Health Survey, the World Health Organization Quality of Life Scale, and the Subjective Happiness Scale are intended to measure quality of life and are expected to be reliable in any given population. The drawback of using a generic patient-reported outcome measure in the evaluation of surgery consists of the higher number of factors influencing the result, making these measures less responsive to the surgical intervention in itself. Patient-reported outcome measures evaluating psychosocial and psychiatric factors, such as the Symptom Checklist 90 and the Body Uneasiness Test, are appropriate for measuring surgical change, because these focus on topics that are not specifically affected by surgery. On the contrary, function-specific patient-reported outcome measures, such as the Female Sexual Function Index, the BREAST-Q, and the Patient and Observer Scar Assessment Scale, evaluate selected factors that may be directly influenced by surgical intervention; this makes them more precise but only valid in the populations in which they have been psychometrically tested. The third type of patient-reported outcome measures consists of diagnosis-specific instruments, such as the Utrecht Gender Dysphoria Scale and the Body Image Scale; in the case of gender dysphoria, these instruments are commonly influenced by all the other parts of the treatment for gender dysphoria, such as hormonal therapy and psychotherapy, and are not specific to surgery. Lastly, ad hoc patient-reported outcome measures are instruments without formal development and testing, and thus without confirmed validity.

The aim of this study was to identify the literature in which structured patient-reported outcome measures have been used to evaluate the results of gender confirmation surgery, and to systematically evaluate the validity of these instruments. Articles on the subject of vaginoplasty, penile reconstruction, breast augmentation, and mastectomy were included, as these are the core procedures in gender confirmation surgery. Excluded were articles on the evaluation of face, throat, vocal cord, and miscellaneous surgery, which are less requested and therefore less studied in the literature. The results of this systematic review will contribute to the development of patient-reported outcome measures suitable for the evaluation of functional outcome after gender confirmation surgery (e.g., sexual and urinary function, aesthetic result, and self-image).


In this systematic review, predefined patient, intervention, comparison, outcome, and setting criteria were formulated to search the literature in a structured manner (Table 1). The literature search was performed in collaboration with librarians educated to perform systematic bibliographic searches at Sahlgrenska University Hospital Library. We searched the medical literature using PubMed (1968 to February of 2017), Scopus (1960 to February of 2017), Cochrane Library (1996 to May of 2016), and PsycInfo (1965 to February of 2017). The following key terms were used: "breast reduction," "chest reconstruction," "chest surgery," "chest-wall contouring surgery," "reduction mammoplasty," "mastoplasty," "mastectomy," "mastectomies," "mastopexy," "mammaplasty," "mammoplasty," "breast augmentation," "metoidioplast*," "penile reconstruction," "penile construction," "phalloplast*," "vaginoplast*," "phalloplasties," "neophalloplast*," "neo-phalloplast*," "neovagin*," "neo-vagin*," "gender re-assignment," "gender reassignment," "gender confirming," "gender confirmation," "sex re-assignment," "sex reassignment," "sex confirming," "sex confirmation," "genital re-assignment," "genital reassignment," "sex reassignment surgery," "transgender," "transsexual," "transsexualism," "male-to-female," "female-to-male," "MTF," and "FTM."

Articles reporting on the following were excluded: revision surgery, secondary surgery, hysterectomy or vaginectomy only, studies only assessing personality, sexual orientation or patient-reported experience measures, studies presenting outcomes reported by someone other than the patient himself or herself, and implementing semistructured techniques of measuring outcomes without a defined set of questions. Two independent researchers performed all steps of the search to find studies matching the predetermined inclusion criteria. Any apprehension was resolved by discussion. The search process is presented in a flowchart (Fig. 1).

The patient-reported outcome measures implemented in the included articles were assessed regarding their development and validation processes in a transgender population; when there was no available information on this matter, the corresponding author was contacted. In cases where no contact could be established, the instrument in question was considered to be without secured validity (ad hoc).

Generic patient-reported outcome measures, and patient-reported outcome measures evaluating psychiatric or psychosocial aspects, were considered valid in a transgender population if it was stated to be valid in a general population. Gender dysphoria–specific patient-reported outcome measures were evaluated only with reference to a transgender population. Function-specific patient-reported outcome measures were evaluated with regard to a transgender population, because these are only applicable in populations in which they have been tested.


The systematic search identified a total of 2079 articles; in addition, five articles were found from separate searches.8–150 After application of the inclusion and exclusion criteria, 202 articles were submitted to full-text review, and 109 articles were excluded, resulting in 93 included articles (Fig. 1). In this material, a total of 110 instruments were identified, of which 64 were ad hoc; six were generic; and 24 evaluated psychiatric, social, or psychosocial aspects. The remaining 16 instruments evaluated function in part or exclusively, and had undergone some degree of formal development and/or validation processes. These instruments can be categorized into three subgroups: instruments valid in other patient groups (n = 9), ad hoc instruments with some formal development/validation (n = 5), and diagnosis-specific instruments (n = 2). (See Table, Supplemental Digital Content 1, which shows an instrument overview,5,9,12-17,21,27,61,62,64,66,68,69,71,73–75,79–150

The development and validation processes of the function-specific ad hoc instruments are listed in Table 2; the evaluated domains of these and the function-specific instruments, which are valid in other patient groups, are presented in Tables 3 and 4.

Ad Hoc

A total of 64 instruments without formal development or validation in any population were identified8–73; these covered the areas of function, appearance, body image, and psychosocial factors, and had been used to evaluate all different types of gender confirmation surgery. One ad hoc instrument is the Biographical Questionnaire for Transsexuals and Transvestites; this is a well-recognized instrument used in many clinics, but is without formal development and validation.

Generic, Psychosocial, and Psychiatric Instruments

Six generic instruments used to evaluate gender confirmation surgery were identified, investigating quality of life, happiness, and general satisfaction; the 36-Item Short-Form Health Survey was the most commonly used generic instrument. In addition, a total of 24 different instruments were used to evaluate psychosocial and psychiatric factors, of which the Symptom Checklist 90 was the most frequently used.

Function-Specific Instruments Valid in Other Patient Groups

Nine instruments had originally been developed to evaluate function and symptoms in other populations; seven of these were in the reviewed material used to assess outcome after vaginoplasty and were focused on sexual function and desire, urinary function, and genital self-image; one had been used in the evaluation of donor morbidity after radial free flap phalloplasty, and one had been used to evaluate the result after breast augmentation. None of the instruments had any formal development or validation in a transgender population.

Function-Specific Ad Hoc Instruments with Some Formal Development/Validation

Five of the ad hoc instruments identified in the included articles had some formal development or validation in a transgender population, such as the Sammons Body Image and Sexual Pleasure Questionnaire74 and the Wierckx Questionnaire75 used to assess sexual function before and after gender confirmation surgery. However, these processes were scarce, including only one or a few steps of the recommended procedures for developing reliable instruments.76

Diagnosis-Specific Instruments

Two validated instruments that were developed to assess gender dysphoria as a whole were identified in the included articles: the Utrecht Gender Dysphoria Scale and the Body Image Scale. These are all validated in and developed for transgender populations, but were originally not meant to assess the impact of gender confirmation surgery exclusively and are affected by therapeutic interventions such as hormonal therapy, voice therapy, and psychotherapy.


Proper measurement of patient-reported outcomes after gender confirmation surgery is important to evaluate the effect of specific surgical treatments, and to compare different surgical techniques and preoperative and postoperative management, to enable improvements within current practice. This systematic review highlights the absence of patient-reported outcome measures that are specific enough to assess the impact of gender confirmation surgery exclusively and concurrently proven to be valid in the transgender population.

Comparison with Other Studies

Patient-reported outcome measures used to assess quality of life and patient satisfaction following gender confirmation surgery have been analyzed in a recent publication by Barone et al.,77 stating the need for new instruments to measure patient satisfaction and quality of life after gender confirmation surgery. Their search identified 796 articles, of which 19 were included in the review, containing a total of 17 instruments; in comparison, this systematic search identified 2084 articles, of which 92 articles were included in the review, containing 109 instruments. The lower number of the articles selected in their systematic review, and the evaluated patient-reported outcome measures, may be because of overly restricted search terms or because only PubMed was searched; further analysis of their process is difficult to perform because they did not predefine patient, intervention, comparison, outcome, and setting criteria.

Ad Hoc Instruments

The most common type of patient-reported outcome measures identified in this review is represented by ad hoc instruments created to be used only for a specific study. These instruments are composed of questions that often make sense for the functional evaluation but had not undergone any formal development and validation processes; consequently, one cannot be confident in their validity and reliability. The Biographical Questionnaire for Transsexuals and Transvestites is an extensive ad hoc instrument composed of items on sociodemographic information, psychosocial aspects and sexuality, and function-oriented items (e.g., frequency of orgasm, lubrication, and satisfaction with the results of gender confirmation surgery). It has been regularly used as part of the patient assessment in many centers, but is without formal development and validation. Both of the studies using the Biographical Questionnaire for Transsexuals and Transvestites59,73 included selected parts of the instrument only, focusing on sexual function and experiences. Furthermore, the fact that different measures are used in different studies is an additional obstacle in the comparison of results; this systematic review identified a total of 64 different ad hoc instruments used to evaluate gender confirmation surgery in 67 separate studies,68,70–73,99 thus ruling out the opportunity to confidently analyze the results and conclusions of these in relation to one another.

Generic, Psychosocial, and Psychiatric Instruments

In addition to diagnosis-specific patient-reported outcome measures, generic instruments and instruments strictly evaluating psychosocial factors or psychiatric morbidity can be considered applicable in any given population and thus also in transgender patients. Just like the previously mentioned diagnosis-specific instruments, the generic, psychosocial, and psychiatric instruments are too unspecific for evaluating the outcome of surgery without the bias of other factors. However, they report on quality of life and psychological well-being, which are likely to be influenced by gender confirmation surgery, as it is intended to reduce gender dysphoria. Consequently, generic and psychologically focused instruments are meaningful in the total evaluation of gender dysphoria, but not sensitive enough to detect the specific changes occurring after gender confirmation surgery. As stated by Pusic et al., “condition- or surgery-specific measures allow greater responsiveness to intervention-related change when compared with generic measures.”78

Function-Specific Instruments Valid in Other Patient Groups

Instruments that are formally developed and validated in other populations were used to investigate appearance; sexual function and desire; urinary, bowel, and genital function; and results after breast augmentation; however, none of these has a confirmed validity in the transgender population and consequently none can be used to reliably evaluate outcome of gender confirmation surgery. Patient-reported outcome measures that are specific to function or diagnosis generate reliable data only in the populations for which they have been developed; therefore, further psychometric testing is needed before confidently using these in other populations. Whether or not an instrument is valid in a specific population is not always obvious. More specifically, an assessment has to be made of whether the population in which the instrument previously has been tested is similar enough to the population in question.

Function-Specific Ad Hoc Instruments with Some Formal Development/Validation

Five of the encountered function-specific ad hoc instruments that were explicitly developed to be used in transgender patients (Sammons,74 Wierckx et al.,75 Costantino et al.,79 Morrison et al.,80 and Melloni et al.69) have undergone some formal development or validation process (e.g., qualitative interviews with transgender patients to define the included items, slightly increasing their validity, reliability, or responsiveness). However, these processes are scarce compared with the review criteria suggested by the Scientific Advisory Committee of the Medical Outcomes Trust, an entity created to independently identify and review health status and quality-of-life instruments; these review criteria include clearly specified methods for item generation, item reduction, and psychometric evaluation.76

Diagnosis-Specific Instruments

The diagnosis-specific patient-reported outcome measures identified in this review were the Utrecht Gender Dysphoria Scale and the Body Image Scale, of which the Body Image Scale was the most frequently used. These instruments are designed to evaluate the extent of gender dysphoria, which is influenced by many factors other than just gender confirmation surgery (e.g., hormonal therapy and psychotherapy). Thus, despite the specific development for transgender patients, these are not proper instruments alone with which to evaluate surgery. The Utrecht Gender Dysphoria Scale focuses on one’s perception of life in relation to gender (e.g., “I hate menstruating because it makes me feel like a girl” and “I feel unhappy because I have a male body”). Only a few of the items focus on the attitude with regard to specific organs (e.g., “I hate having breasts” and “I dislike having erections”); however, none of these concern reconstructed organs (e.g., the vagina following vaginoplasty), reducing the benefit of the Utrecht Gender Dysphoria Scale when evaluating gender confirmation surgery. The Body Image Scale records the patient’s feelings (Likert scale ranging from 1 to 5, where 1 = satisfied and 5 = very dissatisfied) about one’s appearance with regard to a specific body part (e.g., face, hands, and chest) and general characteristics (e.g., weight and stature). It evaluates the patient’s general perception on areas that are affected in gender confirmation surgery but does not investigate these feelings any further; in addition, it records factors that are not influenced by gender confirmation surgery, such as body hair and height.

Research Limitations

A criterion of greater than or equal to 10 included patients was set to exclude patient-reported outcome measures that had only been used in case reports or small patient groups; in addition, only instruments with three or more items were included. Instruments with only one or two items were considered too small to qualify for this review. However, many studies report on one or two questions to evaluate patient outcomes; therefore, our criteria may theoretically have eliminated items of interest.

Further Research

The lack of valid patient-reported outcome measures, sensitive enough to evaluate gender confirmation surgery, has motivated our research team to develop new instruments focusing on patients’ postoperative psychological and physical outcomes. This systematic review is the first step of our research plan to define, validate, and adopt new patient-reported outcome measures for daily clinical practice.


Patient-reported outcome measures are instruments that measure patients’ symptoms and feelings without external bias, and are advantageous for assessing subjective health issues, such as gender dysphoria, from a patient’s perspective. This systematic review highlights the absence of patient-reported outcome measures that are valid for the transgender population and sensitive enough to evaluate gender confirmation surgery without the influence of other gender confirming interventions, such as hormonal therapy and psychotherapy. In current clinical practice and research studies, the use of patient-reported outcome measures includes instruments that are reliable in a transgender population but with low responsiveness to surgical intervention, or instruments that are specific for factors affected by surgery, but without proven validity in transgender patients. The latter group (i.e., ad hoc instruments and instruments that are valid in other populations) are used to assess functional outcome after gender confirmation surgery. Both may include relevant items but have not undergone formal psychometric testing in a transgender population and, as a consequence, are without confirmed validity and reliability for this specific purpose. Basing research on instruments without confirmed validity inevitably decreases the validity of the study itself; in fact, all previous research evaluating patient-reported outcomes after gender confirmation surgery can be considered to have a low level of evidence. In addition, the high number of unreliable instruments used in the current literature not only causes uncertain results but also prohibits dependable comparison between different studies. To obtain valid patient-reported outcome measures, specific for evaluating the results of gender confirmation surgery, development of new instruments or adaptation and psychometric testing of existing instruments is needed.


