Historically, the evaluation of aesthetic outcome after breast surgery has been highly subjective. Objective evaluation of surgical results after breast surgery is necessary if we are to perform critical analyses and refine reconstructive techniques. Over the last decade, patient-reported outcome instruments, with greater focus on patient satisfaction and quality-of-life after oncologic treatment and reconstruction of the breast, have been developed and validated in several languages.1,2 The use of these questionnaires has become increasingly popular.
The Breast-Q (BQ) is one such patient-based questionnaire developed to evaluate outcome after breast surgery. It has 5 different modules (augmentation, reduction/mastopexy, mastectomy, reconstruction, and breast-conserving therapy), and includes 2 domains: health-related quality-of-life and patient satisfaction.3,4
We recently described a new instrument for outcome assessment: the Telemark Breast Score (TBS) (Fig. 1). This instrument is based on 2D photographs taken from 3 standard views, and assesses surgical outcome in terms of volume (size), shape (upper pole, ptosis and aesthetic proportion), and symmetry. This tool enables the professional observer to transform a subjective impression into a reproducible objective score. Data from test–retests, previously reported by us, have shown the TBS to be reliable for assessment after breast-conserving surgery and microsurgical reconstruction after mastectomy.5 Until now, however, the TBS has not been validated against the patient’s own opinion regarding the surgical result.
The aim of this study was to evaluate the external validity of the TBS for patients who had undergone secondary breast reconstruction after mastectomy, using the deep inferior epigastric perforator (DIEP) flap. The study was based on 2 BQ domains (BQ1a–p and BQ3a–g) specifically addressing patient satisfaction with their breast and the general outcome. The Local Ethics Committee (Norway) approved the study (REK nr: 2013/1107) and it was registered at clinical trials.gov with ID number: NCT02853227.
MATERIALS AND METHODS
The photographs of 31 consecutive patients operated on and irradiated between 2008 and 2012 for breast cancer with delayed microsurgical breast reconstruction (DIEP flap) at Telemark Hospital, Norway, were eligible for analysis. Primary reconstructions with DIEP are not routine in the Scandinavian health care program for breast cancer patients. For this reason we invited patients with microsurgical secondary DIEP reconstructions to participate in the study. All patients agreed to participate in the study. Evaluation of the photographs was performed by an independent experienced plastic surgeon who had previously participated in the test–retests mentioned above.5 The independent plastic surgeon was blindfolded for the outcome of BQ assessments. Aesthetic assessment of the photographs using TBS was done 2 weeks after taking photographs. Demographic data of the study group are shown in Table 1.
The validated Norwegian version of the BQ module for reconstructive surgery was used to assess patient satisfaction with outcome during the winter of 2015. Thirty-one patients returned the questionnaire a mean of 2 years after DIEP flap reconstruction (range: 1–4 years). The photographs were taken and BQ was answered by the patients the same day.
We used the subsets of questions included in 2 of the BQ domains in the established BQ module for breast reconstruction, “satisfaction with breasts” (BQ domain 1, BQ1) and “satisfaction with outcome” (BQ domain 3, BQ3), because these were considered most suitable for the purpose of external validity of the TBS (Tables 2, 3).
Incomplete replies to BQ1 were obtained from 3 patients, and to BQ3 from 2 patients.
TBS validity was evaluated by examining the concordance between assessments from TBS and BQ expressed as Svensson’s Measure of Disorder (D) and Monotomic Agreement (MA) presented with 95% jack-knife confidence interval6 (Table 4).
The D-value indicates the proportion of disordered paired observations (surgeon and patient) among all possible combinations of pairs and can assume values between 0 and 1. If D = 0, all pairs are concordant, and if D = 1 all pairs are discordant. MA, which is a function of the measure D (MA = 1 – 2D), indicates the difference between the proportion of ordered pairs and the proportion of disordered pairs, ranging between −1 and 1. If MA = −1 all couples are disordered, and if MA = 1 all couples are ordered.
There are no thresholds designed for the dimensions D and MA categorizing the degree of validity. Optimum values are very close to 0 (D) and 1 (MA). For comparisons where the analysts made estimates on the 2 instruments with very similar operationalizations, one expects the D-values to be very close to 0 and MA close to 1, which means that almost all couples are concordant.
All data are analyzed using R for Mac version 22.214.171.124
Total Scores and Median Scores
The level of concordance between the sum score of TBS and sum score of BQ1/BQ3 is shown in Table 5. When comparing the TBS ∑ score with BQ1 ∑ score and with BQ3 ∑ score, the proportions of pairs that were disordered were D = 0.48 and D = 0.43, respectively. MA values for these comparisons were 0.04 and 0.14, respectively. The fact that the confidence interval overrides zero means that it is not possible to say with statistically certainty that the proportion of ordered pairs was higher than the proportion of disordered pairs. A lower proportion of disordered pairs were seen when comparing median scores than when comparing sum scores.
TBS Item Median Score versus BQ1 and BQ3 Median Scores
The proportion of disordered pairs (D) ranges from D = 0.16 to 0.34 when comparing TBS scores with the median score for domain 1 of BQ (Table 6). Lowest proportion of disordered pairs was seen when comparing items relating to “upper pole” in the TBS and median score of BQ item “patient satisfaction with their breast in general” in BQ1. Highest discordance was seen when comparing items related to ptosis.
The proportion of disordered pairs ranged from D = 0.09 to 0.21 when comparing median scores for TBS items with those for BQ3 questions (Table 7). Lowest discordance was seen when comparing items relating to overall aesthetics of the left breast and how the patient experienced the results of breast reconstruction in the BQ. Highest discordance was seen when comparing items related to ptosis.
Overall Concordance between All TBS Items and Items in BQ1 and BQ3
Concordance between TBS items and items in BQ1 and BQ3, respectively, was analyzed (data not shown). Table 6 shows each TBS item compared with the BQ1 or BQ3 items that gave the lowest proportion of discordant pairs (D-value).
Concordance between TBS items and BQ1 items regarding patient satisfaction with outcome showed the lowest proportion of discordant pairs when comparing item size with BQ1h (satisfaction with softness of breast), item size with BQ1l (satisfaction with how the breast feels to touch), followed by “upper pole right breast” with BQ1i (satisfaction with how similar the breasts are in size), and “upper pole left breast” with BQ1h (satisfaction with softness of breast). For comparisons with the other 5 TBS items, the D-values varied between 0.15 and 0.29 for the BQ item showing the lowest proportion of discordant pairs (Table 8).
Comparison of TBS items with BQ3 items regarding how the patient experienced the outcome of breast reconstruction revealed that for all TBS items, the lowest proportion of discordant pairs was seen when compared with the item BQ3a (reconstruction is a much better option than to not have a breast). When compared with “ptosis-left breast,” BQ3d showed the same proportion of discordance as with BQ3a. In these comparisons D-values varied from 0.01 to 0.11, that is, relatively small proportion of disorder pairs. Highest proportion of disordered pairs was seen when comparing TBS items with items BQ3e, BQ3f, and BQ3g describing patient expectation (Table 8).
This study investigates the external validity of the TBS by comparison of answers from TBS questionnaire with those from 2 selected groups of questions from the BQ questionnaire. Data were analyzed at 3 levels: sum score, median score, and separate items (TBS)/items BQ. The lowest degree of concordance was seen when comparing TBS sum score estimates with the sum scores of BQ items followed by comparison of the median scores of the 2 instruments.
The lowest degree of concordance seen in the sum score comparisons level could be expected because the instruments are not primarily designed to measure the same variables. This may explain why sum scores are more divergent than median scores at the item level. When comparing median scores, it was seen that the proportion of disordered pairs was lower for BQ3 than for BQ1. Because the sample size was small, it was not statistically possible to confirm differences in D-values between items from BQ1 and BQ3.
In general, there was a lower proportion of disordered pairs for BQ3 than for BQ1 in comparison with TBS. A possible explanation for this is that the patient and the surgeon regard the importance of certain items differently in their subjective assessment of surgical outcome. Should the patient receive improved information on outcome, even higher concordance might be obtained regarding answers to questions regarding expectation (BQ3e–g). The definition of ptosis remains a matter of discussion and even BQ3 D questions on ptosis are difficult to compare. It is not surprising, therefore, that less concordance was seen for this item.
The reason for the present relatively low concordance for some items in this study compared with results reported from other validity studies may be explained by the fact that even though 2 instruments investigate, to some extent, the same underlying variables, they may be operationalized differently from both the professional and nonprofessional point of view.
Although the surgeon and the patient probably take different factors into account when they answer the TBS and BQ questionnaires, both instruments seem to provide valuable information when evaluating surgical outcome in a consistent manner.
After comparison of all TBS items and BQ domain items, we found only 6 pairs where we could not statistically ensure that the pairs were more concordant than discordant.
This means that for all other comparisons there was a preponderance of concordant pairs, even though the degree varied, indicating that assessments from the 2 instruments follow each other.
As stated previously,1,8–10 the main disadvantage of a clinical assessment tool is that what is considered to be a successful result may not concur with the patients’ opinion. Furthermore, when measuring subjective parameters bias may be considerable. This may be seen as large interobserver variability not only between clinical observers, but also between clinicians and patients.
BQ is a validated instrument from the patient’s point of view and TBS is a clinical tool in which questions are asked and answered through professional approach. Even though BQ and TBS are not exactly comparable in the formulations of questions, the results follow each other.
In conclusion, the results of this study would suggest that the TBS can be recommended as a valid tool for the assessment of outcome after breast reconstruction.
The authors wish to acknowledge Anna Maria Kling for help with the statistical analyses and the Telemark Hospital for financial support in conducting this study.
1. Pusic AL, Lemaine V, Klassen AF, et al. Patient-reported outcome measures in plastic surgery: use and interpretation in evidence-based medicine. Plast Reconstr Surg. 2011;127:1361–1367.
2. Chen CM, Cano SJ, Klassen AF, et al. Measuring quality of life in oncologic breast surgery: a systematic review of patient-reported outcome measures. Breast J. 2010;16:587–597.
3. Pusic AL, Klassen AF, Scott AM, et al. Development of a new patient-reported outcome measure for breast surgery: the BREAST-Q. Plast Reconstr Surg. 2009;124:345–353.
4. Cano SJ, Klassen AF, Scott AM, et al. The BREAST-Q: further validation in independent clinical samples. Plast Reconstr Surg. 2012;129:293–302.
5. Begic A, Stark B. The Telemark Breast Score: a reliable method for the evaluation of results after breast surgery. Plast Reconstr Surg. 2016;138:390e–400e.
6. Svensson E. Concordance between ratings using different scales for the same variable. Stat Med. 2000;19:3483–3496.
7. R Core Team. R: A Language and Environment for Statistical Computing. 2015. Vienna, Austria: R Foundation for Statistical Computing; https://www.R-project.org/
8. Alderman A, Chung K. Measuring outcomes in aesthetic surgery. Clin Plastic Surg. 2013;40:297–304.
9. Eriksen K, Nordstrand Lindgren E, Olivecrona H, et al. Evaluation of volume and shape of breasts: comparison between traditional and three-dimensional techniques. J Plast Surg Hand Surg. 2011;45:14–22.
10. Ching S, Thoma A, McCabe RE, et al. Measuring outcomes in aesthetic surgery: a comprehensive review of the literature. Plast Reconstr Surg. 2003;111:469–80; discussion 481.