Sun, Clement S. MS*†; Cantor, Scott B. PhD‡; Reece, Gregory P. MD†; Crosby, Melissa A. MD†; Fingeret, Michelle C. PhD†§; Markey, Mia K. PhD*¶
Women planning to undergo breast reconstruction may have several medically appropriate surgical options.1–3 It can be difficult for patients to choose a reconstruction strategy because trade-offs must be made between several issues, including quality of life, cost (in terms of out-of-pocket expenses), and time lost to reconstruction and associated medical procedures. To decide on a reconstruction strategy, the patient should consider all possible outcomes of breast reconstruction and the probabilities that those outcomes will occur. Furthermore, she must evaluate the value of each outcome. However, the number of possible breast reconstruction outcomes is large,1 thereby making it infeasible to evaluate each possibility. Most physicians spend between 46 and 60 minutes during the initial consult alone.4
Multiattribute utility theory (MAUT)5,6 enables the quantification of outcomes given the preferences of the patient. This quantification can then be used to assist with medical decision making. We are unaware of any studies that apply this theory in breast reconstruction decision making; however, it has been applied in other areas of healthcare such as prostate cancer treatment and neonatal intensive care.7–13 Yet, in all cases, MAUT was applied without evaluating its performance in modeling preferences. If MAUT accurately models women’s preferences in the breast reconstruction decision-making setting, it may be employed by future computational decision support systems to aid in patient decision making about breast reconstruction. Although this study was conducted on healthy participants, the findings may be extended to all patients who have not been informed of a diagnosis of breast cancer. The application of MAUT may enable plastic surgeons to quickly and efficiently identify breast reconstruction choices that are more preferred by the patient.
MATERIALS AND METHODS
Defining an Outcome: The 7 Attributes
An outcome is composed of a set of attributes and associated values. The BREAST-Q is a validated and commonly employed instrument for assessing satisfaction and quality-of-life outcomes of breast reconstruction. The BREAST-Q assesses 5 patient-reported outcome measures (or attributes): (1) satisfaction with one’s breasts, (2) psychological well-being, (3) well-being of the chest, (4) well-being of the abdomen, and (5) sexual well-being.14–17 For each measure, the BREAST-Q yields a raw point total, which may be converted into a score (between 0 and 100) using a propriety scoring algorithm. The BREAST-Q has a number of breast reconstruction variants. As the participants in this study have never had breast cancer or mastectomy, we deployed the preoperative version.
In addition to the BREAST-Q measures, we considered the attributes of (6) cost in terms of out-of-pocket expenses and (7) time lost to reconstruction or associated medical procedures. Out-of-pocket cost is measured in US dollars and time is measured in days.
The study investigated the breast reconstruction outcome preferences of 36 women between May 1 and July 18, 2013, in the Seattle, WA, Austin, TX, and Houston, TX, areas. Participants in this study were healthy human volunteers who did not have a history of breast cancer. We obtained Institutional Review Board approval from The University of Texas MD Anderson Cancer Center to conduct this research. The protocol title is recorded as “Validating a Multiattribute Utility Function for Breast Reconstruction Decision-Making.” The MD Anderson protocol ID number is PA12-1179, approved on February 28, 2013. A similar protocol was separately submitted to the Institutional Review Board at The University of Texas at Austin, which determined that it was not human subjects research because no identifying or health information was to be collected.
Structured Preference Elicitation Consultation
Individual consultations were conducted in person with the first author. During the structured consultation, the participant performed a series of tasks: (1) assess BREAST-Q state and clarified attributes, (2) assess risk tolerance for cost and time, (3) clarify preferences for all 7 attributes, and (4) rank 10 sets of 3 standardized hypothetical breast reconstruction outcomes.
Evaluating Preference Model Performance
We quantified the performance of each preference model in terms of its consistency, correctness resolution, and error resolution. The consistency of a model is the extent to which its rankings of the breast reconstruction outcomes matched the rankings of the participants. At the end of the consultation, we asked each participant to rank 10 sets of randomly generated hypothetical breast reconstruction outcomes in order of preference from best to worst. When generating these outcomes, we limited the possible values to between the best and worst values as given in Table 1. Each set contained 3 outcomes (Fig. 1).18 The participants were not shown the model rankings to avoid bias. Within each set, we compared the rankings of each pair of outcomes, 3 pairs per set for a total of 30 pairs. Ideally, a model would be 100% consistent with the participant, meaning that the model agreed with all the participant’s rankings. By chance alone, one would expect a consistency of 50%. Hence, any consistency greater than 50% demonstrates some ability to accurately model a participant’s preferences. A model that is more consistent better represents patients’ preferences about breast reconstruction and produces rankings that are more likely be useful for supporting patient decision making.
Each preference model expresses the value of an outcome as a u-value, which is a number between 1 and 0. The higher the u-value, the better the outcome. The u-value may be thought of as a preference probability. The correctness resolution is the u-value (probability of preference) difference with which a model correctly differentiates 2 outcomes. For instance, if a model correctly orders 2 outcomes with u-values of 0.30 and 0.35, the model has a correctness resolution of 0.05. A correctness resolution of <0.03 is preferred because this amounts to less than a 3% difference in outcomes for which people are usually indifferent.19,20
The error resolution is the u-value difference with which a model incorrectly differentiates 2 outcomes. In other words, if a model incorrectly orders 2 outcomes of 0.30 and 0.35, then the error resolution is 0.05. The error resolution should ideally be less than or equal to the correctness resolution.
Assessing Preference Weights for Attributes
An individual may weigh one attribute more or less heavily than the others. For example, a breast cancer patient who has substantial caregiver responsibilities for family members might value time over out-of-pocket cost while a patient with fewer such demands on her time might value cost over time. We assessed the preference weight, k, for each attribute, using the von Neumann-Morgenstern standard gamble.21 The bigger the k, the more preferred the attribute.
To eliminate the factor of numeracy and reduce the effect of cognitive biases such as anchoring and adjustment22 in obtaining these k values, we employed a probability wheel (AnalyCorp, Stanford, Calif.), which presents the probability as an area of a pie chart slice rather than numerical value.19,23
To quantify a participant’s preference weight distribution, we calculated
Equation (Uncited)Image Tools
where the ks are normalized such that they sum to one.24 For instance, if the participant places equal weight on cost, time, satisfaction, psychological well-being, chest well-being, abdominal well-being, and sexual well-being, the entropy will equal a minimum of –1.95. If a participant places all the weight on one attribute only, such as satisfaction with her breasts, the entropy will equal a maximum of zero.
Modeling Patient Preferences with Multiattribute Utility Theory
A preference model includes 2 components: (1) the risk model for each attribute and (2) the multiattribute utility function that performs trade-offs among attributes and calculates the summary value. We considered 3 risk models and 3 multiattribute utility functions. We integrated the patient’s risk attitude and preferences into 9 preference models.
Attribute values such as time and cost must first be converted into a u-value. We considered 3 types of risk models: (1) risk neutral, (2) risk averse, and (3) risk averse-preferring or sigmoidal (Fig. 2). For instance, with respect to out-of- pocket reconstruction cost, a risk neutral participant would place a value of $1000 on a 50-50 chance of a reconstruction that costs $2000 and a reconstruction that costs $0. A risk-averse participant would value the same gamble at more than $1000. A risk-preferring participant would value the gamble at less than $1000.
The risk neutral model is simply a straight line representing the expected value or average. For the risk-averse modeling of out-of-pocket cost, we chose the exponential curve
Equation (Uncited)Image Tools
where x is some cost value, a and b are chosen such that ucost(xworst) = 0 and ucost(xbest) = 1, and the “perceived monetary wealth” is obtained by determining a maximum value, x, such that the decision maker is indifferent between receiving nothing or a 50% chance at x and 50% chance at –x/2.25 Similarly, the risk-averse modeling of time used the same exponential curve
Equation (Uncited)Image Tools
where t is some time value, a and b are chosen such that utime(tworst) = 0 and utime(tbest) = 1, and the perceived time wealth is obtained in the same fashion as cost.
As we know the participants’ BREAST-Q states, we chose the logarithmic curve for the satisfaction, psychological well-being, chest well-being, abdominal well-being, and sexual well-being attributes. For instance, for psychological well-being, the model was
Equation (Uncited)Image Tools
where ψ represents a psychological well- being value, and a and b are chosen such that
Equation (Uncited)Image Tools
As an alternative to the risk averse and risk neutral models, we also considered a hybrid risk averse-risk preferring or sigmoidal model for the satisfaction, psychological, chest, abdominal, and sexual well-being attributes. The BREAST-Q propriety scoring algorithm provides this risk model by producing a score between 0 and 100, given the raw point total of these attributes. We treated the BREAST-Q scores, as percentages, as the u-values.
Multiattribute Utility Functions
There are several candidate multiattribute utility functions that utilize participant preferences for attributes to produce an overall u-value for a given outcome. This overall u-value is, in effect, a single summary value for that outcome. Due to the number of attributes, it was only feasible to evaluate 3 functions because others require exponentially more weight assessments (eg, the multilinear requires 126). These functions were the multiplicative,5,6,11 additive,5,6,11,19,26 and power additive.27 Each would require assessment of 7 preference weights. The power additive function requires an eighth risk tolerance assessment.27
For the 36 participants, the mean age was 51.2 years with a range of 32 to 72. With regard to race, 66.7% were White, 30.6% were Asian, and 2.8% were Black. One White participant was Hispanic (2.8%). Regarding the highest level of completed education, one participant (2.8%) graduated from high school, 4 (11.1%) had a 2-year college degree, 20 (55.6%) had a 4-year college degree, 9 (25%) had a master’s degree, and 2 (5.6%) had a doctoral degree.
The risk attitude of participants for out-of-pocket cost and time varied widely between risk neutral and extreme risk aversion (Figs. 3 and 4, respectively). Risk aversion was strongest and more common with respect to cost than time. Participants had a wide variety of BREAST-Q scores and preferences for attributes. The median time required to complete all sections of the consultation was 46 minutes, though with a wide range from as low as 23 minutes to as long as 127 minutes (Table 2).
We examined the consistency of the 9 combinations of risk models and multiattribute utility functions (Table 3 and Fig. 5). The best-performing preference model was the risk-averse multiplicative model with an average consistency of 78.9%. Although consistencies were lower for the remaining models, particularly the sigmoidal power additive model, they were not significantly different using the Wilcoxon signed-rank test. We achieved a power ≥0.80 for most of the models, except for the risk neutral models. However, the risk neutral multiplicative (post hoc power = 0.78) required 78 samples and the additive and power additive (post hoc power = 0.43) each required 206 to reach 0.8 power. For 2 participants, the consistency results were poor, ranging between 33% and 57%. However, for another two, 100% consistency was achieved.
Minimum correctness resolutions were very small, ranging on average between 0.010 and 0.023 meaning that the models could successfully rank overall breast reconstruction outcomes that differed by 1.0–2.3%. However, maximum error resolutions were large, ranging on average from 0.172 to 0.305, meaning that the models incorrectly ranked outcomes that differed by 17.2–30.5%. Depending on the choice of preference model, the consistency for any one particular individual could vary by up to 16.7% with an average of 7.6%. Indeed, we examined the number of times that a particular model performed the best (or tied) with a given individual. The risk-averse multiplicative model performed the best with 22 participants (Table 3).
We found one factor that might influence consistency: distribution of preference weights for attributes. There was a significant positive correlation with regard to weight distribution (R = 0.382, P = 0.024). In other words, the more equally a participant valued all the attributes, the less agreement between the participant and the preference models. The Spearman rank correlation between education and consistency was not statistically significantly different from zero (R = −0.2956, P = 0.080). However, the power for the given sample size was only 0.5, which suggests that the relationship with education may be need to be reassessed in the future with a sample size of at least N = 67. Time needed to complete the consultation did not seem to have an effect on agreement between the participant and preference models (R = −0.10, P = 0.57, power = 0.15, required N = 614).
Patients cannot be expected to evaluate every possible outcome in breast reconstruction. Through assessing and modeling their preferences with MAUT, we may computationally perform this vital step in breast reconstruction decision making.
Previous studies have made untested assumptions when applying MAUT in the healthcare setting. We have found that these assumptions, while not incorrect, are too stringent in making decisions about breast reconstruction. Although most people tend to be risk-averse, the results demonstrate incorporating risk attitude is not critical in breast reconstruction preference modeling. Specifically, individual patients may be assumed to be risk neutral, which eliminates the need for risk attitude assessment, streamlines the consultation, and saves time. Contrary to previous assertions,5,6,11 choice of a specific multiattribute utility function did not contribute to a significant difference in consistency. Furthermore, as the power additive function requires an additional risk tolerance assessment and yields no improved consistency, it can be eliminated as a candidate.
It would be highly unlikely to achieve 100% consistency for every participant. To begin, there is no perfect method for the elicitation of patient preferences or modeling these preferences. More importantly, preferences change: over the course of the consultation, several participants mentioned how their preferences were altered as they were forced to think about trade-offs (eg, they began to value out-of-pocket cost less and well-being more). It may be better to consider one’s preferences over several days,26,27,28 which is possible given that patients can have 2 or more encounters with their surgeon before surgery. In addition, several participants expressed appreciation that this process helped them understand their preferences.
An interesting matter is the considerably high error resolutions that were observed. We assumed that the participants’ rankings of outcomes were without error. However, they might misrank outcomes, by accident or embarrassment, resulting in abnormally large error resolutions. Upon examining several participants’ preferences and rankings, we found several instances, which strongly indicated that this was the case. Taking into account the participant’s preferences, we found that the participant’s rankings subjectively did not make sense when they disagreed with the preference models at differences between outcome u-values above 0.10 (eg, reconstruction A was chosen by the participant over reconstruction B, even though B had a u-value 0.10 higher than A).
If we assume that participants incorrectly ordered outcomes with error resolutions >0.10, then consistency improves another 14.8% on average (15.7% for the risk-averse multiplicative model). Therefore, the theoretical consistency may be as high as 94.6%. There are a number of sources that contribute to error in consistency (Fig. 6). We are only able to control for error originating from the preference models (hence this study) and, to a limited extent, preference elicitation.
The correlation between consistency and preference weight distribution is not surprising. If a decision maker considers fewer attributes, making trade-offs becomes easier. Another way to view this is that the more strongly one feels about certain attributes, the easier it is to make trade-offs than if one were completely indifferent.
No statistically significant correlation was found between consistency and education. However, the observed P-value was small (P = 0.080) despite the limited sample size (power = 0.57). There may be a negative relationship (R = −0.30) between years of education and consistency. It may be valuable to investigate this with a larger sample size; in particular, with more observations from the extremes (only high school education, doctoral degree).
Although this type of analysis may be useful for most women preparing to undergo breast reconstruction, there is a small minority for whom these efforts would be unproductive given that MAUT was inconsistent with roughly 6% of the participants.
Breast oncologists and reconstructive surgeons spend a great deal of time and effort counseling and educating patients regarding decisions about breast reconstruction.4 Encoding preferences in the form of probability through the method demonstrated here arguably is the most effective use of a woman’s time, requiring less than 34 minutes on average. Rather than focusing on breast reconstruction technical details in their decision making and using imprecise language in the communication of outcomes, we focused instead on clarifying what they know best: what they like in terms of probability. We presented an approach of mathematically assessing women’s preference and evaluated different methods of modeling their preferences for calculating a summary outcome value. For the majority of the participants, the models performed reasonably well in representing their preferences.
Making trade-offs is hard. This issue is exacerbated by the sheer number of possible outcomes, which would overwhelm even the most rational, considerate, and persistent decision maker. The use of MAUT in modeling a patient’s preferences is an important step toward a decision support system that may ease making decisions about breast reconstruction. Further studies should be conducted on patients who have knowledge of their illness. A longitudinal study of change in preferences over time is similarly warranted.