Application of the E-Value to Assess Bias in Observational Research in Plastic Surgery : Plastic and Reconstructive Surgery

Journal Logo

Plastic Surgery Focus: Special Topics

Application of the E-Value to Assess Bias in Observational Research in Plastic Surgery

Baxter, Natalie B. B.S.E.; Kocheril, Alex P.; Chung, Kevin C. M.D., M.S.

Author Information
Plastic and Reconstructive Surgery: November 2022 - Volume 150 - Issue 5 - p 1151-1158
doi: 10.1097/PRS.0000000000009624


Observational studies make up nearly 50 percent of the plastic surgery literature and play a major role in shaping clinical practice across medicine.1 However, any type of study must be correctly designed and its limits identified to avert biased conclusions. For example, observational research supporting the use of hormone replacement therapy to bolster the cardiovascular health of postmenopausal women revealed the consequences of inadequate study design, as the benefits were later refuted by evidence from randomized controlled trials.2 This scenario highlighted the advantage of randomized controlled trials to delineate varied treatment effects among a random population. Nevertheless, randomized controlled trials are often not feasible in plastic surgery because of ethical concerns and logistical issues.3 Therefore, observational data are commonly used to evaluate the efficacy and safety of new techniques.4

Despite the need for robust observational research methods, 20 percent of observational studies in plastic surgery do not explicitly control for sources of bias.4 Confounding bias is of grave concern because an unmeasured factor can mask or suggest an inaccurate association between an exposure and an outcome. Fortunately, statisticians derived a novel measure, the “E-value,” to quantify the minimum strength that an unmeasured confounder must have to render a treatment-outcome association insignificant (Fig. 1).5 The E-value has already been applied in various fields, including cardiology, general surgery, psychiatry, and oncology.6–8 For example, a study evaluating the association between exercise and risk of liver cancer found that high physical activity was associated with a 25 percent reduction in liver cancer risk.9 Although plausible, the authors recognized that an unknown factor could undermine the results given that patients were not randomized and the analysis took place retrospectively. In addition, they had not controlled for hepatitis infection, which is associated with a 15- to 20-fold increase in the risk of liver cancer. They calculated an E-value of 1.5, indicating that an unmeasured confounder must increase the risk of the exposure and outcome by at least 50 percent to render the association insignificant. Given that the risks associated with hepatitis infection met this criterion, they concluded that unmeasured confounding could influence the conclusion of the study. Had a larger E-value been calculated, it would have been less likely that an unknown confounder could change the association.

Fig. 1.:
The effects of measured confounders are taken into account when calculating an adjusted risk ratio. The E-value quantifies the potential for unmeasured confounding to overturn the association defined by the risk ratio.

It is critical to identify whether certain factors in plastic surgery, such as patient characteristics or aspects of the investigative setting, lead to worse outcomes in one study versus another. Systematic reviews, which compare the results of primary studies that evaluate the same treatment-outcome associations using diverse methodologies, can facilitate such efforts. Thus, we aimed to identify systematic reviews that reported the strength of different treatment-outcome associations in plastic surgery so that we could then calculate corresponding E-values. Specifically, we hypothesized that the E-values of pooled results in a systematic review would be higher than those from individual observational studies, indicating a lower risk of unmeasured confounding.


E-Value Equation

The E-value is a continuous function of the risk ratio, also known as relative risk, which is a metric that compares the probability of observing a specific event in an exposed group versus a comparison group (Table 1).5 The risk ratio can only be calculated in prospective studies when the total number of people exposed to a treatment can be precisely determined. Other effect estimates may also be used as the independent variable in the E-value equation under special circumstances. (See Table, Supplemental Digital Content 1, which shows circumstances under which different effect estimates may be used to calculate the E-value, This includes odds ratios, which compare the odds of experiencing a favorable versus unfavorable outcome in one group versus another. Unlike risk ratios, odds ratios can be calculated in both retrospective and prospective analyses. Furthermore, the hazard ratio, which compares rates of change between two subgroups, may also be used as the independent variable in the E-value equation. Odds, hazard, and risk ratios (effect estimates) may take on values less than, greater than, or equal to 1. A ratio below 1 indicates that the risk or likelihood of an event is lower in the exposed group, whereas a value above 1 indicates that the outcome is more likely in the exposed group. For example, a ratio of 1.75 means that the exposed group is 1.75 times more likely (or 75 percent more likely) to have the outcome. Meanwhile, a value equal to 1 implies that the odds, risk, or hazard are equal in both the exposed and comparison groups.

Table 1. - E-Value Equations for Risk Ratios*
Estimate or CI, by Direction of Risk Ratio Computation of the E-Value
RR > 1
 Estimate E-value = RR + √[RR × (RR− 1)]
 CI If LL ≤ 1, then E-value of CI = LL + sqrt[LL × (LL-1)]
RR = 1 E-value for the RR cannot be calculated; this indicates no risk of unmeasured confounding
RR < 1
 Estimate Let RR* = 1/RR
E-value = RR* + √[RR* × (RR* − 1)]
 CI If UL ≥ 1, then E-value of CI = 1
If UL < 1, then let UL* = 1/UL and E-value of CI = UL* + √[UL* × (UL* − 1)]
LL, lower limit of the CI; RR, risk ratio; RR*, inverse of RR; UL, upper limit of the CI; UL*, inverse of UL.
*Adapted from VanderWeele TJ, Ding P. Sensitivity analysis in observational research: Introducing the E-value. Ann Intern Med. 2017;167:268–274.

The E-value has a minimum possible value of 1, which signifies that no unmeasured confounding is needed to explain away the association between an exposure and an outcome. As the value of the effect estimate moves farther from 1, the E-value also increases almost linearly. A larger E-value indicates that a stronger confounder is necessary to refute the association. It is also important to consider the stability of the E-value estimate, as its calculation relies on estimated, imprecise data. Therefore, VanderWeele and Ding recommended calculating an E-value for either the upper or lower confidence interval of the effect estimate, depending on which value is closer to the null (Table 1).

Literature Search and Screening

We conducted a systematic search of the PubMed/MEDLINE and Embase databases to identify meta-analyses of observational studies published in leading plastic surgery journals including Plastic and Reconstructive Surgery (impact factor, 4.24), the Journal of Plastic, Reconstructive & Aesthetic Surgery (impact factor, 2.39), and Annals of Plastic Surgery (impact factor, 1.35 in 2018). We also included articles from Plastic and Reconstructive Surgery Global Open, which did not have an impact factor at the time of the search on May 10, 2020. We used the advanced searching method in each database to specifically search for articles in these journals, and we used the available filters to narrow the search to include meta-analyses only. All studies published before and up to the search date were included and underwent subsequent review.

Two individuals (N.B.B. and M.B.) performed title and abstract screening to exclude any studies that were not published in the aforementioned journals or were not meta-analyses, meaning that they did not contain a comprehensive analysis of data from multiple primary investigations. Subsequently, two authors (N.B.B. and A.K.) performed full-text screening of the remaining studies to identify the type of analyses conducted. We included articles that contained pooled effect estimates, which are frequently calculated in meta-analyses, to summarize the results of multiple independent studies that evaluated the same exposure and outcome.10 These articles typically used random effects modeling to take into account the heterogeneity of studies and the greater weight of studies that provide more information. For example, large studies that report data on a wider array of covariates have larger weights, and thus contribute more to the pooled result than smaller studies. In addition, for studies to be included in our review, it was necessary for the type of effect estimate to be an odds, hazard, or risk ratio, as these are the only values that can be used to calculate the E-value. Thus, studies that contained only pooled estimates of mean difference or difference ratios were excluded. Although meta-analyses, in the most traditional sense, consist solely of data from randomized controlled trials, studies that contained only data from randomized controlled trials were also excluded, given that the E-value pertains only to observational data.5

Data Extraction and Analysis

Along with the publication date and journal of each included meta-analysis, we noted whether risk, odds, or hazard ratios were reported. We also identified which treatments and outcomes were compared. For each meta-analysis, one author (A.K.) extracted the individual effect estimates from each study along with the pooled effect estimates (Fig. 2). A second author (N.B.B.) also extracted effect estimates from every third included study to ensure that data were obtained correctly. We used the pulled effect estimates to calculate E-values according to the equations in Table 1. We then compared the distribution of E-values from pooled versus individual studies by calculating the mean, median, and interquartile ratio of values. In addition, we performed a Mann-Whitney U test to compare the medians of the pooled versus individual E-values. Results were considered significant for values of p < 0.05. All analyses were performed in RStudio version 1.4.1106.

Fig. 2.:
Data extraction process. RR, risk ratio.


We identified 223 unique articles, of which 45 met the inclusion criteria (Fig. 3). (See Table, Supplemental Digital Content 2, which shows the included studies, Each included meta-analysis contained an average of three pooled assessments of observational data, and each pooled assessment covered an average of seven individual effect estimates. Twenty-four studies reported odds ratios and 21 studies reported risk ratios. The E-value of the pooled effect estimates ranged from 1.11 to 19.49, with an average ± SD value of 3.82 ± 2.65 (median, 3.11; interquartile range, 2.72). As for the individual effect estimates from each primary study, the E-values ranged from 1.00 to 321.50, with an average value of 8.74 ± 19.91 (median, 3.87; interquartile range, 6.06). The median of the E-values from pooled effect estimates differed significantly from the E-values from individual effect estimates (Mann-Whitney U = 96432; p = 0.0001).

Fig. 3.:
Study selection process.

The distributions of E-values calculated across all studies, from both the individual study effect estimates and the pooled effect estimates are displayed in Figure 4. For the pooled effect estimates, 24.8, 22.9, and 20.3 percent of E-values were from 1.00 to 1.99, 2.00 to 2.99, and 3.00 to 3.99, respectively. As for the individual effect estimates from each primary study, 22.2, 17.4, and 11.8 of E-values were from 1.00 to 1.99, 2.00 to 2.99, and 3.00 to 3.99, respectively (Fig. 4).

Fig. 4.:
Distribution of E-values from pooled and individual effect estimates.


In this scoping review of 45 meta-analyses, we sought to evaluate the use of the E-value for plastic surgery research. Using data from individual observational studies, we determined that the E-value varies substantially across the literature, as values reached an average of 8.74, with an interquartile range of 6.06. Meanwhile, the E-values calculated using pooled data were considerably lower, in contrast to our hypothesis, at an average of 3.82, with an interquartile range of 2.72. At face value, these findings suggest that consolidating results in meta-analysis increases the risk that unmeasured confounding can undermine an established treatment-outcome association, as a lower E-value indicates that a weaker confounder is needed to explain away the association. However, a limitation of the E-value is that its interpretation depends heavily on the context of the study, including the methods, treatments, and outcomes assessed.11 There is no clear distinction between E-values of small and large magnitude.

This pitfall has been highlighted in other assessments of the E-value. For example, Blum and colleagues conducted a study to evaluate use of the E-value across the peer-reviewed literature.12 The authors identified two studies that evaluated perinatal outcomes. Although the studies had reported similar E-values, one concluded that unmeasured confounding was likely, whereas the other did not. In addition to the concern that E-values will be misinterpreted, one must consider that E-values are expectedly smaller in one field of inquiry versus another. For instance, although a factor such as smoking status would have a larger impact on the risk of wound infection than patient gender, the relationship may differ for other conditions. Ioannidis and colleagues explained that small effect estimates are common in fields such as genetic epidemiology.11 In fact, large effect estimates may even raise alarm and suggest that a study has biases.

In addition to unmeasured confounding, limitations of observational research include measurement errors and selection bias, which arise when the study sample does not represent the intended population. These factors can inflate effect estimates and lead investigators to calculate higher E-values that suggest an unreasonably low risk of unmeasured confounding. This may explain why we calculated higher E-values based on the effect estimates reported in individual studies versus meta-analyses. Selection bias is more likely in a smaller study, which would contribute less to the pooled effect estimate calculated in a meta-analysis. For example, consider one meta-analysis we identified that compared the complication rates associated with various types of acellular dermal matrices for implant-based breast reconstruction.13 The authors included five observational studies that reported risk ratios for seroma formation ranging from 0.65 to 3.61 when two different acellular dermal matrices were compared. We input these data into the E-value equation to determine that an unmeasured confounder must have a value between 1.16 and 6.68 on the risk ratio scale to explain away the associations established in these studies. Meanwhile, the E-value calculated based on the pooled risk ratio was 1.25, indicating that an unmeasured confounder would need to impact the exposure and the outcome by a minimum of 1.25-fold to explain away the association. The authors acknowledged that two of the five studies were heterogeneous with respect to patient- and operation-related characteristics, such as age and mean tissue expander size. If the authors determined that there is an increased risk of seroma formation based on these factors, perhaps using data from the extant literature, they could provide concrete reasons for the varied results they identified across studies. Alternatively, if potential confounders could not be identified, the other limitations of observational research may have caused the variation in results.

In the aforementioned meta-analysis, the E-values we calculated for the primary studies varied nearly six-fold, suggesting that some authors were either not using robust study methods or that unmeasured confounders were not accounted for (Table 2). However, we also identified meta-analyses that had relatively consistent E-values across primary studies. For example, Ohkuma and colleagues compared breast reconstruction outcomes after free flaps were planned with preoperative computed tomographic angiography versus Doppler ultrasonography.14 Their meta-analysis included five studies that reported risk ratios for flap necrosis, with risk ratios and E-values that ranged from 0.90 to 1.03 and 1.21 to 1.46, respectively. The pooled E-value, calculated based on the pooled risk ratio of 0.97, was 1.21. This study falls under category C in Table 2. Although the E-values are relatively low, the consistency of the risk ratios across studies supports the validity of the weak association between imaging technique and risk of fat necrosis—it is unlikely that identifying and controlling for additional unknown confounders would change the observed association and increase the E-value.

Table 2. - Possible Interpretations of the E-Value
Possible Result Conclusion Implication
A. There is significant variation in E-value across studies. Authors are not using robust study methods and/or research has not elucidated the unmeasured confounders that influence the effect of the exposure on different cohorts. Further research is needed to identify unknown confounders that make certain study designs more prone to bias.
B. There is little variation in E-value across studies and the E-value is high. Research on this topic is high-quality and there is little risk of unmeasured confounding. Different studies yield consistent results despite different methods. This suggests that certain observational methods that are usually considered low quality can provide valid results.
C. There is little variation in E-value across studies and the E-value is low. Research on this topic is likely high quality. Although there appears to be a risk of unmeasured confounding, it is also possible that the treatment/outcome association is truly weak. Different studies yield consistent results despite different methods. Although the low E-value alone implies a risk of unmeasured confounding, the consistency across studies suggests that the results are valid.

The E-value is beneficial because it facilitates quantitative assessment of the potential for unmeasured confounding, in contrast to existing sensitivity analysis techniques that are more subjective. However, given the potential drawbacks and misinterpretations of the E-value, we believe it is ideal to use it in studies when a confounder is known to exist but cannot be measured directly. This is in line with the current use of the metric in other fields; Blum and colleagues reported that nearly 40 percent of articles reporting E-values named specific confounders that were not addressed in the analysis. Approximately half of these articles then concluded that confounding from the unmeasured confounder was likely to affect the results. Even if authors cannot determine whether they are overlooking certain confounders, the E-value may still be calculated and interpreted in light of measured confounders. Furthermore, the statisticians who derived the E-value recommend reporting E-values for the confidence interval in addition to the effect estimate.5 Nevertheless, it is important to emphasize that calculating the E-value alone is not a sufficient sensitivity analysis. Careful consideration of measured and unmeasured confounders, the study population, and the treatment-outcome association under investigation is necessary to aptly deduce the robustness of a study. In addition, it is critical to perform sensitivity analysis in meta-analyses to determine whether studies have a high risk of bias and should be excluded from any pooled assessments.

This study has several limitations. It is possible that we missed relevant meta-analyses during our search, although we did pull studies from two separate databases. Furthermore, our analysis was vulnerable to the biases inherent in the E-value. For example, our interpretations of low and high E-values were subjective in nature and did not take into account the unique context of each individual study. In addition, we did not perform a quality assessment of the studies included in this review. However, our primary goal was to explore the type of plastic surgery studies for which the E-value may be useful, and to extract data for E-value calculations. Furthermore, several studies have already investigated the quality of observational research and systematic reviews in plastic surgery, with the conclusion that there is substantial room for improvement. We have shown that the E-value may be applied in nearly any observational study or meta-analysis that uses effect estimates. It can aid future researchers in performing bias assessments and fill in these gaps in quality.

Unlike more subjective sensitivity analyses, the E-value provides plastic surgery researchers with quantitative values for assessing study bias. Given the substantial amount of outcomes research in plastic surgery that is observational in study design, the E-value may be particularly useful for uncovering the subtle differences between studies that evaluate the same procedure techniques and postoperative complications. Subsequently, investigators may also uncover unmeasured confounders that undermine the quality of observational research in plastic surgery.


The authors thank Melissa Beyrand for assistance with preliminary article screening for this study.


1. Sugrue CM, Joyce CW, Carroll SM. Levels of evidence in plastic and reconstructive surgery research: Have we improved over the past 10 years? Plast Reconstr Surg Glob Open. 2019;7:e2408.
2. Hersh AL, Stefanick ML, Stafford RS. National use of postmenopausal hormone therapy: Annual trends and response to recent evidence. JAMA. 2004;291:47–53.
3. Hassanein AH, Herrera FA, Hassanein O. Challenges of randomized controlled trial design in plastic surgery. Can J Plast Surg. 2011;19:e28–e29.
4. Agha RA, Lee SY, Jeong KJL, Fowler AJ, Orgill DP. Reporting quality of observational studies in plastic surgery needs improvement: A systematic review. Br J Surg. 2015;102:1213.
5. VanderWeele TJ, Ding P. Sensitivity analysis in observational research: Introducing the E-value. Ann Intern Med. 2017;167:268–274.
6. Wadhera RK, Khatana SAM, Choi E, et al. Disparities in care and mortality among homeless adults hospitalized for cardiovascular conditions. JAMA Intern Med. 2020;180:357–366.
7. Sheehy O, Zhao JP, Bérard A. Association between incident exposure to benzodiazepines in early pregnancy and risk of spontaneous abortion. JAMA Psychiatry. 2019;76:948–957.
8. Fisher DP, Johnson E, Haneuse S, et al. Association between bariatric surgery and macrovascular disease outcomes in patients with type 2 diabetes and severe obesity. JAMA. 2018;320:1570–1582.
9. Baumeister SE, Leitzmann MF, Linseisen J, Schlesinger S. Physical activity and the risk of liver cancer: A systematic review and meta-analysis of prospective studies and a bias analysis. J Natl Cancer Inst. 2019;111:1142–1151.
10. Akobeng AK. Understanding systematic reviews and meta-analysis. Arch Dis Child. 2005;90:845–848.
11. Ioannidis JPA, Tan YJ, Blum MR. Limitations and misinterpretations of E-values for sensitivity analyses of observational studies. Ann Intern Med. 2019;170:108–111.
12. Blum MR, Tan YJ, Ioannidis JPA. Use of E-values for addressing confounding in observational studies: An empirical assessment of the literature. Int J Epidemiol. 2020;49:1482–1494.
13. Lee KT, Mun GH. A meta-analysis of studies comparing outcomes of diverse acellular dermal matrices for implant-based breast reconstruction. Ann Plast Surg. 2017;79:115–123.
14. Ohkuma R, Mohan R, Baltodano PA, et al. Abdominally based free flap planning in breast reconstruction with computed tomographic angiography: Systematic review and meta-analysis. Plast Reconstr Surg. 2014;133:483–494.

Supplemental Digital Content

Copyright © 2022 by the American Society of Plastic Surgeons