### Journal Logo

Methods: Original Article

# Confounding and Bias in the Attributable Fraction

Darrow, Lyndsey A.; Steenland, N. Kyle

Author Information
doi: 10.1097/EDE.0b013e3181fce49b

## Abstract

The population attributable fraction, also referred to as the attributable risk or excess fraction, is a widely used measure of the public health burden of a risk factor or set of risk factors. Introduced by Levin in 1953,1 the population attributable fraction (AF) combines relative risk and population prevalence of an exposure to estimate the proportion of disease cases that could be prevented by eliminating the exposure. These estimates can then help to inform public health intervention.

Appropriate methods for calculating a population attributable fraction in the presence of confounding have been discussed in the literature.2–4 For a dichotomous exposure, an unbiased estimate of the population AF can be calculated using the following formula: where pc is the proportion of cases exposed, and the RR is adjusted for confounding. A second commonly implemented method of calculating the population AF is: where pp is the proportion of the total population exposed. However, unlike Formula 1, Formula 2 is valid only in the absence of confounding or effect modification. When either are present, an unbiased estimate of the AF using Formula 2 requires calculation of stratum-specific attributable fractions for each stratum, which are then weighted by the proportion of cases in each stratum and combined.5 where pi is the proportion of the population in stratum i that is exposed, and Wi is the proportion of the diseased persons (cases) in stratum i. Formula 1 and Formula 3 are mathematically equivalent.5,6 Practical implementation of this weighted-sum approach can be challenging in situations where there are many confounders, some of which may not be dichotomous. A method using regression has been described by Bruzzi et al,7 but this requires that all variables are categorical.

As previous authors have noted, a frequent error in the calculation of the AF is the use of adjusted risk ratios in Formula 2 instead of the weighted-sum approach in Formula 3.2,3 This commonly occurs when investigators do not have access to the raw data used to estimate relative risks for a given exposure, but wish to calculate an AF for that exposure based on relative risks reported in the literature and on estimates of exposure prevalence in the general population (often taken from ancillary surveys). For example, the World Health Organization (WHO) by necessity estimates the global disease burden for specific risk factors using literature reviews that summarize adjusted relative risks and exposure prevalence estimates.8 Incorrect methods are sometimes used even when the raw data are available to calculate the AF correctly.9

The magnitude and direction of bias when using Formula 2 instead of Formula 3 to calculate an attributable fraction in the presence of confounding have rarely been examined. In one example, the inappropriate use of age- and sex-adjusted relative risks in Formula 2 led to a 17% upward bias in the AF for the relationship between obesity and mortality.2 Although it is intuitive that the direction and magnitude of bias in the AF should be related to the direction and magnitude of confounding, this relationship has not been well described.

Using various raw-data scenarios, we have investigated the relationship between the direction and magnitude of confounding in the original or source data, and the direction and magnitude of bias in the attributable fraction when using Formula 2 (biased) versus Formula 3 (unbiased). We have also investigated whether the relationship between the confounding risk ratio and bias in the attributable fraction is influenced by the strength of the exposure-disease association or the prevalence of the exposure in the overall population (the 2 parameter inputs in the AF calculation). We examined these questions using risk ratios, but results are also applicable to rate ratios and odds ratios in rare-disease contexts.

## METHODS

For simplicity, we first investigated the relationship between confounding and AF bias in the context of a single, dichotomous confounder (C). We manipulated the associations between confounder and disease (D) and confounder and exposure (E) to change the direction and magnitude of confounding, using a general framework described by Arah et al.10 In all these scenarios, the exposure is positively associated with the disease (ie, exposure-disease RR >1.0); we did not consider exposures that were negatively associated with disease because calculation and interpretation of the prevented fraction differs from the attributable fraction.11,12 However, we note that an attributable fraction can be also calculated for preventive exposures simply by defining exposure as a failure to have the protective factor.11

### Raw Data Scenarios

Generation of the raw data is described in detail in the eAppendix (https://links.lww.com/EDE/A433). We generated data under various scenarios of confounding. We also varied the prevalence of exposure and the exposure-disease RR because these are the 2 components of the AF calculation. Each data set was composed of data arranged in a pair of 2 × 2 tables. Under each scenario, we assessed the relationship between bias in the AF and the magnitude and direction of confounding. The total population size was fixed at 1000 for all scenarios, and the risk ratio for the exposure-disease relationship was assumed constant across confounder strata (ie, no effect modification). The Table lists the various scenarios for which we examined the relationship between confounding and AF bias. We also created data scenarios involving 2 dichotomous confounders to explore the generalizability of the results beyond the situation of a single, dichotomous confounder. The eAppendix (https://links.lww.com/EDE/A433) also illustrates how these data were generated.

### Calculation of the Confounding RR and AF Bias

For each scenario of generated cell counts, we assessed confounding and bias in the AF. Confounding was quantified using the confounding risk ratio, which is the ratio of the crude RR to the Mantel-Haenszel adjusted RR. We calculated a “true” attributable fraction by applying the weighted-sum method from Formula 3 above. We calculated a biased AF by incorrectly using adjusted risk ratios in Formula 2. Bias in the AF was expressed as a ratio of the biased AF to the unbiased AF. In secondary analyses, we also assessed AF bias expressed as an absolute difference of the biased AF and the unbiased AF.

## RESULTS

Figure 1 illustrates the general conditions that would lead to positive confounding (confounding RR >1.0) and negative confounding (confounding RR <1.0), and the resulting direction of bias in the AF. In situations where the confounder is positively associated with both exposure and disease (no. 1), or negatively associated with both (no. 2), there is positive confounding (crude RR >adjusted RR). Conversely, when the confounder-disease and confounder-exposure associations are in opposite directions (no. 3 and no. 4), the resulting overall confounding is negative (crude RR <adjusted RR). Again, we consider only scenarios where the exposure is positively associated with the disease because the attributable fraction is not an appropriate measure for a preventive exposure. FIGURE 1.: Possible confounding scenarios and the resulting confounding RR and AF bias.

We present results from 827 scenarios in Figures 2 and 3. For a small number of scenarios (<1%), the constraints used to generate the data led to negative counts within cells of the 2 × 2 tables; these scenarios were excluded. In the figures, bias in the attributable fraction is plotted against the overall confounding risk ratio (RR), which is influenced by both the confounder-disease and the confounder-exposure associations. These associations (specified in the Table) led to scenarios with confounding RRs ranging from 0.28 to 3.38. An AF bias of 1.3 indicates a 30% upward bias in the AF when using the biased method relative to the weighted-sum method; an AF bias of 0.85 indicates a 15% downward bias. FIGURE 2.: Confounding RR (crude RR/adjusted RR) and attributable fraction bias (biased AF/unbiased AF) by prevalence of exposure, when exposure-disease RR = 2.0. A, Confounding RR ≤1. B, Confounding RR ≥1. FIGURE 3.: Confounding RR (crude RR/adjusted RR) and attributable fraction bias (biased AF/unbiased AF) by exposure-disease RR, when prevalence of exposure = 0.5. A, Confounding RR ≤1. B, Confounding RR ≥1.

Overall, when the confounding RR is greater than 1.0, the biased AF underestimates the true AF (ie, the ratio is less than 1). When the confounding RR is less than 1.0, the biased AF overestimates the true AF (ie, the ratio is greater than 1). Bias increases with the magnitude of confounding, increasing as the confounding RR departs from 1.0 (no confounding) in either direction. Prevalence of the confounder influences bias in the AF only through the confounding RR; a strong confounder with a low prevalence yields the same AF bias as a weak confounder with a high prevalence if the confounding RR is the same in the 2 scenarios. The bias is greater when the prevalence of exposure in the total population is lower and when the exposure-disease association is weaker (ie, RRs closer to 1.0), conditions that tend to make the AF smaller. Overall, the amount of bias under most realistic scenarios of confounding is on the order of 10%–20%. The Pearson correlation coefficients, assessing the degree of linear relationship between the confounding RR and the AF bias ratio for each curve shown in Figures 2 and 3, ranges from −0.79 to −0.84 (these are negative because of the inverse relationship between the direction of confounding and the direction of AF bias; Fig. 1). Results from specific scenarios are presented in detail later in the text.

Results plotted in Figures 2 and 3 are based on confounding scenarios 1 and 3 in Figure 1, in which the confounder is also positively associated with the disease. However, we also investigated confounding scenarios 2 and 4, and found identical patterns. Regardless of the relationships driving the overall confounding RR, positive confounding RRs (confounding RR >1.0) always led to underestimation of the AF and negative confounding RRs (confounding RR <1.0) always led to overestimation of the AF.

### Confounding and AF Bias by Prevalence of Exposure

Figure 2 illustrates the relationship between confounding and bias in the AF for various scenarios of population exposure prevalence (0.25, 0.50, 0.67, and 0.75). When the exposure prevalence is smallest, and consequently the AF is smallest, bias in the AF is greatest. As the denominator representing the true attributable fraction gets smaller, discrepancies with the biased AF numerator are magnified and the bias ratio increases or decreases dramatically. For example, the highest degree of bias observed was when the “true” AF was smallest at 8%. This pattern is evident for both positive (confounding RR >1.0) and negative (confounding RR <1.0) confounding. For the scenarios presented in Figure 2, the exposure-disease RR is fixed at 2.0. However, we did investigate the prevalence of exposure under other exposure-disease relationships to confirm that the same patterns were observed.

### Confounding and AF Bias by Magnitude of the Exposure-disease Risk Ratio

We examined the relationship between confounding and AF bias in the context of different exposure-disease risk ratios (1.1, 1.5, 2.0, 2.5, and 3.0). As shown in Figure 3, the relationship between confounding and bias in the AF shows a similar pattern across different risk ratios. However, for the same degree of confounding, smaller risk ratios show greater bias in the AF. This is the same phenomenon observed above for prevalence of exposure; as the RR approaches 1.0, the true AF (the denominator) gets smaller, and the bias is magnified. For the scenarios presented in Figure 3, the prevalence of exposure is fixed at 0.5. However, we did investigate the influence of the exposure-disease risk ratio under alternative prevalence of exposure scenarios to confirm that the same patterns were observed.

### Confounding and Absolute AF Bias

As an alternative to quantifying bias in the AF using a relative measure (biased AF/unbiased AF), we also examined absolute bias in the AF (biased AF/unbiased AF) in relation to the confounding RR. For the data scenarios shown in Figures 2 and 3, the absolute difference in the AF ranged from −15% to +16%. For a given exposure-disease RR and exposure prevalence, the pattern was the same: increasing confounding led to increasing absolute AF bias, with underestimation of the AF for confounding RR >1.0 and overestimation of the AF for confounding RR <1.0. However, unlike the pattern shown in Figure 2 for relative AF bias, a lower prevalence of exposure did not necessarily lead to higher absolute AF bias, and smaller exposure-disease RRs did not necessarily lead to higher absolute AF bias. Results of this analysis are presented in more detail in the eFigure (https://links.lww.com/EDE/A433).

### Multiple Confounders

To explore the generalizability of the results beyond the situation of a single confounder, we also investigated scenarios involving 2 dichotomous confounders (ie, the exposure-disease RR was still confounded if adjusted for only one of the confounders). As in the single-confounder scenarios, we varied the degree and direction of confounding, the prevalence of exposure, and the exposure-disease RR (mentioned in eAppendix, https://links.lww.com/EDE/A433). Under 259, 2-confounder scenarios (scenarios with negative cell counts were excluded), bias in the AF was identical to the single-confounder scenarios for a given confounding RR, prevalence of exposure, and exposure-disease RR. That is, Figures 2 and 3 represent both the 1-confounder and 2-confounder scenario results.

## DISCUSSION

In the presence of confounding, estimates of attributable fraction are biased when adjusted risk ratios are inappropriately applied to Formula 2. Bias in the attributable fraction on the order of 20% might be considered substantial, given that estimates of AF are bounded by 0 and 1. For example, under a scenario with 20% negative bias (relative AF bias = 0.80), an observed AF of 40% in the presence of positive confounding will underestimate the true AF of 50%. The absolute difference between the biased AF and unbiased AF for the data scenarios considered ranged from −15% to +16%. Because these biased methods are persistent in the epidemiologic literature,9 it is useful to understand the degree and direction of expected bias in the attributable fraction when these incorrect methods of calculation are implemented. The direction of bias in the AF is dependent on the direction of the confounding, with confounding risk factors that positively confound the exposure-disease relationship (crude RR >adjusted RR) leading to underestimation of the AF, and confounding risk factors that negatively confound the exposure-disease relationship (crude RR <adjusted RR), leading to overestimation. We found that the confounder-exposure and confounder-disease relationships influenced AF bias only through their aggregate effect on the overall confounding RR. The magnitude of bias in the AF is dependent on (1) strength of the confounding as measured by the confounding RR, (2) prevalence of exposure in the population, and (3) strength of the exposure-disease risk ratio. There is greater relative bias with (1) stronger confounding, (2) weaker exposure-disease associations (RR's closer to 1.0), and (3) lower exposure prevalence.

The influence of exposure prevalence and exposure-disease relative risk on the magnitude of relative AF bias is mathematically intuitive. We defined bias as the ratio of the miscalculated AF relative to the “true” AF, calculated using unbiased methods. For scenarios where the denominator representing the true AF was small, small absolute differences between the numerator (biased AF) and denominator (unbiased AF) result in high relative bias. Because exposure prevalence and relative risk explicitly determine the attributable fraction, scenarios with lower exposure prevalence and lower relative risks lead to smaller “true” AF denominators and higher relative bias. When we defined bias in the AF as an absolute difference (biased AF-unbiased AF), we did not observe the same pattern of smaller exposure prevalence or smaller exposure-disease RR leading to more bias. We also investigated whether prevalence of disease influenced bias in the AF (results not shown); it did not, which is intuitive, given that attributable fraction is by definition calculated only among the diseased population.

The consistent pattern of overestimation of the AF in scenarios where the confounder negatively confounds the exposure-disease RR (ie, confounding RR <1, or crude RR <adjusted RR), and underestimation of the AF when the confounder positively confounds the exposure-disease relationship (ie, confounding RR >1, or crude RR >adjusted RR) can be explained by the weighting used to calculate the unbiased AF in Formula 3. For example, if the confounder and exposure are positively associated with one another (and both are positively associated with disease, as in Fig. 1, scenario 1), then strata where the confounder is present will (1) have a greater case-load and therefore contribute more weight to the weighted sum because the stratum-specific case proportions are the weights, and (2) have a higher prevalence of exposure and therefore a higher stratum-specific AF. If strata with larger AFs contribute more weight to the weighted sum, the unbiased (“true”) AF will be higher. Thus, as the unbiased AF (the denominator) gets larger, the biased numerator (using Formula 2) increasingly underestimates it. The reverse is true when the confounder is negatively associated with the disease (Fig. 1, scenario 3). These same kinds of relationships can be intuited for scenarios 2 and 4 (Fig. 1).

A single dichotomous confounder is not realistic for most epidemiologic studies. We therefore also examined 2-confounder scenarios, and found results identical to the one-confounder results. For a given prevalence of exposure and exposure-disease RR, the confounding RR determined bias in the AF. Although our conclusions are limited to the specific data scenarios examined, the fact that results were identical for the 1-confounder and 2-confounder scenarios provides some evidence of generalizability to multiconfounder situations.

The weighted-sum approach also provides an unbiased estimate of the AF in the presence of effect modification.5 Although not assessed here, effect modification will also influence the degree of bias in the AF when using Formula 2 compared with Formula 3.2 We did not explore effect modification because the possible scenarios are effectively limitless, making generalizations difficult. As Flegal et al2 have demonstrated, extrapolation of the estimated AF from the source population to a target population introduces further bias if differences in the distribution of effect modifiers between 2 populations are not accounted for.

We conducted our analyses in the context of harmful exposures (ie, risk ratios greater than 1.0) because the attributable fraction, by definition, reflects the fraction of disease burden caused by the exposure. Results are equally applicable to preventive exposures if “exposure” is defined as a failure to have the protective factor. Interpretation of the AF in this context is the proportion of disease cases attributable to “nonexposure.”11

Results appear to be consistent with at least 2 real-world examples comparing the biased attributable fraction to unbiased methods. Flegal et al2 demonstrated an upward bias of 17% in the AF reported by Allison et al13 in their assessment of obesity and mortality in the United States. In this specific study context, the exposure of interest, obesity, was negatively associated with male sex and age, 2 important potential confounders. In this case, the confounders were negatively associated with exposure and positively associated with disease (Fig. 1, scenario 3), which would lead to a confounding RR <1.0 and a positively biased AF, as observed.

Greenland14,15 presents another example of AF bias in the opposite direction using data from a study of fetal monitoring and cesarean section rates. In his example, the confounder (arrested labor) was positively associated with the exposure of interest (fetal monitoring) and positively associated with the outcome of interest (cesarean-section) (Fig. 1, scenario 1) and led to an underestimation of the AF when using the incorrect method (Formula 2), as our results would predict. Furthermore, the magnitude of the underestimation reported by Greenland (15%) is also consistent with our results for the given degree of confounding (confounder-exposure RR of 1.5), an exposure prevalence of 0.5 and the exposure-disease risk ratio of 1.24 (Fig. 3).

The findings reported here can potentially help interpret biased attributable fractions reported in the literature. Many studies do report crude and adjusted RRs, from which a confounding RR can be calculated. Our findings suggest that when the crude RR is greater than the adjusted RR, the AF is underestimated, and when the crude RR is less than the adjusted RR, the AF is overestimated. Knowing the direction of bias can help one better interpret incorrectly calculated AFs. Although the magnitude of bias depends on the specific scenario, the scenarios presented here provide a sense of the range of the magnitude of bias under certain conditions (prevalence of exposure and exposure-disease RR). Nonetheless, we reiterate the conclusions of previous authors2–5,9,14: appropriate formulas should be used to calculate the attributable fraction in the presence of confounding.

Incorrect methods are often implemented when relative risk and exposure prevalence estimates are collected from published reports and the data necessary to correctly calculate the AF are unavailable (eg, the WHO's Global Burden of Disease). In these cases, in addition to bias introduced by using the wrong method to calculate the AF, portability of the AF outside of a specific study population relies on the relative risks, exposure prevalence, and distribution of confounders being similar for the study population and target population. Attributable fraction estimates can be very sensitive to minor changes in population characteristics.2 Furthermore, cohort studies often focus on groups of highly exposed persons that do not reflect the exposure distribution of larger, more general populations.4 These issues of portability between populations may dwarf the biases discussed here due to incorrect methods of AF calculation.

If calculated correctly, the attributable fraction is a useful measure of public health burden that can help guide policymakers toward the most promising public health interventions. Unfortunately, errors in estimation of the attributable fraction are common in the epidemiologic literature. In situations where it is not feasible to correct these biased calculations by revisiting the data, knowledge about the direction and magnitude of expected bias in the AF can contribute to a more informed interpretation of attributable fraction estimates.

## ACKNOWLEDGMENTS

We thank Dana Flanders for his helpful comments.

## REFERENCES

1. Levin ML. The occurrence of lung cancer in man. Acta Unio Int Contra Cancrum. 1953;9:531–541.
2. Flegal KM, Graubard BI, Williamson DF. Methods of calculating deaths attributable to obesity. Am J Epidemiol. 2004;160:331–338.
3. Rockhill B, Newman B, Weinberg C. Use and misuse of population attributable fractions. Am J Public Health. 1998;88:15–19.
4. Steenland K, Armstrong B. An overview of methods for calculating the burden of disease due to specific risk factors. Epidemiology. 2006;17:512–519.
5. Benichou J. A review of adjusted estimators of attributable risk. Stat Methods Med Res. 2001;10:195–216.
6. Benichou J. Methods of adjustment for estimating the attributable risk in case-control studies: a review. Stat Med. 1991;10:1753–1773.
7. Bruzzi P, Green SB, Byar DP, Brinton LA, Schairer C. Estimating the population attributable risk for multiple risk factors using case-control data. Am J Epidemiol. 1985;122:904–914.
8. Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJ. Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet. 2006;367:1747–1757.
9. Uter W, Pfahlberg A. The application of methods to quantify attributable risk in medical practice. Stat Methods Med Res. 2001;10:231–237.
10. Arah OA, Chiba Y, Greenland S. Bias formulas for external adjustment and sensitivity analysis of unmeasured confounders. Ann Epidemiol. 2008;18:637–646.
11. Gargiullo PM, Rothenberg RB, Wilson HG. Confidence intervals, hypothesis tests, and sample sizes for the prevented fraction in cross-sectional studies. Stat Med. 1995;14:51–72.
12. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Philadelphia:Lippincott Williams & Wilkins;2008.
13. Allison DB, Fontaine KR, Manson JE, Stevens J, VanItallie TB. Annual deaths attributable to obesity in the United States. JAMA. 1999;282:1530–1538.
14. Greenland S. Bias in methods for deriving standardized morbidity ratio and attributable fraction estimates. Stat Med. 1984;3:131–141.
15. Neutra RR, Greenland S, Friedman EA. Effect of fetal monitoring on cesarean section rates. Obstet Gynecol. 1980;55:175–180.

## Supplemental Digital Content

© 2011 Lippincott Williams & Wilkins, Inc.