Attributable Fraction Across Different Exposures
AFs for different exposures may be combined to determine an overall AF (eg, the overall occupational AF for leukemia due to exposure to ethylene oxide, benzene, and radiation). If 3 exposures are statistically independent (ie, experiencing one makes an individual no more or less likely to experience the other), and their joint effects are multiplicative, then,8
Simply summing the exposure-specific AFs may result in an overall AF >1, because it “double counts” cases that could be avoided by removing either one or the other exposure.17 If exposure-specific AFs are small, however, a simple sum will approximate AFoverall.
Source and Target Populations: Portability
The previous section implicitly assumed that we want to estimate the fraction of cases attributable to exposure in precisely the same population in which we have results from an epidemiologic study. An example would be a population-based case–control study representative of the population for which we sought to estimate the attributable fraction. However, often we seek to use the results from one or more epidemiologic studies (source populations) to estimate the attributable fraction for a different (target) population. For example, a case–control study (or studies) done in one area of a country may be the source population, but the entire population of the country may be the target population, for which the case–control study is not necessarily representative.18 Similarly, a cohort study of a specific exposed population may be the source of an RR, which is then used to estimate an AF in the general population19,20. Alternatively, AFs based on data in one country may be applied to another country.1 Target populations may even be hypothetical future populations, for example, if we want to estimate future disease burdens (eg, future mesothelioma due to past asbestos exposures21). In this regard, Murray and Lopez22 make the distinction between attributable burden (the current burden) and avoidable burden (the future burden); the latter is potentially preventable through intervention.
This distinction between source and target population raises some important questions.
First, because the attributable fraction depends not only on the relative risk due to exposure, but also the fraction of the population exposed, and somewhat on the distribution of confounders in the population, all these factors must be the same in the source and target populations for the AF estimated from the source population to apply also to the target population. Thus, the AF is not generally portable from one population to another.
In many cases, there is no RR available for the target population, but the exposure prevalence in the target population is known. In this situation, it is common to use the RR from the source population (on the assumption that this is a reasonable approximation of the RR in the target population) and the exposure prevalence in the target population exposed. Hence, using formula 3, we have:
If there are other risk factors associated with exposure in the target population (confounders), then again one should use the “weighted-sum” method described here. This means using the confounder-adjusted RR from the source, separately for each target population stratum as defined by these confounders, and then obtaining an overall weighted-average AF from the stratum-specific AFs. This method assumes no effect modification of RRs by these risk factors in the source population, as discussed subsequently.
If the source RR estimates are by level of exposure, then the process described here can be applied to an extension of formulas (4) or (4a) analogous to the extension of formula (3) to formula (6), thus allowing not only proportion of exposed to vary between source and target populations, but also proportions at different levels of exposure.
There may be problems of portability for the RRs as well as for the proportions exposed. For example, if relative risks for occupational exposures are from cohort studies, they may implicitly assume a rather strict (high) definition of “exposed.” Applying such relative risks to a target population using general census data on occupation to estimate the proportion exposed would be inappropriate because the relative risks would be inappropriate for this different definition of exposure. This is so because cohort studies may represent more highly exposed workers (with correspondingly high RRs), whereas the average level of exposure among all exposed workers in the population is lower (with correspondingly lower RRs). Similar problems may occur for chronic disease when the latency between exposure and disease differs in source and target population (ie, longer latency in the source population resulting in higher RRs for disease requiring long latency).
RRs may also not be portable because of effect modification by other population characteristics such as smoking or diet. For example, if there were more smokers in the target population than the study population, and there was a positive interaction between smoking and the exposure, the overall RR in the target population would be higher in the target than the study population. If there is such effect modification, to obtain an unbiased AF estimate in the target population, it is necessary to use the RRs specific to each level of the effect modifier in the source population for calculating AF in the target population through the “weighted-sum” method.12
Attributable Fractions Based on Different Study Designs
In general, the formulas for the AF in the first section can be based on data from any of the 3 common study designs (cohort, case–control, prevalence), although in the case of prevalence (cross-sectional) studies, the AF will represent the proportion of prevalent rather than incident cases that could be avoided if exposure were absent.
One common method of calculating AFs is to take the RRs from cohort studies of exposed (source) populations and to estimate the prevalence of the exposure in question in the target population using secondary sources. As noted above, exposed cohorts will frequently have exposure patterns not typical of the exposed among the general population, adding extra difficulty to source–target extrapolations beyond, say, population-based case–control studies. An additional problem sometimes arises in source–target extrapolation from cohort studies when the exposure prevalence estimate for the target population comes from a survey of those “currently exposed” when, for chronic diseases with a required latency, what is needed is an estimate of the “ever exposed.” Sometimes, the percentage “ever exposed” can be estimated from the “currently exposed.”19
Case–control studies may also provide the basis for AFs. If they are population-based, they may also provide a good estimate of the proportion of cases, which are exposed in the general population (which may be the target population).
Some disease–exposure associations are so strong that (RR − 1)/RR ≈ 1, so AF = pc (RR − 1)/RR ≈ pc. In this case, the AF is approximately the proportion of cases exposed. This might be assumed, for example, in estimating the proportion of liver cancer due to occupational exposure to vinyl chloride monomer or the proportion of mesothelioma due to exposure to asbestos.
Variance of the Attributable Fraction
Calculating the variance of the AF is often not straightforward, because it involves 2 estimated parameters, possible confounding, and possible extrapolation from source to target population. We focus on the variance of the AF for the entire population (AFpop, formulas 2 and 3), which is usually the AF of interest (rather than the AFexp). When confounding affects the estimation of RRs using in calculating AFpop (formula 2 or 3), there are in general no closed-form formulas for the variance, and some approximate form must be used. There are a number of variance formulas depending on the method of calculating the AF and whether one is concerned with a target population that is the same or differs from the source population.
AFpop (formula 2). An approximate expression for the variance23,24 of the AFpop in the source population that uses the prevalence of exposure among cases is
Formula (7) can be used to get the upper and lower confidence limits of the AF (ULAF and LLAF). ULAFpop is 1 − exp(LL(ln(1 − AFpop), LLAFpop is 1 − exp(UL(ln (1 − AFpop), and corresponding limits of ln(AFpop − 1) from (7) are ln(1 − AFpop) +/− 1.96(√var(ln(1 − AFpop). Formula (7) can be used whether or not the RR is adjusted for confounding, assuming approximate homogeneity of the RR across confounder stratum. It is based on a single summary RR and can be used when the RR is a Mantel-Haenszel RR or is derived from a regression model. The exposure status of the cases in the source population must be known. Formula (7) can be used for data from case–control, cohort, or prevalence studies.24
AFpop (Formula 3) Use of Exposure Prevalence Data From a Separate Target Population, No Confounding in the Target Population. When pp is be used rather than pc (AFpop formula 3), pp may come from ancillary survey data on a target population rather than from the study population. A formula to estimate the variance (or standard deviation) of an AFpop in the target population based on pp and incorporating the variance of a pp taken from a survey is available25:
where T = number in survey and O = pp/(1−pp). This formula assumes that the RR in the source population is applicable (portable) to the target population and that there is no confounding in the target population. This formula can be used to obtain 95% Wald limits (which must be converted back to the original AF scale), but it is unwieldy and gets more complicated if either RR or O is estimated through a regression analysis or a complex survey.
Other Methods for the Variance of AFpop in the Source Population in the Presence of Confounding. Benichou10 gives several references for calculating the variance of the AF (either formula 2 or 3) calculated using the weighted-sum approach. These methods involve in general assuming specific distributions of RRs, depending on the study design used to derive the RRs, and then using the delta method to obtain approximate variances within each stratum. Benichou also provides references for obtaining variances when the RRs are obtained from regression models, again based on the delta method. Other authors26,27 have suggested bootstrap methods in conjunction with logistic regression models.
A Monte Carlo Approach to Calculating the AFpop and Its Variance. An alternative approach to using these formulas is to use a Monte Carlo approach for calculating the variance of the AF.25 This approach can incorporate the usual random error (sampling error) in the RR and exposure prevalence as well as other potential sources of uncertainty such as uncontrolled confounding or extrapolation from a source to a target population.
Consider the simple situation in which the only source of uncertainty is random error and one seeks only to estimate the variance of the AF in the source population, and there is no confounding. A distribution for an exposure/disease-specific RR (eg, log normal) and for the exposure prevalence of cases (AF formula 2) or the population exposure prevalence (AF formula 3) may be specified (eg, binomial). Then repeated draws from these distributions will yield repeated AFs, and then a point estimate and distribution for the AF can be obtained as well as a Monte Carlo 95% interval.
If there is confounding in the source population, and one wishes to use formula (3) based on the study population exposure prevalence, the AF can be calculated using the weighted sum method described previously. This can again be done by Monte Carlo simulation, specifying distributions of RR and exposure prevalence in each confounder stratum, and combining stratum with the chosen weights. Again, after repeated draws and repeated calculations of the AF, Monte Carlo limits can be obtained.
Additional uncertainty regarding inference to a target population can be incorporated into the Monte Carlo simulations. Greenland25 gives some examples of how this might be done.
An alternative to simulation, in which a distribution of RRs and exposure prevalence is assumed, is a bootstrap approach in which the variance of the AF can be obtained through repeated resampling with replacement from the original data.25 However, this requires that the investigator has access to that data.
Additional Source of Variance in Summary Attributable Fractions Across Risk Factors. All AF calculations require the assumption that the RRs used are causal. Investigators may want to combine AFs from several risk factors to derive a summary AF across all of them, and judgment about which AFs are causal becomes an important source of variance. For example, investigators may seek to calculate an AF for all cancers due to occupation. A more liberal interpretation of associations likely to be causal will result in a higher summary AF. In 2 recent studies, the AF for cancer due to occupational exposures among men was 5% in one study19 versus 14% in the other,20 primarily due to the inclusion of more exposure/diseases associations judged to be causal in the latter study (Appendix, available with the online version of the journal). Both studies agreed on 19 well-established associations, but the study with the higher AF accepted another 19 less well-established associations as causal. It would be possible to incorporate this additional uncertainty in estimating the variance of the AF, resulting in wider confidence limits using a Monte Carlo approach by assigning distributions to beliefs about the causality of each association.
MEASURES USED IN CONJUNCTION WITH ATTRIBUTABLE FRACTIONS: YEARS-OF-LIFE-LOST AND DISABILITY-ADJUSTED LIFE-YEARS
Measures of YLLs or of DALYs are often calculated for specific diseases without regard to the exposures causing the disease. We review these measures and note their use, under appropriate assumptions, to encompass YLLs for a specific risk factor that causes a specific disease (simply by multiplying the measures by the corresponding AF).
This use of YLLs and DALYs is becoming more common.1–5
Years of Life List
YLLs are the years of life lost due to premature mortality from a specific cause of death. YLLs were originally used to allow meaningful comparisons of burden across different causes of death—without any reference to specific exposures or risk factors. Their key feature is that they take time lost to premature mortality into account. YLLs are sometimes called YPLL or years of potential life lost. There are many ways of calculating YLLs.28 All of them involve multiplying the number of deaths in a given population by some life expectancy (eg, the life expectancy at the average age of death for the cause of interest in the target population).
where N = total deaths from the specific cause and L = standard life expectancy after the average age at death from the cause of interest.
Differences arise regarding what one assumes for age-specific life expectancy. WHO uses age-specific life expectancies from Japan, which has the longest overall life expectancy of any country (life expectancy at birth is 80 for men and 83 for women). However, others (such as the CDC) truncate that life expectancy at some specific age: the CDC truncates life expectancy at age 75 when calculating YLLs. This calculation gives more weight to YLLs at earlier ages. Such truncation not only results in different absolute numbers of YLLs (lower with earlier truncation), but also shifts the proportion of YLLs for different diseases depending on the distributions of the age of death for these diseases. For example, with no truncation, heart disease had the highest proportion of YLLs in the United States in 1986 (41%), whereas cancer had 33%.28 With truncation at age 75, heart disease decreased to 32%, whereas cancer was 34%. The absolute number of YLLs decreased approximately 50% due to this truncation.
Once one has calculated the YLLs for a given cause, the YLLs due to a specific risk factor (attributable YLLs) can be calculated simply by multiplying the YLLs by the AF for the risk factor (or risk factors) in question for that cause.
Formula (9) is an approximate one because life expectancy is not a linear function of age. One can obtain a more accurate estimate of YLL by use of life table methods. In this approach, Nis, for each age group usually 1- or 5-year intervals are multiplied by the life expectancy at that age, Li.
Miller and Hurley29 have provided a clear exposition of how to calculate YLLs in this way.
DALYs offer the advantage of enabling calculation of a burden of disease that includes both morbidity and mortality and that can be compared across different diseases. For a specific disease,
where YLL is the years of life lost due to premature mortality, as discussed previously, and YLD is the years of life lived with disability due to disease incidence. YLD is defined as:
where I = number of incident cases, DW = disability weight, and L = mean duration of disease. If data on incidence are not available, they are sometimes estimated from mortality data and case-fatality rates.
Like YLLs, YLDs may be calculated for a specific disease independent of risk factors. The resulting DALYs are then multiplied by the AF for a specific risk factor to determine the DALYs attributable to that risk factor (there is a simplifying assumption here that disease progression is the same for any given risk factor, although in some cases, a disease category might be really a mixture of heterogeneous diseases, each with a different clinical course and each caused by different risk factors).
The disability weights (DWs) seek to weight the disability of living with different diseases (in some cases, the weight is reduced for treated disease). They are necessarily based on subjective judgment, generally originating from a panel of experts. Table 1 gives some examples of disability weights with the highest possible being 1.0.
DALYs calculated for a range of specific diseases without reference to risk factors are available from WHO.30
WHO uses a somewhat more complicated method for calculating YLLs and YLDs than that presented in formulas (9) and (11). This method downweights or discounts years farther away from the year of death on the theory that these years are less valuable. Formulas (9) and (11) become
where r = 0.03 is the discount rate.
The disability weights and the discount rate can highly influence DALYs.31 Given the subjective nature of the disability weight and discount rate, as a practical matter, one might consider conducting sensitivity analyses that vary these 2 variables.
DALYs are similar to other measures taking quality of life into account such as quality-adjusted life-years (QALYs).32,33 QALYs adjust each year lived for the existence of specific health impairment (rather than considering disease-specific disability, like in DALYs) usually based on some rating scale (eg, impairment of mobility or usual activities, existence of pain or anxiety/depression). Ratings typically come from surveys of patients or caregivers. QALYs are age-specific, being the product of life expectancy times the impairment factor (1 for no impairment and 0 for total impairment, ie, death).34 They are frequently used to compare interventions (eg, one drug vs another to treat breast cancer), and to calculate a cost–utility ratio between 2 interventions ([cost of intervention A − cost of intervention B]/[QALYs resulting from intervention A − QALYs resulting from intervention B]).
As with AFs, estimation of YLLs or DALYs using these methods is based on assumptions that are typically only approximately met and not testable.35 For example, they assume that among the exposed who succumb to the disease in question, vulnerability to other causes of death is not also heightened. These limitations should not be forgotten, but neither should they prevent use of these measures as broad-brush descriptors of disease burden.
We have attempted to provide an overview of the issues in estimating the public health burden of specific risk factors for disease. We believe that estimation of the public health burden is useful for policymakers and for the public, because it indicates the potential benefits of intervention, and that it should be encouraged among epidemiologists.
Several issues make estimation of risk factor burden—whether as AFs, attributable YLLs, or attributable DALYs—somewhat more difficult than the usual epidemiologic estimation of standard measures such as rate ratios and odds ratios. One issue is the standard question of causal inference (ie, are the RRs causal?). AFs have the advantage of forcing epidemiologists to make more clear their judgments about causal inference (ie, does a given study or studies provide sufficient evidence for causality to justify calculating public health burden with the implied possible intervention to reduce exposure?).
A related issue is whether the risk factor for which the AF is calculated is in fact susceptible to an intervention.36 Depending on the feasibility of the intervention, the AF may be either meaningless, poorly defined, or overestimated. For example, being over age 60 will be a risk factor with a very high AF for most chronic diseases, but it makes no sense to calculate such an AF because we cannot intervene to prevent aging. Obesity (body mass index >30) is a seemingly straightforward risk factor. However, an AF for obesity is poorly defined because it is not clear what intervention would result in eliminating this risk factor. If it were increased physical activity, then the risk factor of interest (and corresponding RRs) should be lack of physical activity, not obesity (which is itself presumably an intermediate variable). In addition, it is unlikely that interventions to prevent obesity will completely eliminate obesity, so that in practice an AF for obesity will be an overestimate of what can be achieved by current interventions (although in theory a new pill might be invented which would make it possible to eliminate obesity altogether).
An issue specific to some AFs is the additional complexity stemming from the problem of portability of parameters from the source population to the target population. This last issue is analogous to the “external validity” or “generalizability” of findings from any epidemiologic study, but it takes on a more quantitative aspect because a quantitative measure of burden must be estimated for the target population.
Another issue involves the nature of the AF itself. Although the AF is usually thought of as a measure to assess public health burden, it is worth recalling that the AF represents the fraction of cases attributable to exposure rather than the absolute number of cases, and the absolute number may be the more important statistic (DALYs, unlike AFs, are an absolute measure). When comparing the public health burden due to a single exposure across different diseases, it will be necessary to multiply the AF by the total number of cases due to each disease in the population to obtain the actual number of cases attributable to the exposure. An exposure with a high AF for a disease of low incidence may cause fewer cases than for another more common disease for which the exposure has a low AF. Wacholder37 gives the example of the BCRA mutation, which has a high attributable fraction for ovarian cancer and a low attributable fraction for breast cancer but will cause more cases of breast cancer because breast cancer is much more common than ovarian cancer. Wacholder, in an argument analogous to the traditional one regarding the risk difference versus the risk ratio, proposes that the “attributable community fraction” (ACR = I – I0, the numerator of formula 3a) may be more relevant to public health than the AF itself.
The AF (or the ACR) provides a bridge by which results of epidemiologic studies can be made relevant to public health policy. It also forces the epidemiologist to examine of the relevance their work—if relative risks and proportions exposed are small, implications for public health may be minor and the usefulness of research questionable. Another question arising naturally in the calculation of the AF is whether the risk factor can be eliminated by intervention, and if so, what is the benefit to society from eliminating or reducing the risk factor? The downside of urging epidemiologists to confront these questions routinely is that it takes epidemiologists as impartial scientists and thrusts them more clearly into the political arena of public health.38 The advantage is that it forces them to think of the impact of their work in society as a whole.37,39,40 Public health agencies are increasingly using measures of public health burden to make the case for (or against) specific interventions; epidemiologists are somewhat behind.
1.Ezzati M, Lopez A, Rodgers A, et al. Comparative Risk Assessment Collaborating Group. Lancet
2.McKenna M, Michaud C, Murray C, et al. Assessing the burden of disease in the US using DALYs. Am J Prev Med
3.Centers for Disease Control and Prevention. Smoking Attributable Mortality Morbidity and Economic Costs (SAMMEC). Available at: http://apps.nccd.cdc.gov/sammec
. Accessed May 3, 2006.
4.Ebrahim S, McKenna M, Marks J. Sexual behavior: related adverse health burden in the US. Sex Transm Infect
5.Centers for Disease Control and Prevention. Alcohol-attributable deaths and years of potential life lost, US, 2001. MMWR Morb Mortal Wkly Rep
6.Greenland S, Robins J. Conceptual problems in the definition and interpretation of attributable fractions. Am J Epidemiol
7.Levin M. The occurrence of lung cancer in man. Acta Unio Int Contra Cancrum
8.Miettinen O. Proportion of disease caused or prevented by a given exposure, trait or intervention. Am J Epidemiol
9.Greenland S. Bias in methods for deriving standardized morbidity ratio and attributable fraction estimates. Stat Med
10.Benichou J. A review of adjusted estimators of attributable risk. Stat Methods Med Res
11.Rockhill B, Newman B, Weinberg C. Use and misuse of population attributable fractions. Am J Public Health
12.Flegal K, Graubard B, Williamson D. Methods of calculating deaths attributable to obesity. Am J Epidemiol
13.Bruzzi P, Green S, Byar D, et al. Estimating the population attributable risk for multiple risk factors using case–control data. Am J Epidemiol
14.Flegal K, Graubard B, Williamson D, et al. Excess deaths associated with underweight, overweight, and obesity. JAMA
15.Walter S. Prevention for multi-factorial diseases. Am J Epidemiol
16.Wacholder S, Benichou JM, Heineman E, et al. Attributable risk: advantages of a broad definition of exposure. Am J Epidemiol
17.Enterline P. Sorting out multiple causal factors in individual cases. In: Chiazze L, Lundin F, eds. Epidemiologic Methods for Occupational and Environmental Health Studies
. Ann Arbor, MI: Ann Arbor Science Publications; 1983;177–184.
18.Vineis P, Thomas T, Hayes R, et al. Proportion of lung cancers in males due to occupation, in different areas of the USA. Int J Cancer
19.Steenland K, Burnett C, Ward E, et al. Dying for work: the magnitude of US mortality from selected causes associated with occupation. Am J Ind Med
20.Nurminen M, Karjalainen A. Epidemiologic estimate of the proportion of fatalities related to occupational factors in Finland. Scand J Work Environ Health
21.Peto J, Hodgson J, Matthews F, et al. Continuing increase in mesothelioma mortality in Britain. Lancet
22.Murray C, Lopez A. On the comparable quantification of health risks: lessons from the Global Burden of Disease Study. Epidemiology
23.Rothman K, Greenland S. Modern Epidemiology
. Philadelphia: Lippincott-Raven; 1998.
24.Greenland S. Variance estimators for attributable fraction estimates consistent in both large strata and sparse data. Stat Med
25.Greenland S. Interval estimation by simulation as an alternative to and extension of confidence intervals. Int J Epidemiol
26.Llorca J, Rodriguez M. A comparison of several procedures to estimate the confidence interval for attributable risk in case–control studies. Stat Med
27.Kooperberg C, Petitti D. Using logistic regression to estimate the adjusted attributable risk of low birthweight in an unmatched case–control study. Epidemiology
28.Gardner J, Sanborn J. Years of potential life lost (YPLL)—what does it measure? Epidemiology
29.Miller B, Hurely J. Life table methods for quantitative impact assessments in chronic mortality. J Epidemiol Community Health
30.World Health Organization. Global burden of disease statistics. Available at: www.who.int/whosis
. Accessed May 3, 2006.
31.Arnesen T, Kapiriri L. Can the value choices in DALYs influence global priority-setting? Health Policy
32.Gold M, Stevenson D, Fryback D. HALYs and QALYs and DALYs, oh my: similarities and differences in summary measures of population health. Annu Rev Public Health
33.Murray C, Salomon J, Mather C. A critical examination of summary measures of population health. Bull World Health Organ
35.Robins J, Greenland S. Estimability and estimation of expected years of life lost due to a hazardous exposure. Stat Med
36.Hernán M. Hypothetical interventions to define causal effects-afterthought or prerequisite? Am J Epidemiol
37.Wacholder S. The impact of prevention effort on the community. Epidemiology
38.Rothman K, Adami H, Trichopoulos D. Should the mission of epidemiology include the eradication of poverty? Lancet
39.Pearce N. Traditional epidemiology, modern epidemiology, and public health. Am J Public Health
40.Rockhill B. Theorizing about causes at the individual level while estimating effects at the population level. Epidemiology
Supplemental Digital Content
© 2006 Lippincott Williams & Wilkins, Inc.