**ArticlePlus**

Click on the links below to access all the ArticlePlus for this article.

Please note that ArticlePlus files may launch a viewer application outside of your web browser.

There are typically 2 motivations for dose-response (or exposure-response) analyses in epidemiology. First, investigators are interested in causal inference. One of Bradford Hill’s criteria for judging causality is a “positive” dose-response relationship. A positive dose-response, defined as one that increases, in a reasonably monotonic fashion, is more convincing than a simple excess of disease risk among the exposed versus the nonexposed. A second motivation for dose-response analyses is risk assessment, ie, estimating the increase in risk that results from a given increase in exposure. This usually requires a quantitative dose-response model, preferably based on epidemiologic data when they are available.^{1}

We discuss some techniques for evaluating dose-response relationships and some issues that arise in doing so (see also the review by Stayner et al.^{2}). By way of example, we use a cohort study of workers exposed to dioxin and followed, over time, for cancer mortality. In the example, the parameter of interest is the rate ratio, modeled using Cox regression. The exposure of interest is cumulative exposure, typical of many chronic disease studies.

We consider categorical analyses, analyses using splines, and simple parametric models. When considering the last, we focus on the usual log-linear model for rates or rate ratios. We do not consider additive models, linear relative risk models, or biologically based models, because these are less commonly used by epidemiologists. However, many of the points made here would apply to those models as well.

The usual log-linear model is one in which the log of the rate is modeled as a linear function of cumulative exposure:

where RR is the rate ratio, and *exp(β*_{o}) is the baseline or referent rate. *Cumexp* is cumulative exposure, which can be categorical or continuous.

It has been our experience in a large number of occupational cohort studies, conducted at the National Institute for Occupational Safety and Health (NIOSH),^{3–8} that rate ratios tend to increase in a linear fashion at lower exposures and then plateau or tail off at higher exposures. Possible reasons for this phenomenon include 1) mismeasurement of high exposures, 2) saturation of metabolic pathways at high exposures, 3) depletion of susceptibles at high exposures, and 4) bias resulting from the healthy worker survivor effect. A fuller discussion of this phenomenon can be found in Stayner et al.^{9}

A model that captures this “plateau” effect is what we will call the “log-log” model (a form of the more general power model), in which the log of the rates is a linear function of the log of exposure:

where k is a small constant that can represent a background level of exposure, or that is present so as to avoid taking the logarithm of 0 if there are nonexposed individuals in the study.

Graphs of dose-response using the log-linear or log-log models take on different shapes depending on whether the axes are untransformed relative risks (RRs) versus cumulative exposure, or whether the log(RR) or *log(cumexp*) are used. Figure 1 illustrates the shape of the generic curves for the log-linear and log-log models using different axes. In both Figures 1A and 1B, the 2 curves have approximately the same shape whether or not the scale of the rate ratio is logged (in the latter case, the log-linear curve is a straight line). One can see the tailing off of the rate ratio at high exposures in the model using the log of cumulative exposure (log-log model). The shapes of the curves change, however, when cumulative exposure is log-transformed (Fig. 1C), and it is the log-log curve that appears more linear. Choice of how to display model results is a matter of convenience, but could partly depend on the range of exposure; a log-normal exposure distribution with a long tail on the right lends itself to a log exposure scale.

### Categorical Models

Typically, investigators use categorical models to “see” the shape of the dose-response without imposing a shape through a particular parametric model. This is often done by connecting the categorical rate ratios with a straight line to sketch a “curve” even though this curve is fictional; in fact, categorical analyses result in step functions in which the rate ratio is assumed (often unrealistically) to be constant within categories. Although we recommend that investigators routinely conduct categorical analysis, often such analyses could be only a first step. For risk assessment purposes and for reporting results in a concise and universal form that can be used by others, it is useful to seek a simple parametric model that fits the data.

In categorical analyses, the number and choice of cutpoints will influence the shape of the observed dose-response. Too few categories could obscure important aspects of the shape of the dose-response, whereas too many can do the same by adding random noise to the rate ratios for each category. Furthermore, the placement of the cutpoints will have a large influence on the perceived shape. To avoid arbitrariness, it is best to choose cutpoints that are based on “a priori” criteria before analysis is begun. One such method is to choose categories based on placing an equal number of cases in each category; this approach provides a similar variance for rate ratios across categories. An alternative method would be to choose categories based on equally spaced percentiles from the exposure distribution of all subjects or of noncases within a cohort. Disadvantages of this latter approach include: 1) possibly unequal variance of rate ratios between categories, especially if there is not much difference between the mean exposure within percentiles at the low end of exposure (typical of log-normal exposure distributions); and 2) the lack of an obvious manner for determining the cumulative exposure for noncases when cumulative exposure is time-dependent and subjects can appear in multiple risk sets (like in Cox analyses). In the example below, we have chosen cutpoints based on the distribution of exposure among the cases. Ideally, the dataset is sufficiently large so that it is reasonably robust to these different choices. One of the advantages of smoothers such as splines is that they, to some degree, avoid these choices.

Figure 2 shows categorical data for a study of dioxin and cancer.^{3} This study included 3538 men exposed to dioxin (TCDD) who were followed for all cancer mortality. The exposure of interest was cumulative serum dioxin (parts per trillion-years, or ppt-years), estimated through a job-exposure matrix and serum dioxin data on 170 subjects in the cohort. Cumulative exposures were highly skewed, typical of occupational exposures, ranging at the end of exposure from 6 to 210,054 ppt-years (median 98, mean 1589). A 6-ppt background level was assumed. The mean duration of exposure was 2.7 years (standard deviation, 4.4). All models presented here included 3 categorical variables for year of birth, in addition to the exposure variables of interest.

Figure 2A shows a step function for the dioxin data using 5 categories, with cutpoints formed by dividing the cancer deaths into quintiles of cumulative exposure (in ppt-years). The highly skewed exposure data motivate the use of a log scale for exposure. Although a step function is a true representation of the results of a categorical analysis (assuming no change in rate ratio within a category), it is common to use some single point within the category (eg, a mean or median) to represent the data, and these points can be connected to provide a curve made up of linear pieces. Figures 2B and 2C show such categorical analyses using the medians of either 5 or 10 categories. The 5-category curve appears consistent with a monotonically increasing dose-response. The use of 10 categories shows some downturns in the curve. However, the use of 10 categories increases random variation in each category, as evidenced by the increased width of the confidence intervals.

### Splines

Splines are parametric curves that serve as useful complements to categorical analyses. Splines fall into the general class of “smoothers,” which include LOESS and other types of generalized additive models.^{10} Splines provide a smooth curve of the dose-response curve and impose few constraints on that curve; however, they have the drawback of not providing interpretable parameters. Ordinary splines can be programmed by the investigator in most regression programs, although investigators will often need to produce the graphs themselves, based on the predicted values of the model. More complex penalized spline curves can now be fit using S-PLUS (Insightful Corporation, Seattle, WA) in Cox regression, which produces a graph of dose-response automatically. Multivariate modeling with splines is done frequently by constructing splines for the exposure variable but leaving other variables in their customary form in a log-linear model.

Splines are piece-wise polynomials with the pieces defined by cutpoints (called “knots”). They are parametric curves, although the parameters are usually not easily interpretable and therefore of little inherent interest. The polynomials are typically linear, quadratic, or cubic, with the last being perhaps the most common. The number of knots determines the degree of smoothing; the more knots, the less smoothing. There are typically a small number of knots (between 3 and 8) that can be specified by the investigator. A good discussion can be found in Harrell et al.^{11} or Greenland.^{12}Figure 3 illustrates a simple linear spline with 3 knots based on the equation in the legend. This is a piece-wise linear function in which the slopes of each piece are estimated separately (although in the equation, the slope of each piece is summed with all previous ones, a procedure also used in quadratic and cubic splines).

We discuss 2 classes of splines: restricted cubic splines and penalized splines. Restricted cubic splines^{11} are restricted to be linear at each end (before the first knot and after the last), as a result of instability in fitting polynomials at the extremes, and are constrained to meet smoothly at cutpoints. They can be used in any regression program and provide a prediction equation with estimated parameters. They have the same form as the linear spline in Figure 3, except that the estimated functions between knots are cubic. Figure 4 illustrates restricted cubic splines with 3 and 6 knots, chosen to be symmetric (10, 50, 90th percentiles, or 5, 23, 41, 59, 77, 95th percentiles, respectively), using the same dioxin/cancer data seen in the categorical data. Note that the spline with 3 knots looks more like a straight line (high degree of smoothing), whereas the spline with 6 knots has more “wiggles” (less smoothing) and follows the pattern of the categorical analysis based on deciles.

Penalized splines are a more recent type of spline are available in some software packages (eg, the “pspline” function for Cox regression in S-PLUS), and are somewhat more complicated mathematically than traditional splines.^{13,14} Penalized splines fit a linear combination of B-splines, whereas applying a penalty for a lack of smoothness. B-splines (the B stands for base function) consist of overlapping polynomial pieces that join together at the knots. A linear function of B-splines forms a smooth curve. The degree of smoothing is chosen by the user through specification of the “degrees of freedom” (df; not the traditional degrees of freedom of a χ^{2} distribution). Typically, the knots are chosen by the program and are more numerous than in traditional cubic splines. The model is

where B_{k}(X) is a B-spline of degree q (default = 3 in S-PLUS). The user specifies the amount of smoothing by choosing the degrees of freedom (default = 4 in S-PLUS; the number of B-splines or base functions is approximately 3 times the degrees of freedom). A prediction equation can be obtained from S-PLUS only with some difficulty.

The key to penalized splines is that a penalty is imposed on sum of absolute differences between adjacent β_{k}’s, a penalty that is used in the likelihood (actually, the penalized maximum likelihood). In S-PLUS, the user imposes this penalty by choosing the degrees of freedom. The use of a penalty is analogous to modeling the β_{k}’s as random effects, ie, β_{k}^{∼}*N(0,σ*^{2}), where σ^{2}, the variance permitted among the β_{k}’s, determines the amount of smoothing. Figure 5 shows 2 penalized splines for the dioxin/cancer data with different amounts of smoothing (ie, different degrees of freedom).

One possible advantage of a penalized versus restricted cubic spline is that the shape of the latter depends on the user’s choice of knot location, whereas the former has more knots spread across the range of exposure (and the knots are chosen by the program in S-PLUS). Different knot locations in restricted cubic splines with the same number of knots do not change the shape of the curve very much if knots are chosen to be symmetric across exposure, but can affect the shape if knots are not symmetric. Figure 6 shows the dioxin data and 2 restricted cubic splines, both with the same number of knots but placed in somewhat different regions of the x-axis. These curves do not differ greatly.

Another common smoothing method is a LOESS plot. LOESS is a nonparametric procedure for generating a moving weighted average of the outcome variable (or a moving weighted regression of the outcome on the exposure) within defined local regions of the x-axis, where the weights decrease as one moves away from the center of each specific region and become 0 beyond the range of the region. LOESS estimation, originally developed for linear regressions, is now available for logistic regression through the GAM procedure in SAS (SAS, Cary, NC) and S-PLUS (the “lo” function in “gam”), but, to our knowledge, is not available for conditional logistic regression or Cox regression.

### Degree of Smoothing

How much smoothing is desirable? There are statistical criteria such as the Akaike Information Criterium (AIC) in which the model likelihood is penalized proportionally to the number of estimated parameters. However, choosing the degree of smoothing is as much a philosophical as a statistical question and has no simple answer. Greenland^{12} has noted that “with enough well-chosen categories, cubic splines can closely approximate virtually any smooth curve. This advantage seems of doubtful utility for epidemiologic analysis, however, because plausible trends and dose-response curves are usually very simple in form.” Even assuming true dose-response curves are indeed simple in form (and one could argue that a U-shaped curve is not simple in form), observational data are subject to confounding, measurement error, interaction with other exposures, and random variation, all of which could affect some parts of the dose-response curve more than others. It could be desirable to smooth out such curvature (“wiggles”) in search of the “true” simpler dose-response curve.

On the other hand, if such smoothing is done without sufficient care, it might guide the investigator toward a simple parametric form that is a wrong one, something which might have been avoided by less smoothing and more exploration of the “wiggles.” This issue is analogous to choosing the number (and placement) of cutpoints in categorical analyses, discussed previously (one could also argue that is analogous to the degree of acceptable heterogeneity in meta-analyses and the need to explore rather than smooth over such heterogeneity). Some dose-response relationships are, in fact, somewhat complicated, and splines or other smoothers used with limited amounts of smoothing have proved useful in understanding them. Perhaps the best known examples are dose-response curves for air pollution and mortality in time-series analyses, in which seasonal variation in weather creates a wave-like background pattern in mortality, and long-term decreases in pollution create linear downward trends.^{10,15} Another example would be the U-shaped curve for the still poorly understood relationship between cholesterol and mortality, in which Schwartz has argued that the smoothing curves outperformed categorical analyses in detecting a complex relationship.^{10}

### Choosing a Parametric Curve

Splines, along with categorical analyses, can offer a visual guide to choosing a simpler parametric model. Simpler models are easier to apply in risk assessment and can be preferred on grounds of parsimony. A simpler model can be justified on statistical grounds as well if the model fit of the (parametric) spline is no better than a simpler parametric model. Figure 7 shows the log-linear and log-log models for the dioxin data in Figures 2 and 4–6. Visually, the log-log model clearly appears to be the better fit, conforming to the categorical data and the splines sees in earlier figures. Statistically, the corresponding chi squares for change in model likelihoods (chi squares) with the addition of the exposure terms were 0.2 (1 df, *P* = 0.65) for the log-linear model, 8.4 (1 df, *P* = 0.004) for the log-log model, and 9.3 (4 df, *P* = 0.05) for the cubic spline model with 3 knots. These goodness-of-fit data indicate an adequate fit by the log-log model and no improvement by the spline over the log-log model (*P* = 0.82).

The use of statistical tests to evaluate one dose-response model versus another seems a legitimate use of hypothesis testing, and choosing a single best model based on a large improvement in goodness-of-fit seems appropriate. However, when there are a variety of models that fit the data reasonably well, it might not be wise to pick the “best” model on purely statistical grounds, ie, based on a marginal improvement in a log likelihood (see below).

Another legitimate use of a hypothesis test would be a simple trend test, determining whether the dose-response model improves on a null model of no increase in risk with increasing exposure. Detailed analyses of the shape of dose-response would not be justified when there is little probability that the dose-response differs from the null.

### Is the Best-Fitting Model the Best Model?

In the dioxin data, the log-log model provides a reasonable fit on statistical grounds. However, risk assessment for dioxin is focused on very low exposures, eg, the risk at twice background versus background levels (10 ppt vs. 5 ppt in the serum). This corresponds to exposures in the 375–750 ppt-year range over a 75-year lifetime (5.9–6.6 on the log scale, close to the origin in Fig. 8). In this low-exposure area, the log-log model has a high slope. With a small increase in exposure, risk increases very rapidly, perhaps unreasonably so.

One alternative is a piece-wise linear model with 2 pieces. Here, 2 lines are fit to the data and joined at a single cutpoint, estimated as the point where the slope changes. We used an interactive method to pick the best cutpoint for a piece-wise linear model, judging the best cutpoint based on model fit (for a discussion of picking such cutpoints, see Ulm^{16}). The dose-response curve for high exposures is assumed to be largely independent of the slope for lower exposures (a debatable assumption).

Figure 8 shows the piece-wise model (with a 15-year lag) together with the log-log curve. Comparing their goodness-of-fit, the piece-wise model increased in the model likelihood by 8.2 (1 df, or 2 df if one considers the cutpoint as another parameter), versus an increase of 8.4 (1 df) for the log-log model. Thus, the piece-wise model does not fit the data as well as the log-log curve. However, the rapid increase in risk with only a small increase in dose in the low-dose region, seen in the log-log curve, could be implausible biologically and results in high estimates of excess risks with slight increases of exposure over background levels (see below). On grounds of biologic plausibility, one might prefer the results from the piece-wise linear model.

Up to now, we have emphasized a “data-driven” approach to the selection of a best-fitting parsimonious parametric model for the data. However, when several models provide adequate fit, selection of the best model based on purely statistical grounds could be arbitrary. To quote Breslow on risk assessment and modeling,^{17} “the rational for selection of a model for analysis is the belief that the savings in variance afforded by the model assumptions will offset the increase in bias that results from the assumptions being incorrect. The common practice of allowing the data to dominate the model selection process, whether by stepwise entry of variables into a regression equation or evaluation of goodness-of-fit data in the observable effect range, is often inappropriate. A preferable strategy is to determine which of a broad class of models are reasonably consistent with the observed data and to select among these based on prior understanding of the subject matter. If the lack of such understanding precludes a definitive model choice, it is much wiser to admit this openly and accept the resultant uncertainty than it is to sweep the whole issue under the rug by presenting a single ‘best’ model.”

### Calculation of Excess Lifetime Risk Based on the Dose-Response Model

Once a model (or a set of models) has been chosen and the dose-response analyses completed, one can estimate the increase in rate of disease per unit of exposure. Risk assessors and regulators will then use this information to predict individual lifetime risk for specific levels of exposure. When the data are based on humans (ie, epidemiology) and not on animals, this is a straightforward process of converting rates from populations to individual risks. Aside from a few uniquely occupational disease such as silicosis, most diseases occur in both exposed and nonexposed populations. This typically means calculating an excess lifetime risk of an outcome above and beyond a background risk. For example, the Occupational Safety and Health Administration (OSHA) is often interested in setting exposure limits that permit no more than a 1 per 1000 excess lifetime risk of serious disease or death after a lifetime 45-year exposure. The results of the dose-response model can easily be used to convert rates to risks, as follows, using standard formulas based on^{18}:

where *t* corresponds to time. Time here would be age, so that risk might be estimated over a usual lifetime, based on standard lifetime expectancies (eg, through age 75 or 79, for men and women in the United States).

This formula can be adapted to estimate the *excess* risk over time using a rate ratio rather than a rate. Furthermore, competing risks of death from any cause can be taken into account, which will slightly decrease the calculated excess risk. Gail^{19} has provided the following formula to calculate excess risk over time given a rate ratio (which could increase during exposure as cumulative exposure increases).

Excess risk refers to cumulative excess risk of death from the cause of interest by age 79, *q*_{c} is the cause-specific mortality rate for the nonexposed general population, *q*_{a} is the overall all-causes mortality rate for the nonexposed, *rr*_{i} is the age-specific rate ratio for the exposed versus the nonexposed (in our case, this is a function of cumulative exposure, which increases from ages 20–65), and *i* and *j* index age. An alternative formula used in BIER IV^{20} is also available (see the appendix, available with the electronic version of this article at www.epidem.com), which gives approximately the same results.

Table 1 shows the result of applying this formula to derive the lifetime (through age 75) risk of death from cancer after a constant lifetime dioxin exposure at double the current standard background level (10 ppt vs. 5 ppt). A doubling of exposure over background might occur from eating a diet high in fish, for example. Excess risk is predicted using either the log-log or the piece-wise linear model.

Table 1 shows different predictions of the 2 different models in the low dose range, which is of interest to the U.S. Environmental Protection Agency (EPA) in making an estimate of risk at environmental levels. At low levels of exposure, rates and risks increase very rapidly with increasing exposure in the log-log model but more slowly in the piece-wise model. One might argue that a doubling of background levels of serum dioxin (TCDD), which are quite low to begin with (5 ppt), is unlikely to result in an increase of 1% of cancer mortality above the background risk of 12% (the risk for the general population at background dioxin levels). For this reason, regulators might prefer the piece-wise model even though it did not fit as well as the log-log model. A prudent course for the epidemiologist would be to present results from both models.

## SUMMARY

Epidemiologic dose-response analyses are useful for causal inference and for risk assessment. Categorical analyses and use of smoothing techniques such as splines are useful for assessing the shape of the dose-response curve and can help in choosing a parametric model appropriate for risk assessment. We recommend the routine use of both categorical analyses and some type of smoothing curve in data analyses, followed by exploration of a range of suitable parametric models that fit the data well. All of these should be candidates for graphic presentation in publications focused on dose-response relationships. The best fitting model might not necessarily be the best model for risk assessment purposes.