Attention in epidemiologic research has recently been refocused on distal factors, many of which are social, that affect health and disease. 1,2 By “distal,” we refer to the extent that a causal factor is removed from the physiologic onset of disease, in time, space, or number of steps in a chain of events. The current focus on distal factors, particularly those removed from the level of the individual, represents the reemergence of a social-production-of-disease theory that has waxed and waned in epidemiology for two centuries, championed, among others, by Virchow in the mid-19th century 3 and Cassel in the mid-20th century. 4

To illustrate multilevel causation of disease, Poole and Rothman recently described a potential pathway leading from socioeconomic repression of homosexual males at the societal level, to the behavior of male prostitution, to HIV exposure, to depressed T-cell count, to AIDS. 5 This chain of events details transmission from social structures down through the level of the individual to the level of cellular processes in the pathogenesis of disease. Another popular causal model of this type is the “Barker Hypothesis,” which posits that socioeconomic deprivation of pregnant women leads to placental insufficiency, precipitating fetal growth retardation, manifested in adult offspring as insulin resistance and, finally, cardiovascular disease. 6

Most epidemiologic analyses, if extended “up-stream”, would demonstrate similar paths from broad socioeconomic conditions. 7 Are cigarettes, for example, any more legitimate a cause of lung cancer than distal socioeconomic factors, such as low education and psychosocial stress, that lead people to pursue this coping strategy? Just as smoking behavior is demonstrably sensitive to social expectations and policies, social conditions are consequential to many health conditions, and the notion of causality is restricted unnecessarily by focusing only on those factors most proximally associated with the outcome. Indeed, some have asserted that a restricted focus on factors at the individual level has more to do with the ideology of individual free choice than with the scientific considerations of causality. 8

With the rekindled interest in social causation of disease, social factor effects are currently estimated in many studies. For example, Lantz and colleagues reported the hazard rate ratio associated with low income on mortality as 2.77 (95% CI = 1.74–4.42), adjusted for covariates including age, sex, and education. 9 This result was described as the “independent” effect of income. Similar estimates are reported with increasing regularity and engender numerous philosophical and methodologic dilemmas. 10 We focus here on one such issue, which is the extent to which common adjustment techniques provide valid causal inferences concerning distal social factors. Is there a reasonable interpretation to the “independent” effect of a socioeconomic quantity such as income when the parameter is estimated from a standard multiple regression model? We demonstrate that conditioning on covariates without regard to structured interrelations among the relevant quantities (measured and unmeasured) can lead to biased estimation of casual effects, and we provide a simple conceptual example of this bias when covariate structure is ignored or mis-specified.

### Epidemiologic Confounding and Adjustment

It is widely held that the randomized controlled trial is the “gold standard” for establishment of causality because the treatment is assigned without regard to any measured or unmeasured characteristic of the individual. The cumulative experiences of the treatment groups at follow-up therefore cannot plausibly be functions of systematic differences in covariate distributions; that is, beyond those imbalances which occur by chance. 11 The contrast in outcome distributions observed between groups can be logically attributed only to one systematic variant in the design—the imposition of treatment. The treated and untreated groups therefore each serve as valid substitute populations for the other’s unobserved (counterfactual) experience, and an estimated causal effect formed by the contrast between the groups can be expected to approach the true unobservable contrast between treated and untreated states for the target population of interest. 12

In observational studies, however, there may exist some vector of covariates *Z*, measured or unmeasured, that are associated with (but not affected by) a point-exposure of interest (*X*) and causally precede the outcome (*Y*), and thereby *confound* their observed relation. Observing a crude correlation between *X* and *Y* does not distinguish a casual effect from a statistical dependency that merely results, for example, from their common perturbation by *Z*. We are generally interested in the possible causal relation: an experimental scenario in which forcing *X* to some specific value *x* _{1} would result in a different probability distribution of *Y* than if we had forced *X* to some alternate value *x* _{2} [Appendix 1]. We thus define confounding as a divergence between two kinds of conditional probability distributions of *Y* : the distribution given that we find *X* at the value *x* (estimable from the data), and the distribution given that we intervene to force *X* to take the value *x*. Using Pearl’s “SET” notation to represent the latter “potential outcome” distribution, 13 MATH we can express this definition of confounding as:MATH or, in commonly used abbreviated probability notation, which will be used throughout,

If such confounding can be attributed entirely to covariate(s) *Z*, we may adjust by standardizing on *Z* to estimate the potential outcome distribution of *Y* given *X* :14where summation is over observed values *Z* =*z*, and *Z* is unaffected by *X*. 15,16

### Why Standard Adjustment for Confounding Works

Causal effects may be defined as contrasts between the probability distributions of the outcome *Y* under various intervention regimens for exposure *X* 17 ^{p.70}. For example, if the contrast is a difference, then [*P* (*y* |SET[*X* =*x* _{1}]) −*P* (*y* |SET[*X* =*x* _{2}])] ≠ 0 indicates a causal effect of *X* on *Y*. 18 Likewise, if the contrast is the ratio [P(y‖SET[X=x1])P(y‖SET[X=x2]) then the null value is one.

Following Pearl, we state the causal effect most generally as *P* (*y* |*SET* [*X* =*x*]) which, as a function of x and y, contains information on all possible contrasts among all possible values of X . This definition is expressed in terms of the SET[*X* =*x*] intervention, but in observational studies we generally cannot assume that the process by which *X* came to take the specific value *x* corresponds to an experimental intervention. Rather, this process often involves the influence of extraneous variable(s) *Z*, and when *Z* also affects *Y*, we may have confounding as defined above.

The formula in Eq 2 is an appropriate adjustment to control for confounding if within each stratum of *Z*, observed exposure *X* is statistically independent of the potential response (*Y* |*SET* [*X* =*x*]) for each imposed value *x*. That is, for each value *x* to which variable *X* is hypothetically “SET”, the distributions of (*Y* |*SET* [*X* =*x*]), conditional on each covariate stratum (*Z* =*z*) and observed exposure level (*X* =*x’*), are all equal to the distribution of (*Y* |*SET* [*X* =*x*]), conditional on just covariate stratum (*Z* =*z*).

Expressed symbolically:MATH for each *x’* ε*X*, *z* ε*Z*, and *x* ε*X*, where “∼” denotes “has the same distribution as.” This condition, called “weakly ignorable treatment assignment” within strata of *Z*, is sufficient to ensure that standard adjustment using *Z* yields an unbiased estimate of the causal effect of *X* on *Y*. 17 ^{p.246,} 18,19

### Why Standard Adjustment for Confounding Sometimes Doesn’t Work

We have expressed causal effects in terms of hypothetical interventions *SET* [*X* =*x*], and have defined confounding to be the inequality in the outcome probability distribution between conditioning on this intervention regimen and on the passive observation that *X* =*x*. Ideally, we want to know something about these hypothetical interventions on the basis of available observations. Given an observed sample with measured *X*, *Y*, and *Z* values, we would like estimate what would have happened if we had gone around actually assigning people to some specific regimen of exposure *X*. The result of this hypothetical intervention is the salient aspect of the causal relation that we hope to identify, and yet is impossible to ascertain directly whenever the intervention is impractical or impossible as a real experiment. In social epidemiology, we ultimately want to know the result on some outcome of an intervention in which we facilitate education, augment income, or enact some other social policy, but controlled experiments are generally impractical or unethical. A goal of observational social epidemiology, therefore, is to express the causal effect *P* (*y* |SET[*X* =*x*]) as some function of the observed joint probability distribution *P* (*y*, *x*, *z*). If we can do this, the causal effect is identifiable from observational data. 17 ^{pp.114–118} If we cannot find such an expression, then our results remain potentially confounded, and sensitivity analysis may help to bound the magnitude of bias in causal estimates resulting from assumed degrees of departure from identifiability. 20

As described in Eq 2, we use standardization to express the causal quantity of interest in terms of the passively observed (*Y*, *X*, *Z*) data. This adjustment fails in a number of circumstances, however. First, we cannot expect an unbiased causal effect estimate when *Z* (or some subset of *Z*) remains unmeasured. There is no clever solution to this problem, other than to endeavor to measure sufficiently many variables *Z* such that conditional independence of *X* and (*Y* |SET[*X* =*x*]) can be achieved or approached. It is therefore incumbent upon researchers using observational data to assure that their covariate data are sufficiently rich to support causal conclusions. Although valid inference is sometimes possible even when important covariates remain unmeasured, their absence more often results in nonidentifiable causal structures, or in sensitivity analyses that reveal considerable uncertainty regarding the causal parameter of interest.

It is well appreciated that confounders need to be measured in observational research, but there are other circumstances in which the standard adjustment fails. Furthermore, these circumstances are particularly relevant to the sorts of distal effects that interest social epidemiologists because social factors explicitly entail causal chains through intermediates. First, the standard adjustment may be biased if *Z* is itself affected by *X* (that is, *Z* is a causal intermediate). Observed intermediates can sometimes be useful in adjusting for confounding, but the standard adjustment in Eq 2 is not adequate in these circumstances.

Another difficulty encountered is the problem of multivariate *Z*. For the adjustment defined in Eq 2, we need to estimate the probabilities *P* (*y* |*x*, *z*) and *P* (*z*), but this becomes potentially problematic when *Z* is the vector (*Z* _{1}, *Z* _{2},..., *Zn*) and *n* is large, as often occurs. Many cells in the multi-dimensional domain of *Z* are likely to be sparsely populated, and this situation generally requires modeling assumptions for stable estimation from the observed data. Even the propensity score methodology, which transforms multi-dimensional stratification to a single dimension, does not entirely avoid this issue because modeling assumptions are still generally required to estimate the propensity scores (probabilities of treatment assignment conditional on *Z*). 14

### Covariate Structure in Analysis of Observational Data

Given the observed joint data set of exposure *X*, outcome *Y*, and covariates *Z*, we still need additional knowledge concerning the structured relations among these variables to adjust properly for possible confounding of the causal effect of *X* on *Y*. Following path analysis and structural equations traditions, we encode these structured relations in the form of graphs, which are diagrammatic representations of recursive functional relations between the variables. 13,17,21 As an example, consider the diagram of the causal paths among the five variables in Figure 1. The nodes in the graph represent the variables *X* _{1},..., *X* _{5}, with arrows representing their systematic functional dependence on preceding variables. Each variable is also assumed to be affected by a random perturbation (ε_{1},..., ε_{5}), not shown, which can be thought of as the effects of unmeasured variables *U*. In general, the perturbations are not statistically independent (for example, the consequence of *X* _{i} and *X* _{j} being affected by a common unmeasured *U* _{k}), but initially we assume they are independent. With this assumption, the complete functional description of the system of variables is given by *X* _{i} =*f* _{i}(*A* _{i}, ε_{i}), where *A* _{i} are the “parents” of *X* _{i} (variables which emit arrows terminating at *X* _{i}). It follows that the joint probability distribution of the variables is decomposable into a product of conditional probabilities:

For example, in Figure 1, *A* _{4} = (*X* _{2}, *X* _{3}), and its realization *a* _{4} = (*x* _{2}, *x* _{3}). The joint probability of the observation that *X* _{1} =*x* _{1},..., *X* _{5} =*x* _{5} for the graph in Figure 1 may therefore be expressed as:

The graph in Figure 1 does not uniquely determine *P* (*x* _{1}, *x* _{2}, *x* _{3}, *x* _{4}, *x* _{5}) inasmuch as the ε*i* and the functions *f* _{i} are not specified. Equation 4 however, does impose constraints, and the graph represents all possible joint distributions that can be factored in the form of Eq 4. An intervention of the form *SET* [*X* _{i} =*x* _{i}], in which *X* _{i} is forced by an external mechanism to assume the value *x* _{i}, overrules the natural processes determined by the functional relation *f* _{i} between the variables, and therefore acts to remove the equation *X* _{i} =*f* _{i}(*A* _{i}, ε_{i}) from the model, replacing it simply with the value *x* _{i}. The product shown in Eq 3 is thereby altered by this intervention such that the factor *P* (*x* _{i}|*a* _{i}) is cancelled out. That is, the joint probability *P* (*x* _{1},..., *x* _{n}) shown in Eq 3 is modified by an intervention to appear as:13,17 ^{pp.72–73,22} which can be rewritten, to focus only on the variables that are not “SET,” as: The zero in the lower portion of the bracket in Eq 5 merely specifies that when *X* _{i} is forced to take the value *x* _{i}* then *P* (*x* _{1},..., *x* _{i},..., *x* _{n}|*SET* [*X* _{i} =*x* _{i}*) = 0 for any *other* specific value *x* _{i}. The upper expression in Eq 5 corresponds to the common-sense ramification of an intervention on *X* _{i}. Whereas there was previously a probability distribution for *X* _{i} dependent on its parents, *A* _{i}, it is now certain that *X* _{i} will take the value *x* _{i}*, irrespective of what values were realized by the *A* _{i} or the random perturbation ε_{i}. This probability is therefore divided out by placing it into the denominator. For example, in the causal structure depicted in Figure 1, suppose an intervention on *X* _{2} forces the variable to the value *x* _{2}. In light of this intervention, the new expression for the joint probability of the remaining variables can be derived from Eq 6:MATH

The relation between intervention and the joint probability of structurally related variables defined in Eq 6 was identified as Robins’ “g-computation algorithm”, 22,23 and rediscovered by Pearl in the context of nonparametric structural equations models. 13,17

Graphs on which the solution rests are formed from substantive knowledge about the study variables, and cannot generally be derived from observational data alone. Additional knowledge may change the graph structure, and therefore the appropriate solution for a desired causal effect estimate. This fluidity reinforces the notion that prior knowledge is essential in determining the range of solutions for causal estimates.

### A 3-Variable Example in Social Epidemiology

The concepts described above will be illustrated with an example. For heuristic purposes, we assume a universe of only 3 measured quantities:*E* = education, *I* = income and *D* = mortality (death). Additionally, there may be various unmeasured covariates that will be dealt with subsequently. We focus on the causal effect of education (*E*) on mortality (*D*), and impose certain restrictions on the possible structure of *E*, *I*, and *D* : (1) *D* cannot emit directed arcs terminating on *E* or *I*, in conformance with our belief that education and income act causally before death, and (2) The causal diagram is acyclic; that is, it cannot contain any closed loops. Even under these restrictions, *E* and *D* can be connected in a variety of arrangements, either by direct effects (*E* →*D*), indirect effects (*E* →*I* and *I* →*D*), or associations derived from confounding (for example, *U* _{k} →*E* and *U* _{k} →*D*). The causal effect of education on mortality is expressed as *P* (*d* |*SET* [*E* =*e*]), the set of distributions of *D* when *E* is fixed at various levels *e* via intervention. This expression contains all the information needed to evaluate contrasts between any two particular values of *E*, say, *e* _{1} and *e* _{2}. This contrast measures the effect of *E* on *D* that we wish to identify on the basis of observational data, recognizing that this effect can be direct, indirect through *I*, and/or can be confounded by *I* or by unmeasured variables.

Given restrictions (1) and (2), there are seven potential structural arrangements of interest, shown graphically as directed acyclic graphs in Figure 2. Two additional permissible configurations were omitted because they entail no causal effect of *E* on *D*. Note that 6 of the 7 graphs in Figure 2 involve a direct effect, *E* →*D*. Graph [6] involves a strictly indirect effect, relayed through *I*, and graph [1] contains both direct and indirect effects. In graph [2], there is confounding by measured variable *I* as well as a direct effect of *E*. Although this set of configurations exhausts the permitted logical possibilities, some of these graphs may be preferred on substantive grounds. For example, graph [2] seems undesirable for most applications, given the substantive knowledge that education is often fixed relatively early in adult life, whereas earned income is causally subsequent to educational level throughout most of adulthood. Clearly there is some extent to which individual income influences educational attainment, but if we wish to consider these exposures as simple (as opposed to longitudinal) quantities, we may wish to consider this causal effect (*I* →*E*) to be negligible. The unmeasured variable “parental income,” for example, might be a more salient causal determinant of an individual’s educational achievement.

We now consider unmeasured confounding variables introduced into the 7 possible structural arrangements depicted in Figure 2. There are 8 possible configurations of confounding effects by unmeasured variables, representing all combinations of the presence or absence of a common unmeasured parent (or ancestor) for each of the three possible pairings of the measured variables (Figure 3). To avoid clutter, unmeasured variable nodes are omitted, while their directed arcs terminating on measured variables are depicted by dashed double-headed arrows. In graph [H], there is no confounding by unmeasured variables, a situation that may be interpreted as indicating that all of the random perturbations (ε) affecting measured variables are statistically independent.

These 7 causal structures and the 8 potentially confounding unmeasured variable configurations can occur in any combination, leading to 56 possible graphical representations of the relation between the 3 variables in which there is some effect of *E* on *D*. In 35 out of the 56 structures (62%), the causal effect of *E* on *D*, *P* (*d* |*SET* [*E* =*e*]), is not identifiable; because of confounding by unmeasured variables, there exists no unique function of the observed variables that is equivalent to the effect of physically intervening to force *E* to hold the value *e*. As expected, nonidentifiable structures tend to occur when there are more confounded relations present. Table 1 shows which structures are identifiable, with nonidentifiable structures indicated by an “×.”

When there is no confounding by unmeasured variables, (Figure 3, graph [H]), all causal structures yield identifiable effects for *E*. In the presence of 3 confounding arcs (Figure 3, graph [A]), identification of a causal effect is not achieved under any configuration of the variables. The presence of 2 confounding arcs (Figure 3, graphs [B], [C], [D]) allows for identification of a causal effect in only one structure from Figure 2 (graph [7]). Even with only a single confounding arc (Figure 3, graphs [E], [F], [G]), the causal effect of *E* is nonidentifiable when the unmeasured-variable confounding occurs between *E* and *D* with a direct effect *E* →*D*, and when unmeasured-variable confounding occurs between *E* and *I* with an indirect effect through *I*.

To demonstrate the importance of causal structure on estimates of effect, we select two causal structures from among the 21 graphs with identifiable effects of *E* on *D* (Figure 4). Scenario 1 (Figure 2, graph [1] and Figure 3, graph [G]) is a structure involving both direct and indirect (through *I*) effects of *E* on *D*, and an unmeasured variable *U* affecting both *I* and *D*. There is some reasonableness to this structure because education is generally understood to have both material and behavioral consequences. 24 The unmeasured variable *U* may be taken to represent an aspect of chronic health status, which affects adult income through physical disability and predisposes to mortality. This structure is implausible in assuming no additional confounding, but we limit our choice of examples to those with identifiable causal effects and seek simplicity over realism for purposes of illustration.

In the case of Scenario 1, how can we express the causal effect *P* (*d* |*SET* [*E* =*e*]) as a function of the observed variables? The answer is simply that there is no confounding, and therefore observing *E* =*e* is indistinguishable from forcing *E* =*e*. The graph for Scenario 1 in Figure 4 confirms that factors ε*E* determining how *E* comes to take its value are unrelated to outcome *D* except indirectly through their effect on *E*. Since *P* (*d* |*E* =*e*) =*P* (*d* |*SET* [*E* =*e*]), the confounding definition in (1) is not met, and no adjustment is warranted. The distribution of *D* when we observe *E* =*e* is the same as it would be if we had imposed this value through an intervention, and the unbiased total causal effect may therefore be estimated directly from the observed data.

Scenario 2 (Figure 2, graph [6], and Figure 3, graph [E]) involves unmeasured-variable confounding between *E* and *D*, but the effect of *E* is rendered identifiable by the measurement of intermediate *I*. This structure corresponds to something like the Barker Hypothesis described above if, for example, *U* represents “parental income.” The structure specifies that parental income affects educational level directly, and also influences probability of mortality directly (“fetal programming”), 6 but is independent of income, except for the indirect effect of income relayed through education. To assume identifiability of the causal effect of *E* we must believe that this independence between *I* and *U* is at least approximately true.

How does one express *P* (*d* |*SET* [*E* =*e*]) as a function of the observed variables in Scenario 2? Following Eq 6, one may state the causal effect of interest in terms of the structured relations represented, summing over observed values of *I* for a marginal effect estimate:MATH

One may then make replacements to eliminate “SET” statements from the right-hand side of the expression. For example, what is *P* (*i* |*SET* [*E* =*e*])? There is no arrow leading into *I* except that from *E*. Therefore, whether *E* should take its value through the natural mechanism or from the intervention *SET* [*E* =*e*] is irrelevant; because the relation between *I* and *E* is unconfounded, one can replace *P* (*i* |*SET* [*E* =*e*]) with *P* (*i* |*e*). Further substitutions follow from additional conditional independencies encoded in the graph. Conditioned on a given value of *E*, the relation between *I* and *D* is unconfounded; given *E* =*e*, *D* takes its value as a function of *I*, regardless of how *I* came to hold that value. Likewise, *P* (*d* |*i*, *SET* [*E* =*e*]) is equivalent to *P* (*d* |*SET* [*I* =*i*], *SET* [*E* =*e*]) by similar reasoning, which may in turn be replaced by the equivalent expression *P* (*d* |*SET* [*I* =*i*]). Finally, the substitution for the “SET” statement in *P* (*d* |*SET* [*I* =*i*]) follows from Eq 2, since this is the standard adjustment for a covariate (*E*) that is a determinant of both effect of interest (*I*) and outcome (*D*). This set of substitutions leads to an equivalent expression for the causal effect of *E* on *D* on the right-hand side that is a function only of observed variables: where *e’* is a dummy index for summing over the observed levels of *E*. 17 ^{pp.81–83}

We now know what adjustments are appropriate to answer causal questions about the effect of education on mortality for the selected scenarios. An unbiased estimate of the causal effect *P* (*d* |*SET* [*E* =*e*]) is obtained by *P* (*d* |*E* =*e*) if Scenario 1 is true, and by Eq 9 if Scenario 2 is true. How far off would we be if we applied the wrong adjustment formula, or if we assumed a causal structure that was incorrect?

### Example from the National Longitudinal Mortality Study (NLMS)

The data for this example are from the Public Use File (Release 2) of the National Longitudinal Mortality Study (NLMS), a prospective study of cause-specific mortality among noninstitutionalized U.S. residents, with baseline data on socioeconomic factors including education and family income. 25,26 The file was formed by searching for deaths among participants in five Current Population Surveys conducted by the U.S. Census Department in years 1979–1981, and contains data on 637,162 persons followed until death or for 9 years (3288 days), with 42,919 subjects matched for mortal events to the National Death Index. 27

Analyses were conducted on participants 18 years of age and older (N = 450,483), among whom there were 42,190 deaths (9.4%). Although weights were calculated for the NLMS to allow inference to U.S. population in 1980, no sample weighting is used in these analyses. For brevity, only results for men are described below. Variables considered in this example are all-cause mortality, age (18–98 years), reported baseline educational achievement, and reported baseline family income. Eight education categories in the NLMS were reduced in the present analysis to 3 levels: <12 years (29.1%), 12 years (38.9%) and >12 years (32.0%), with a total of 3276 (0.7%) missing values. Reported family income (in 1980 dollars) was categorized in the NLMS into seven categories, collapsed here into 3 levels: <$10,000 (26.5%), $10,000–19,999 (31.5%) and ≥$20,000 (42.0%), with a total of 26,995 (6.0%) missing income values. The three levels of education and income are labeled below as “Low” (L), “Medium” (M) and “High” (H).

A cumulative risk approach was used to calculate probability of death *P* (*D* =*dead*) during the follow-up interval in each possible 10-year age stratum with a “moving window” approach. Briefly, an initial window was constructed for ages 18–27, and the joint sex-, factor- and age-specific risks of death calculated for these strata simply as the proportions of deaths during follow-up (that is, cumulative 9-year mortality risks). The window was then shifted right by one year to ages 19–28, and a set of similar calculations made. This process was repeated until the upper boundary of the age window reached the maximum age. At each point, expressed as the mid-point of the age window, we calculated risk differences and adjusted risk differences as estimates of causal effects. Although a number of weighting schemes have been suggested to improve the point estimates in moving kernel approaches, 28 we used the unweighted proportion (a rectangular kernel) because this estimate is the actual stratum-specific proportion, and therefore is identical to values obtained in histograms or stratified analyses. Analyses were conducted using Stata Statistical Software. 29

## Results

Figure 5 displays four quantities for adult men. Lightly shaded vertical bars indicate the analysis cohort size available at the mid-point of each 10-year age window, with values along the left vertical axis. Darker vertical bars indicate the number of deaths in each 10-year age window, also along the left axis. The lines portray estimated age-specific causal effects of *E*, given that Scenario 1 is true. The dotted line shows the value for *P* (*dead* |*SET* [*E* =*L*]) which, assuming Scenario 1, is estimated by *P* (*dead* |*E* =*L*). The solid line shows corresponding values for *P* (*dead* |*SET* [*E* =*H*]). Any contrast, such as a difference or ratio, between the lines at any point expresses the magnitude of the unbiased age-specific causal effect of education, assuming veracity of the casual model in Scenario 1 and validity of the measurements.

Figure 6 is analogous to Figure 5, except that calculations are made under the assumption that Scenario 2 is the true causal configuration. The dotted line shows values for *P* (*dead* |*SET* [*E* =*L*]), estimated by Eq 9 as MATH The solid line shows corresponding estimates for *P* (*dead* |*SET* [*E* =*H*]).

Because causal effect contrasts are difficult to read directly from these graphs, we select a particular contrast, the risk difference between high and low education. Figure 7 shows this difference, adjusted for income. The studded line (—•—) shows age-specific effects of education estimated using the standard adjustment formula in Eq 2, as would be applied by investigators who ignore structural dimensions of the data or who assume a different causal structure for which standard adjustment is valid, such as Figure 2, graph [2] and Figure 3, graph [H]. The dashed line shows a valid adjustment under covariate structure assumptions that are encoded in Scenario 1, and the solid line shows corresponding age-specific effects of education estimated on the basis of Scenario 2. An example illustrating the computational steps involved is provided in Appendix 2.

Absolute magnitudes of effect differ in relative importance over the life-course. How consequential is a difference in 0.05 in the probability of death during follow-up? For baseline ages 18–27, with crude probability of death during follow-up equal to 0.012, this difference would be dramatic. For those 80–89 years old at baseline, however, with a 0.721 crude probability of death during follow-up, a change of 0.05 is not so impressive.Another way to represent the data, therefore, is to scale the effect contrast by the crude sex- and age-specific probability of death. This gives the magnitude of the education effect as a proportion of the overall probability of death at a given age (Figure 8). For example, if Scenario 1 were true, the mortality reduction achieved by intervening to move individuals from low to high educational status is roughly half the magnitude of the overall mortality for 55 year olds, and the standard adjustment strategy would substantially underestimate this effect. Later, the education effect is overwhelmed by the high overall mortality that prevails in the final decades of life. Whereas the causal contrast depicted here is the difference in outcome probabilities between setting everyone to low education and setting everyone to high education, the age-specific crude mortality probability used to scale the effect is dependent on the actual mix of educational levels at each age, and therefore the scaled comparison is dependent on this mix. This is analogous to the “comparative mortality figure” obtained from direct standardization of death rates.

Finally, we would like to know how far off we might be from an unbiased estimate when applying the standard adjustment strategy in situations where it is not structurally appropriate. We define “structural bias” in the causal contrast estimate as the difference between a structurally appropriate adjustment and the standard adjustment. In Figure 9, we present the structural bias given that either Scenario 1 or Scenario 2 is the true causal structure.

The absolute magnitudes of structural bias differ in relative importance over the life-course. We therefore scale the bias by the sex- and age-specific probability of death. This graph gives the magnitude of bias as a proportion of the crude probability of death at a given age (Figure 10). In this example, use of standard adjustment strategies when Scenario 1 is true tends to underestimate effects, whereas it tends to overestimate effects when Scenario 2 is true. Therefore, the structurally naive estimate cannot generally be assumed to be “conservative” or “biased toward the null.” As expected, structural bias becomes relatively less consequential as absolute mortality probability rises with age. Instability in the effect and bias estimates at younger ages reflects a relatively small number of deaths (small denominator probabilities). Bounds on the estimates, such as point-wise confidence intervals, can be added if desired to quantify this variability (Appendix 3).

## Discussion

By focusing on hypothetical interventions, we provide a meaningful casual definition for the types of quantities that are increasingly investigated by social epidemiologists. Although some authors have expressed discomfort with this conceptualization, no meaningful alternative framework has yet been proposed. 30 Furthermore, real interventions on social factors are the ultimate goals of policy initiatives, such as student loan programs to increase educational attainment and program payments to provide minimum income levels. The impacts of such policies on health outcomes are more clearly understood in terms of the counterfactual contrasts described above than in terms of the partial regression coefficients that are more commonly reported as measures of “independent effects.” In addition, the adjustment strategy is consequential in providing a valid estimate of the true counterfactual contrast, and requires the specification of a causal model. Traditional regression methods also imply structural assumptions, but these are generally not explicitly considered, and the simple example shown here suggests that reliance on a default structure can be hazardous.

One deficiency of this exercise is that the 3-variable models are absurdly oversimplified. It is implausible that there is no common unmeasured factor that affects both education and income, for example. Indeed, one reviewer recommended that we refer to the structures in Figure 4 as “toy” examples to emphasize their extreme simplicity. We stress that the example is intended primarily to be conceptual. Adding a large number of covariates to achieve a realistic causal structure would add considerable complexity to the problem, as the number of possible structural models to consider would become very large. Adding additional confounding arcs would make the models more plausible, but would also render many causal effects of interest nonidentifiable. One can begin to appreciate the opportunities for being led astray in the typical analysis of dozens of variables, even before considering additional uncertainties such as measurement error and mis-specification of statistical model form.

This discussion of causal effects and specification of appropriate adjustment techniques is relevant to many areas of epidemiologic research, not merely for social factors. The analysis of social factors generally involves variables that are presumed to be arranged in causal chains, or to involve complex interrelations and interdependencies, however, and this potentially complex covariate structure therefore provides greater opportunity for mis-specification of causal effects. Although mis-specification can occur in many epidemiologic settings, the potential for incorrect adjustment in social epidemiology, where exposures are “distal” and involve specified intermediate pathways, may be particularly profound.

Social factors have long been a subject of epidemiologic inquiry, and the current revitalization of this research program has witnessed great strides in conceptualization and measurement of new exposures, 31 application of novel multivariate analytic techniques, 32 and further development of underlying social theory. 33 Nevertheless, the strategy for conceptualizing, estimating and interpreting measures of causal effect for social factors has remained largely stagnant, relying on statistical traditions that were designed to describe data rather than elucidate causal processes. 17 pp.133–171, 34 The public health promise of epidemiologic research is the scientific understanding of etiologic processes to inform appropriate population-level interventions. 35 Social factor data represent a particular challenge, because of the their inherent complexity and tendency to manifest in distinct ways in different contexts. 36 Nonetheless, the clear centrality of social factors such as education and income to human health dictates that we strive to improve upon existing methodology. In particular, an appreciation of how background knowledge of covariate structure influences the choice of valid covariate adjustment strategy, the focus of the present discussion, is an important step in allowing social epidemiologic research to have a real and positive impact on public health and social policy.

## Acknowledgments

Sander Greenland, Irva Hertz-Picciotto, Charles Poole and James Robins provided generous and constructive critique.

## Appendix

“Probabilities” as used in this paper, denoted with the *P* () operator, are to be understood in the context of a hypothetical infinite superpopulation which constitutes the target population and from which the observed population is considered to be a random sample. Our interest is in estimating causal effect parameters in this superpopulation. Furthermore, in our methodological development we ignore sampling errors in the observational data by assuming a sufficiently large sample size. In Appendix 3, we compute point-wise confidence intervals for the causal effect estimates in the analysis example in order to provide some indication of the variability introduced by sampling.

## Appendix

In order to demystify these calculations, we work an example in this appendix for a single age window, 55–64 years. Table A2.1 shows the number of 55- to 64-year-old male participants who were classified under each outcome status (alive or dead) at the end of the follow-up period.

Step 1: P(e): P(E = L) = (7660 + 2259)/(20380 + 4709) = 0.395 P(E = M) = (6950 + 1480)/(20380 + 4709) = 0.336 P(E = H) = (5770 + 970)/(20380 + 4709) = 0.269

Step 2: P(i): P(I = L) = (3583 + 1462)/(20380 + 4709) = 0.201 P(I = M) = (6335 + 1574)/(20380 + 4709) = 0.315 P(I = H) = (10462 + 1673)/(20380 + 4709) = 0.484

Step 3: P(i‖e): P(I = L‖E = L) = (2350 + 1024)/(7660 + 2259) = 0.340 P(I = L‖E = M) = (918 + 322)/(6950 + 1480) = 0.147 P(I = L‖E = H) = (315 + 116)/(5770 + 970) = 0.064 P(I = M‖E = L) = (2931 + 790)/(7660 + 2259) = 0.375 P(I = M‖E = M) = (2366 + 524)/(6950 + 1480) = 0.343 P(I = M‖E = H) = (1038 + 260)/(5770 + 970) = 0.193 P(I = H‖E = L) = (2379 + 445)/(7660 + 2259) = 0.285 P(I = H‖E = M) = (3666 + 634)/(6950 + 1480) = 0.510 P(I = H‖E = H) = (4417 + 594)/(5770 + 970) = 0.743

Step 4: Structured Adjustment for Scenario 1: P(d‖e): P(dead‖E = L) = (2259/(7660 + 2259)) = 0.228 P(dead‖E = M) = (1480/(6950 + 1480)) = 0.176 P(dead‖E = H) = (970/(5770 + 970)) = 0.144

Step 5: Standard Adjustment for Effect of E given I:MATH P(dead‖SET[E=L]) =(1024/(2350+1024))·(0.201)+ (790/(2931 + 790)) ·(0.315)+ (445/(2379 +445))·(0.484)=0.204 P(dead‖SET[E=M])= (322/(918 + 322)) · (0.201) + (524/(2366 + 524)) · (0.315) + (634/(3666 + 634)) · (0.484) = 0.181 P(dead‖SET[E=H])= (116/(315 + 116)) · (0.201) + (260/(1038 + 260)) · (0.315) + (594/(4417 + 594)) · (0.484) = 0.175

Step 6: Standard Adjustment for Effect of I given E:MATH P(dead‖SET[I=L]) =(1024/(2350+1024))·(0.395)+ (322/(918+ 322)) · (0.336)+ (116/(315 + 116))·(0.269)=0.280 P(dead‖SET[I=M])=(790/(2931 + 790)) · (0.395) + (524/(2366 + 524)) · (0.336) + (260/(1038 + 260)) · (0.269) = 0.199 P(dead‖SET[I=H])=(445/(2379 + 445)) · (0.395) + (634/(3666 + 634)) · (0.336) + (594/(4417 + 594)) · (0.269) = 0.144

Step 7: Structured Adjustment for Scenario 2:MATH I = L I = M I = H Step 3 Step 6 Step 3 Step 6 Step 3 Step 6 P(dead‖SET[E = L]) = (0.340) · (0.280) + (0.375) · (0.199) + (0.285) · (0.144) = 0.211 P(dead‖SET[E = M]) = (0.147) · (0.280) + (0.343) · (0.199) + (0.510) · (0.144) = 0.183 P(dead‖SET[E = H]) = (0.063) · (0.280) + (0.193) · (0.199) + (0.743) · (0.144) = 0.163

Causal effects are contrasts between the outcome probabilities for each level of the hypothetical intervention, so that, for example, the effect of education high versus low, as depicted in Figure 7, is (0.204 − 0.175) = 0.030 for the standard adjustment method, (0.228 − 0.144) = 0.084 if Scenario 1 is true, and (0.211 − 0.163) = 0.048 if Scenario 2 is true. The scaled effect measures, such as those in Figure 8, are divided by the crude age-specific probability of mortality, which in this age window is 4709/(20380 + 4709) = 0.188

## Appendix

We have avoided variance calculations in this article, but point-wise confidence intervals are readily computed for these effect estimates.

#### Scenario 1

We have focused on the contrast between setting everyone to *E* =*L* versus setting everyone to *E* =*H*. In Scenario 1, this is just the difference between two binomial proportions (the crude conditional probabilities), each with variance (p(1 − p)/n). Using the numbers from Appendix 2 :MATH MATH MATH MATH

We are interested in the variance of this difference contrast (0.228 − 0.144) = 0.084

Var(*E* =*L* *vs* *E* =*H*) = [Var *E* _{=} *L*

+ Var *E* _{=} *H*] = [0.00001745 + 0.000018288] = 0.000035738, or a standard error of 0.00598. So, the large-sample 95% confidence interval for this difference would be:MATH

#### Scenario 2

The contrast of interest is:MATH

which expands to an expression in 15 statistically independent variables (various sample conditional and unconditional probability estimators). We use the “delta method” to obtain an estimate of the variance of this contrast by taking the first partials of the difference with respect to these 15 independent variables, and using the first term of the Taylor Series expansion of the difference about the expected value “point” of the 15 independent variables, *P* _{0} = [*E* (*X* _{1}),, *E* (*X* _{15})].

Using the numbers in the example from Appendix 2, we are interested in the variance of the difference (0.211 − 0.163) = 0.048. The variance for this contrast, calculated using the method described above, equals 0.0000085849, or a standard error of 0.00293. The large-sample 95% confidence interval for this difference is therefore:MATH

#### Standard Adjustment

The contrast of interest is:MATH

which expands to an expression in 8 independent variables. Again, we use the “delta method” to obtain an estimate of the variance of this difference. Using the numbers in the example from Appendix 2, we are interested in the variance of the difference (0.204 − 0.175) = 0.030. The variance for this contrast, calculated using the expression above, equals 0.0000536, or a standard error of 0.00732.

The large-sample 95% confidence interval for this difference is therefore:MATH MATH

Point-wise confidence intervals are shown for this contrast in Scenarios 1 and 2 for all ages in Figure A3.1. The standard adjustment estimate is omitted in this Figure for the sake of clarity.