In recent decades, there has been an explosion of methods for causal analysis. In this issue of Epidemiology, Gatto et al^{1} provide a valuable schema for coping with the numerous distinctions that have arisen. Although the schema is already quite comprehensive, we will take it as a starting point for further discussion and development.

Among the crucial dimensions highlighted by Gatto and colleagues^{1} is the target population in which effects are defined. As the authors point out, this dimension is often left vague or implicit, and yet at least some (and usually all) effect measures depend on the population or person. In theory, both the effect measure to be estimated (the causal estimand or target parameter) and the population to which it applies should be determined from stated research hypotheses or policy goals. In practice, however, analysis methods and selection restrictions may result in estimates of effects for a different population, necessitating adjustment to yield estimates for the population of interest or target population for intervention.^{2–4} Furthermore, identifying effect variation across subpopulations or individuals (who can be viewed as subpopulations of size 1) is critical to efforts to tailor clinical practice to provide “patient-centered” care and to inform personal treatment choices.

In this commentary, we explore further classification dimensions arising from the population aspect of epidemiologic measures: (1) population effects versus subgroup effects versus average individual effects and (2) collapsible versus noncollapsible effects. We also discuss method-induced subsetting and generalization problems, the latter being crucial to the translation of epidemiologic science into public policy and healthcare recommendations. We end by pointing out the importance of considering potential interference between units. As in the article by Gatto et al,^{1} our discussion will concern causal parameters (actual effects) in populations and subpopulations rather than estimates of effects.

#### CONCEPTS AND NOTATION ASSUMING NO INTERFERENCE

Suppose *Y* is an observed outcome of interest, which may be an indicator (eg, 1 = dead, 0 = alive at end of study), a rate, or a quantity (eg, blood pressure); *X* is a treatment or exposure variable whose effects on *Y* are under study; and *Y*_{x} is the potential outcome when *X* = *x* is received, possibly counter-to-fact. For the moment, suppose that *Y* and *X* are individual variables, so that in a population of *N* individuals indexed by *i* = 1, …, *N* we have a *Y*_{i} and *X*_{i} for each individual *i*. *Y* and *X* then denote the variables for an unspecified individual.

Gatto et al^{1} use a now-standard model for the effect of a simple binary treatment *X* (eg, *X* = 1 for tetanus immunization, 0 for none) in which *Y* is supplemented by two potential outcomes, which we label *Y*_{1} = outcome if *X* = 1 is received and *Y*_{0} = outcome if *X* = 0 is received. *Y* is then the actual outcome

^{5–7} The average absolute treatment effect in a population *p* is defined as

where the expectation *E*[·] is a shorthand for averaging over individuals in *p*:

This average individual effect is both the average change in *Y*, *E*[*Y*_{1} − *Y*_{0}] and the change in average *Y*, *E*[*Y*_{1}] – *E*[*Y*_{0}], produced by treating everyone (all *X* = 1) versus treating no one (all *X* = 0).^{8} This has been labeled the average treatment effect,^{5} the marginal treatment effect,^{9} and the population-average treatment effect.^{10},^{11} We emphasize that (as in the article by Gatto et al^{1}) the above notation and terms assume that neither the treatment nor the outcome of any given person affects the treatment or outcome of any other person, an assumption we will refer to as “no interference” among individuals, discussed further below.

When *Y*_{1} and *Y*_{0} are binary outcome indicators, the average *E*[*Y*_{1}] equals the proportion Pr(*Y*_{1} = 1) with *Y*_{1} = 1, the average risk (incidence proportion) if everyone receives *X* = 1; similarly *E*[*Y*_{0}] = Pr(*Y*_{0} = 1) is the average risk if everyone receives *X* = 0. Thus, *E*[*Y*_{1} − *Y*_{0}] = Pr(*Y*_{1} = 1) − Pr(*Y*_{0} = 1) is the average-risk difference. If *Y*_{1} and *Y*_{0} are quantities, *E*[*Y*_{1} − *Y*_{0}] = *E*[*Y*_{1}] – *E*[*Y*_{0}] is simply the mean difference in outcome when *X* = 1 versus *X* = 0. For example, if *Y* is time to death, then *E*[*Y*_{1} − *Y*_{0}] is the expected years of life lost (if negative) or saved (if positive) from having *X* = 1 versus *X* = 0. Given a set of covariates sufficient for confounding control and a sufficiently large and unbiased sample, these absolute population effects can be estimated by standardization to the total population covariate distribution (G-computation) and by inverse probability of treatment weighting^{2},^{12},^{13}; when the expected outcome is always positive, parallel formulas yield relative effect measures such as *E*[*Y*_{1}]/*E*[*Y*_{0}]. It has been argued that other measures of effect, such as years of life lost, are important in studying disease etiology, as well as for policy and legal purposes.^{14–17} We thus recommend that dimension 2 proposed by Gatto and colleagues^{1} be expanded to include measures beyond those comparing risks or rates. This expansion is easily accommodated using the standard framework just described, by letting *Y* represent any outcome measure. We now turn to more intricate issues.

#### INDIVIDUAL VERSUS POPULATION MEASURES

One dimension we would add to a classification scheme is that of population measures versus individual measures. This dimension is distinct from item (6) in Gatto et al,^{1} insofar as certain population measures have no individual analog and other measures may exhibit discrepancies between averages over individuals and other aggregate (population) measures.

Suppose that the outcome of interest is annual per-capita healthcare expenditure. While at first glance this variable might seem individualized by the “per capita,” it is not derived by averaging individual expenditures. Instead, to adjust for different sizes of compared populations, total population expenditures are divided by *N*; nonetheless, the result is an aggregate macroeconomic (system-wide) measure of complex capital outlays not only by individual persons but also by hospitals, insurance carriers, health plans, and governments (which are funded by fees and taxes on corporations, sales, and property, as well as on individual persons). We could average direct individual payments for care and insurance, but that would not capture the same concept.

Now suppose there is an unambiguous individual-level outcome variable *Y*. For some measures of this outcome, the average value may not equal the aggregate population or subgroup measure estimated by our statistical procedures. For example, if (as in most potential-outcome expositions) *Y* is deterministic, the odds of *Y* = 1 versus *Y* = 0 for an individual *i* can be only 0/1 = 0 if *Y*_{i} = 0 or 1/0 = ∞ if *Y*_{i} = 1. Hence, the average individual odds will be zero if no one has *Y* = 1 and infinite otherwise. If, as is usually the case, the individual-level outcome *Y* varies across persons in the population, the population (marginal) outcome odds *E*(*Y*)/*E*(1 − *Y*) = Pr(*Y* = 1)/Pr(*Y* = 0) will be a positive finite number and hence not equal to the average individual odds. To eliminate zeros and infinities, consider next a stochastic (probabilistic) individual model, in which each individual *i* is like a coin about to be tossed that has probability *P*_{i} of *Y*_{i} = 1 and 1 − *P*_{i} for *Y*_{i} = 0. Such Bernoulli “random-effects” models arise when allowing for unmeasured individual risk factors so complex (eg, the entire genome and its expression) that no two individuals are alike. Individual odds are then *ω*_{i} = *P*_{i}/(1 − *P*_{i}). If *Y* is uncertain (0 < *P*_{i} < 1) for everyone, these *ω*_{i} and hence their average *E*(*ω*) will be finite and positive, but if the risks *P*_{i} vary *E*(*ω*) will exceed the population odds *E*(*Y*)/*E*(1 − *Y*) = *E*(*P*)/*E*(1 − *P*), although their difference will be small if the odds *ω*_{i} are very small for everyone.^{8}

These inequalities extend to causal effect measures. Suppose each person has a well-defined risk *P*_{1i} when given *X* = 1 and a smaller risk *P*_{0i} when given *X* = 0, with corresponding odds *ω*_{1i} and *ω*_{0i}. The average causal odds ratio will then be further from the null than the population causal odds ratio:

that is, the average effect of *X* on the individual odds exceeds the effect of *X* on the population odds, illustrating how an odds ratio can be noncollapsible even when it is unconfounded.^{18},^{19} The same noncollapsibility problem arises when comparing a total population odds ratio to average subgroup odds ratios or when using person-time rate ratios or differences instead of odds to measure effects.^{19},^{20} It does not arise, however, with risk differences or risk ratios.^{18},^{19} We are thus led to another dimension for classifying measures of effect: whether the effect measure itself is collapsible (as with risk ratios and risk differences) or noncollapsible (as with rate ratios and odds ratios). Note that here we are speaking of the effect measures themselves, not estimates or substituted associations. By definition, these measures cannot be biased, and thus, noncollapsibility is not confounding but is rather a change in an actual effect measure as stratification increases.^{19} This phenomenon becomes important when the outcome under study is common in at least one comparison group so that rate and odds ratios no longer approximate risk ratios. When this happens, we prefer to use collapsible measures to ease causal interpretation.^{19}

The individual versus population dimension can be subsumed under the more common statistical differentiation between conditional and marginal effects,^{9},^{21} by allowing for conditioning so fine that individuals are distinct. At that fine level of conditioning, however, statistical methods must impose parametric constraints to identify the conditional effects of interest. Typically, these constraints imply that individuals with identical covariate values will have identical risks when given the same treatment (conditional exchangeability). Although such models are essential for statistical analysis, molecular epidemiology demonstrates that these models may omit many important risk factors and effect modifiers. Thus, conceptually, we prefer formulations (as given above) that allow individual risks to vary within levels of measured covariates.

#### METHOD-INDUCED SELECTION

To describe issues concerning subgroups of the initial population (Gatto et al^{1} characteristic 6), suppose *S* indicates selection from the population into a subgroup, with *S* = 1 for those in the selected subgroup and *S* = 0 otherwise. Then, the average absolute treatment effect in the selected subpopulation is

where *E*[·|*S*=1] is the average over individuals in the selected subpopulation. When *S* = *X*, the subpopulation is the treated or exposed (those with *S* = *X* = 1); the effect

has been labeled the “effect of treatment on the treated”^{22} and the “average effect of treatment on the treated”.^{4},^{5} This measure is estimated by classical exposed population (“indirect” or standardized morbidity ratio-weighted) standardization and by odds-of-treatment–weighted analyses.^{2},^{12},^{23}

The composition of the selected subpopulation relative to the intended target population can be crucial in determining the relevance of the resulting estimate of *E*[*Y*_{1} − *Y*_{0}|*S* = 1]. In some studies (eg, of occupational hazards), the selected subpopulation may itself be the target, and thus, the effect in this subpopulation is the target parameter. If the selected subpopulation can be expected to have the same distribution as the target for relevant factors (as for a simple random sample from the target), then we may also expect the subpopulation effect to equal the target effect, that is,

Nonetheless, we would often expect this equality to fail under typical selection strategies and analysis methods. As an example, if the subpopulation is selected from the target with balanced matching of untreated subjects to treated subjects or by propensity score matching, the subpopulation will have a distribution of matching factors that follows that of the treated subpopulation (*X* = 1), not the original population. Consequently, if these factors include important effect modifiers, the average effect in the matched subpopulation will usually not equal the average effect in the original population. In particular, if there are no uncontrolled modifiers, it will equal the effect of treatment on the treated.^{3}

In a similar fashion, odds-of-treatment weighting estimates the effect of treatment on the treated by reweighting subjects in the unmatched population to have a weighted distribution equal to the unweighted distribution of the exposed.^{2},^{23} Method-induced selection is thus a benefit of the matching (and reweighting a benefit of the odds-of-treatment weighting) if the effect of treatment on the treated is indeed the target,^{24} but it is a potential source of bias if the average effect in the total unmatched population is the target. In contrast, adjustment using outcome regression or inverse-probability-of-treatment weighting or both on the original unmatched population will tend to estimate the effect of treatment on that original population.^{3},^{24},^{25} This can be seen as an advantage if the total unmatched population is the target, but a potential source of bias otherwise.

Some methods may estimate parameters with no interpretation as an average causal effect in any specified population of interest. Consider propensity score matching in which some treated persons have no untreated match. This is a type of “positivity failure,” with a consequence that the targeted effect is *E*[*Y*_{1} − *Y*_{0}|*S* = 1] where *S* = 1 means “*X* = 1 and match available.” This quantity may not equal the intended target *E*[*Y*_{1} − *Y*_{0}|*X* = 1], although the discrepancy may be slight if matching failure is rare.^{4} A related problem arises in inverse-probability-of-treatment weighting estimation when extreme weights are trimmed or truncated.^{26} The population corresponding to the weights then does not correspond to the original population, and the estimated effect need not equal the average treatment effect in that target or any natural population. Alternative weight estimation methods do not appear to require weight truncation^{27},^{28} and thus can be recommended on interpretational and statistical grounds.

Simple instrumental variable methods estimate the average treatment effect among compliers *E*[*Y*_{1} – *Y*_{0}|*C* = 1], where *C* is the compliance indicator that takes a value of 1 for persons who follow the treatment to which they are assigned, regardless of what they are assigned.^{29} *E*[*Y*_{1} – *Y*_{0}|*C* = 1] has been called the “local average treatment effect”^{29} or the “complier average causal effect.”^{30},^{31} Whether the complier average treatment effect is in itself an effect of substantive interest is debatable, especially if such compliers cannot be identified in advance. Whether the complier average causal effect equals another average effect of interest depends heavily on whether the predictors of compliance are also modifiers, as would be expected if side effects predict both compliance and treatment effects.^{30–32} Similar if not stronger reservations apply to principal stratum effects.^{30},^{31},^{33},^{34}

#### GENERALIZATION (PROJECTION) OF EFFECTS

If we wish to project estimates from the subgroup selected for analysis to a different target population, we will need the modifier-specific effects and the distribution of modifiers in the new target. Given large enough numbers, this may be a relatively simple requirement when projecting to the original parent population of our analyzed subgroup, because given the data it can be carried out with regression standardization or equivalent methods.^{12} However, owing to lack of data, projection can be daunting and hazardous when the goal is to generalize or transport treatment effect estimates to external targets.

Randomized clinical trials (RCTs) offer cautionary examples. Development and approval trials are often conducted on patients selected on characteristics one should expect to be modifiers, such as outcome prognosis, expected responsiveness to treatment, unresponsiveness to standard treatments, and low risk of untoward side effects. Yet, RCTs typically enroll study groups that are too small to allow detection of every important modifier, let alone estimate modification accurately. As a result, it is unsurprising when effects observed in clinical practice and in postmarketing surveillance fail to match those seen in earlier trials.

We believe that when the goal of our research is to inform public-health and medical practice, external targets are the main targets of interest, for the only interventions or practices that can be informed by our research results are in the future and thus are unobserved. Furthermore, as some RCT examples show, it can be dangerous to view available observations as random samples from a population that includes our future targets. While there are now various methodological tools for projecting from observed populations to external targets,^{13},^{35–38} as with all tools these methods require certain basic information that is often absent or must be imputed from past data. We thus would extend any schema to include a category for measures applying to populations yet unseen and hence unsampled. We leave open the question of how best to incorporate this category, although perhaps expanding item (6) of Gatto et al^{1} to more categories would suffice.

#### MEASURES UNDER INTERFERENCE

As do Gatto et al^{1} and the vast majority of other writers, we have so far assumed no interference. Nonetheless, violations abound. Infectious disease and social research provide many important settings in which the treatment, outcome, and effect of treatment on outcome depend on the treatment, outcome, and effects for other population members.^{39–42} For example, sufficiently high population prevalence of vaccination can lead to steep risk reduction for those remaining unvaccinated (herd immunity), limiting the effect of vaccination on those so protected, and possibly affecting their choice to be vaccinated as well as their outcome if they choose not to. Further details on the vaccination distribution will affect the risk of unvaccinated persons, including whether family members or neighbors are vaccinated.

As a consequence, it can be ambiguous to discuss effects of an individual vaccine indicator *X* on an individual in the population. Furthermore, given that some people may avoid or miss vaccination and yet large numbers will be vaccinated, it may be of no policy use to consider effects of everyone versus no one vaccinated. Thus, any useful outcome and effect measure (whether for the population, a subgroup, or an individual person) will instead be a function of the distribution of vaccination across the entire population. Note that “distribution” here means much more than a generic probability distribution applying to everyone in the population; it also includes details of exactly who gets the vaccine and who does not (including their age, school enrollment, family members), as well as considerations of vaccine effectiveness. Potential outcomes are then as numerous as such distributions, and the expression of population effects as averages across individual effects appears useless.

It is beyond the scope of this commentary to address the subtleties and complexities that follow from these observations. We do suggest, however, that it is important to extend effect definition and measurement to situations with interference, and that efforts to do so^{39–42} will raise classification dimensions beyond those discussed above and by Gatto and colleagues.^{1}

#### CONCLUSION

We have not attempted to raise, let alone address, all the issues that could arise from a classification scheme for effects and their measures. As with terminology, any classification scheme will have arbitrary elements, but a good scheme can be invaluable for teaching and communication. We therefore thank Gatto and colleagues^{1} for providing a well-developed foundation. We hope this discussion and expansion might encourage further contributions to this topic.

#### ABOUT THE AUTHORS

KATHERINE J. HOGGATT is a Research Health Scientist at the VA Greater Los Angeles and an Adjunct Assistant Professor of Epidemiology at the UCLA Fielding School of Public Health, Los Angeles, California. SANDER GREENLAND is Professor Emeritus of Epidemiology at the UCLA Fielding School of Public Health and Professor Emeritus of Statistics at the UCLA College of Letters and Science, Los Angeles, California.