Sibling comparison designs, such as the co-twin control design, have been considered simple, yet ingenious, extensions of the matched case-control design. Instead of explicitly matching on a set of measured variables, the use of siblings as controls will automatically match on many unmeasured factors, including cultural background, parental characteristics and child-rearing practices, and—particularly for monozygotic twins—genetics. This promise of adjustment for a wide range of unmeasured, even unknown, confounders has made co-twin control methods popular, especially in etiologic research on controversial topics. Recent applications include the associations of body mass index with mortality,^{1} sexual orientation with mental disorder,^{2} and birth weight with cardiovascular disease.^{3} Twins are relatively rare, and the degree to which associations among twins are generalizable to the entire population has been questioned. Recent commentaries have therefore advocated matching of other relatives, primarily full- and half-siblings.^{4},^{5} Indeed, sibling comparisons have already been used in economics research for at least 30 years,^{6} and recent applications in epidemiology include the association of mother's smoking during pregnancy with offspring psychologic traits,^{7}^{–}^{9} and the association of birth weight with intelligence.^{10}

Methodologic aspects of sibling-comparison designs have been relatively unexplored. Common analytic strategies for paired data include the flexible between-within model^{11}^{–}^{13} and conditional logistic regression. In the absence of nonshared confounding and other biases, these methods produce estimates of the causal effect of exposure free from confounding by factors shared by the siblings.^{13} The absence of bias is rather unlikely in real applications, but what impact these biases might have on sibling comparisons has, to our knowledge, not been discussed in the epidemiologic literature. In contrast, economists have long been aware that random measurement error and omitted variables can have specific influence on within-pair estimates under a linear model.^{6},^{14}

We explore the impact of 2 common sources of bias: confounding by factors not perfectly shared by siblings, and random measurement error of exposure. We will show, analytically and with simulations, how sibling comparison methods in these situations may actually lead to increased bias. Specifically, we give a brief summary of how sibling comparisons are analyzed and explain why the designs may lead to increased bias. Next, we show the impact analytically under a linear model, and we illustrate the impact through simulations under a logistic model where the outcome and exposure are dichotomous. We end with a discussion of how results from sibling comparisons may be interpreted in light of the caveats raised here.

#### STATISTICAL ANALYSES OF SIBLING COMPARISON DESIGNS

For binary outcomes, the sibling comparison is often considered an extension of the matched case-control design. Sibling-pairs discordant for the studied outcome are selected, and the association of exposure with outcome in the paired data is analyzed using McNemar's test or conditional logistic regression. Because the analysis is essentially stratified by sibling pair, only pairs that also differ in exposure level will contribute to the estimated association. Although conditional logistic regression is the method of choice in cross-sectional case-control settings, most sibling comparison studies are nested in a larger cohort or population register and may thus be analyzed using the more general between-within model.

To describe this model, let Y_{ij} and X_{ij} be the outcome and exposure, respectively, of individual *j* in pair *i*. The pairs are symmetric, so the ordering of *j* is exchangeable. First consider an ordinary generalized linear regression model, ignoring the fact that the data consist of pairs:

The between-within model is a simple extension of this, where the individual's exposure and the siblings' mean exposure are included as independent variables:

where *X¯i* is the average exposure in pair *i*. The link function, *g{}*, puts the sibling comparison in a generalized linear model framework. Different link functions enable for instance linear, logistic, and log-linear regression. The exposure-outcome association is split into 2 parts: a within-effect β_{W} and a between-effect β_{B}. When the exposure is dichotomous, the between-within model will give the same result as analyzing only exposure-discordant twin pairs.^{15}

Recently, we showed that in the absence of nonshared confounders, the within effect may be interpreted as a causal exposure effect in the subpopulation consisting of exposure discordant pairs.^{13} The between-effect contains information on how strongly factors shared by the family affect the exposure-outcome association. However, this information is generally considered difficult to interpret and is ignored in most applications.

Although some sibling comparison studies estimate only within effects, most aim to evaluate the presence and degree of confounding due to shared factors by comparing the unpaired estimate, β, to β_{W}. The customary interpretation has been that if β = β_{W} there is no confounding by shared factors; if β is different from β_{W}, there is at least some confounding by shared factors; and if β ≠ β_{W} = 0, the entire association is caused by shared confounders. As we will show in the following sections, none of these conclusions is necessarily true.

#### HOW SIBLING COMPARISON DESIGNS CAN INCREASE BIAS

To understand how sibling comparison designs could increase bias, it may help to consider a hypothetical situation in which the exposure, confounder, and outcome are binary, there is no causal effect of exposure on outcome, and the exposure and outcome are both influenced by sibling-shared risk factors, but the confounder is not. As noted earlier in the text, the only pairs that contribute to the estimated “within-pair” association are those discordant in exposure. Although such discordant pairs may share many causes of the exposure, the fact these pairs are differently exposed implies a selection of pairs that also differ in nonshared causes of the exposure, including common causes of both exposure and outcome.

In other words, with regard to nonshared confounders, the members of exposure-discordant pairs (although they are siblings) are likely to differ more from each other than 2 randomly selected persons from the same population having the same exposure levels. If the effect of the confounder is to increase both the probability of being exposed and the probability of developing the outcome, unexposed persons are less likely to have the confounder, and consequently less likely to develop the outcome. When picked as part of a discordant pair, a sibling is also selected to be different from the exposed relative, further reducing their likelihood of having the confounder. As an effect, the confounder-exposure association will be strengthened, thus increasing any spurious association due to nonshared confounding bias.

This problem can be illustrated using causal diagrams.^{16} Suppose that the data-generating mechanism is given by the causal diagram in the Figure. X is the exposure of interest, Y is the outcome, and C denotes the full set of unmeasured confounders, both shared and nonshared. C_{ij} represents C for the *j*th individual in the *i*th sibling pair. F_{C} and F_{X} denote the familial factors influencing C and X, respectively. The hypothetical situation discussed in the preceding paragraph is a special case of this scenario, where F_{C} is completely absent. Suppose that all variables in the Figure are binary, and that all causal influences are “positive,” so that F_{C} = 1 increases the likelihood for C_{ij} = 1, which in turn increases the likelihood for X_{ij} = 1, etc. A crude (ie, nonpaired) analysis compares the exposed (X_{ij} = 1) with the unexposed (X_{ij} = 0), in terms of the outcome Y_{ij}. Due to the path C_{ij} → X_{ij}, those with X_{ij} = 1 are more likely to have C_{ij} = 1 than those with X_{ij} = 0, which induces an imbalance in C_{ij} across these groups. A sibling analysis makes a similar comparison, but restricted to the pairs with X_{i1}≠X_{i2}. In other words, a comparison is made between those with (X_{ij} = 1, X_{ij'} = 0) and those with (X_{ij} = 0, X_{ij'} = 1), in terms of the outcome Y_{ij}. What does the extra information about the sibling's exposure, X_{ij'}, tell us about the confounder C_{ij} for the person under study? Due to the path C_{ij}←F_{C}→C_{ij'}→X_{ij'}, we have that X_{ij'} = 0 decreases the likelihood for C_{ij} = 1 and that X_{ij'} = 1 increases the likelihood for C_{ij} = 1, that is, the path induces a positive correlation between C_{ij} and X_{ij'}. Thus, ignoring other paths, a restriction to pairs with X_{i1}≠X_{i2} counteracts the imbalance in C_{ij} across the comparison groups with X_{ij} = 1 and X_{ij} = 0. In other words, because the confounder is influenced by familial factors, confounding will be decreased by this restriction. However, we must also take the path C_{ij}→X_{ij}←F_{X}→X_{ij'} into account. By conditioning on the collider X_{ij}, we induce a spurious (ie, noncausal) association between C_{ij} and X_{ij'}.^{17} Suppose now that X_{ij'} = 0. It is then likely that F_{X} = 0, which makes it more likely that X_{ij} = 0. If, despite this, X_{ij} = 1 (ie, the pair is discordant), then this is likely due to C_{ij} = 1, and so this path induces a negative correlation between C_{ij} and X_{ij'}. Thus, ignoring other paths, a restriction to pairs with X_{i1}≠X_{i2} induces an even larger imbalance in C_{ij} across the comparison groups with X_{ij} = 1 and X_{ij} = 0. In other words, because the exposure is influenced by familial factors, confounding will be increased by this restriction. In reality, both these paths are likely to be present, and the magnitude of the resulting bias depends on the relative strength of the arrows along both paths, that is, on whether the exposure or the confounder is most strongly influenced by familial factors. As shown below, this makes the sibling correlation in exposure and confounders important for the interpretation of sibling comparisons.

If there is random measurement error (misclassification) in the observed exposure, then the association of the observed exposure and the outcome will be weaker than the association of the true exposure and the outcome. Among the discordant pairs, some are in truth concordant but appear to be discordant due to misclassification. As argued in the beginning of this section, analyzing discordant pairs implies a selection for pairs that differ on nonshared causes of the exposure. Because random measurement error is not shared by siblings, we are selecting for pairs that differ in the direction of such error. In other words, among individuals in discordant pairs, fewer will be correctly classified on exposure than in the general population. This will lead to an increased attenuation of the observed exposure-outcome association.

The reasoning above is valid because only sibling pairs that differ in exposure will be informative on the within-pair exposure-outcome association. It applies to case-control and cohort settings, and to continuous, binary, or count measures of exposure or outcome, and is not dependent on the statistical analysis procedures used. Although sibling comparison methods have been used under a variety of names (eg, “sibling discordance study,”^{4} “sibship design,”^{5} “discordant-twin design,”^{18} “cotwin control,”^{19},^{20} “cotwin-matched case-control,”^{21}), this reasoning applies to all.

#### IMPACT UNDER A LINEAR BETWEEN-WITHIN MODEL

If the outcome is continuous and the true causal effects are linear, it is possible to derive exact mathematical expressions for the crude and within-pair regression coefficients, showing explicitly how sibling comparison models may increase bias in parameter estimates. We show only the results in this section; for full derivations, see the Appendix.

Consider the case in which Y and X are continuous and centered (so we may ignore intercepts), and all causal effects in the Figure are linear. The true causal model can then be written as:

We define σ_{C}^{2} = *Var*(*C*), σ_{εY}^{2} = *Var*(ε_{Y}), and σ_{εX}^{2} = *Var*(ε_{X}), and note that (due to F_{C}, F_{X}, and F_{Y}) C, X, and Y may all be correlated within pairs: *Cov*(*Ci1*; *Ci2*) = ρ*C* σ_{C}^{2}, *Cov*(ε_{Yi1}; ε*Yi2*) = ρ_{εY} σ_{εY}^{2}, and *Cov*(ε_{Xi1}; ε_{Xi2}) = ρ_{εX} σ_{εX}^{2}. According to the Figure, all confounding between X and Y is captured by C, and C is associated with X and Y only through its causal influence. Thus, person *ij*'s deviation terms (εs) are independent of X, Y, C and each other, and there are no within-family cross-variable correlations (*Cov*(ε*Yi1*; ε*Xi2*) = 0, etc).

In this situation, the parameter of interest—the causal effect of one unit's increase in X on Y—is β_{YX}. A linear regression of X and C on Y would produce an unbiased estimate, although correct inference requires robust standard errors to account for the correlated structure of the data. The regression of X on Y would suffer from confounding bias.

##### Impact of Confounding

The ordinary, or crude, estimate is simply the coefficient from a linear regression of X on Y. Expressed in terms defined above, the crude estimate can be shown (see Appendix) to be:

We see that β is a sum of the causal effect, β_{YX}, and a term due to confounding bias. The degree of confounding depends on the strength of C's effect on Y and X but does not depend on the strength of the familial clustering of either Y, X, or C. The expression for the within-effect is similar, but not identical:

The only difference from β is the

in the denominator of the bias term. If all confounders are perfectly shared by siblings, ρ_{C} = 1, and the denominator of the confounding term will be infinite, the confounding term will be zero and β*W* equal the causal effect. However, if ρ*C* = ρ_{εX} then β = β_{W}, even though there may be confounders shared by siblings. We also see that if ρ_{C} > ρ_{ε}_{X}, the confounding bias in β_{W} will be less than in β, and if ρ_{C} < ρ_{εX}, then β_{W} is more biased than β. Because Cor(X_{i1}; X_{i2}) is a linear combination of ρ_{C} and ρ_{ε}_{X} (weighted by the proportion of variance in X due to C and to other causes, respectively), it follows that these conclusions also hold when substituting ρ_{ε}_{X} with Cor(X_{i1}; X_{i2}). Thus, we have shown that under the linear model, the within estimate will be less biased than the crude estimate only when the set of all confounders is more strongly shared by siblings than the exposure. Below we will show that the same conclusion holds under the logistic model.

##### Impact of Measurement Error

In general, measurements of X are likely to be somewhat inaccurate. In the presence of random measurement error, the association between observed X (X*) and Y is lower than the association of X and Y, and this attenuation will be increased in the within-pair association. Let the X* equal true X plus an error term

Then, the attenuated association can be shown (see Appendix) to be

where γ is the reliability of our measure of X. Thus, a reliability of 90% will attenuate the regression coefficient by 10%. For the within estimate, the solution is

Like β, the within-estimate is attenuated by a multiple of the reliability. However, β_{W} is further attenuated by the γ in the denominators. Consider a situation where the within-estimate is not confounded (eg, when β_{XC} = 0).

Although β is attenuated by 1 − γ, β_{W} is attenuated by *(1* − γ*)/(1* − *Cor(X**_{i1}, *X**_{i2}*))*. This factor has been reported previously,^{6},^{18} but we point out that it holds only under no confounding. Otherwise, the attenuation of the within estimate will be even stronger, as described by Eq. 9.

#### IMPACT UNDER A LOGISTIC BETWEEN-WITHIN MODEL

If the causal effects follow a logistic model, a model commonly used for dichotomous outcomes, we do not have a closed-form solution of the regression coefficients. To show that the same qualitative conclusions still hold, we simulated paired data following the causal structure in the Figure, under a logistic model with binary exposure, confounder, and outcome. For details on the simulation set-up, please see the eAppendix (http://links.lww.com/EDE/A596). The influence of the within-pair correlation of confounders and measurement error in the exposure is shown in the Table. Note that under the logistic model, due to the noncollapsibility of the odds ratio, the “within effect” is a causal exposure effect in the subpopulation consisting of exposure discordant pairs, rather than a pair-specific effect.^{13} This means that in the Table, the OR_{W} = 5 in rows where Cor(C1, C2) = 1 (eg, rows 1 and 5) is a causal, nonconfounded effect of X on Y, but due to noncollapsibility, it would not generally equal the causal effect in the whole population. In the absence of confounding, though, the within effect will equal the causal effect in the whole population.^{15}

With this simulation, we can also illustrate a point made above. Let the prevalence of exposure, confounder, and outcome be 20%, 50%, and 10%; the Pearson correlation of X1-X2, 0.3; the correlation of C1-C2, 0.0; and the nonconfounded odds ratio (OR) of X on Y be 5, the OR of C on X be 3, and the OR of C on Y be 26. The prevalence of the confounder among unexposed individuals in the population is then 45%. But the prevalence of the confounder among unexposed individuals who are part of an exposure discordant pair is 37%. Further, let the exposure be measured with a sensitivity and specificity of 0.8. In the entire population, 80% will be correctly classified on exposure, but among individuals in discordant pairs, only 67%. For readers interested in trying out other parameter values, we provide R-code used in our simulations in eAppendix 1 (http://links.lww.com/EDE/A596).

Table 1-a. Impact of... Image Tools |
Table 1-b. Impact of... Image Tools |
Table 1-c. Impact of... Image Tools |

##### Impact of Confounding

The exact magnitude of confounding is a function of the prevalence of exposure, confounder, and outcome, the strength of the causal effect of confounder on exposure and outcome and the causal effect of exposure on outcome. In the Table simulations, these values were set to produce a confounded crude OR of 8 (confounding) and 2 (inverse confounding). The extent that sibling comparisons remove or increase confounding bias is decided by the relative correlation of exposure and confounder within families. From the Table, we see that the same qualitative conclusion holds for within-pair OR as for β_{W} in the linear setting. If all confounders are perfectly shared by siblings, that is, Cor(C1, C2) = 1, then OR_{W} equals a causal OR. However, if Cor(C1, C2) = Cor(X1, X2), then OR = OR_{W}, although there are confounders shared by siblings. Further, if Cor(C1, C2) > Cor(X1, X2), the bias in OR_{W} will be less than the bias in OR; if Cor(C1, C2) < Cor(X1, X2), then OR_{W} is more biased than OR.

##### Impact of Measurement Error

Measurement error, or misclassification, of a dichotomous variable is often described in terms of the measurement's sensitivity and specificity.^{22} Whether low specificity or low sensitivity is most influential in attenuating the exposure-outcome association will depend on the prevalence of the variables. Regardless, as illustrated in the Table, both will lead to an attenuation of the OR, and further attenuation of the OR_{W}.

#### DISCUSSION

In line with previous econometric research,^{6},^{7} we have shown that within-pair estimates from sibling comparisons, β_{W}, can be more biased than crude unpaired estimates, β. In particular, we have shown that the usual interpretation of results from sibling comparison studies will be valid only in the absence of confounding not perfectly shared by siblings and measurement error in the exposure. Since many real associations are likely to suffer from one or both of these biases, this should be considered an important caveat for sibling comparison applications and may call for re-evaluation of previous work using these designs.

##### Implications for Interpretation

To summarize, β_{W} is unbiased only if all confounders are perfectly shared by the members of the pair, and there is no random measurement error of the exposure. If there is measurement error, β_{W} is expected to be lower (closer to the null) than β, even without confounding. If there is confounding, β_{W} may be either less or more confounded than β_{.} When siblings are less similar in exposure than in confounders, confounding bias will be lower in β_{W}. When siblings are more similar in exposure than in confounders, confounding bias will be higher in β_{W}. When siblings are similarly correlated in exposure and confounders, β_{W} ≈ β, although confounders are to some extent shared by siblings. This makes interpretation of sibling comparisons more complicated than usually assumed, particularly for co-twin control designs. Comparisons of within-pair estimates from dizygotic (DZ) and monozygotic (MZ) twins are often used to assess whether confounding comes from family environment or genetic sources. Because MZ twins have a stronger correlation of both the exposure and confounders whenever these are heritable, this comparison will be sensitive to the influence of nonshared confounding and measurement error. For instance, if there is a positive causal effect of the exposure and measurement error in the exposure, but no confounding, we would find β_{W(MZ)} < β_{W(DZ)} < β, which would typically (and in this scenario erroneously) be interpreted as evidence for genetic confounding.^{23},^{24}

Sibling comparisons cannot in themselves prove causality or establish whether there is sibling-shared confounding. However, adding previous subject matter knowledge or complementary analyses, we may still be able to use sibling comparisons to support such reasoning. If we perform an adequately powered sibling comparison and observe β stronger than β_{W}, this is compatible with several causal scenarios. First, the association may be partly or wholly caused by confounders that are more shared by siblings than the exposure is. Second, there may be random measurement error in the exposure. Third, there may be a combination of these 2. Fourth, the association may be partly or wholly caused by confounders less shared by siblings than the exposure is. Although this would have made β_{W} stronger than β, the increased attenuation due to measurement error counteracts this. Fifth, there may be confounding causing an inverse (with respect to β) association, masking part of the causal effect, less shared by siblings than the exposure is, with or without measurement error. Sixth, there may be confounding causing an inverse association, more shared by siblings than the exposure is; this would have made β_{W} stronger than β, but the increased attenuation due to measurement error counteracts this.

In given settings, one may be able to convincingly argue that overall confounding is unlikely to cause an inverse association, disqualifying the fifth and sixth explanations. Further, if one could show that the measurement error is not strong enough to explain the lowered β_{W} under the scenario with no confounding, then only the first and the third explanations are possible. The association must be, at least partly, caused by confounders shared by siblings. Further, unless β_{W} = 0, it is not possible that the association is completely caused by confounders that are perfectly shared by siblings.

The relative strength of sibling correlation in exposure and confounder turns out to be a central concept in these designs. Thinking in terms of the degree of correlation among variables may be more useful than the usual distinction between shared and nonshared factors. If shared factors are those that have a sibling correlation of one, and the nonshared factors have a correlation of zero, it would seem that most variables of interest are somewhere in between.

As shown earlier in the text, the sibling-correlation in exposure leads to increased bias due to collider stratification. Sibling-correlation in the confounders is what allows a reduction in bias. These correlations set sibling comparisons apart from ordinary matched designs in which the index person and reference are selected to be perfectly correlated on the matching variables, but are not correlated in other causes of exposure or confounders. Although the correlation in exposure can be estimated, the correlation in the set of all confounders can only be hypothesized, and requires a discussion of what they may be, their relative strength, and how heritable or influenced by shared environment they are. If there are 2 confounders—one weak and highly heritable or influenced by shared environment, and the other strong but only very weakly familial—then the set of confounders will have an intermediate correlation.

As a rule of thumb, sibling comparison may be used in situations where we believe that the confounding is more shared than the exposure, and avoided in the opposite scenario. An example of the first situation may be twin studies of prenatal exposures and adult outcomes. Although these exposures may be strongly correlated in twins (eg, birth weight), most hypothetical confounders will be perfectly shared by the twins (eg, gestational length, maternal age, parity, maternal substance use). A situation that warrants more caution may be studies of adult lifestyle factors and disease (eg, body mass index and mortality). In these, it is easy to think of potential confounders that, though heritable, may be less correlated than the exposure (eg, personality factors, diet, exercise, health-seeking behavior). Sibling comparisons of, for example, maternal smoking during pregnancy may also warrant some caution. Although mothers share many characteristics from one birth to the other, several will also differ by default (parity, maternal age, and paternal age). It would then be informative to estimate the sibling correlation in maternal smoking during pregnancy and contrast it to the sibling correlation in some of the putative confounders.

In general, it may be difficult to make assumptions of the extent to which unmeasured confounders are shared. However, we may feel more comfortable arguing whether they should, on the whole, cause a positive or negative association. As for measurement error, we may be able to directly measure the reliability of our instruments, or at least put some upper or lower limit on how well our measure captures the true exposure. Together, assumptions like these may aid us in deciding what causal scenario is best supported by our data.

#### FUTURE DIRECTIONS

All results presented in this paper assume that, with the exceptions of nonshared confounding and random measurement error, estimates from sibling comparisons are unbiased. There are of course many other potential sources of bias, such as reverse causation or systematic measurement error in exposure or outcome. The relatives may further influence each other in some way, or the association may not be linear on the scale we are modeling. Studies of whether such errors need special consideration in the sibling comparison setting would make a welcome contribution to the literature.

Finally, do the caveats pointed out in this article render sibling comparison designs useless? We would say no; sibling comparison designs remain a unique tool to adjust associations for unmeasured confounding by factors shared by siblings. However, their interpretation rests on several assumptions that should be made explicit, and that may be tested in some situations, particularly concerning measurement error. In any sibling comparison study, it would be prudent to include a discussion of the (hypothesized) extent to which the exposure and the set of all confounders are shared by family members; whether confounding is likely to create a positive or negative association; and how well the observed exposure measures the causal exposure. Although it may not be possible to retrieve an unbiased estimate of the association adjusted for family-level confounders, it may often be possible to argue that the association could not be completely due to shared confounders, or that the association to some extent must be caused by such factors. Although perhaps not the definite test researchers were hoping for, there are many applications where sibling studies may prove helpful, particularly in combination with evidence from other study designs, in drawing causal conclusions about specific associations.

##### APPENDIX

Proof of Eq. 5:

Proof of Eq. 6:

In the linear between-within model, we may exclude β_{B} from the model by reparameterizing Eq 2:

where β_{W} from Eq 11 and Eq 2 will be identical.^{11} Thus:

Proof of Eq. 8:

Proof of Eq. 9: