Wolters Kluwer Health
may email you for journal alerts and information, but is committed
to maintaining your privacy and will not share your personal information without
your express consent. For more information, please refer to our Privacy Policy.

From the ^{a}MRC Centre for Causal Analyses in Translational Epidemiology, School of Social & Community Medicine, University of Bristol, Bristol, UK; and ^{b}School of Social & Community Medicine, University of Bristol, Bristol, UK.

Submitted 3 November 2011; accepted 22 May 2012.

Supported by the UK Economic and Social Research Council (RES-060-23-0011, which provided the salary for L.D.H at the start of this work), and the UK Medical Research Council (L.D.H was funded by a Population Health Scientist Fellowship G1002375 at the end of this work; K.T. received funding for a grant titled “Developing and disseminating robust methods for handling missing data in epidemiological studies,” G0900724). B.G. is funded by an Intermediate Clinical Wellcome Trust Fellowship (089979). The UK Medical Research Council, the Wellcome Trust, and the University of Bristol provide core support for ALSPAC. The UK Medical Research Council and the University of Bristol provide core funding for the MRC Centre of Causal Analyses in Translational Epidemiology.

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com). This content is not peer-reviewed or copy-edited; it is the sole responsibility of the author.

Editors’ note: A commentary on this article appears on page 10.

Correspondence: Laura D. Howe, MRC Centre for Causal Analyses in Translational Epidemiology, School of Social & Community Medicine, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK. E-mail: [email protected].

Although cohort members tend to be healthy and affluent compared with the whole population, some studies indicate this does not bias certain exposure-outcome associations. It is less clear whether this holds when socioeconomic position (SEP) is the exposure of interest.

Methods:

As an illustrative example, we use data from the Avon Longitudinal Study of Parents and Children. We calculate estimates of maternal education inequalities in outcomes for which data are available on almost the whole cohort (birth weight and length, breastfeeding, preterm birth, maternal obesity, smoking during pregnancy, educational attainment). These are calculated for the full cohort (n~12,000) and in restricted subsamples defined by continued participation at age 10 years (n∼7,000) and age 15 years (n∼5,000).

Results:

Loss to follow-up was related both to SEP and outcomes. For each outcome, loss to follow-up was associated with underestimation of inequality, which increased as participation rates decreased (eg, mean birth-weight difference between highest and lowest SEP was 116 g [95% confidence interval = 78 to 153] in the full sample and 93 g [45 to 141] and 62 g [5 to 119] in those attending at ages 10 and 15 years, respectively).

Conclusions:

Considerable attrition from cohort studies may result in biased estimates of socioeconomic inequalities, and the degree of bias may worsen as participation rates decrease. However, even with considerable attrition (>50%), qualitative conclusions about the direction and approximate magnitude of inequalities did not change among most of our examples. The appropriate analysis approaches to alleviate bias depend on the missingness mechanism.

Unfortunately for epidemiologists, not everyone agrees to participate in cohort studies. Among those initially willing to take part, many will be lost to follow-up over time. In general, nonparticipation and loss to follow-up tend to be more pronounced among the less advantaged and less healthy, leading to cohort studies often being a relatively healthy, wealthy subpopulation.^{1–8} Reassuringly, several studies have demonstrated that this causes only minimal bias in some exposure-outcome associations.^{7}^{,}^{9–15} Notably, however, given the strong relationship of socioeconomic position (SEP) to nonparticipation and loss to follow-up, few studies have examined possible bias in estimates of socioeconomic inequalities in health outcomes.

Of studies examining the effect of loss to follow-up on SEP associations, most have concluded that there is minimal bias.^{13}^{,}^{16–20} Most of these studies, however, have focused on a single time point after baseline,^{13}^{,}^{16}^{,}^{18}^{,}^{19} whereas over time a lower proportion of the original cohort remains engaged, and any selection bias may worsen. Furthermore, none of these existing studies has explored the possible mechanisms through which selection bias could operate.

ANALYSIS APPROACH

We extend the work of existing studies, looking at the extent to which biases in estimates of inequalities worsen as the proportion of the original cohort still contributing data is reduced. We analyze potential mechanisms driving these biases and possible solutions, using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) as an illustrative example. Using SEP as our exposure (indexed, in our example, by maternal education), we take a set of pregnancy and birth outcomes for which data are available for (almost) the full cohort. Because missing-data theory states that estimates may be biased if participation is associated with the outcome,^{21} we describe the associations of loss to follow-up with SEP and each outcome. We examine the socioeconomic inequality in these outcomes in the full cohort and then again in subsets of the cohort restricted to those who are not lost to follow-up at older ages to quantify the bias in the estimated inequalities and to evaluate whether the degree of bias worsens as loss to follow-up increases. Most of the outcomes we examine were measured before the loss to follow-up occurred; although in general our interest lies in knowing whether loss to follow-up biases associations with outcomes measured at or after the time of loss to follow-up, by definition these data are not usually available to us for study enrollers who are no longer participating. Thus, we are using these outcomes for which we have data on (almost) the full cohort for illustrative purposes only, to describe the possible effects of loss to follow-up more generally and to explore some possible mechanisms underlying this problem and analysis approaches that could remove these biases.

Selection bias may result through multiple mechanisms, depending on how loss to follow-up is associated with other variables of interest. Figure 1 shows five directed acyclic graphs illustrating possible mechanisms associated with loss to follow-up in inequalities research. These are not intended to be an exhaustive set of possible mechanisms; rather, they indicate a range of ways through which loss to follow-up may arise, which each has different consequences for the results and different analysis solutions. In each of the mechanisms presented in Figure 1, SEP is assumed to be completely observed, but loss to follow-up (R, a binary indicator of response where R = 1 if the individual contributed outcome data and R = 0 if they did not contribute outcome data) leads to missing data in the outcome Y. In some instances, loss to follow-up may have no impact on estimates of inequalities; this is depicted in Figure 1A, in which SEP causes R, but there is no direct causality or any other pathway between Y and R. In this situation, complete-case analysis would be unbiased, and thus we would expect to see similar estimates of socioeconomic inequalities in the full cohort and in our analyses restricted to participants who are not lost to follow-up.^{21}^{,}^{22}

Directed acyclic graphs illustrating possible mechanisms through which loss to follow-up may operate in studies of socioeconomic inequalities. These directed acyclic graphs represent a series of possible ways through which loss to follow-up could be associated with different variables of interest when studying socioeconomic inequalities. The mechanisms have different consequences for whether estimates of inequalities are biased, and different possible analysis solutions to address bias when it is present. SEP indicates socioeconomic position, which is assumed to be completely measured; Y, the outcome of interest, which has some missing data; C, other variables that may be measured or unmeasured; R, response, that is, continued participation in the cohort study such that R = 1 if the individual contributed data on the outcome Y and R = 0 if they did not. The box around R indicates that analysis is restricted to participants not lost to follow-up, that is, R = 1.

The simplest mechanism through which loss to follow-up could lead to biased estimates of inequalities is shown in Figure 1B, where both SEP and Y are causes of R. This could occur when the outcome alters an individual’s likelihood of being lost to follow-up, for example, if illness makes participation more difficult, or conversely confers interest in their health and increases participation (conditions that are represented by C in Figure 1B). Unfortunately, if C is not measured, techniques for dealing with missing data, such as multiple imputation or inverse probability weighting, cannot remove the bias in this situation because the outcome is missing not-at-random and because measured variables cannot adequately predict the outcome or the missingness mechanism.^{21} If, however, data on C in Figure 1B (eg, illness making participation more difficult due to being bed-bound) were available, this could be included in multivariate multiple imputation models, which should remove the bias. Sensitivity analyses can be performed, for example, by simulating possible missingness mechanisms and exploring the extent to which different mechanisms/assumptions affect conclusions. Although this mechanism (ie, where both SEP and outcome are related to loss to follow-up) may be important in some cases, we will not conduct further analysis to explore this; instead, we focus on other situations where analysis techniques can address the bias.

In Figure 1C, C is a mediator of the SEP–Y association and is also associated with R. For example, C could represent measured or unmeasured behavioral factors, which are causally linked to SEP. In this situation, we would not want to control for C because we are interested in the total effect of SEP on Y, including the effect mediated by C. All SEP–outcome associations have mediators, but the potential bias comes when those mediators are also related to R. Restricting to R = 1, that is, participants not lost to follow-up, affects the path from SEP to C and on to Y; if C is not included in the model, R and Y are associated, resulting in biased estimates of the SEP–Y association. Multiple imputation could be useful in addressing the bias because C (if measured) could be included in the imputation equations but not in the model of interest. Inverse probability weighting is another approach that could be used in this situation, but we focus on multiple imputation because it is more widely used. We explore this situation by taking birth weight (one of our set of outcomes) as an example outcome. One mediator of the SEP–birth weight association is maternal smoking during pregnancy.^{23}^{,}^{24} We assess the extent to which this mechanism is plausible and whether including maternal smoking during pregnancy and other possible mediators in imputation models alleviate any bias in the estimates of birth-weight inequalities.

In Figure 1D, C is a confounder of the SEP–Y association and also a cause of R. Arguably, the main potential confounders of SEP–Y associations in children are race/ethnicity and sex, but other confounders could be identified for studies in adulthood (eg, childhood SEP) and in more general cases. Here, the SEP–Y association is already confounded by C, and so analysis of the full cohort will be biased (unless C is included in the model). However, restricting to R = 1 affects the path from SEP to C to Y, and so could worsen bias. To remedy this, we would want to include C (if measured/measurable) in the analysis model or in imputation equations. In our analyses, we are analyzing outcomes in children, and SEP is based on parental measures; the child’s sex cannot be a confounder in this case because it could not influence mother’s SEP (her educational attainment), as that is defined and measured before the child’s birth. The vast majority of the ALSPAC participants (95%) are white; we, therefore, do not have sufficient power to explore confounding by ethnicity in ALSPAC. Thus, we do not further explore.

SEP could be considered a “latent” (unobservable) construct, which can be assessed by numerous indicators—educational attainment, income, social class, housing tenure, and so on. Each of these indicators captures part of the latent construct, but it is not possible to measure SEP in its entirety. Epidemiologists often report inequalities using a single SEP indicator (because of convention, the lack of availability of multiple measures, or a desire to explore whether several individual indicators of SEP have different associations with an outcome). Figure 1E represents this way of conceptualizing SEP; here “true” or latent SEP is approximated by the observed SEP variable (ie, maternal education). True SEP is likely to be related to R. Conditional on true SEP, observed SEP is related to neither R nor Y, but because true SEP is unobserved, observed SEP is related to both R and Y. This would bias the association of observed SEP (maternal education) with Y.

Solutions to the situation depicted in Figure 1E depend on the goals of the analysis. Adjusting for other manifestations of true SEP would reduce the bias, but the results would then be difficult to interpret because they would represent the association of the SEP indicator with the outcome, conditional on all other SEP indicators adjusted for. If the aim is to quantify the association between a specific SEP indicator and the outcome (because different SEP indicators often show differing associations with outcomes),^{25}^{,}^{26} other SEP indicators could be included in multiple imputation models. Alternatively, if the main interest lies in quantifying the association between true SEP and the outcome, additional manifestations of true SEP would need to be identified (if possible).

To explore the situation depicted in Figure 1E, we take maternal education as the SEP indicator (because this is very often used in studies of inequalities in child outcomes). We assume it is being used as a proxy for true SEP and explore whether this mechanism may result in selection bias. We then examine whether using a multidimensional construct of SEP, which may be a better proxy of true SEP than a single indicator alone, alleviates any selection bias.

METHODS

ALSPAC is a prospective birth cohort in southwest England (full details in the eAppendix, https://links.lww.com/EDE/A623).^{27} The first stage in our analyses was to calculate the inequality in a range of outcomes for which we have data on all, or almost all, of the cohort. Methods used to assess these outcomes (birth length and weight, preterm birth, breastfeeding, maternal obesity, maternal smoking during pregnancy, child’s educational attainment at age 11 and 14 years) and the measure of SEP (maternal education) are detailed in the eAppendix (https://links.lww.com/EDE/A623).

Inequalities in each outcome were quantified using the slope index of inequality (SII; for continuous outcomes) or relative index of inequality (RII; for binary outcomes). A variable is created assigning each category of maternal education a value according to the proportion of mothers with a lower education, assuming that mothers follow an underlying continuous SEP distribution. For example, if the highest maternal education category contained 10% of mothers, those in this category are assumed to have ranks of SEP between the proportions of 0.9 to 1, giving a mean of 0.95 (ie, 95% of individuals have a lower maternal education than the mid-point of this category). This variable is then treated as a continuous variable in regression models, such that, for continuous outcomes, linear regression analysis estimates the SII—the mean difference between the highest maternal education (score of 1 on this new variable) and the lowest maternal education (score of 0); for binary outcomes, logistic regression analysis estimates the RII—the odds ratio comparing highest to lowest maternal education.^{28}^{,}^{29} We defined the variable separately for the full cohort at baseline and for participants not lost to follow-up at age 10 and 15 years. Our main analysis calculates the SIIs and RIIs using these separately defined variables; sensitivity analysis was conducted to assess whether the results were the same if the variable based on the proportions of participants in each maternal education category for the full cohort at baseline was used for all analyses. In further sensitivity analyses, we repeated the analysis with maternal education coded as 1 to 4, with 1 representing the lowest maternal educational category and 4 the highest.

We describe the associations between participation at the 10-year and 15-year clinics (ie, loss to follow-up) and maternal education for each outcome. To assess whether loss to follow-up results in bias in the estimates of inequality in these outcomes, we recalculated the SIIs and RIIs restricting analysis to children who attended the research clinic at age 10 years and those who attended the research clinic at age 15 years.

Mediators of the SEP–Birth Weight Association

We used path analysis (model details in the eAppendix, https://links.lww.com/EDE/A623) in Mplus^{30} to describe associations between maternal education, maternal smoking during pregnancy (as 1 example mediator of the SEP–birth weight association), birth weight, and participation at the 15-year clinic (as in Figure 1C). The main interest of this path analysis lies in estimating whether there is an association between maternal smoking during pregnancy and participation at the 15-year clinic. We conducted multiple imputation in Stata^{31} (details in the eAppendix, https://links.lww.com/EDE/A623) to impute birth weight for those who did not attend the 15-year clinics, including multiple potential mediators of the SEP–birth weight association in the imputation equation (maternal smoking during pregnancy, preterm birth, maternal age, natural log of maternal prepregnancy body mass index [BMI]) as well as maternal education. We then analyzed the imputed data to assess whether this reduced bias in the estimates of the SEP–birth weight association.

SEP as a Latent Construct

To assess whether loss to follow-up among strata of maternal education was differential with respect to other SEP measures, we explored the association between loss to follow-up (nonparticipation at the 10- and 15-year research clinics) and family income and occupational social class (measurement detailed in the eAppendix, https://links.lww.com/EDE/A623) within strata of maternal education (as displayed in Figure 1E).

To investigate whether bias in the estimation of inequalities is aggravated by the latent construct of SEP being associated with R, we then constructed a multidimensional SEP indicator using factor analysis of 15 SEP indicators (details of indicators and methods for generating the multidimensional indicator are presented in the eAppendix, https://links.lww.com/EDE/A623). We assessed the association between this multidimensional SEP construct and loss to follow-up. Subsequently, for each of the (almost) completely observed outcomes utilized in the previous analyses, we calculated the SII or RII using the multidimensional SEP indicator for the full cohort and restricting analyses to those participants who attended the 10- and 15-year clinics.

RESULTS

Of the 12,493 children with data on maternal education, 7,045 (56%) and 5,075 (41%) attended the clinics at 10 and 15 years, respectively. The continuing participants tend to be of higher SEP than those lost to follow-up (eTable 2, https://links.lww.com/EDE/A623). Loss to follow-up was associated with all of our example outcomes (eTable 3, https://links.lww.com/EDE/A623).

Higher maternal education is associated with all eight outcomes (Table 1). For five of these eight outcomes (birth weight, birth length, educational attainment at 11 and 14 years, and preterm delivery), there is a clear pattern by which the estimated socioeconomic inequality attenuates toward the null as the sample becomes more restricted; that is, the estimate of the RII or SII is closer to the null value when the analysis is restricted to participants who continued to attend at age 10 compared with the estimate from the full cohort, and even closer to the null when analysis is restricted to those who continued to attend at age 15 (Table 1). Attenuation of the observed socioeconomic inequality is additionally observed for maternal smoking during pregnancy and for never breastfeeding, when comparing results from the full cohort and results restricting to participants who continued to attend at age 10; however, for these outcomes, the estimates of inequalities did not attenuate further in participants who continued to attend at age 15. Attenuation of the inequality in maternal obesity is less evident. When the binary outcomes were analyzed on the risk difference scale, attenuation of the estimated inequality was more apparent for maternal smoking during pregnancy, maternal obesity, and never breastfeeding (Table 2). The differences between odds ratios and risk differences arose because the absolute risks were lower for continuing participants across all strata of maternal education (eTable 4, https://links.lww.com/EDE/A623).

Estimates of Socioeconomic Inequalities in Outcomes with (Almost) Complete Data Among the Full Cohort, Participants Who Continue to Participate at Age 10 Years, and Participants Who Continue to Participate at Age 15 Years^{a}

Estimates of Socioeconomic Inequalities in Outcomes with (Almost) Complete Data Among the Full Cohort, Participants Who Continue to Participate at Age 10 Years, and Participants Who Continue to Participate at Age 15 Years. Analysis of Binary Outcomes on the Absolute Risk Difference Scale^{a}

For the majority of the outcomes, interaction tests demonstrated little or no statistical evidence that the association between SEP and the outcome differs between those who continued to participate and those who were lost to follow-up (Table 3). However, there is an indication that the degree of difference between the inequality among participants and nonparticipants increases as the proportion of the original cohort who remain engaged reduces; the number of outcomes for which there is evidence of a difference between participants and nonparticipants is greater when analysis is restricted to continuing participants at age 15 compared with when it is restricted to participants at age 10 (three outcomes for participants at age 15 compared with one at age 10 have P < 0.05, and five outcomes compared with one have P < 0.15; Table 3). In almost all cases, the qualitative conclusions that would be reached from the analysis (ie, direction and approximate magnitude of inequality) do not differ between the full cohort and the analysis based only on participants not lost to follow-up.

Differences in Socioeconomic Inequalities in Outcomes with (Almost) Complete Data Comparing Participants Who Are Lost to Follow-up and Those Who Continue to Participate at Ages 10 and 15 Years

Sensitivity analysis confirmed that this pattern of attenuating estimates of socioeconomic inequalities was observed regardless of how maternal education was defined and treated in the analysis (eTable 5, https://links.lww.com/EDE/A623).

Situation C: Mediators of the SEP–Birth Weight Association

Figure 2 demonstrates that smoking during pregnancy is a mediator of the maternal education–birth weight association, and it is also associated with nonparticipation in the 15-year clinic.

Mediation of the association between maternal education and birth weight: one possible mechanism through which loss to follow-up could affect estimates of inequalities. Coefficients are regression coefficients (robust standard errors) from path analysis. Maternal education is a rank variable, that is, the proportion of individuals in the sample with a lower level of maternal education, and is treated as a continuous variable such that 0 is the lowest maternal education (on the latent continuous scale that this method assumes is underlying the categorical variable) and 1 is the highest maternal education. Maternal smoking in pregnancy is coded as 0 for none and 1 for any. Attendance at 15-year clinic is coded 0 for did not attend and 1 for did attend. Birth weight is standardized to have a mean of zero and variance of 1. Each arrow in Figure 2 represents a linear regression analyses; we mapped the binary indicators (maternal smoking in pregnancy and attendance at the 15-year clinic) to a standardized normal distribution, and as such the coefficients for these variables represent mean differences between the category coded 1 and the category coded 0—for example, for the association between maternal smoking during pregnancy and attendance at the 15-year clinic, a coefficient of –0.133 is interpreted as follows: the proportion of participants who attended the 15-year clinic was 13.3% lower among those whose mothers smoked during pregnancy compared with those whose mothers did not smoke during pregnancy. All coefficients had P values ≤ 0.01.

Using multiple imputation with multiple potential mediators of the SEP–birth weight association (maternal smoking during pregnancy, parity, preterm birth, maternal age and log of maternal prepregnancy BMI) included in the imputation model generated an estimate of the SEP–birth weight association much closer to the estimate in the full sample compared with the estimate in the restricted sample defined by continued participation at age 15; in the full sample, SII = 116 (95% confidence interval = 80–153); in participants attending the 15-year clinic, 58 (44–112); and in the full cohort, using multivariate multiple imputation to impute birth weight for participants not attending the 15 year clinic, 107 (57–155).

Situation E: SEP as a Latent Construct

Table 4 shows that within each category of maternal education, those who drop out from the cohort tend to have a lower income and are more likely to have a manual occupation. Thus, if we are losing the lower SEP individuals within each category of maternal education, this would contribute to the underestimation of inequalities we observed as loss to follow-up increases (Figure 1E).

Family Income and Occupational Social Class Across Strata of Maternal Education,^{a} Comparing Participants Who Are Lost to Follow-up with Those Who Continue to Participate at Ages 10 and 15 Years

When we repeated the analysis above looking at inequalities in the (almost) fully observed outcomes using a multidimensional SEP indicator (eAppendix and eTables 6 and 7, https://links.lww.com/EDE/A623) instead of maternal education, there was no longer a trend of the estimates of inequality attenuating as the proportion of the cohort lost to follow-up increased (Table 5).

Socioeconomic Inequalities in (Almost) Completely Observed Outcomes Using a Multidimensional SEP Construct and Comparing Participants Lost to Follow-up with Those Who Remain Engaged in the Cohort

DISCUSSION

By comparing estimates of socioeconomic inequalities in (almost) completely observed outcomes between the full cohort and subsamples of the cohort defined by continued participation, we have shown that loss to follow-up from cohort studies can result in underestimation of inequalities for a large number of outcomes. The differences between estimates of inequalities comparing the full and restricted cohort tended to be small, and the qualitative conclusions did not change even when more than half of the cohort was lost to follow-up. Consistent with previous studies,^{13}^{,}^{16–20} there was only weak statistical evidence that these differences between analysis of the full cohort and analysis restricted to participants not lost to follow-up differed from the null (although tests for interaction have low power). However, there was some indication that the degree of underestimation in the inequalities may increase as the proportion of original participants who remain engaged reduces, which most previous studies have been unable to explore because they focused on a single time-point after recruitment. This worsening bias has implications for maintaining reasonable levels of participation in cohorts. Although we cannot necessarily extrapolate our findings to other cohorts, similar socioeconomic patterning of loss to follow-up has been seen in many other studies.^{1–8} Furthermore, one aim of our study was to provide some guidance in ways researchers might deal with and interpret loss to follow-up in their own data.

Most of the outcomes we have included in our analysis were measured at or around the time of birth, before loss to follow-up occurred. The more usual concern is whether there is bias in exposure-outcome associations when the outcome was measured at or after the time of loss to follow-up, but by definition, these data are not available for all study participants. Our approach was intended to be explorative and illustrative, and can hopefully be useful for more general inferences about possible selection bias in outcomes measured after loss to follow-up has occurred. Linkage to additional routine data sources such as healthcare records may facilitate the validation of estimates of inequalities in later outcomes if data are available for continuing participants and also those lost to follow-up. Such data linkage could potentially also allow examination of biases due to nonparticipation at the original point of recruitment.

We have considered a set of possible mechanisms through which loss to follow-up could result in biased estimates of socioeconomic inequalities. These mechanisms were not intended to be an exhaustive list. Associations between loss to follow-up and mediators of SEP–outcome associations appear to be one mechanism driving the bias in estimates of inequality in our example. Our results demonstrate that multiple imputation can be useful in correcting this bias when these mediating factors can be included in the imputation models (though they should not be included in the subsequent model estimating the association of SEP with outcome).

Our results also indicate that the use of a single SEP indicator, which is one manifestation of a complex latent construct, may have contributed to the underestimation of inequalities in our data. Within strata of maternal education, those who were lost to follow-up had lower income and occupational social class. When we used a multidimensional SEP construct (a closer approximation to latent SEP), we did not see attenuation of estimated inequalities as the proportion of the cohort lost to follow-up increased. These findings illustrate the importance of measuring a wide range of aspects of SEP in cohort studies to fully capture the construct, both in situations where the research interest is primarily in inequalities and also where confounding by SEP is likely to be important. While a multidimensional SEP construct may result in less selection bias, there are disadvantages to such a measure. Single SEP indicators may have a clearer interpretation and perhaps policy implications. Different measures may show differing associations with outcomes, which can be useful for untangling the causal processes underlying inequalities^{25}^{,}^{26}; these differences would be masked by a multidimensional measure. If this is a concern, multiple measures of SEP could be included in multiple imputation models to permit an unbiased estimate of the associations between a single SEP indicator and the outcome.

Perhaps the most likely mechanism underlying the bias related to loss to follow-up in our example is a combination of multiple mechanisms illustrated by the various situations in Figure 1. In our example, we were able to use analysis strategies to cope with bias due to both mediators of the SEP–outcome association and the complex multidimensional construct of SEP. In many studies, a lack of measured variables will limit the ability to adjust for these biases. In the absence of measured variables, simulation studies to explore the sensitivity of estimates of inequality to assumptions about missing-data mechanisms may be the only option. Depending on the study design and missingness mechanism, other approaches to deal with selection bias may also be appropriate.^{8}^{,}^{32}^{,}^{33} When the interest lies in estimating trajectories of an outcome over time, multilevel models or inverse probability weighting can be used to include persons with one or more measures under a missing-at-random assumption, including in situations where the observed value of an outcome at 1 time-point is predictive of loss to follow-up at later time points.^{33}^{,}^{34} These methods will therefore give unbiased estimates of longitudinal trajectories of an outcome provided that the value of the outcome at the later time point itself is not the reason for loss to follow-up.

Where there is a relationship between the outcome and loss to follow-up (Figure 1B), sensitivity analysis may shed light on the extent of the bias.

Our analyses of educational attainment data are limited by the fact that these data are available only for state-school educated pupils; the national tests are not compulsory for the 11% of ALSPAC participants who attended private schools.

In summary, we have shown that loss to follow-up in cohort studies may underestimate socioeconomic inequalities in a wide range of outcomes. Although the extent of this bias was small (consistent with previous studies),^{13}^{,}^{16–20} there is some evidence that the bias increases as the extent of loss to follow-up increases. However, even when more than half of the cohort were lost to follow-up, qualitative conclusions about the direction and approximate magnitude of inequalities did not, in most of our examples, alter dramatically. We have also explored possible mechanisms underlying the bias and analysis approaches to remove this bias.

ACKNOWLEDGMENTS

We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses.

REFERENCES

1. Strandhagen E, Berg C, Lissner L, et al. Selection bias in a population survey with registry linkage: potential effect on socioeconomic gradient in cardiovascular risk. Eur J Epidemiol. 2010;25:163–172

2. Goldberg M, Chastang JF, Leclerc A, et al. Socioeconomic, demographic, occupational, and health factors associated with participation in a long-term epidemiologic survey: a prospective study of the French GAZEL cohort and its target population. Am J Epidemiol. 2001;154:373–384

3. Wilhelmsen L, Ljungberg S, Wedel H, Werkö L. A comparison between participants and non-participants in a primary preventive trial. J Chronic Dis. 1976;29:331–339

5. Strandberg TE, Salomaa VV, Vanhanen HT, Naukkarinen VA, Sarna SJ, Miettinen TA. Mortality in participants and non-participants of a multifactorial prevention study of cardiovascular diseases: a 28 year follow up of the Helsinki Businessmen Study. Br Heart J. 1995;74:449–454

6. Barchielli A, Balzi D. Nine-year follow-up of a survey on smoking habits in Florence (Italy): higher mortality among non-responders. Int J Epidemiol. 2002;31:1038–1042

7. Knudsen AK, Hotopf M, Skogen JC, Overland S, Mykletun A. The health status of nonparticipants in a population-based health study: the Hordaland Health Study. Am J Epidemiol. 2010;172:1306–1314

8. Reilly JJ, Kelly J. Long-term impact of overweight and obesity in childhood and adolescence on morbidity and premature mortality in adulthood: systematic review. Int J Obes (Lond). 2011;35:891–898

9. Pizzi C, De Stavola B, Merletti F, et al. Sample selection and validity of exposure-disease association estimates in cohort studies. J Epidemiol Community Health. 2011;65:407–411

10. Heilbrun LK, Nomura A, Stemmermann GN. The effects of non-response in a prospective study of cancer: 15-year follow-up. Int J Epidemiol. 1991;20:328–338

12. Wolke D, Waylen A, Samara M, et al. Selective drop-out in longitudinal studies and non-biased prediction of behaviour disorders. Br J Psychiatry. 2009;195:249–256

13. Van Loon AJ, Tijhuis M, Picavet HS, Surtees PG, Ormel J. Survey non-response in the Netherlands: effects on prevalence estimates and associations. Ann Epidemiol. 2003;13:105–110

14. Bjertness E, Sagatun A, Green K, Lien L, Søgaard AJ, Selmer R. Response rates and selection problems, with emphasis on mental health variables and DNA sampling, in large population-based, cross-sectional and longitudinal studies of adolescents in Norway. BMC Public Health. 2010;10:602

16. Søgaard AJ, Selmer R, Bjertness E, Thelle D. The Oslo Health Study: The impact of self-selection in a large, population-based survey. Int J Equity Health. 2004;3:3

17. Carter K, Gunasekara FI, McKenzie S, Blakely T. Differential loss of participants does not necessarily cause selection bias. J Epidemiol Community Health. 2011;65:A180

18. Harald K, Salomaa V, Jousilahti P, Koskinen S, Vartiainen E. Non-participation and mortality in different socioeconomic groups: the FINRISK population surveys in 1972-92. J Epidemiol Community Health. 2007;61:449–454

19. Martikainen P, Laaksonen M, Piha K, Lallukka T. Does survey non-response bias the association between occupational social class and health?. Scand J Public Health. 2007;35:212–215

20. Ferrie JE, Kivimäki M, Singh-Manoux A, et al. Non-response to baseline, non-response to follow-up and mortality in the Whitehall II cohort. Int J Epidemiol. 2009;38:831–837

21. Daniel RM, Kenward MG, Cousens SN, De Stavola BL. Using causal diagrams to guide analysis in missing data problems. Stat Methods Med Res. 2012;21:243–256

22. White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med. 2010;29:2920–2931

24. Leary SD, Smith GD, Rogers IS, Reilly JJ, Wells JC, Ness AR. Smoking during pregnancy and offspring fat and lean mass in childhood. Obesity (Silver Spring). 2006;14:2284–2293

25. Galobardes B, Shaw M, Lawlor DA, Lynch JW, Davey Smith G. Indicators of socioeconomic position (part 1). J Epidemiol Community Health. 2006;60:7–12

26. Galobardes B, Shaw M, Lawlor DA, Lynch JW, Davey Smith G. Indicators of socioeconomic position (part 2). J Epidemiol Community Health. 2006;60:95–101

27. Golding J, Pembrey M, Jones RALSPAC Study Team. . ALSPAC–the Avon Longitudinal Study of Parents and Children. I. Study methodology. Paediatr Perinat Epidemiol. 2001;15:74–87

32. Reilly JJ, Bonataki M, Leary SD, et al. Progression from childhood overweight to adolescent obesity in a large contemporary cohort. Int J Pediatr Obes. 2011;6:e138–e143