Frequently, in epidemiologic and clinical cohorts and randomized trials study participants exit the study before the maximum follow-up time without having had the event of interest as a right-censored observation. A major contributor to right-censoring is loss-to-follow-up (LTFU). One beneficial property of common time-to-event methods (e.g., Cox proportional hazards models, parametric survival models, nonparametric survival functions)1–4 is their ability to account for right-censoring under the assumption that censoring is not informative, perhaps conditional on covariates.
While common survival methods allow analysis of right-censored observations, defining the time at which persons who are LTFU should be right-censored has not, to our knowledge, been adequately explored. Perhaps the fundamental challenge with assigning a specific censoring date is that LTFU essentially a nonevent. For example, if a study participant is expected to return for regular, scheduled visits and she misses several consecutive visits, at what point should we classify her as LTFU because we are missing too much information about her time-varying covariates and, importantly, her outcome? When there is no standard visit schedule and participants can be lost without actively missing a scheduled visit, defining a censoring date is even more challenging.5 Once we have settled on a definition of LTFU, for example, missing two consecutive study visits, or going 12 months without a clinical encounter, we are faced with a second fundamental problem of assigning LTFU a specific censoring date: Should the censoring date be the date of last attended study visit or the date when the definition of LTFU was met (e.g., on the anniversary of her second missed study visit). The answer to the first question (how long to allow participants to be unobserved before we classify them as LTFU) will depend on the study question.
In the previous study, we determined that the answer to the second question (how to link LTFU date with censoring date) depends on the type of outcome that is being ascertained.6 We defined outcomes as measured or captured. Measured events are only ascertained or ascertainable at study visits under the purview of the parent study (e.g., the level of a biomarker, a survey response, or a self-reported diagnosis). Captured events are ascertainable through some mechanism not dependent on an encounter between the participant and the parent study (e.g., death, cancer diagnosis reported to a registry, or hospital admission). Whether we call an event measured or captured depends on the setting and data collection procedures of the parent study. For example, AIDS diagnosis may be a measured or captured event depending on the context; an AIDS diagnosis is likely to prompt a clinic visit; however, the ability of investigators to observe that event is dependent on whether the individual returns to a clinic that contributes data to the cohort (thus call it a measured event), or whether there is some sort of AIDS registry that enumerates all AIDS diagnoses in a sufficiently large geographical area that could be linked to the study data (thus call it a captured event). Despite potential difficulties in defining an outcome as measured or captured, our previous study showed that this distinction is important: if the outcome was measured, follow-up time should be censored at the last study encounter, whereas if the outcome was captured, follow-up time should be censored when the definition of LTFU is met.6
Given these results, how should we censor person-time in the presence of LTFU when the outcome is a mix of both captured and measured outcomes, specifically, either a composite event (e.g., AIDS or death) or a pair of competing events (e.g., time to initiation of antiretroviral therapy [ART] where death is a competing event)? Such composite events are common in both observational studies and also in randomized clinical trials, where composite events may be used to increase the number of expected events and thus increase the power to detect a difference in the probability of the outcome across treatment arms. Building on our previous study, we examine the amount of bias introduced if we ignore the need for two different censoring strategies in such circumstances. We explore bias under varying rates of the two different event types and of LTFU. Furthermore, we propose a method to minimize bias when estimating the risk of a composite event composed of different event types.
Figure 1 illustrates some of the points in this section and some of the problems that LTFU can create in the analysis of time-to-event data; relevant calculations appear in Table 1. Throughout, we assume that censoring is not informative.
A survival function for a composite event can be generated using the Kaplan–Meier estimator:
or Nelson–Aalen estimator:
where is the cumulative hazard at time , ; are the ordered event times; is the number of events at ordered event time ; and is the number at risk at ordered event time .4,7
In a study of only measured events, individuals should not contribute to after their last visit (last-encounter censoring), because if an event occurs in the interval between the last visit and when the definition of LTFU is met, it will be unobserved and not counted in . See individual 2 in Figure 1 as an example.
In a study of only captured events, individuals should contribute to after their last visit until the definition of LTFU is met (LTFU-definition censoring). In the period from the last visit until meeting the definition of LTFU, if an event occurs it would be counted in , thus individuals should also be counted as at-risk in . See individual 4 in Figure 1 as an example.
In a study of a composite event made up of measured and captured events, last-encounter censoring will yield the best denominator for the measured events, but will underestimate the risk sets for captured events, and thus the overall risk of the composite event will be biased upward in expectation, relative to the case where there is no LTFU; the magnitude of this upward bias with last-encounter censoring should be relative to the proportion of the composite event made up of captured events.
In contrast, LTFU-definition censoring will be the best denominator for the captured events (e.g., individual 4 in Figure 1) but the contribution of measured events to the numerator will be insufficient (e.g., individual 2 in Figure 1) and thus overall risk of the composite event will be biased downward in expectation, relative to the case where there is no LTFU; the magnitude of this downward bias with LTFU-definition censoring should be relative to the proportion of the composite event made up of measured events.
There is one additional possible source of bias: measured and captured events are semicompeting events, analytically (a captured event precludes “counting” of a measured event, but a measured event does not always preclude “counting” of a captured event). Consider individuals 5 and 6 in Figure 1: both individuals have a measured event first, followed by a captured event in the next interval. Individual 5 attends all study visits, so his measured event is counted and the subsequent captured event is not. Individual 6 is not classified as LTFU because having a captured event precludes her meeting the LTFU definition. However, due to the missed study visits precluding observation of the measured event at time , the event time is shifted to the captured event at time . This may have two effects on the risk function: (1) it will be shifted to the right; and (2) if the population is closed on the left (there are no late entries that might replenish risk sets), an event occurring later in time could have a bigger influence on risk estimates (because the denominator will be smaller) such that the risk will be biased upward. We anticipate that, in most cases, the magnitude of this bias will be small because it requires persons to have a measured event and then a captured event in the interval between their last visit and when they would have met the definition for LTFU. However, if the measured event is a strong predictor of the captured event (e.g., AIDS diagnosis and AIDS-related death), researchers might consider studying the captured event only, rather than including the measured event as a surrogate outcome as part of the composite event.
In the presence of competing events, the cumulative incidence of event can be calculated by the Aalen–Johansen estimator as
where can be the Kaplan–Meier or the Nelson–Aalen estimator of the overall survival function at time ; is the number of events of type at ordered event time ; and is the number at risk at ordered event time .4,7
In a study of a measured event, where there is a competing event that is captured, the most logical censoring scheme is last-encounter censoring. However, there are two potential sources of bias. First, we may miss measured events that occur between a last encounter and a captured event that precludes persons being considered LTFU (and thus we include this person-time without the event); we might expect this to bias risk downward. Second, is likely to be overestimated (biased upward) by logic described above, and thus we might expect that the cumulative incidence of the measured event will also be biased upward (because we end up assuming a larger proportion of the study sample are susceptible to the discrete time hazard of the measured event, , than the proportion who are truly at risk), with the degree of overestimation increasing with the proportion of overall events that are captured/competing events.
By similar logic, in a study of a captured event, where there is a competing event that is measured, LTFU-definition censoring is the most logical censoring scheme. However, there remain two potential sources of bias: We may overestimate if we allow persons to remain in the risk set after a measured event that was not observed because they did not attend a study visit; we might expect this to bias risk downward. In addition, is likely to be underestimated (biased downward) by LTFU-definition censoring, and thus the cumulative incidence of the captured event may be biased downward because we assume a smaller proportion of the study sample are susceptible to the discrete time hazard of the captured event.
A New, Hybrid Censoring Scheme
For estimating the risk of a composite event made up of two event types, we propose a new, hybrid method that allows for estimation of the survival function with appropriate censoring strategy for each event type. We leverage the fact that overall survival from a composite event is a function of the cause-specific hazards. More formally,
where and are the cause-specific hazard and cumulative hazard for the jth event of the composite event. To estimate , we can simply estimate the cause-specific survival function, , for each event type separately by a Kaplan–Meier or Nelson–Aalen estimator, i.e., treating the other event type as a censoring, rather than a competing event. For persons who are LTFU, use the censoring strategy most appropriate for the event type . This can then be transformed into the cause-specific cumulative hazard, .4,7 Then we sum the cumulative hazard functions at time t to obtain the composite cumulative hazard function , and back transform it into the survival function for the composite event:
We simulated 1,000 datasets of 1,000 people. For each person, we simulated a visit structure such that the number of months between visits followed a Weibull distribution with shape = 2.7 and scale = 6.75; for draws <1 or >11, we set the number of months to the next visit to 1 or 11, respectively (this resulted in median = 6 months between visits, interquartile range = 4–8). The visit structure in the original simulated data ensured that there was no LTFU, which we defined as 12 months without a visit. We followed each simulated individual for 120 months. We assigned a random subset of individuals (with the proportion in the subset corresponding to the risk of the composite event) a latent event time from a uniform~(0,120) distribution and an event type according to a Bernoulli distribution. If the event was measured , we recorded the event time equal to the month of the next visit. If the event was captured (, we recorded equal to the month in which the event occurred.
We were interested in three estimands: (1) overall risk of the composite event, ; (2) cumulative incidence of the measured event, treating the captured event as competing, ; and (3) cumulative incidence of the captured event, treating the measured event as competing, . We were not interested in any contrasts (e.g., risk differences, hazard ratios) of risk in this simulation because the estimand most affected by LFTU is the risk function; contrasting two risk functions would obscure and complicate the problem.
To minimize differences in the bias due to changing risk of the estimand of interest, we varied three parameters over the simulation scenarios performed: (1) we set the risk of the composite event and varied the proportion of all events that were captured from 5% to 95%; (2) we set the risk of the measured event and varied the risk of the captured event; (3) we set the risk of the captured event and varied the risk of the measured event. We repeated the simulation experiments assuming several different values for the overall risk of the constant (composite, measured, or captured) event and several different values for the risk of LTFU.
We estimated “truth” from the full data with no LTFU, then imposed LTFU on the data and attempted to recapture the truth using different censoring strategies. We imposed LTFU on the data by generating a latent LTFU time based on a uniform(0,120) distribution and we set the proportion of the sample LTFU by assigning a Bernoulli indicator for LTFU. We assumed that individuals with did not return for visits after and we retrieved the month of their last visit from the visit data. Individuals with event time were not LTFU. Individuals with and (a captured event) were not LTFU (because the occurrence of the event would preclude them meeting the definition of LTFU). Individuals with (those who met the definition of LTFU before being administratively censored) and either and (had a measured event) or were LTFU. Simulations were completed in SAS version 9.4 (Cary, NC); simulation code is available as eContent (http://links.lww.com/EDE/B570).
The Johns Hopkins Human Immunodeficiency Virus (HIV) Clinical Cohort includes information abstracted from the medical record for adults engaged in continuity HIV care who consented to share their data.8 The collection and analysis of these data was approved by the Johns Hopkins School of Medicine Institutional Review Board. To show the impact of different censoring strategies on real-world estimates of association, we applied each censoring strategy for patients LTFU to estimate: (1) time to AIDS (a measured event) or death (a captured event) among patients who were AIDS free at baseline; (2) time to ART initiation (initiation of three or more antiretroviral medications on the same day) among patients who were ART-naïve at baseline, treating death as a competing event; and (3) time to death before ART initiation among patients who were ART-naïve at baseline, treating ART initiation as a competing event. We report 95% confidence intervals for the risk estimates based on the 2.5th and 97.5th percentiles of 500 estimates from bootstrap resamples of the data. We restricted our analysis to patients who enrolled in the cohort from January 2000 to August 2016.
Simulation: Risk of Composite Event
For estimating risk of a composite event, as the proportion of the events that were captured went to 1, bias under LTFU-definition censoring went to zero (Figure 2, upper row). As the proportion of events that were measured went to 1 (proportion that was captured went to 0), bias was minimized under last-encounter censoring. As described in our previous article, bias under last-encounter censoring does not go to zero, but rather, there is a slight overestimation of the risk due to the visit structure and the realities of not being able to measure time continuously: when individuals are censored at their last encounter, their person-mass is distributed to future events, starting with events in the following month; however, had they not been LTFU, even if a measured event occurred in the next month, it would not have been recorded until the next visit, which would be on average, 6 months later. More details on this are available in our previous article. The upshot is, the tipping point for whether LTFU-definition or last-encounter censoring will be less biased was not at 50%/50% measured versus captured outcomes; Rather, last-encounter censoring was less biased than LTFU-definition censoring for estimating a composite outcome until around 40%–45% of all events were measured and then LTFU definition was least biased. Our new hybrid approach was the least biased at all levels of proportions of measured and captured events.
Simulation: Measured Event Is of Interest; Captured event Is Competing
Last-encounter censoring was always the least biased censoring strategy when estimating the cumulative incidence of a measured event in the presence of a captured competing event. As the risk of the captured event increased, the estimated risk of the measured event decreased. This actually decreased the overall bias (because, as described above, with no competing events, last-encounter censoring slightly overestimates risk of a measured event) until the proportion of persons who have either the measured event or the competing, captured event approached 100%; at this point, the absolute bias increased as the risk of the measured event was underestimated (Figure 2, middle row).
Simulation: Captured Event Is of Interest; Measured Event Is Competing
LTFU-definition censoring was always the least biased censoring strategy when estimating the cumulative incidence of a captured event in the presence of a measured competing event. When risk of the captured event was higher (40%), and the risk of the measured competing event increased, the estimated risk of the captured event started to be overestimated; the magnitude of this overestimation was trivial. Last-encounter censoring always overestimate the risk of the captured event.
There were 3,618 patients enrolled in the cohort from 2000 to 2016. When estimating time to AIDS or death, we restricted analyses to 2,559 patients who were AIDS-free when they enrolled. Over 5 years of follow-up, 276 (11%) were diagnosed with an AIDS-defining condition, 132 (5%) died, and 1,420 patients (55%) were LTFU. The estimated 5-year risk of AIDS or death was 25% when we used last-encounter censoring, 21% when we used LTFU-definition censoring, and 24% when we applied the new hybrid method (Table 2). When estimating time to ART or time to death before ART, we restricted analyses to 1,921 individuals who were ART-naïve at baseline. Over 5 years of follow-up, 1,306 (68%) initiated ART, 72 (4%) died before initiating ART, and 459 (24%) were LTFU. When estimating the 5-year risk of ART initiation, last-encounter censoring resulted in an estimate of 85%, while the estimate using LTFU-definition censoring was 80%. Estimates of the 5-year risk of death before ART initiation were 5.2% and 4.7% with last-encounter and LTFU-definition censoring, respectively (Table 2).
When estimating the risk of a composite event composed of two different types of events, both last-encounter and LTFU-definition censoring strategies were biased, with the least biased censoring strategy and magnitude of the bias dependent on the proportion of events that were measured versus captured. Bias under either censoring strategy was small, particularly when risk of the outcome or the proportion of the sample LTFU was low. When absolute risk is of interest, particularly in the presence of high risk of the outcome and high LTFU, we have described a method for reconstructing the total risk function from conditional cause-specific risks that minimized the bias due to misallocation of person-time for persons who are LTFU. As is evident in our example, this correction can make a meaningful difference.
Last-encounter censoring was the least biased censoring strategy for estimating the risk of a measured event with a captured competing event; LTFU-definition censoring was the least biased censoring strategy for estimating the risk of a captured event with a measured competing event. Each was relatively unbiased except when the sum of the risk of the event of interest and the risk of the competing event neared 100%, or when both the risk of the event of interest and the proportion LTFU were high.
We simplified our simulation in two key ways. First, we assumed the event time and event type are independent. We believe this does not impact our conclusions; even with a more complicated simulation, the underlying issues related to the inclusion/exclusion of person-time based on whether or not it is truly methodologically “at risk” would remain. Second, we estimated single sample risk functions. When comparing contrasts of risk functions (e.g., risk differences or hazard ratios) the impact of the bias on the final estimands of interest should be even smaller, because some of it will cancel. A test of the null hypothesis that risk functions are the same in two groups would not be affected by these biases, assuming censoring is independent of group membership.
Herein, we have focused our discussion on LTFU as a source of right-censoring; the other source of right-censoring is administrative LTFU. In general, one should apply the same censoring scheme used for LTFU to end of administrative follow-up. We say “in general” because, when estimating time to a measured event, if an administrative end of follow-up is set that would cap follow-up time at a maximum value (e.g., 10 years of follow-up) censoring individuals at their last encounter before the maximum time results in artificially rapidly shrinking risk sets just before that maximum time, and events that occur at the end of follow-up will end up having greater influence on the risk function because of the smaller risk sets. In this instance, it would be useful to allow individuals known to continue at risk (e.g., because they have an event-free visit at 10.2 years) to be censored at the maximum follow-up time (10 years).
Assigning a censoring date in the presence of LTFU requires making two decisions: (1) how long can persons go “unobserved” before we are uncomfortable leaving them in our analysis? and (2) when should we assign censoring time for individuals who are LTFU. Herein we focused on the second decision and demonstrated that the magnitude of potential bias was a function of the proportion LTFU. The proportion LTFU is, in part, a function of the first decision. A stricter definition will correspond to a higher proportion lost; indeed, the choice of LTFU definition resulted in anywhere between 22% and 84% LTFU 2 years after treatment initiation in an HIV cohort in Mozambique.9
LTFU is ubiquitous in cohort studies and the proportion of patients LTFU is especially high in clinical cohorts that were not designed for research purposes. Furthermore, our results are applicable to randomized trials that often have composite events. For example, the recent trial of aspirin use among healthy elderly used a composite event of death, dementia, and physical disability.10 Time-to-event methods that accommodate censoring typically assume that censoring is not informative. We have assumed the same here. The potential for bias due to informative censoring certainly exceeds the bias we describe here due to inappropriate inclusion or exclusion of person-time. However, as we demonstrated in the previous study and in this example, the difference in risk estimates under different censoring strategies can be nontrivial. We have provided practical guidance on choosing a censoring strategy, intuition behind the most appropriate choice in a given scenario, and some sense of the expected direction and magnitude of the bias that can result from an inappropriate choice of censoring strategy. Analyses would benefit from using the most appropriate censoring strategy given the event type under study.