Prediction of conditional survival in esophageal cancer in a population-based cohort study

Background: The authors aimed to produce a prediction model for survival at any given date after surgery for esophageal cancer (conditional survival), which has not been done previously. Materials and Methods: Using joint density functions, the authors developed and validated a prediction model for all-cause and disease-specific mortality after surgery with esophagectomy, for esophageal cancer, conditional on postsurgery survival time. The model performance was assessed by the area under the receiver operating characteristic curve (AUC) and risk calibration, with internal cross-validation. The derivation cohort was a nationwide Swedish population-based cohort of 1027 patients treated in 1987–2010, with follow-up throughout 2016. This validation cohort was another Swedish population-based cohort of 558 patients treated in 2011–2013, with follow-up throughout 2018. Results: The model predictors were age, sex, education, tumor histology, chemo(radio)therapy, tumor stage, resection margin status, and reoperation. The medians of AUC after internal cross-validation in the derivation cohort were 0.74 (95% CI: 0.69–0.78) for 3-year all-cause mortality, 0.76 (95% CI: 0.72–0.79) for 5-year all-cause mortality, 0.74 (95% CI: 0.70–0.78) for 3-year disease-specific mortality, and 0.75 (95% CI: 0.72–0.79) for 5-year disease-specific mortality. The corresponding AUC values in the validation cohort ranged from 0.71 to 0.73. The model showed good agreement between observed and predicted risks. Complete results for conditional survival any given date between 1 and 5 years of surgery are available from an interactive web-tool: https://sites.google.com/view/pcsec/home. Conclusion: This novel prediction model provided accurate estimates of conditional survival any time after esophageal cancer surgery. The web-tool may help guide postoperative treatment and follow-up.

This study was prompted by a patient asking whether and how his chances of survival had changed after having survived a certain period of time after surgery and depending on his specific prognostic factors. No answer was available in the literature. Yet, prediction of the long-term survival in cancer patients may help plan postoperative treatment and care, and also provide patients and their families with information concerning life expectancy [11] . Predicting conditional survival, that is, the probability of surviving an additional specified period of time given that the patient has already survived for a certain time period after surgery, is a valuable development of conventional baseline prediction models for survival [12] . Conditional survival reflects how the prognosis evolves over time after treatment, and thus provides more accurate estimates of the probability of long-term survival at given time points. A few baseline prediction models have been developed for long-term postoperative survival in esophageal cancer patients, but not estimated conditional survival [13][14][15][16][17] . We aimed to: develop and evaluate a prediction model to estimate conditional survival after surgery for esophageal cancer; externally validate the accuracy of this model; and construct a web-based interactive calculator for prediction of the remaining survival based on model variables at any given date between 1 and 5 years of surgery.

Design
We developed and validated a prediction model of conditional survival in patients having had curative surgery for esophageal cancer and survived greater than or equal to 1 year after surgery using two nationwide Swedish population-based cohorts.
The external validation cohort included esophageal cancer patients who had undergone surgery in 2011-2013. Patients were followed up until death or end of the study (31 December 2018), whichever occurred first, again allowing 5 years follow-up for all.
In both cohort, comprehensive data were retrieved from Medical Records and National Swedish Health Data Registries, that is, registries for cancer, patients, death, prescribed drugs, and education. The work has been reported in line with the STROCSS criteria [22] , Supplemental Digital Content 1, http://links.lww.com/JS9/A224.

Outcomes
The main outcomes were all-cause mortality and disease-specific mortality any time within 5 years of surgery. Mortality data were retrieved from the National Swedish Cause of Death Registry, which has 100% completeness in the assessment of the date of death (all-cause mortality) and greater than 96% completeness for cause of death (disease-specific mortality).

Model development
We created our own novel model to assess probability of survival. The survival probability depends on the predictors and the postoperative day the probability was calculated. For any given patient, the survival probability increases over time, approaching the value one as the time after surgery increases. The conditional survival was modeled by maximizing the likelihood function with the unknown parameters α and βto be estimated. In the above equation, x was the vector of patients' predictors, t was the time the patients experienced an event, and d denoted the type of event the patient experienced, where censoring was defined by d = 0, death due to esophageal cancer by d = 1, and death due to any other cause by d = 2.
Function ( ) f x i T 1 was removed from the maximization of the likelihood function because it did not depend on the unknown parameters α and β. Function f 2 represented the conditional probability of dying from a competing event. Because there were two competing events (all-cause and disease-specific mortality), function f 2 was set equal to a logistic function, that is, for competing risk of death due to esophageal cancer Function f 3 represented the conditional density function of time to event defined as , that is the product of the conditional probability of dying between time t i and time , where S 4 is the parametric survival function; and the conditional probability of being alive at time t i , that is , with j indicating the type of competing event. Three distributions were tested to select the functional form of function S 4 : log-logistic, Weibull, and Gompertz distribution. The final model survival function followed a log-logistic distribution: and was selected because this distribution has a closed form solution and hence it was more stable than the Weibull distribution and Gompertz distribution and had an Akaike information criterion value similar to that of the other two distributions. Goodness of fit of the model without candidate predictors was tested by plotting the Kaplan-Meier curve versus the survival function f 5 .
The function v was equal to Splines of the term ( ) t log i did not improve the model. The final model for the likelihood function to maximize was equal to: Parameters α, η, ϕ, γ, and ρ were functions of the candidate predictors. Predictors remaining in the final model were selected using the following strategy. Sex and age were selected a priori and were kept only in parameters η and α. The other variables were tested separately in each parameter in this order, first in η, then in α, ϕ, γ, and lastly in ρ. A variable was left in the model if the P value was less than 0.05. For each parameter, all possible interactions between the selected predictors were tested and left in the model if the P value was less than 0.05. No interaction term was tested for parameter ρ because it already represented a threeway interaction term. This procedure was repeated on 100 bootstrap samples of size 1027. A variable or interaction term was kept in the final model if it was selected in more than half of the 100 bootstrap samples.

Model performance
The model performance was assessed by discriminative accuracy and risk calibration, both within the derivation cohort and validation cohort. The discriminative accuracy was examined by calculating the area under the receiver operating characteristic curve (AUC). In the derivation cohort, a bootstrap cross-validation procedure was applied, in which the AUC statistics were calculated from the predictions performed on a random sample extracted with a replacement of size 1027. This process was repeated 1000 times, and the AUC statistic was calculated for each of these bootstrap samples. The model performance was assessed by calculating the AUC and 95% CI at 3 and 5 years after surgery and by plotting AUC changes over time. We also calculated the difference between the AUC derived from the validation sample and the external validation sample. We first randomly selected with replacement 200 samples from the internal samples and 200 samples from the external sample. In each bootstrap selection, we calculated the AUC at 3 and 5 years after surgery both for all-cause and disease-specific mortality. We then computed the 95% CI of the difference of the AUC values, as the 2.5-97.5% quantile intervals over the 200 bootstrapped AUC difference. To assess the risk calibration for the internal and external validation, Hosmer-Lemeshow calibration plots are reported showing the level of agreement between the predicted and observed proportions of deaths across tenths of predicted risks. The goodness of fit model was examined by the Hosmer-Lemeshow test, with the null hypothesis being that the observed and expected proportions were the same.

Interactive calculator of survival probability
An interactive calculator was created to compute the probability of all-cause and disease-specific mortality at time t given that the patient has survived until time t 0 after surgery (where t 0 is greater or equal to one postsurgery year), and depending on the individual patient's set of predictors. The calculator is available at: https://sites.google.com/view/pcsec/home.
The cumulative incidence functions for the competing risk esophageal cancer is , the cumulative incidence function for competing risk from another cause of death is ; , and the total cumulative incidence function is CIF CIF t cancer other . For a patient who has survived until t 0 , where t 0 > 1, the conditional CIF at time t for competing risk esophageal cancer is:

Patients
Characteristics of the study participants are shown in Table 1 (Table 1).

Final model
For each parameter, the frequency of each candidate predictor kept in the model and the number and percentage of  Table 2. The final model included those predictors that satisfied the condition described in section 'Model development' above, that is, five predictors for function f 2 : age, tumor histology, chemo (radio)therapy, pathological tumor stage, and resection margin status; and seven predictors for parameter η in function S 4 : age, sex, education, tumor histology, pathological tumor stage, resection margin status, and reoperation. Charlson comorbidity was excluded from the final model because the P value was greater than 0.05 in greater than 50 bootstrap samples. Figure 1 shows the goodness of fit of the final model by comparing the parametric estimate of the survival function (Kaplan-Meier curve) with the survival function f 5 estimates. The two curves overlapped, indicating that the model was correctly specified.

Model performance
The time-dependent AUC curves for all-cause and diseasespecific mortality in internal cross-validation and external   (Table 3). In the external validation cohort, the corresponding medians of AUC ranged from 0.71 to 0.73 (Table 3). There was no statistically significant difference between the AUC values derived for the external and internal validation samples ( Table 3). The predicted and observed risks of mortality showed good agreement in both internal and external validation (Figs 3 and 4; P>0.05 in Hosmer-Lemeshow tests).

Web-based calculator
To provide complete results for the prediction model for each patient, we developed an interactive web-tool to calculate all-cause and disease-specific mortality at any date t (t < 5 years) in patients who had survived until any date t0 (1 years ≤t 0 < t) after esophageal cancer surgery. The web-tool is referred to above. To illustrate how the web-tool can be used, let us, as an example, consider a 65-year-old male patient with 9 years of education, pathological tumor stage I, squamous cell carcinoma, no chemo(radio)therapy, tumor-free resection margins (R0), and no reoperation. Supplementary Figure 1a, Supplemental Digital Content 2, http://links.lww.com/JS9/A225 estimates that if he survived 1 year after surgery, his probability of death within the following 2 years (i.e. within 3 years of surgery) from any cause is 30.4% and from esophageal cancer is 21.7%. For the same patient having survived 2.6 years after surgery, Supplementary  Figure 1b, Supplemental Digital Content 2, http://links.lww.com/ JS9/A225 estimates a 5.8% probability of dying due to any cause and a 3.7% probability of death from esophageal cancer within 6 months.

DISCUSSION
This study used two population-based cohorts to develop and validate a prediction model for projecting patients' conditional survival after esophageal cancer surgery. The final predictors were age, sex, education, tumor histology, chemo(radio)therapy, tumor stage, resection margin status, and reoperation. The model showed good performance.
Among methodological strengths are the population-based cohort design with nearly complete inclusion, complete followup, accurate data on the exposure, outcomes, and predictors. We developed a novel biostatistical approach by creating a probability equation specifically for this study. The performance of the developed prediction model was assessed with both internal cross-validation and external validation in an independent cohort, which counteracted over-fitting. However, due to potential differences in patients' characteristics, healthcare systems, and treatment across populations, the model remains to be validated in other countries. Another limitation was the lack of data on some potential predictors, that is, anthropometric measures and lifestyle factors.
We addressed the research question posed by a patient ('What is my specific and individual probability of surviving given that I now underwent surgery some time ago') in mathematical terms and expressed by the likelihood function. Several functional  Table 3 Area under the receiver operating characteristic curve of the developed prediction models and their difference for 3-year and 5-year all-cause and disease-specific mortality presented as median (95% CI).

Outcomes
Internal validation External validation Difference forms for the survival function were tested to determine which best fitted the data. The prediction capability of the patient's characteristics was verified for all parameters of the likelihood function while including competing risks and interval censoring in the model. Although the likelihood function was complex, the program only required a few lines of code and was computationally fast. The explicit formulation of the likelihood function facilitated the postprocessing procedure and particularly the derivation of the cumulative incidence function and the AUC curves. However, writing of the likelihood function required deep understanding of the underlying clinical setting and statistical methods. Because the number of predictor variables was limited, the likelihood function did not need regularization [24] . This may limit its direct use in settings where several predictors (big data) are present. A handful of models have been developed for projecting conditional postoperative survival in other cancers, including gastric cancer and penile cancer [25,26] . Because of the prognostic factors differ considerably between cancer types, we these models are not applicable to esophageal cancer patients. Some models have been developed for predicting long-term survival in esophageal cancer patients at baseline [13][14][15][16][17] , including one from our group [17] . By combining information on various prognostic factors, these models from previous studies have shown moderate to good performance, with AUC values ranging from 0.6 to 0.8. If conditional survival was estimated, this was limited to the probability of survival for certain additional years (e.g. 3 years) given integer numbers of years of accumulated survival. In the present study, we instead created a probability equation, which provides a truly 'dynamic' estimation of postoperative survival.
The present study is the first to estimate the conditional survival in patients with esophageal cancer, that is, how the probability of survival changes (improves) continuously for each day after surgery. This knowledge should be of great relevance for patients and healthcare. Individual patients' prognosis at any given date after surgery is better provided by a valid conditional prediction model that takes the postsurgery survival time into account. This may improve healthcare by making the clinical follow-up more tailored. The online model is easy to use and may thus be a valuable tool for clinicians when they follow-up their patients after surgery. Patients often ask about their chance of survival. Rather than using baseline data, the online tool data would be more accurate for each individual patient. The chance of survival increases over time after surgery, so this information should not be alarming for most patients. In conclusion, this study using two independent populationbased cohorts provides a new model for individualized estimation of survival in patients who have undergone curative surgery for esophageal cancer, conditional on the time they have survived thus far. The developed model showed good performance in both internal and external validation, and may help making postoperative healthcare and follow-up more individualized.

Ethical approval
The study was approved by the Regional Ethical Review Board in Stockholm (2107/141-31/2).

Conflicts of interest disclosure
None.
Research registration unique identifying number (UIN) 1. Name of the registry: Clinicaltrials.gov 2. Unique identifying number or registration ID: NCT05540119 3. Hyperlink to your specific registration (must be publicly accessible and will be checked): www.clinicaltrials.gov/ct2/ show/NCT05540119

Data statement
Data may be shared on request to the corresponding author, but will require permissions of the Ethical Review Board and the governmental authorities that contributed with data used in this article, that is, the Swedish National Board of Health and Welfare and Statistics Sweden.

Provenance and peer review
Not commissioned, externally peer-reviewed.