The topic of prognostic model development in liver transplantation has gained prominence in recent years. Prognostic models can help to identify patients with end-stage liver disease who may benefit from liver transplantation, to provide patients with information on their prognosis that facilitates informed decision making about their treatment, and to produce risk-adjusted comparisons of outcome between transplant centers (1).
Existing models have used mortality both before (2) and after (3) transplantation as the outcome measure and have used transplant candidate/recipient information only (4) or a combination of recipient/donor information (5–8) as prognostic factors. The choice of outcome measure and prognostic factors should reflect the intended use of the finalized model. For example, the Model for End-Stage Liver Disease (MELD) score used waiting list mortality and only transplant candidate prognostic information as its purpose is to predict mortality of potential recipients without transplantation (which it does well ). Whereas, for mortality after transplantation both recipient and donor prognostic factors could be used to determine how the interplay of these two sets of factors is associated with outcome. To see how donors could be used optimally is especially important in an era where there is a progressive shortfall of donors (9). In this paper, a prognostic model to predict early posttransplant mortality based on pretransplant recipient factors only will be developed. Although such a model will not predict outcome as accurately as a model that includes donor prognostic information as well, it can provide transplant candidates with predictions of their early posttransplantation prospects before any prognostic information from a donor is known. This would be useful information for patients with end-stage liver disease for whom liver transplantation is considered as a treatment option.
We published a systematic review of the literature to assess the quality of studies that developed and validated prognostic models for mortality after transplantation (1). We found the quality of the reported models to be sub-optimal according to our assessment tool and also that none of the models had good predictive ability. The objective of this paper is to develop a prognostic model for 90-day mortality after liver transplantation based on pretransplant recipient factors, employing a rigorous model development method.
PATIENTS AND METHODS
We used patient data that were prospectively collected for the UK & Ireland Liver Transplant Audit. Details of this audit have been described elsewhere (10). All patients who received a transplant between March 1, 1994 and September 30, 2004 in both countries were considered. We included patients if they were 16 years or older, did not receive a “super-urgent” liver transplantation (see www.uktransplant.org.uk for definition), underwent a single-organ transplantation, and were receiving their first liver transplantation.
Prognostic Model Factors and Outcome Measure
The following factors were considered for prognostic model development (it should be noted that these factors were measured immediately prior to transplantation): cause of liver disease (see Appendix A for details), age, sex, ethnicity, previous upper abdominal surgery (defined as history of laparotomy with clinical evidence of a supra umbilical incision, either related to liver failure [oesophageal transection, devascularization], or for an unrelated problem [e.g. cholecystectomy, duodenal ulcer surgery] when the graft became available), requirement for maintenance diuretic therapy, lifestyle activity score (clinician reported measure on a 5-point score of the impact of disease on the ability to carry out activities of daily living – as described in detail elsewhere ), inpatient and ventilation status (patient not in hospital, patient in hospital but not requiring mechanical ventilation, or patient is mechanically ventilated), presence of pyrexia, presence of microbiologically-proven sepsis, encephalopathy grade (as defined by West Haven criteria [12, 13]), clinically detectable ascites, body mass index (BMI), varices (no varices present, varices present but not previously bled, previous variceal bleed), International Normalized Ratio of prothombin time (INR), serum concentrations of bilirubin, albumin, creatinine, sodium, and potassium, hemoglobin (Hb), white cell count (WBC), pH, arterial partial pressure of oxygen (pO2), platelets, serological markers of hepatitis B (HBV DNA, HBsAg, HBeAg, anti HBs, anti HBe, anti HBcIgM), hepatitis C (HCV RNA, anti HCV), cytomegalovirus (anti CMV), hepatitis D (anti HDV), herpes simplex virus (anti HSV), human immunodeficiency virus (anti HIV). We chose mortality at 90 days after transplantation as the outcome measure.
Collinearity between factors is not a problem for prognostic model development if all factors that are collinear with each other are retained in the model. However, when using an algorithmic variable selection procedure (e.g. step-wise regression, see below) collinearity can bias which factors are (or are not) selected. Collinearity can be detected if standard errors of model coefficients are extremely large. More formally, a variance inflation factor (VIF) (14) can be calculated for each factor. If any factor has a VIF greater than a certain threshold, this indicates that collinearity is a problem. There is disagreement in the literature about what this threshold should be, values of four (15), five (16), and 10 (17) have been suggested.
Checking Linearity of Associations Between Continuous Factors and Outcome Measure
We checked whether the factors that were measured on a continuous scale had an approximately linear association with the outcome measure by plotting the observed log-odds of mortality at 90 days against the medians of ten equally sized rank-ordered groups of continuous factor values. If this check identified curvilinear associations, polynomial terms were added.
Prognostic Model Factors with Missing Values
Factors with more than 20% of missing values were discarded. Missing values were imputed using the method of switching regression (18). With this approach, missing values of a given factor are predicted from other factors in a regression model and this step is repeated for all factors with missing values. This step was imbedded within the variable selection procedure (see below).
Selecting Factors and Identifying Interactions
The following steps were followed:
- Step 1: Generate a non-parametric bootstrap sample from the data set (with sample size equal to that of original data set).
- Step 2: Impute values for factors with missing values.
- Step 3: Use backward step-wise logistic regression to determine a prognostic model from the full list of factors. For continuous factors with a curvilinear relationship with outcome, polynomial and linear terms were treated as one item in the step-wise procedure. A significance level of 0.2 was used in the step-wise selection procedure. Robust standard errors were used to account for departures from model assumptions (19).
- Step 4: Repeat steps one to three for a total of 200 bootstrap samples.
- Step 5: Include those factors that were selected in at least two-thirds of the prognostic models developed in the bootstrap samples.
- Step 6: Repeat step one and two.
- Step 7: Fit a prognostic model with those factors identified in step five and a clinically plausible interaction involving any two factors (not just those identified in step 5). Perform a deviance test to see if the model with the interaction term improves model fit. An interaction occurs when the impact of a factor on mortality is influenced by the value of another factor.
- Step 8: Repeat step six and seven for a total of 200 bootstrap samples.
- Step 9: Include interaction in final model if P value from deviance test was less than 0.2 in at least two-thirds of the bootstrap samples.
- Step 10: Repeat steps six to nine for any other clinically plausible interactions.
Determining the Final Model
Final model regression coefficients were obtained by repeating steps one to four in the approach above but in step three the final model is fitted (no step-wise procedure). The final model regression coefficients were estimated by the mean of the coefficients from the 200 bootstrap samples. The 2.5 and 97.5 percentiles from the relevant data in the bootstrap samples were used to obtain 95% coverage intervals. These can be interpreted in the same way as 95% confidence intervals (CIs).
As the selection of the final model factors and the estimation of their regression coefficients was developed using bootstrap samples, it was appropriate to internally validate the prognostic model using the original data set (with imputed values for missing values of factors). A prognostic score was calculated for each patient in the original data set by adding the intercept of the final model to the sum of the products of the regression coefficients and relevant values of the final model factors. Model discrimination was tested using the c-statistic (20). Model calibration was tested by comparing observed with predicted percentage of mortality at 90 days and carrying out the Hosmer-Lemeshow goodness-of-fit test (21). The prognostic scores were ranked and then split into 10 equally sized groups. Within each group, the observed percentage of mortality (and 95% CI) was calculated. Predicted percentage of mortality (anti-log-odds of the prognostic score) was obtained by taking the mean prediction in each group.
A total of 4,829 patients met the inclusion criteria. In all, 452 (9.4%) patients died within 90 days of their transplantation. Collinearity did not seem to be a problem in these data. No factor coefficients in the logistic regression analyses had extremely large standard errors and the largest VIF was 4.6. All factors measured on a continuous scale had an approximately linear relationship with the log-odds of mortality at 90 days apart from platelets (Fig. 1). A squared term for this factor was added to provide a curvi-linear representation of its association with outcome. The laboratory results of pH, pO2, and all serological markers of viral infection (apart from HBsAg, anti HCV, and anti CMV) had greater than 20% of missing values and therefore were no longer considered for model development.
Table 1 shows the distribution of all factors and their association with mortality at 90 days. For illustration purposes only, the continuous factors have been split into four groups using quartile values. The strongest univariate associations between mortality and categorical factors are seen for lifestyle activity score, cause of liver disease, encephalopathy grade and ventilation/in-patient status. It should be noted, however, that the high levels of mortality seen for encephalopathy grades three and four and in-patients who are ventilated refer to a small percentage of patients. The strongest univariate associations between mortality and continuous factors are seen for creatinine, potassium and International Normalized Ratio.
The results from steps one to four of the variable selection strategy are shown in Table 2. Those factors that were selected in at least two-thirds of the prognostic models developed in the bootstrap samples (step 5) are highlighted in bold. Interactions between age and bilirubin, age and creatinine, and creatinine and cause of liver disease were considered to be clinically plausible. Following steps 6 to 10, the percentage of time the p value from the deviance test was less than 0.2 in the bootstrap samples was 65%, 61.5% and 83.5%, for the three interactions respectively. Strictly adhering to the two-thirds selection criterion, only the creatinine and cause of liver disease interaction is added to the factors highlighted in bold in Table 2 to create the final model. The estimates of the logistic regression coefficients of the final model are detailed in Table 3. The distribution of the prognostic scores derived from the final model for the original data set are shown in Figure 2.
Internal validation using the original data set finds the c-statistic for the final model to be 0.65 (95% CI 0.63, 0.68). This means that the probability of a randomly selected patient who died within 90 days of their transplant having a higher prognostic score than a randomly selected patient who did not die within 90 days of their transplant is 0.65. Figure 3 shows observed and predicted percentage of mortality at 90 days plotted against ten equally sized and ranked prognostic score groups. The P value from the Hosmer-Lemeshow goodness-of-fit test is 0.6 indicating good agreement between what is observed and what the model predicts, and that the model is well calibrated. The moderate c-statistic value obtained for our model is reflected in the fact that the observed percentage of mortality is very similar in groups two through to eight (range 5–10%). However, it can be seen from Figure 3 that patients in the lowest prognostic score group (representing 10% of this patient population) have considerably lower mortality at 90 days than the rest of the patients. Moreover, patients in the two highest prognostic score groups (representing 20% of this patient population) have considerably higher mortality at 90 days than the rest of the patients. In essence, this classifies patients into three groups (group 1, groups 2–8, and groups 9–10) based on their early posttransplantation mortality risk.
Our study shows that a model developed on a UK & Ireland data set using only prognostic information from the recipient immediately prior to liver transplantation discriminates moderately between patients who did and who did not die within 90 days.
An important limitation is that only mortality at 90 days has been studied and obviously longer term mortality outcomes are also of interest. Even so, in this population a substantial proportion of mortality occurs within this relatively short period after transplantation. In the UK & Ireland data set, two-thirds of the one-year mortality occurs within 90 days. Furthermore, longer term outcomes are more difficult to predict and it is likely that the association between transplant candidate factors and outcome does not remain constant over time.
A further limitation is that some of the factors considered for the prognostic model are prone to possibility of interobserver variation. For our final model these factors are encephalopathy grade, lifestyle activity score, presence of clinically detectable ascites and history of previous upper abdominal surgery. If the interobserver variation is significant, this will affect the external validity of our model. Furthermore, if these factors and others that did not make the final model were less prone to interobserver variation, improved discriminatory ability of the developed model may have resulted.
Although the percentage of missing values has fallen in recent years in the UK & Ireland Liver Transplant Audit data set, it still remains a significant problem for prognostic model development. However, the switching regression approach for imputation is an improvement on complete case analysis which is associated with reduced statistical power and potential bias. It is also superior to the approach which imputes means for continuous factors and most frequent values for categorical factors because this leads to inappropriately reduced variance in the prognostic factors and bias.
As with any variable selection approach, we had to make some arbitrary choices—for example, the choice of significance level (0.2) and cut-off for selection of factors in the bootstrap samples (two-thirds). A sensitivity analysis was performed to see how the results would change if the significance level was kept the same but the cut-off was changed from two-thirds to 50%. The final model would then include all factors as before plus presence of pyrexia, platelets and the interactions age × bilirubin and age × creatinine. The c-statistic for this model would have been the same as that for the original model (0.65; 95% CI 0.62, 0.68).
In our recent systematic review of prognostic models in liver transplantation (1) only one paper used pretransplantation recipient prognostic factors exclusively (4). The prognostic factors in the final model of this paper were age, BMI, urgency of transplantation, diagnosis, bilirubin and creatinine. All these factors were in our final model apart from urgency of transplantation as we excluded candidates requiring “superurgent” transplantation. This model did not use mortality at 90 days as an outcome measure but used 30 days and one year. Even so, it was possible to apply this model using the 90 days outcome measure on our data set. This external validation produced a c-statistic of 0.65 (95% CI 0.62, 0.67), the same result as that obtained by the internal validation of our final model. Accounting for differences in mortality between “super-urgent” and elective patients would obviously boost the c-statistic as these two groups of patients have very different posttransplant mortality outcomes. Therefore, even allowing for the expected drop in discriminatory performance if our model were to be externally validated, it could be argued that in a like-for-like comparison the performance of our model would be at least as good if not superior.
Our model for 90-day mortality also compares favorably to published models that include donor and transplantation factors as well as recipient factors. The model which had best discrimination following external validation in our systematic review had a c-statistic of 0.63 (95% CI 0.61, 0.66). An updated model (22) developed on the same data source had a c-statistic of 0.69 (CI not reported) following internal validation. For the same reasons as above, it could be argued that there is little difference in discriminatory ability between this and our model due to the fact that it also accounts for differences in mortality between “super-urgent” and elective patients.
We feel there are five possible explanations for why the discriminatory ability of our prognostic model is only moderate. Firstly, it could be that important transplant candidate prognostic factors are missing. Even though we considered all factors that the papers in the systematic review (1) considered, we did not include factors such as diabetes and other comorbidities as these were not available in the UK & Ireland data set. Another factor we did not consider is center experience. Although not a transplant candidate factor, it is a factor that is known pretransplantation. Adding a center experience factor to our model could have improved model discrimination overall. However, such a factor would not improve predictions for patients already assigned to a transplant center. Furthermore, such a factor should not be used if the purpose of the model is to facilitate risk-adjusted comparison between transplant centers.
Secondly, it is possible that the methodology we have used is inappropriate. A “gold standard” statistical approach to prognostic factor selection does not exist, especially when the prognostic factors have missing values. We think combining missing value imputation using the switching regression method with bootstrapping for variable selection is systematic and appropriate but further research is needed to compare its performance with other approaches. Furthermore, the relationship between the prognostic factors and outcome could be more complex than what a linear predictor of regression coefficients and prognostic factors models. Artificial neural networks (ANNs) offer complex, non-linear alternative solutions and are beginning to be the focus of research into prognosis following liver transplantation (23, 24). In some medical research settings, no differences in predictive ability have been found between ANNs and regression models, (25–29) whereas in other medical research settings it has been purported that ANNs are superior (23, 30). ANNs are prone to model overfitting and are considerably more difficult to interpret than regression models. These issues will need to be addressed for ANNs to play a more prominent role in prognostic modeling.
A third explanation is that the population of patients with end-stage liver disease selected for liver transplantation are a more homogeneous group than the entire population of patients with end-stage liver disease. They all share the characteristic that they are considered well enough to receive a liver transplant. This inherent homogeneity may make it impossible for a prognostic model to discriminate above a certain threshold.
Fourthly, the fact that our study found an interaction between cause of liver disease and creatinine on association with mortality may indicate that effort would be better spent developing disease specific prognostic models. In addition, our sensitivity analysis showed weak evidence that age modifies the association between bilirubin and mortality, and creatinine and mortality. Disadvantages of disease specific and/or age specific models are that it is cumbersome to have separate models for separate causes of liver disease and/or different age groups. Furthermore, model development would be hampered by reduced statistical power.
Although discrimination of our model was not excellent, the results show that those patients with a “low” chance of dying within 90 days of their transplantation (<5%) and those with a “high” chance of dying within 90 days of their transplantation (>10%) can be differentiated from patients with a “intermediate” chance of dying within 90 days of their transplantation.
Our model can be used for the following three areas. Firstly, our model can provide transplant candidates with predictions of their early posttransplantation prospects before any prognostic information from a donor is known. Appendix B describes explicitly how our model can provide predictions for transplant candidates. Although this Appendix may appear too unwieldly for practical application, it can be set up as a simple-to-use spreadsheet. Secondly, if used alongside the MELD score, for each transplant candidate a prediction of early mortality with and without transplantation can be made. This provides clinicians with a tool to help them identify patients who may benefit most from transplantation. Thirdly, our model can be used to produce risk-adjusted comparisons of early mortality outcome between transplant centers not taking into account donor prognostic information. Such an approach is appropriate if you do not want to adjust for factors relating to donor organ retrieval and the matching of donor to recipient in comparisons of transplant units.
The authors thank the following members of the UK & Ireland Liver Transplant Audit and their departments:
Mr Derek Manas and Liesl Smith (Freeman Hospital, Newcastle, UK), Mr Steve Pollard and Olive McGowan (St James’ Hospital, Leeds, UK), Mr Neville Jamieson and Claire Jenkins (Addenbrooke’s Hospital, Cambridge, UK), Mr Keith Rolles and Dr Nancy Rolando (Royal Free Hospital, London, UK), Professor Nigel Heaton and Susan Landymore (King’s College Hospital, London, UK), Mr David Mayer, Professor James Neuberger and Bridget Gunson (Queen Elizabeth Hospital, Birmingham, UK), Professor Oscar Traynor and Mr. Emir Hoti (St.Vincent’s Hospital, Dublin, Republic of Ireland), Mr John Forsythe and Karen Tuck (The Royal Infirmary at Edinburgh, Edinburgh, UK), and Kerri Barber (UK Transplant, Bristol, UK).
1. Jacob M, Lewsey JD, Sharpin C, et al. Systematic review and validation of prognostic models in liver transplantation
. Liver Transpl
2005; 11: 814.
2. Kamath PS, Wiesner RH, Malinchoc M, et al. A model to predict survival in patients with end-stage liver disease. Hepatology
2001; 33: 464.
3. Onaca NN, Levy MF, Sanchez EQ, et al. A correlation between the pretransplant MELD score and mortality
in the first two years after liver transplantation
. Liver Transpl
2003; 9: 117.
4. Thuluvath PJ, Yoo HY, Thompson RE. A model to predict survival at one month, one year, and five years after liver transplantation
based on pretransplant clinical characteristics. Liver Transpl
2003; 9: 527.
5. Adam R, Cailliez V, Majno P, et al. Normalised intrinsic mortality
risk in liver transplantation
: European liver transplantation
registry study. Lancet
2000; 356: 621.
6. Ghobrial RM, Gornbein J, Steadman R, et al. Pretransplant model to predict posttransplant survival in liver transplant patients. Ann Surg
2002; 236: 315.
7. Bilbao I, Armadans L, Lazaro JL, et al. Predictive factors for early mortality
following liver transplantation
. Clin Transplant
2003; 17: 401.
8. Desai NM, Mange KC, Crawford MD, et al. Predicting outcome after liver transplantation
: utility of the model for end-stage liver disease and a newly derived discrimination
2004; 77: 99.
9. Feng S, Goodrich NP, Bragg-Gresham JL, et al. Characteristics associated with liver graft failure: the concept of a donor risk index. Am J Transplant
2006; 6: 783.
10. Jacob M, Copley LP, Lewsey JD, et al. Pretransplant MELD score and post liver transplantation
survival in the UK and Ireland. Liver Transpl
2004; 10: 903.
11. Jacob M, Copley LP, Lewsey JD, et al. Functional status of patients before liver transplantation
as a predictor of posttransplant mortality
2005; 80: 52.
12. Atterbury CE, Maddrey WC, Conn HO. Neomycin-sorbitol and lactlose in the treatment of acute portal-systemic encephalopathy. A controlled, double-blind clinical trial. Am J Dig Dis
1978; 23: 398.
13. Ferenci P, Lockwood A, Mullen K, et al. Hepatic encephalopathy – definition, nomenclature, diagnosis, and quantification: final report of the working party at the 11th World congress of Gastroenterology, Vienna, 1998. Hepatology
2002; 35: 716.
14. Dobson AJ. An introduction to generalized linear models. 2nd ed. London: Chapman & Hall; 2002: 94.
15. Glantz SA, Slinker BK. Primer of applied regression and analysis of variance. New York: McGraw-Hill, 1990: 181.
16. Montgomery DC, Peck EA. Introduction to linear regression analysis. 2nd ed. New York: Wiley; 1992.
17. Chatterjee S, Hadi AS, Price B. Regression analysis by example. 3rd ed. New York: Wiley, 2000.
18. van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med
1999; 18: 681.
19. White H. Maximum likelihood estimation of misspecified models. Econometrica
1982; 50: 1.
20. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology
1982; 143: 29.
21. Hosmer DW, Lemeshow S. Applied logistic regression. 2nd ed. New York: Wiley, 2000: 147.
22. Burroughs AK, Sabin CA, Rolles K, et al. 3-month and 12-month mortality
after first liver transplant in adults in Europe: predictive models for outcome. Lancet 2006; 367: 225.
23. Banerjee R, Das A, Ghosal UC, Sinha M. Predicting mortality
in patients with cirrhosis of liver with application of neural network technology. J Gastroenterol Hepatol
2003; 18: 1054.
24. Haydon GH, Hiltunen Y, Lucey MR, et al. Self-organizing maps can determine outcome and match recipients and donors at orthotopic liver transplantation
2005; 79: 213.
25. Nguyen T, Malley R, Inkelis SH, Kuppermann N. Comparison of prediction models for adverse outcome in pediatric meningococal disease using artificial neural network and logistic regression analyses. J Clin Epidemiol
2002; 55: 687.
26. Eng J. Predicting the presence of acute pulmonary embolism: a comparative analysis of the artificial neural network, logistic regression, and threshold models. Am J Roentgenol
2002; 179: 869.
27. Ottenbacher KJ, Linn RT, Smith PM, et al. Comparison of logistic regression and neural network analysis applied to predicting living setting after hip fracture. Ann Epidemiol
2004; 14: 551.
28. Ergun U, Serhatioglu S, Hardalac F, Guler I. Classification of carotid artery stenosis of patients with diabetes by neural network and logistic regression. Comp Biol Med
2004; 34: 389.
29. Song JH, Venkatesh SS, Conant EA, et al. Comparative analysis of logistic regression and artificial neural network for computer-aided diagnosis of breast masses. Acad Radiol
2005; 12: 487.
30. Catto JWF, Linkens DA, Abbod MF, et al. Artificial intelligence in predicting bladder cancer outcome: a comparison of neuro-fuzzy modeling and artificial neural networks. Clinical Cancer Research
2003; 9: 4172.