Kidney transplantation offers improved survival and quality of life for many patients with kidney failure compared to being on dialysis. Despite the huge gap between the demand and supply of donor kidneys, discards rates remain high in the United States (approximately 18%–20%), and double that of Australia and the United Kingdom, thus forgoing the transplant opportunity for many patients on dialysis. The shortage of donor organs remains the critical factor to timely transplantation.
An important aspect of an efficient allocation system is to increase acceptance of marginal donor kidneys for the appropriate patients and to increase kidney utilization. To this end, Barah and Mehrotra,1 in this issue of Transplantation, discussed the concept of a “fast-track policy” for the US allocation system. Using novel machine learning (ML) strategies, the authors accurately predicted the characteristics of the donor kidneys at risk of being discarded, and devised a fast-track policy to reallocate these kidneys and offered them to candidates with a clear survival benefit. A similar program has been developed in the United Kingdom2 and found effective utilization of previously declined kidneys, with patient and graft outcomes similar to their standard scheme.
The authors examined several risk-modeling algorithms including logistic regression (LR), the more modern variant of regression-based algorithm known as “least absolute shrinkage and selection operator” (Lasso), and modern ML models such as random forest (RF), neural networks, and support vector machines. The authors found that higher Kidney Donor Profile Index and donor terminal serum creatinine were the key predictors for discards, in both RF- and regression-based models. The authors also concluded that among donors with Kidney Donor Profile Index exceeding 85%, the RF model had balanced accuracy compared to LR (0.73 versus 0.66).
A key strength of the article is the model comparison process that accompanied the model development for kidney discards. Such model comparison processes should be encouraged when assessing and conducting prediction modeling in all aspects of scientific research. To assess the validity of the modeling, the authors used cross-validation (repeated CV) to estimate the accuracy and measured the variability associated with the accuracy estimation. This repeated CV procedure provided not only the point estimates of the model performance metrics (balanced accuracy, area under the curve), but also the variability associated with these metrics, ensuring a better grasp of the differences and/or improvement amongst the different models. The authors also highlighted the importance that evaluation of the ML models should not be limited to a single metric such as area under the curve or C-statistics. Instead, investigators should examine a battery of performance metric (F1 score, balance accuracy, precision, and recall). This, together with the repeated CV strategy will provide a better overview of the model’s performance.
While ML holds promise for a personalized approach to transplantation medicine, the lack of a standardized approach to model evaluation in medical science has raised many issues, including the concept of reproducibility.3 We suggest that multiple-level performance assessments should be considered for an independent evaluation of all ML models. In the case for predicting discard kidneys, 3 levels of assessment: (1) in silico assessment within a single dataset, (2) generalizable assessment with independent and/or multiple datasets, and (3) practical assessment in deployment should be performed. The authors conducted an extensive in silico assessment on the US data and concluded RF out-performed other models. Using a rolling forecasting procedure, the authors evaluated the generalizability of the modeling by segregating data into multiple donor time periods and used the previously observed values to predict future events. The observation is that under the second level of assessment, the differences in the performance characteristics between LR and RF were not comparable between the 2016 and 2018 data, with similar or lower accuracy rates being observed in the RF model compared to the LR modeling in earlier data.
Finally, to ensure the applicability and pertinency of the study findings in real-life settings, future research should focus on developing an evaluation process within a simulated allocation framework that allows on-going assessment of posttransplant outcomes including the expected number of life-years saved and improvements in the overall quality of life of our patients. With the advancement of data science, the development of such a realistic simulator is now feasible, allowing us to embrace new research opportunities in big data analytics within the discipline of clinical transplantation.
1. Barah M, Mehrotra S. Predicting kidney discard using machine learning. Transplantation. [Epub ahead of print. January 8, 2021]. doi: 10.1097/TP.0000000000003620.
2. White AD, Roberts H, Ecuyer C, et al. Impact of the new fast track kidney allocation scheme for declined kidneys in the United Kingdom. Clin Transplant. 2015;29:872–881.
3. Stupple A, Singerman D, Celi LA. The reproducibility crisis in the age of digital medicine. NPJ Digit Med. 2019;2:2.