A recent article about significant differences in how individual laboratories preprocess and analyze data from functional MRI reiterates the fact that with complex datasets and techniques, there are no consensus or single “best” approaches to data analysis and subsequent interpretation.1 In the age of “big data” using genome sequences, complex imaging techniques, and higher throughput data analysis, standardization and precise phenotyping of clinical samples is critical. In organ transplant, phenotyping of the allograft by histology is still the “gold standard” for diagnosis of rejection after transplantation and often dictates the subsequent course of clinical care. The transplant biopsy is still an imperfect gold standard because of inherent operational and data interpretation limitations.2 It also has limited utility as it is ill-suited for serial monitoring of patients. Moreover, the scoring and grading of transplant biopsies show substantial interobserver variation. Finally, the biopsy is invasive and a lagging indicator of the rejection process. Nonetheless, the transplant biopsy remains the standard of care for diagnosis and has been continually refined by the Banff working group. Since the Banff classification of transplant biopsies represents an expert consensus of histology by key opinion leaders, it was decided to add a molecular diagnostics component to improve the utility and objectivity of a biopsy.3 Thus, it is worthwhile to rethink the role of the biopsy as the primary diagnostic and include it as one component of a larger diagnostic toolbox.
The “for cause” biopsy is a reaction to other clinical evidence of the rejection process; however, damage to the organ is already set in motion by the time the histological rejection is confirmed. One logical solution to this issue has been the use of surveillance/protocol biopsies. Unfortunately, surveillance biopsies are performed in less than half of all US transplant centers (34% of transplant recipients), with even high-volume transplant centers (>200 transplants per y) not routinely performing surveillance biopsies.4 Additionally, surveillance biopsies are usually only performed twice a year in most instances due to the invasive nature and associated costs and risks. This has provided a window of opportunity for other non- or semiinvasive diagnostic tests that are amenable to routine frequent monitoring.
The advent of molecular techniques for rejection diagnosis has changed the landscape of kidney transplantation diagnostics. The increased population of public databases with high-throughput data from multiple sources (tissue, blood, and urine) and ubiquitous powerful bioinformatic algorithms, computational infrastructure, and the expertise to handle large datasets has generated great potential to leverage these datasets to identify posttransplant dysfunction. However, the use of noninvasive or semiinvasive techniques using urine and specifically whole blood, although convenient, must be taken with a healthy dose of skepticism. There are several limitations to their use in the successful development, validation, and implementation of diagnostic tools, and these inherent problems make them far from being ready to replace a biopsy.
Microarray or RNA-sequencing data collected from whole blood or peripheral blood mononuclear cells have a high variability even between patients with the same graft status; and, even with the latest machine learning and deep learning tools for classification, predictive accuracy is underwhelming. Additional confounders include patient demographics, treatment regimens, nonstandardized sample collection, and data generation methods that inhibit the clear elucidation of biologic mechanisms orchestrating the rejection process. For example, the transcriptional differences in whole blood, a heterogeneous mixture of numerous cell subsets, may reflect changes in the proportions of the individual cell types rather than mirroring the true biology driving rejection; and despite several cellular deconvolution methods in the literature, results of this approach have been mixed.5 The multifactorial nature of the variability requires consideration of clinical (ie, patient characteristics, treatment regimen), biologic (ie, cell composition, level of inflammation), and technical (RNA quality, batch effects) sources, where each type is handled with different bioinformatic solutions (ie, model factors, CIBERSORT, ComBat6,7). The true test to the efficacy of these approaches will be the measured accuracy of the predicted classes of independently collected patient samples.
Mirroring the use of machine learning tools in diagnostic development, digital pathology utilizes deep learning algorithms, which are well suited to analyze and classify digital images of the biopsies themselves. In contrast to the Banff grading system, which was developed based on a compilation of observations and findings from clinicians, current deep learning methods have no a priori knowledge of the characteristics of each phenotype and rely on emergent properties of the training data for future unsupervised classification.8 Thus, the principal limitation of these methods is how comprehensively the training data represent the actual phenotypes and their prevalence in the general population. With enough confidence in the performance of learning-based methods from both digital pathology and molecular diagnostics, there is great potential to augment the pathologists reading with so-called “informed reads.” This may vastly reduce diagnostic and prognostic errors that are known but currently are unavoidable in clinical care.
The recent addition to the diagnostic armamentarium from the Banff Molecular Diagnostics Working Group, the Banff Human Organ Transplant panel, is a noteworthy first step to combine knowledge of both histological and molecular diagnosis.9 However, the fact remains that this gene panel was derived using the benchmark of the same flawed gold standard whose specificity is impossible to gauge since there are no “true” standards available to make an informed decision. Therefore, to make such efforts more meaningful and less biased, a strategy to reduce the fundamental biases associated with collecting, analyzing, and interpreting big data will provide a standardized platform for designing improved diagnostics.
One way to develop and implement these improved diagnostics is the use of pipeline generation tools with the objective of maximizing reproducibility in future datasets. Another prerequisite is extensive and rigorous validations of diagnostic markers using independently collected “all-inclusive” data sets to demonstrate their generalizability in a diverse set of samples.10 There should also be consideration for diagnostic workflows that are flexible yet accommodative to hybrid methodologies because sometimes studies tend to be too varied to rigidly fit a given pipeline.
Thus, self-assessment in the transplant community regarding concerns about data reproducibility leading to vastly improved transplant diagnostics will be necessary.
1. Botvinik-Nezer R, Holzmeister F, Camerer CF, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature. 2020;582:84–88.
2. Broecker V, Mengel M. The significance of histological diagnosis in renal allograft biopsies in 2014. Transpl Int. 2015;28:136–143.
3. Haas M, Sis B, Racusen LC, et al. Banff 2013 meeting report: inclusion of c4d-negative antibody-mediated rejection and antibody-associated arterial lesions. Am J Transplant. 2014;14:272–283.
4. Lee DM, Abecassis MM, Friedewald JJ, et al. Kidney graft surveillance biopsy utilization and trends: results from a survey of high-volume transplant centers. Transplant Proc. [Epub ahead of print. Jun 21,2020]. doi: 10.1016/j.transproceed.2020.04.1816
5. Zaitsev K, Bambouskova M, Swain A, et al. Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures. Nat Commun. 2019;10:2209
6. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–127.
7. Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12:453–457.
8. Rashidi HH, Tran NK, Betts EV, et al. Artificial intelligence and machine learning in pathology: the present landscape of supervised methods. Acad Pathol. 2019;6
9. Mengel M, Loupy A, Haas M, et al. Banff 2019 meeting report: molecular diagnostics in solid organ transplantation-consensus for the Banff Human Organ Transplant (B-HOT) gene panel and open source multicenter validation. Am J Transplant. 2020;20:2305–2317.
10. Kurian SM, Whisenant T, Mas V, et al. Biomarker guidelines for high-dimensional genomic studies in transplantation: adding method to the madness. Transplantation. 2017;101:457–463.