Tang, Hongying*; Hurdle, John F.†; Poynton, Mollie†; Hunter, Cheri‡; Tu, Ming‡; Baird, Bradley C.§; Krikov, Sergey†; Goldfarb-Rumyantzev, Alexander S.*
Kidney transplantation is the preferred renal replacement modality for patients with end-stage renal disease.1,2 Because of the advance of surgical technology and immunosuppressive therapy, short-term graft survival has improved consistently. Long-term graft survival, however, still presents problems.3 In addition, the severe shortage of donor sources necessitates optimizing the donor and recipient match variables to maximize graft survival time.
Prognosis of kidney transplantation is important clinically in designing individual strategies for patient management. Accurate identification of patients most likely to benefit from transplantation certainly will improve outcomes by optimizing the selection of donor-recipient pairs, optimizing modifiable factors, and choosing individualized renal replacement therapy and immunosuppressive therapy. However, prediction of kidney transplantation outcomes presents a challenge. Kidney transplantation outcome depends on multiple factors, such as the recipient's age, comorbidities, height, weight, panel reactive antibody (PRA) level, to name a few. Although individual risk factors related to graft outcome have been studied in the past,4–8 our understanding about how these risk factors affect a transplant outcome, individually and jointly, is poor. Existing prediction models use predictors available in the posttransplant period.9–12 There is little experience in predicting graft survival using pretransplant variables.
The literature demonstrates that 3-year graft survival could be accurately predicted based on information available during the pretransplant period.13 In the past, our group has developed a set of prediction models based on the US Renal Data System (USRDS) data. These prediction models are among the few that have been built by the variables available before transplantation. Continuing that work, our group has built tree-based classification models to predict 1, 3, 5, 7, and 10-year graft survival. We have shown that tree-based classifiers generally perform the same or outperform other model strategies.13,14 The prediction models were evaluated by the USRDS data using the area under the receiver operating characteristic curve (AUROC) and show promising results.
In this project, our goal was to evaluate the applicability of our previously built prediction models to a local dataset. The national registry data, used for many epidemiologic studies, came from multiple sources with information sometimes poorly validated or missing. As local clinical data are associated with direct patient care, the data are usually considered to be of relatively higher quality and accuracy. More importantly, local data might statistically differ from the national data due to demographic or cultural differences. As implementation of the prediction models in clinical practice will directly affect the wellbeing of patients, evaluation of prediction models using local clinical data is critical.
We used two data sources, the clinical data stored in our Enterprise Data Warehouse and transplant-related data submitted to the United Network for Organ Sharing (UNOS) by the Solid Organ Transplant program at the University of Utah Health Science Center (UUHSC). There were 960 unique kidney transplantation recipients at the UUHSC from January 1, 1990, to December 31, 2004. This study included 942 unique kidney transplant recipients who could be linked between the two data sources by patient's first name, last name, date of birth, gender, and date of the transplantation. For each patient, only the most recent (but not necessarily the first) transplantation was included in the study. The death-censored graft survival was used as an outcome.
The prediction models were built based on the USRDS data using S-plus professional version 6.2. Five-classification tree prediction models were built to predict graft outcome at 1, 3, 5, 7, and 10 years posttransplant. The predictors were selected from the list of pretransplant variables based on a literature review15–17 and previously studied by our group.13,18–20 Different subsets of predictors were included in the prediction models to examine the predictive power of each variable. The following predictors were used in the final model evaluation:
1. Recipient variables: age; gender; race; height; weight; history of hypertension; diabetes; unstable angina; cardiovascular or peripheral vascular diseases; predominant dialysis modality (hemodialysis, peritoneal dialysis, transplantation, or none); total time on waiting list; dialysis modality used before transplantation for at least 60 days; primary source of payment for treatment; if recipient has US citizenship; a comorbidity score (as described later); peak PRA level; and most recent PRA level.
2. Donor variables: donor type; age; gender; race; height; and weight.
3. Transplantation parameters: the degree of human leukocyte antigen (HLA) match; donor cold storage time; history of previous transplantation; total number of transplantations; and if the recipient used mycophenolate mofetil in immunosuppressive therapy.
To adjust for patient comorbidities, we formed a comorbidity coefficient similar to the Charlson comorbidity index.21 Each of the comorbid conditions available in the dataset (from the CMS form 2728) contributed one point toward the composite index with additional points given for older age. We previously used similar approach to summarize comorbidities with abbreviated comorbidity indices of Davies et al.22 and Charlson et al.21 These abbreviated indices were validated by strong association with clinical outcomes.18,23 A comorbidity score of 0 means that a recipient had none of the diseases used to calculate the comorbidity score (i.e., peripheral vascular disease, unstable angina or other cardiovascular disease, diabetes, and hypertension).
Some of the local data had one or more missing attributes. Before using the local data, we used imputation to fill gaps in the data. For missing “height” or “weight” of a recipient or a donor, the value suggested by the Center for Disease Control and Prevention growth chart according to age was assigned. The missing values of continuous predictors (peak PRA, most recent PRA, and HLA match) were assigned with the respective mean value of the predictors. If a recipient received a living donor kidney and had no waiting time, the waiting time is assigned as 0.
We built a suite of applications to evaluate the prediction models. An extract-transform-load tool was developed to gather the data from multiple local data sources into a final table. A Prediction Rule Parser was built to transform the S-plus prediction rules from free text format to a table format, so that it could be used in a decision support application. A prediction engine was developed to apply the parsed prediction rules to the local data.
Data Analysis and Prediction Modeling
The data characteristics of the study population at the UUHSC and the USRDS were analyzed and compared. The χ2 test was used for categorical variables, and the t-test was used for continuous variables.
Prediction model used in this project was previously developed by our group and is described elsewhere.24 The model was based on the regression tree approach. Tree-based model analysis has been extensively described elsewhere25 and was previously used by our group in the prediction of renal function of diabetics14 and in the prediction of kidney allograft survival.13 Briefly, tree-based modeling, also called classification and regression trees, is a form of binary recursive partitioning that systematically separates data into two groups using regression of a single factor on the outcome. Unlike many traditional methods (e.g., Support Vector Machines, Naïve Bayes), tree-building techniques are ideally suited for the development of a reliable, self-explaining clinical decision support system and can be used to classify new patients into categories according to predicted allograft outcomes.25
Tree-based modeling requires relatively little input from the analyst, as the outcome is presented in a form of binary trees and easy to interpret by a nonstatistician. However, the model is limited in that the partitioning method leads to the predicted value being presented in a discrete format, which may not make full use of the information that continuous variables can provide.25
We used randomly selected two of three of the USRDS data to train the model. To test the performance of the models, prediction algorithms were applied to the testing dataset (remaining one of three of the USRDS data), as well as to the data from the UUHSC Transplant Center. The values of the predicted probability of graft failure were generated and compared with actual values of graft outcome.
Discrimination of the prediction models was determined by calculating the AUROC by plotting a “sensitivity” measure of the model against “1-specificity.” The sensitivity and specificity are calculated after the outcome of the model is converted from the probability of graft survival to the binary variable describing predicted survival (yes/no) by selecting different probability cut points. By converting the probability into binary variable, we generated several sets of predictions with sensitivity and specificity associated with each cutoff point. These sensitivity and specificity values were then plotted as a ROC curve. The AUROC is a numeric value representing the discriminatory ability of the model. The value close to 1 means perfect prediction and 0.5 is equal to random chance. A software tool named “nonparametric comparison of areas under correlated ROC curves” was used for AUROC comparison. The software tool was provided by SAS support website.
Data analysis and calculations were performed with SAS version 9.1 (SAS Institute, Cary, NC). STATA version 9.0 (Stata Corporation, College Station, TX) was used to calculate and compare the AUROCs.
Comparison of Baseline Characteristics
Before the prediction models were tested on local data, the USRDS dataset and the UUHSC dataset were compared by means of descriptive statistics to find whether the two datasets were different in their baseline characteristics. All the 30 data attributes (including the outcome variable: graft survival) used in the prediction models were compared between the UUHSC and the USRDS. Fourteen data attributes were found to be significantly different (Table 1).
After censored patient data were excluded, the numbers of patients with a long enough follow-up to reach the graft outcomes (“failure” or “survive”) at 1, 3, 5, 7, and 10 years posttransplant in the USRDS dataset were 92,844, 73,672, 58,005, 46,791, and 35,279, respectively. The number of patients whose graft outcomes were known at 1, 3, 5, 7, and 10 years in the local UUHSC dataset were 854, 635, 462, 325, and 213, respectively.
Both data sources show graft survival rates consistently decreasing with time. Study population at the UUHSC show significantly higher graft survival rates compared with the population at the USRDS (p < 0.001 at all five time points). Similarly, other baseline data were different between the datasets.
The proportion of living donors at the UUHSC were nearly twice as high as in the USRDS at every given time point (p < 0.001 at all five time points). In agreement with that, the study population at the UUHSC shows significantly higher HLA match levels (p < 0.001 at all five time points). The UUHSC dataset showed a significantly higher proportion of patients with lower comorbidity (score = 0) and smaller fractions of patients with diabetes mellitus compared with the USRDS dataset (p < 0.01 and p < 0.005, respectively, at all five time points). The proportion of white recipients in the USRDS dataset was approximately 0.70; in contrast, in the UUHSC population, it was >0.93 at every time point (p < 0.001 at all five time points). Similarly, the proportion of white donors in the USRDS dataset remain approximately 0.84, whereas in the UUHSC dataset, it was 0.96–0.98 at different time points (p < 0.001). The fraction of male donors in the USRDS was significantly higher than in the UUHSC dataset. The proportion of recipients treated with mycophenolate mofetil in the UUHSC was significantly greater (p < 0.001) than that in the USRDS by 13%–20% at all five time points.
The number of patients who never had previous kidney transplant was higher in UUHSC dataset. The differences are significant at 1, 3, 5, and 7 years posttransplant (p < 0.01) and had a trend toward significance at 10 years posttransplant (p = 0.06).
The proportion of the recipients who had donor cold storage time <6 hours is significantly lower in the USRDS data: the differences are as large as approximately 30% at some time points (p < 0.001 at all five time points). Patients in UUHSC dataset are generally younger by 5–10 years (p < 0.001) at any given time point. The recipients' PRA levels in the UUHSC dataset were higher by 3%–6% than in the USRDS dataset (p < 0.01) at every time point. Significantly greater number of patients in UUHSC data were US citizens compared with USRDS population (p < 0.05 at all five time points).
Comparison of Discrimination of the Prediction Models
Table 2 summarizes the AUROC values of the prediction models evaluated on the USRDS and the UUHSC datasets for the five time points. The AUROC values for the USRDS were 0.59, 0.64, 0.76, 0.91, and 0.97, respectively, for the five time points studied, and the values for the UUHSC were 0.54, 0.58, 0.58, 0.61, and 0.70, respectively. All respective AUROC values for the predictions generated in the UUHSC dataset were lower than the AUROC values for the USRDS data, especially at the 7 and 10 years posttransplant. The differences are significant for all five time points (p < 0.001).
This study is perhaps the first to demonstrate that a local kidney transplant recipient population can have different, clinically important characteristics from those in the national kidney transplant population. The graft survival rate at the UUHSC is significantly higher (p < 0.001) than that at the USRDS at each of the five time points (1, 3, 5, 7, and 10 years posttransplant). The difference is as high as 25%. It is intriguing to explore what is behind such a difference. In general, graft survival rate depends on the quality of the donor kidney, the recipient's health condition before and after transplant, and transplant parameters. Most of the data attributes that are significantly different between the local (UUHSC) and the national (USRDS) datasets are expected to contribute to better graft survival outcome. Previous studies have observed a significantly higher rate of acute rejection among the recipients with lower donor-recipient HLA match.26 The apparently higher HLA match (p < 0.001 at all time points) at the UUHSC must contribute to the higher survival rate. In addition, improvement in survival has been found to be associated with living donors.27–29 The higher proportion of living donors at the UUHSC should thus be one of the causative factors for the higher survival rate. Long storage time of donor kidneys would compromise the quality of the transplanted donor kidneys and, thus, the transplant outcome.30 At the UUHSC, a higher proportion of recipients had donor cold storage time <6 hours; the quality of transplanted kidneys is expected to be better compared with that in the USRDS. The literature reports that recipients having no previous kidney transplant history would be expected to have longer graft survival than those having previous kidney transplant(s).19,31 The proportion of the recipients having had no previous kidney transplant history at the UUHSC is much higher than the USRDS, which might be associated with the longer graft survival at the UUHSC. Some other data characteristics indicate that the recipients at the UUHSC were generally healthier. For instance, local recipients had a lower average age, a lower proportion of recipients with a history of diabetes (p < 0.005 at all time points), a lower proportion of recipients with a history of hypertension (p < 0.05 at 1 and 3 years), and a higher proportion of recipients with comorbidity score equal to 0.
The percentages of both white donors and white recipients at the UUHSC are much higher than at the USRDS. The distribution of donors and recipients of other races such as Asian and Native American are comparable between USRDS and the UUHSC. Thus, the ratio of African Americans and whites is the main racial difference between the USRDS and the UUHSC. Previous research has demonstrated that white recipients have better graft survival compared with that of African American recipients.32–34 Arguably, the higher survival at the UUHSC is very likely associated with the higher proportion of white donors and recipients.
Data analysis shows that the proportion of male donors at the USRDS is significantly higher than that at the UUHSC. Literature reports that donor gender has not shown significant effect on graft survival.35 It is uncertain whether the donor gender difference contributes to the higher survival rate at the UUHSC.
The mean values of peak PRA in both datasets were <20%. A PRA <20% is usually considered to be associated with better graft outcome.36 Thus, PRA levels should not contribute to the differences in survival.
Perhaps, the most interesting results of this project have to do with application of the prediction algorithm developed in USRDS data when applied to the dataset from the UUHSC. The prediction models were trained on the two thirds of the national USRDS dataset and then tested on the remaining one third of the USRDS dataset and local data. The models performed better when tested on the one third of the national USRDS data than on the local UUHSC data. That result cautions against excessive optimism when prediction models developed in the large collection of data are recommended for practical use in a local healthcare institutions. We suggest that this discrepancy in performance is very likely due to the differences in the data characteristics between the two datasets. As described previously, 14 of the 30 predictors in the local data were significantly different from national data. However, that fact that the populations studied are different should not diminish the concern of applying prediction models to other local populations. Similarly to Utah, the population might be different in Idaho, in Minnesota, or in Georgia. To the extent that such a pattern of difference exists at other transplant centers, national models must be applied locally with considerable scrutiny. However, in recent literature, it is common to advocate local implementation of the prediction models derived on the large/national datasets or even in another local institutions.
In general, this article is a warning against the assumption that the models working in a large datasets by default should work for local data as well. At the same time, prediction modeling is a valid and important approach to data analysis in medicine. We believe that the key is the thorough external validation of the models, including a test run on a local data before implementation in a particular local setting. Good performance should facilitate implementation, whereas inadequate performance should be alleviated by adjusting the model. In the recent article by Kasiske et al.,37 authors validated their prediction model on several local datasets extracted from the USRDS. That might be one approach before implementation. However, it does not protect from the differences in quality of data, data format, and errors, as local data collection might be substantially different from the national datasets.
There are several limitations to this study. It is a retrospective study, therefore we had to use data that already had been collected. Although we have tried to find clean and complete data from all available data sources, we still had to deal with missing information. Imputation for USRDS database was performed using multiple imputation procedure for continuous variables. For local UUHSC dataset, we used mean values in the dataset for the PRA levels and number of HLA matches; also, we used CDC estimates for height and weight by age. Not more than 20% of entries were missing for any one of these four continuous variables. In a way of sensitivity analysis and to examine whether imputing the missing data would introduce the bias, we tested the prediction models only on recipient data containing no missing data (439 distinct patients). It turned out that the nationally derived prediction models performed similarly on local data with and without imputed values, validating the use of imputation in this project.
In this work, the UUHSC data were studied as the exclusive local data source. There might be patients who had specific diagnoses established outside the UUHSC or who might have been on dialysis outside the University of Utah Dialysis Program. These patients could not be treated as missing because it was uncertain whether there were other records outside the UUHSC. These patients were assumed to have no history of particular diseases or dialysis if no such records were found at the UUHSC or the UNOS report. Although this assumption potentially may introduce bias, it was considered to be the most reasonable course of action. Another possible source of bias is the difference in the fraction of records with known outcome used for validation. Specifically, at year 10 in the USRDS group, there were 38% remaining cases with known outcome, and local data show only 25% of records. In other words, higher proportion of patients in the UUHSC dataset are either lost to follow-up or died compared with the USRDS dataset. However, even assuming that there is informative censoring and the population we are using for validation is skewed, it should not change the conclusion. We would still expect the model to be robust enough and perform well even in a fraction of population, as ultimately the prediction models are to be used for individual predictions.
Renal transplantation outcome prediction models derived from, and validated on, national data may perform differently on local data. This suggests that adopting wholesale a prediction model developed on a large national dataset for local clinical decision support purposes should be done with caution.
The data reported in this study have been supplied by the USRDS as well as the United Network for Organ Sharing as the contractor for the Organ Procurement and Transplantation Network. The interpretation and reporting of these data are the responsibility of the authors and in no way should be seen as an official policy or interpretation of the the OPTN or US government.
1. Ojo A, Wolfe RA, Agodoa LY, et al: Prognosis after primary renal transplant failure and the beneficial effects of repeat transplantation: Multivariate analyses from the United States Renal Data System. Transplantation 66: 1651–1659, 1998.
2. Ojo AO, Hanson JA, Meier-Kriesche H, et al: Survival in recipients of marginal cadaveric donor kidneys compared with other recipients and wait-listed transplant candidates. J Am Soc Nephrol 12: 589–597, 2001.
3. Remuzzi G, Grinyo J, Ruggenenti P, et al: Early experience with dual kidney transplantation in adults using expanded donor criteria. Double Kidney Transplant Group (DKG). J Am Soc Nephrol 10: 2591–2598, 1999.
4. Naumovic R, Djukanovic L, Marinkovic J, Lezaic V: Effect of donor age on the outcome of living-related kidney transplantation. Transpl Int 18: 1266–1274, 2005.
5. Pieringer H, Biesenbach G: Risk factors for delayed kidney function and impact of delayed function on patient and graft survival in adult graft recipients. Clin Transplant 19: 391–398, 2005.
6. Hwang AH, Cho YW, Cicciarelli J, et al: Risk factors for short- and long-term survival of primary cadaveric renal allografts in pediatric recipients: A UNOS analysis. Transplantation 80: 466–470, 2005.
7. Humar A, Ramcharan T, Kandaswamy R, et al: Risk factors for slow graft function after kidney transplants: A multivariate analysis. Clin Transplant 16: 425–429, 2002.
8. Parzanese I, Maccarone D, Caniglia L, et al: Risk factors that can influence kidney transplant outcome. Transplant Proc 38: 1022–1023, 2006.
9. de Bruijne MH, Sijpkens YW, Paul LC, et al: Predicting kidney graft failure using time-dependent renal function covariates. J Clin Epidemiol 56: 448–455, 2003.
10. Fritsche L, Hoerstrup J, Budde K, et al: Accurate prediction of kidney allograft outcome based on creatinine course in the first 6 months posttransplant. Transplant Proc 37: 731–733, 2005.
11. Hariharan S, McBride MA, Cherikh WS, et al: Post-transplant renal function in the first year predicts long-term kidney transplant survival. Kidney Int 62: 311–318, 2002.
12. Russell CD, Yang H, Gaston RS, et al: Prediction of renal transplant survival from early postoperative radioisotope studies. J Nucl Med 41: 1332–1336, 2000.
13. Goldfarb-Rumyantzev AS, Scandling JD, Pappas L, et al: Prediction of 3-yr cadaveric graft survival based on pre-transplant variables in a large national dataset. Clin Transplant 17: 485–497, 2003.
14. Goldfarb-Rumyantzev AS, Pappas L: Prediction of renal insufficiency in Pima Indians with nephropathy of type 2 diabetes mellitus. Am J Kidney Dis 40: 252–264, 2002.
15. Van Manen JG, Korevaar JC, Dekker FW, et al: Adjustment for comorbidity in studies on health status in ESRD patients: Which comorbidity index to use? J Am Soc Nephrol 14: 478–485, 2003.
16. Kasiske BL, Snyder JJ, Matas AJ, et al: Preemptive kidney transplantation: The advantage and the advantaged. J Am Soc Nephrol 13: 1358–1364, 2002.
17. Matas AJ, Gillingham K, Payne WD, et al: Should I accept this kidney? Clin Transplant 14: 90–95, 2000.
18. Goldfarb-Rumyantzev A, Hurdle JF, Scandling J, et al: Duration of end-stage renal disease and kidney transplant outcome. Nephrol Dial Transplant 20: 167–175, 2005.
19. Goldfarb-Rumyantzev AS, Hurdle JF, Baird BC, et al: The role of pre-emptive re-transplant in graft and recipient outcome. Nephrol Dial Transplant 21: 1355–1364, 2006.
20. Goldfarb-Rumyantzev AS, Hurdle JF, Scandling JD, et al: The role of pretransplantation renal replacement therapy modality in kidney allograft and recipient survival. Am J Kidney Dis 46: 537–549, 2005.
21. Charlson ME, Pompei P, Ales KL, MacKenzie CR: A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. J Chronic Dis 40: 373–383, 1987.
22. Davies SJ, Russell L, Bryan J, et al: Comorbidity, urea kinetics, and appetite in continuous ambulatory peritoneal dialysis patients: Their interrelationship and prediction of survival. Am J Kidney Dis 26: 353–361, 1995.
23. Tang H, Chelamcharla M, Baird BC, et al: Factors affecting kidney-transplant outcome in recipients with lupus nephritis. Clin Transplant 22: 263–272, 2008.
24. Krikov S, Khan A, Baird BC, et al: Predicting kidney transplant survival using tree-based modeling. ASAIO J 53: 592–600, 2007.
25. Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks/Cole, 1984.
26. Berney T, Malaise J, Morel P, et al: Impact of HLA matching on the outcome of simultaneous pancreas-kidney transplantation. Nephrol Dial Transplant 20(suppl 2): ii48–ii53, ii62, 2005.
27. Gjertson DW, Cecka JM: Living unrelated donor kidney transplantation. Kidney Int 58: 491–499, 2000.
28. Pape L, Ehrich JH, Zivicnjak M, Offner G: Living related kidney donation as an advantage for growth of children independent of glomerular filtration rate. Transplant Proc 38: 685–687, 2006.
29. Kandaswamy R, Kasiske B, Ibrahim H, Matas AJ: Living or deceased donor kidney transplants for candidates with significant extrarenal morbidity. Clin Transplant 20: 346–350, 2006.
30. Salahudeen AK, Haider N, May W: Cold ischemia and the reduced long-term survival of cadaveric renal allografts. Kidney Int 65: 713–718, 2004.
31. Mange KC, Weir MR: Preemptive renal transplantation: Why not? Am J Transplant 3: 1336–1340, 2003.
32. Weng FL, Israni AK, Joffe MM, et al: Race and electronically measured adherence to immunosuppressive medications after deceased donor renal transplantation. J Am Soc Nephrol 16: 1839–1848, 2005.
33. Butkus DE, Meydrech EF, Raju SS: Racial differences in the survival of cadaveric renal allografts. Overriding effects of HLA matching and socioeconomic factors. N Engl J Med 327: 840–845, 1992.
34. Smith SR, Butterly DW: Declining influence of race on the outcome of living-donor renal transplantation. Am J Transplant 2: 282–286, 2002.
35. Zeier M, Dohler B, Opelz G, Ritz E: The effect of donor gender on graft survival. J Am Soc Nephrol 13: 2570–2576, 2002.
36. Bryan CF, Luger AM, Martinez J, et al: Cold ischemia time: An independent predictor of increased HLA class I antibody production after rejection of a primary cadaveric renal allograft. Transplantation 71: 875–879, 2001.
37. Kasiske BL, Israni AK, Snyder JJ, et al: A simple tool to predict outcomes after kidney transplant. Am J Kidney Dis 56: 947–960, 2010.