INTRODUCTION
Since the inception of solid organ transplantation as a life-saving treatment for organ failure, there has been significant mismatch in organ supply and demand. Given the scarcity of donor organs, fundamental debates regarding the equity and utility of the transplant waitlist prioritization system have persisted. The Model for End-Stage Liver Disease (MELD) score was adopted for waitlist prioritization in the United States in 2002 and is seen as an objective laboratory-based measure of liver disease severity due to its ability to predict 90-day mortality without a transplant. The MELD score initially incorporated serum creatinine, bilirubin, and international normalized ratio (INR), with high scores reflecting high mortality risk. Implementation of the MELD-based transplant allocation system led to decreased pretransplant mortality and improved transplant rates without a decline in posttransplant survival1—representing a significant improvement over the previous largely time-based system.
While the MELD system continues to evolve to further optimize the predictive power of the assigned score, the system is plagued by significant geographic variation in the median MELD scores at transplant across the United States. As a result, geographic location has an unintended impact on the level of illness (reflected by MELD) at which patients can receive a life-saving transplant. The difference in median MELD at transplant varied between regions by as much as 12 points (35 versus 23) in 2015, the equivalent of a 60% difference in the estimated risk of 3-month mortality.2 Attempts at reducing this disparity in MELD at transplant require changes in organ distribution that have been the subject of intense debate in the liver transplant community. However, using MELD parity as a goal of allocation system changes requires that the MELD score itself can be measured both accurately and reproducibly across regions and transplant centers.
Unfortunately, variations in the creatinine and INR values obtained by using different laboratory methodologies may significantly impact the calculated MELD score for a given blood sample.3-6 Since these initial reports, nationwide standardization of creatinine measurements to isotope dilution mass spectrometry (IDMS) has been implemented to reduce interlaboratory variation.7 However, national standardization and national proficiency laboratory testing is limited to measurements in the absence of substances that interfere with the assay. As bilirubin interferes with the measurement of creatinine, and different laboratory platforms are impacted by this interference to different degrees, the benefit of IDMS standards is reduced in samples with high bilirubin concentrations. In addition, the MELD score calculation has also become more complex with the addition of sodium to the MELD (MELD-Na) score in 2016, after demonstration that MELD-Na better predicts waitlist mortality.8 How these changes have impacted variability in measured MELD-Na, and how center-based variability in measured MELD-Na impacts the probability of transplant for patients at centers directly competing for organs within a single United Network for Organ Sharing (UNOS) region are not known.
We therefore determined the extent of variability in MELD-Na score among the liver transplant centers within a single UNOS region (region 9) with multiple transplant programs. Furthermore, we sought to determine whether there is bias placing specific patients or centers at a disadvantage for organ allocation resulting inadvertently from an assay/platform choice of the hospital laboratory.
MATERIALS AND METHODS
Study Design and Procedures
Institutional review board approval was obtained from Columbia University to allow for deidentification and testing of residual patient blood specimens.
Thirty deidentified samples were selected to span the analytical measurement range of total bilirubin (1–40 mg/dL) and each sent to 8 laboratories at 7 participating liver transplant centers within UNOS region 9. Samples were pooled from multiple patient encounters to obtain a sufficient volume. Aliquots (0.5 mL) were frozen and shipped on dry ice to participating sites. Serum creatinine, total bilirubin, and sodium were measured immediately upon thawing at each site. INR cannot be measured in residual serum samples and therefore could not be run. An INR of 1.5 was assigned for the MELD-Na calculation.
Sets of 2 frozen serum samples, with creatinine standardized to values of 0.847 ± 0.018 and 3.877 ± 0.082 mg/dL by IDMS, were purchased from the National Institute for Standards and Technology (NIST). To assess reproducibility, these NIST samples were tested multiple times at each site until the sample was exhausted.
Participating Centers and Laboratory Equipment
Eight center-based laboratory sites at 7 liver transplant centers in UNOS region 9 participated. Among the testing sites, the following analyzer platforms were represented: AU680 Chemistry System (Beckman Coulter Diagnostics, Brea, CA), DxC 700 AU (Beckman Coulter Diagnostics), Abbott Architect (Abbott Diagnostics, Abbott Park, IL), Roche Cobas 6000 (Roche Diagnostics, Indianapolis, IL), Roche Cobas 8000 (Roche Diagnostics), and Vitros 5600 (Ortho Clinical Diagnostics, Raritan, NJ). Depending on the center, the kinetic alkaline picrate (Jaffe) or the enzymatic method was used to measure creatinine. No 2 sites had the identical combination of analyzer platform and creatinine methodology. Both ion selective electrode and potentiometric methods for obtaining serum sodium were represented. All sites used a Diazo method for measuring serum total bilirubin. For the purposes of this report, the platform and assay are deidentified to preserve the anonymity of the participating centers (sites are labeled numbers 1–8, laboratory platform manufacturers are lettered A–D, and assay methodology is indicated in subscripts 1–2). The creatinine assay with the least interference by bilirubin was used as the gold-standard test for the correlation plots.
Statistical Analyses
Descriptive statistics were performed using Prism (GraphPad v5.0) and Stata 12.0 (College Station, TX). Intraclass correlation coefficients (ICCs) were calculated in Stata 12.0 using a 2-way mixed-effects model to assess the agreement in the measurement of creatinine across the 8 participating sites that assessed each of the 30 samples. In addition to assessing the ICC for sodium, bilirubin, and creatinine, the ICC for creatinine was also assessed using stepwise increase in the bilirubin threshold at which samples were included in the analysis.
A linear mixed-effects model was then utilized to evaluate the impact of center, interfering substances (bilirubin), the interaction between center and bilirubin on measured creatinine using SAS v9.04 (SAS Institute Inc., Cary, NC).
Heatmaps and hierarchical clustering were performed using R (v3.1.1; heatmap.2), using default Euclidean measures to obtain distance matrix, and complete agglomeration method for clustering, to assess center-based bias in measured creatinine and MELD-Na score.9
Finally, to evaluate the impact of changes in MELD-Na score on access to transplantation, data from the Scientific Registry of Transplant Recipients of adult liver transplant candidates listed between June 2013 (when the Share 35 rule was implemented in organ allocation) and February 2017 were analyzed. The Share 35 policy was implemented to further prioritize liver transplant candidates with the highest risk of death without transplant (those with MELD-Na scores ≥35) to receive offers from within the entire UNOS region rather than only the local donor service area. The proportion of patients transplanted within 30 days of listing at each MELD-Na, nationally and in region 9, was tabulated. Unadjusted logistic regression was performed to estimate the odds ratio for transplant at each MELD-Na score.
RESULTS
Reproducibility of Creatinine Measurements With NIST Standards
The results of creatinine measurements using the NIST standards are summarized in Figure S1 (SDC, https://links.lww.com/TP/B838). For sample 1, mean results at the 8 sites ranged from 0.795 to 0.896 mg/dL (coefficient of variability = 1.8%). At 3 out of the 8 sites, all results were outside of the allowable deviation. For sample 2, average results ranged from 3.717 to 4.02 mg/dL (coefficient of variability = 1.0%). For this sample, all results at 2 sites were out of the range considered allowable by the NIST. Only 1 site (site 7) was within the allowable range for both NIST samples on all tests.
Individual Assay and MELD-Na Variability Between Sites
The overall ICC was moderate for creatinine (0.95; 95% confidence interval, 0.90-0.98) and relatively weak for bilirubin (0.89; 0.83-0.94) and sodium (0.88; 0.81-0.94).
The overall mean creatinine for the samples was 2.1 mg/dL (range: 0.7–4.6 mg/dL), with a mean measured creatinine from 0.99 to 3.95 mg/dL (Table 1). The overall mean bilirubin for the samples was 13.9 mg/dL (range: 0.4–47 mg/dL), with a mean measured bilirubin ranging from 1 to 40 mg/dL.
TABLE 1.: Summary of mean (range) assay results from the 8 laboratory sites
The mean calculated MELD-Na per sample ranged from 14 to 39 (Figure 1, Table 1). The range in calculated MELD-Na per sample (difference between the maximum and minimum values) varied by 1–6 points, with an average of 3 MELD-Na points (Table 2). Overall, 30% of samples had a range of ≥4 MELD-Na points.
TABLE 2.: Range in MELD-Na score per sample
FIGURE 1.: Distribution of sodium to the Model for End-Stage Liver Disease (MELD-Na) scores by individual sample.
Impact of Center and Bilirubin on Creatinine Measurements
As creatinine is heavily weighted in the MELD-Na score, and there is a known impact of interfering substances including bilirubin on measured creatinine level, additional analysis was performed specifically examining the impact of center and bilirubin level on creatinine results. Using results from a gold-standard test with minimal bilirubin interference on creatinine levels, correlation plots of bilirubin versus the percent difference in creatinine level between the test and gold-standard result for each sample are plotted to illustrate the bilirubin interference on creatinine results (Figure 2). As bilirubin values increased, creatinine values were biased lower for some assay platforms.
FIGURE 2.: Using results from a gold-standard test with minimal bilirubin interference on creatinine levels, correlation plots of bilirubin vs the percent difference in creatinine level between the gold-standard and test result for each sample are plotted to illustrate the effect of total bilirubin interference on creatinine. The number coding for each site in the legend represents the anonymized performing laboratory (1–8), the letter (A–E) represents the instrument platform the laboratory utilized to obtain the results, and the subscript (1–2) represents the methodology used to perform creatinine. The Pearson R 2 and P are provided. A, Two sites using different analyzer platforms (1C and 2B) for performing creatinine with different relationships between bilirubin and percent difference in creatinine. B, Three sites all using the same platform (D) with site 5D2 using a different assay and exhibiting a more substantial effect of bilirubin on creatinine measurements.
The impact of rising bilirubin concentrations on estimates of creatinine concentration was also demonstrable by the decreasing ICC for creatinine across the sites as the instances with lower bilirubin concentrations reported were excluded (Figure 3A) such that for samples with a bilirubin above 33, the ICC for creatinine had dropped to 0.86.
FIGURE 3.: Impact of bilirubin on creatinine variability. A, Intraclass correlation coefficients (ICCs) for creatinine by increasing bilirubin level. B, Boxplots of range in calculated sodium to the Model for End-Stage Liver Disease (MELD-Na) scores per sample by bilirubin quartile displaying the increased variability in MELD-Na as bilirubin rises.
Finally, the impact of center and bilirubin concentration on measured creatinine level was evaluated using a multivariable linear mixed-effects model (Table 3). In this model, center, bilirubin, and the interaction between center and bilirubin were significantly predictive of creatinine level. The magnitude of this effect was dependent on the center’s choice in assay methodology as shown by the parameter estimates for the interaction term (bilirubin × center), with centers using assay platform D (centers 3, 4, and 5) having the most significant effect of bilirubin on creatinine measurements (Table 3 and Figure 2B).
TABLE 3.: Mixed-effects model to predict measured creatinine level
Impact of Center on MELD-Na Score
As a result of the different relationships between bilirubin level and creatinine measurement interaction, there is greater variability in MELD-Na score as bilirubin increases (Figure 3B). The average range of MELD-Na scores was 2 for patients with the lowest bilirubin quartile (<7 mg/dL) but 4 in patients with bilirubin in the highest quartile (>20 mg/dL).
The impact of center on MELD-Na score is also displayed by unbiased hierarchical clustering of MELD-Na scores and creatinine (Figure 4). Sites with identical platforms and methods for obtaining creatinine levels, such as 3D1 and 4D1, clustered together. Site 5D2 used the same platform but an alternative method, resulting in declustering from sites 3 and 4. In addition, it is clear from this analysis that sites using platform A have the highest creatinine values in almost all samples, with site 8 having the highest MELD-Na score in 50% of cases.
FIGURE 4.: Center-based bias in MELDNa, creatinine and total bilirubin. Heatmaps illustrating unbiased hierarchical clustering of (A) sodium to the Model for End-Stage Liver Disease (MELD-Na) scores, (B) creatinine at the participating sites, and (C) total bilirubin level. Rows are ordered from lowest to highest mean bilirubin value. Green cells represent measured values with positive deviation from the mean for the sample, while red represents a negative deviation. The number at the base of each column represents an anonymized performing laboratory (1–8) and the letter (A–E) represents the instrument platform the laboratory utilized to obtain the results. The subscript number represents the testing methodology used for creatinine on a given platform. The blue line running down each column is a representation of the z score; deviations of this line represent movement away from the mean.
Impact of MELD-Na Score on Access to Transplant
The differences in MELD-Na scores observed for individual patient samples between centers may have a substantial impact on access to transplant. A change in MELD-Na of 3–6 points for an individual patient leads to significant differences in the 30-day probability of transplant, both nationally and within region 9 (Table S1, SDC, https://links.lww.com/TP/B838). For samples with extreme ranges of up to 6 MELD-Na points, the impact may be profound. For example, for the sample with a range in calculated MELD-Na scores from 26 to 32, this amounts to an approximately 50% difference in the probability of transplant in the subsequent 30 days, both nationally (25.5% versus 48.3%) and in region 9 specifically (41.2% versus 83.3%). In addition, even small 1–2 point changes in MELD-Na may have a significant impact, especially around an allocation threshold such as MELD-Na of 35. For example, as the Share 35 allocation policy that mandates regional sharing for all patients with MELD-Na of ≥35, a difference between MELD-Na of 33 and 35 results in a 10% increase in probability of transplant within 30 days, again both regionally (49.0% versus 59.9%) and nationally (75.0% versus 85.7%).
DISCUSSION
In this study, we identified highly significant differences in MELD-Na allocation ranking due to the variation in laboratory methodology between centers. This effect was most pronounced at the highest scores, resulting in highly significant differences in expected organ access between individuals with the same actual severity of liver disease in a given zone of distribution. This finding is concerning because MELD-Na–based liver transplant allocation was instituted to eliminate subjective clinical criteria from the rank order on the waitlist.
In recent years, the literature has been replete with studies concerning disparities in access to transplantation driven by factors other than liver disease severity. All of these studies have focused on the effects of geography on access to transplantation, and the bias in distribution based on geography. In March 2000, the US Department of Health and Human Services promulgated the Final Rule, which instructs that allocation policies shall not be based on the candidate’s place of residence or place of listing except to the extent needed to satisfy other regulatory requirements. Our work suggests that without standardization of MELD-Na determination, the second pillar of the final rule, distribution in decreasing order of medical urgency, is not being met either. While eliminating geographic disparities in MELD-Na at transplant has been the focus of current efforts to improve fairness, center-based variability in laboratory platform and assay may create additional variability and bias within and between regions that should be considered in the current allocation debate.
Within 8 laboratories in UNOS region 9, the mean range in calculated MELD-Na within single patient samples was 3 MELD-Na points, with one-third of samples with a range of 4 or more points. This variability spanned the range in MELD scores from 14 to 39 but was the most pronounced in samples with high total bilirubin levels. Almost a third of patients had a 4–6 point difference in MELD-Na score, translating into an up to 50% change in the probability of being transplanted in the subsequent 30 days. Scores that straddle thresholds for changes in sharing of organs, at both the high (MELD-Na >35) and low ranges (MELD-Na <15), could lead to dramatically different access to transplant based upon laboratory technique rather than upon the inherent risk of poor outcomes.
While some variability in assay performance has been previously reported and is perhaps expected, this is the first report in which significant bias is demonstrated between centers within a UNOS region. The center where the serum was assayed was significantly predictive of measured creatinine level and calculated MELD-Na. Thus, individual patients may be given additional priority on the transplant waiting list based upon their center or laboratory facility rather than based upon their inherent risk of death without transplant. As each of the MELD-Na components are also accounted for in the Scientific Registry of Transplant Recipients risk adjustment model that is utilized to assess center performance, it is also possible that centers may be inappropriately advantaged or disadvantaged on a regulatory basis when the laboratory measurements do not correctly reflect severity of disease.
Creatinine measurement is clearly impacted by laboratory platform and assay methodology as well as the presence of interfering substances, including bilirubin.10 Bilirubin is a chromogen that causes a negative interference with creatinine measurements, usually resulting in lower creatinine values. Several methods have been developed to overcome this interference, leading to different results depending upon the assay utilized.11,12 The impact of creatinine assay variability on calculated MELD has also been explored in European studies, where significant variation in measured creatinine across assay techniques was documented, with increasing variability in patients with high bilirubin levels and MELD scores.13,14
The national initiative to standardize creatinine measurements to NIST standards should have reduced the variability of MELD-Na scores. However, despite compliance with this standardization protocol, many of the sites measured values outside expected values when testing NIST standards. In addition, these standards were not created with interfering substances such as bilirubin present in the sample. There is clearly a complex relationship between the creatinine assay, the platform manufacturer, and the interference of bilirubin on creatinine measurement such that simply requiring a specific creatinine assay may not completely eliminate this center-based effect.
Previous studies have also revealed that INR is a source of variability in calculated MELD score.3,4 INR was not tested in this present study as it cannot be run on residual serum samples. This is a limitation of the current analysis as variation in INR between methodologies is significant, again with the highest variation in the high MELD patients. Given the differing methodologies between the laboratories in region 9, this would likely have further increased the variability in calculated MELD-Na. In addition, we acknowledge that many outpatients on the waiting list get laboratories done at multiple laboratory facilities, often outside of their transplant center. We did not send samples to these additional facilities, but this may add additional variability.
Despite standardization efforts for creatinine and INR reporting, these assays continue to perform poorly among patients with cirrhosis, and UNOS has never performed a widespread quality control study to understand the impact of this variability on allocation. How to best address this center-based disparity is not straightforward. Any proposal for standardization must consider the logistical complexity of thousands of patients having testing performed at hundreds or thousands of transplant centers and local laboratories on a regular basis. Mandating uniform methodology for all 4 measures in the MELD-Na score or centralized testing in a UNOS region or donor service area is also likely not feasible.
In this era of greater accountability and federal reporting of laboratory-based measures, there is precedent for adjusting for variability due to laboratory methodology. Reporting of serum albumin for patients on dialysis includes information on the method used15,16 to allow for conversion of results to a common standard.17 However, to accurately determine the level of adjustment needed for each measure will require further studies. Finally, it has also been proposed that standards with interfering substances also be used to calibrate laboratory equipment, perhaps at least at transplant centers and private laboratories seeking certification for MELD-Na testing. It is clear that UNOS and the transplant community would benefit from a more comprehensive investigation into the underrecognized impact of laboratory test variability on organ allocation and clinical outcomes on both the center and individual level. The extent to which providers and programs are already directing patients to facilities known to produce higher MELD scores is unknown. It is also possible that for patients with significant cholestasis, such as those with severe alcohol-related hepatitis, there will be more significant variability between laboratories.
In the United States, access to donated organs is affected by 2 independent aspects: distribution, the geographic zone defining the recipient population within which the organ is primarily offered, and allocation, the rank order with which the offer is made which is based on disease severity defined by MELD-Na. We have demonstrated a clinically relevant interlaboratory variation in calculated MELD-Na score among the transplant centers directly competing for organs in UNOS region 9. This variability was in part driven by creatinine, which was independently predicted by the center where it was assayed. These differences led to consistently higher or lower creatinine and MELD-Na values at individual centers, impacting organ access in a nonrandom fashion. This suggests a troubling scenario in which a patient could change their position on the list by selecting a laboratory with assays that consistently report higher creatinines, undermining the principles of fair allocation based upon an objective measure of liver function. This bias should be considered in current efforts to eliminate disparities in liver transplant access.
REFERENCES
1. Freeman RB, Wiesner RH, Edwards E, et al.; United Network for Organ Sharing Organ Procurement and Transplantation Network Liver and Transplantation CommitteeResults of the first year of the new liver allocation plan. Liver Transpl. 2004; 10:7–15
2. Organ Procurement and Transplantation Network/United Network for Organ Sharing Liver and Intestinal Organ Transplantation CommitteeRedesigning liver distribution. 2016. Available at
https://optn.transplant.hrsa.gov/governance/public-comment/redesigning-liver-distribution. Accessed July 25, 2017
3. Trotter JF, Olson J, Lefkowitz J, et al. Changes in international normalized ratio (INR) and model for endstage liver disease (MELD) based on selection of clinical laboratory. Am J Transplant. 2007; 7:1624–1628
4. Trotter JF, Brimhall B, Arjal R, et al. Specific laboratory methodologies achieve higher model for endstage liver disease (MELD) scores for patients listed for liver transplantation. Liver Transpl. 2004; 10:995–1000
5. Lisman T, van Leeuwen Y, Adelmeijer J, et al. Interlaboratory variability in assessment of the model of end-stage liver disease score. Liver Int. 2008; 28:1344–1351
6. Cholongitas E, Marelli L, Kerry A, et al. Different methods of creatinine measurement significantly affect MELD scores. Liver Transpl. 2007; 13:523–529
8. Kim WR, Biggins SW, Kremers WK, et al. Hyponatremia and mortality among patients on the liver-transplant waiting list. N Engl J Med. 2008; 359:1018–1026
10. Nah H, Lee SG, Lee KS, et al. Evaluation of bilirubin interference and accuracy of six creatinine assays compared with isotope dilution-liquid chromatography mass spectrometry. Clin Biochem. 2016; 49:274–281
11. Badiou S, Dupuy AM, Descomps B, et al. Comparison between the enzymatic vitros assay for creatinine determination and three other methods adapted on the Olympus analyzer. J Clin Lab Anal. 2003; 17:235–240
12. Lolekha PH, Jaruthunyaluck S, Srisawasdi P. Deproteinization of serum: another best approach to eliminate all forms of bilirubin interference on serum creatinine by the kinetic Jaffe reaction. J Clin Lab Anal. 2001; 15:116–121
13. Goulding C, Cholongitas E, Nair D, et al. Assessment of reproducibility of creatinine measurement and MELD scoring in four liver transplant units in the UK. Nephrol Dial Transplant. 2010; 25:960–966
14. Kaiser T, Kinny-Köster B, Bartels M, et al. Impact of different creatinine measurement methods on liver transplant allocation. PLoS One. 2014; 9:e90015
15. Parikh C, Yalavarthy R, Gurevich A, et al. Discrepancies in serum albumin measurements vary by dialysis modality. Ren Fail. 2003; 25:787–796
16. Department of Health and Human ServicesEnd Stage Renal Disease Medical Evidence Report. Available at
https://www.cms.gov/Medicare/CMS-Forms/CMS-Forms/downloads/cms2728.pdf. Accessed March 1, 2019
17. Clase CM, St Pierre MW, Churchill DN. Conversion between bromcresol green- and bromcresol purple-measured albumin in renal disease. Nephrol Dial Transplant. 2001; 16:1925–1929