Several natural history models have been derived from clinical data in adult populations with primary sclerosing cholangitis (PSC) (1–10). No consensus exists regarding the optimal model (11), and none have been validated for use in children. Important clinical differences exist between pediatric and adult-onset PSC patients. At PSC diagnosis, dominant strictures are present in 4% of children (12,13), compared with 45% of adults (14). Similarly, cholangiocarcinoma is rare in pediatric-onset PSC, occurring in 1% of children by 10 years (12,13), compared with at least 7% to 13% of adults (15–17). A small duct phenotype is present in 20% of children (12,13), but only 10% of adults (18,19). Features of autoimmune hepatitis overlap with PSC are present in over 33% of children (12,13), but only 7% of adults with PSC (20,21). With these clinical differences, it is unclear how well risk models derived from adult patient data are generalizable to children.
The most widely used model to estimate transplant-free patient survival is the Revised Natural History Model for PSC, from a group at the Mayo Clinic (the “Mayo model”) (5). It estimates survival with native liver for up to 4 years, and is available as an online calculator tool (22). A subsequent risk model from 5 European centers was created by Boberg et al (6) to more accurately estimate 1-year survival to inform immediate transplant listing decisions (the “Boberg model”). The most recent Amsterdam-Oxford model (the “A-O model”) included the largest model creation and validation cohorts to date, and had an added strength of originating from population-based data (10). It estimates survival with native liver out to 15 years, and is also available online (23). Characteristics of these models and their creation and validation cohorts are described and are compared with the Pediatric PSC Consortium in Table 1. We aimed to test the predictive utility of the Mayo, Boberg, and A-O prognostic models for PSC using data from the Pediatric PSC Consortium, a large, multicenter cohort of children with PSC (12).
We previously reviewed medical records on all known PSC patients at 36 different institutions throughout Europe, North America, the Middle East, and Asia (12). The PSC diagnosis was based on a cholestatic laboratory profile and either cholangiography showing multifocal stricturing and segmental dilations of the biliary tree and/or liver biopsy showing periductal, concentric fibrosis, fibro-obliterative cholangitis, or primary ductular involvement (11). Patients with abnormal cholangiograms were labeled as large duct PSC. Patients with normal cholangiograms but abnormal liver biopsy were labeled as small duct PSC. Autoimmune hepatitis (AIH) was diagnosed in patients who met a “probable” or “definite” score on the simplified AIH criteria that have been validated in children (24). We collected demographics, laboratory, histopathology, cholangiography and endoscopy data at liver disease diagnosis, as well as the presence of an esophageal variceal bleeding history. Alkaline phosphatase values were normalized for age. Complete data was present in 670/781 patients (86%). To account for missing data, we performed multivariate imputation using iteratively chained equations, combining the results of 10 imputed data sets. We validated the models using this imputed data set.
We calculated survival probabilities for each child using the equations derived from the Mayo (5), Boberg (6) and A-O (10) risk models (Appendix, Supplemental Digital Content, https://links.lww.com/MPG/B735). We did not validate other models because they necessitated access to original histopathology (1,4), full images from cholangiography studies (7,9), or included subjective assessments of organomegaly (2,3,8) that were not included in our dataset, and none are widely used. To generate observed survival probabilities, we created a retrospective cohort of all patients and followed them from time of PSC diagnosis to endpoints of liver transplantation or death from liver disease. Person-time was censored at the date of the last known clinical encounter. We used the Kaplan-Meier method to calculate rates of survival each year after diagnosis. The endpoints of each model were somewhat different, with the Mayo model derived to predict only a risk of dying with a horizon of 4 years, and with liver transplant treated as if the patient would die within 1 year. The Boberg model was designed to predict 1-year transplant-free survival, and the A-O model offered predictions for 10+ years. For uniformity in assessing multiple models, and to extrapolate longer term prediction capability, we kept patients in their initial risk strata and observed survival out to 10 years regardless of each model's original intent.
We evaluated the ability of each model to yield accurate survival probabilities for a given patient graphically, by comparing overlaid plots of observed and calculated survival probabilities. We plotted the Kaplan-Meier curve of observed outcomes alongside the annual predicted probabilities of survival for each risk group. For the plots of predicted survival, we calculated the median of the annual survival probabilities of each patient within each risk group, and connected these with straight lines (25,26). The utility of risk score cutoffs specified by the adult models to stratify patients into distinct groups (eg, “low” and “high” risk) with distinct observed survival probabilities was assessed using the logrank test. The logrank test is used to test the null hypothesis that there is no difference between the risk groups in the probability of an event (transplant or death) at any time point (27). Discriminatory ability of the models was assessed with the concordance statistic (c-statistic). The c-statistic was calculated by comparing observed and expected survival between every possible pairing of 2 of the 781 patients in the cohort (1 vs 2, 1 vs 3, …, 780 vs 781). The c-statistic is the percentage of all 609 180 of these possible pairings that the model “guessed” correctly (assigned a worse predicted survival to the patient with the worst observed survival) (28). The c-statistic ranges from 0.5 (no discrimination, eg, random risk stratification using a coin toss) to 1.0 (perfect discrimination), with values of 0.8 or higher generally regarded as “good discrimination” (29). We created time-truncated datasets to each of one through 10 years of follow-up and assessed the c-statistic for each time point to follow the accuracy of each model out to longer and longer prediction windows.
We broke down the median risk score in each risk group for the Mayo and A-O models and calculated the proportion of the risk score attributable to each individual predictor. We compared 3 or more groups of continuous variables using the Kruskal-Wallis test. All calculations were done using Stata version 13.0 (StataCorp, College Station, TX). The protocol of the study was approved by the institutional review and/or research ethics board of each collaborating institution.
The Revised Mayo Clinic Model
The Mayo model was designed to report 4-year outcomes. Overall, the Mayo model offered good discrimination of 4-year outcomes with a c-statistic of 0.83. Predicted versus observed survival with native liver (SNL) was similar in low-, medium-, and high-risk groups at 1 year (99% vs 99%, 97% vs 98%, and 80% vs 79%, respectively), but more disparate at 4 years (98% vs 96%, 89% vs 79%, and 33% vs 47%, respectively). The low-, medium- and high-risk cutoffs created 3 distinct populations of patients with progressively worse outcomes, logrank P < 0.001 between all groups as shown in Figure 1. Most children were correctly stratified into the low-risk group.
Serum albumin and aspartate aminotransferase levels made up the majority of the risk score for each patient, whereas total bilirubin, patient age, and variceal hemorrhage history contributed very little to the risk score, as shown in Supplemental Figure 1 (Supplemental Digital Content, https://links.lww.com/MPG/B735). Each of the predictor variables varied significantly between groups as shown in Supplemental Table 1 (Supplemental Digital Content, https://links.lww.com/MPG/B735). Inflammatory bowel disease was most prevalent in low- versus medium- and high-risk groups: 80% versus 73% versus 52%, whereas autoimmune hepatitis was least prevalent in low- versus medium- and high-risk groups: 29% versus 39% versus 52%, respectively (both P < 0.001). Large duct disease was distributed evenly among risk groups.
The Amsterdam-Oxford Model
The A-O model was designed to report 15-year outcomes, but we had inadequate pediatric follow-up data to this time point and so followed it to a maximum of 10 years. Overall, the A-O model offered fair discrimination of 10-year outcomes with a c-statistic of 0.69. Predicted versus observed SNL was similar in low-, low-intermediate, and medium-risk groups, but disparate in the high-risk group at 1 year (100% vs 99%, 100% vs 98%, 100% vs 97%, 96% vs 90%, respectively), 5 years (97% vs 97%, 96% vs 94%, 94% vs 89%, 83% vs 66%, respectively), and 10 years (88% vs 93%, 84% vs 84%, 76% vs 74%, 61% vs 34%, respectively). The low, low-intermediate, medium-, and high-risk cutoffs created 4 distinct populations of patients with progressively worse outcomes, log-rank P < 0.001 between all groups as shown in Figure 2. The original model stratified 16%, 34%, 34%, and 16% of adult patients as low, low-intermediate, medium and high risk, respectively. Children were stratified with 19%, 9%, 14%, and 57% falling into these respective groups, over-classifying most as high risk.
Serum aspartate aminotransferase levels and platelet count made up the majority of the risk score for each patient, whereas total bilirubin, alkaline phosphatase, and albumin contributed little to the risk score, as shown in Supplemental Figure 2 (Supplemental Digital Content, https://links.lww.com/MPG/B735). Age and large duct phenotype were similar in all risk groups, whereas all of the laboratory-based predictors were significantly different as shown in Supplemental Table 2 (Supplemental Digital Content, https://links.lww.com/MPG/B735). Inflammatory bowel disease was equally prevalent in lower risk groups: 84% versus 77% versus 80% versus 74%, respectively, whereas autoimmune hepatitis was more prevalent in higher risk groups: 17% versus 31% versus 37% versus 38%, respectively.
The Boberg Model
The Boberg model was designed to report 1-year outcomes. The Boberg model provided excellent discrimination of 1-year outcomes, with a c-statistic of 0.87, making it generally accurate at predicting if an individual patient would require liver transplantation or not on the basis of his or her laboratory studies. The patient's bilirubin (median 0.6 [IQR 0.4–1.2]) made up the majority of the prognostic score, accounting for 80%. Serum albumin (median 4 [IQR 3.6–4.4]) and patient age (median 12y [IQR 8–15]) accounted for 10% each. The model was overly pessimistic in predicting SNL for the group however. We observed 24 deaths or liver transplants in the first year after diagnosis, whereas the Boberg model predicted that over 170 would have occurred. The observed versus predicted SNL at 1 year was 98% versus 78%, respectively.
We assessed the performance of each model to discriminate outcomes at each of 1 to 10 years after diagnosis, even though this was beyond the intended window for the Mayo and Boberg models. This is shown in Figure 3. The Mayo model was excellent at predicting need for transplant at 1 year, outperforming the other models (c-statistic 0.93 vs 0.87 vs 0.82 for the Mayo, Boberg, and A-O models, respectively). Despite the Mayo score being designed for outcomes up to 4 years, and the Boberg model designed for outcomes at 1 year, use of either score as a predictor outperformed the A-O model at every time point cutoff through 10 years. AST, platelet count, bilirubin, and albumin were most associated with outcomes and accounted for the bulk of each risk score, were used in each model. Overall the Mayo model provided the best discrimination at all points in follow-up.
We used a large dataset of pediatric-onset PSC cases to assess the validity of prognostic and risk stratification tools created for adult PSC patients. We showed that the Mayo model offered the best discrimination of outcomes up to 10 years. The Mayo and A-O models accurately estimated SNL in patients for 4–5 years after diagnosis. The Mayo model provided the best stratification to low-, medium- and high-risk groups. A large source of inaccuracy of the models appeared to be weighting of AST that did not take into account the high prevalence of autoimmune hepatitis in children.
AST level contributed the largest variance explained in calculating risk scores in the Mayo and A-O model, and in stratifying patients into higher risk groups. AST rises with extensive fibrosis and cirrhosis. Indeed, the AST to Platelet ratio index (APRI) is a useful surrogate marker of hepatic fibrosis in many liver diseases (30–32), including PSC (33,34). Although an important predictor of disease progression, the Mayo and A-O models do not take into account the high prevalence of features of AIH overlap in children. At least one-third of children with PSC are affected with AIH (12) compared with 0 to 5% of the adult cohorts (5,6,10) used to create these models. The median AST at diagnosis in children with PSC-AIH overlap was 290 U/L, yet most of these children had an uncomplicated clinical course, with a 5-year SNL of 90% (12). The large number of children with marked elevations of AST that are unrelated to fibrosis, and which do not imply a negative prognosis, is the largest source of inaccuracy in prediction and risk stratification in these models.
It may seem remarkable that the models provide reasonable discrimination of outcomes at all, given the derivation and validation cohorts range in median age from 36 to 45 years old, and the median child in our cohort is only 12. Despite differing prevalence of complications at diagnosis of PSC, disease progression to new adverse liver events is similar between children and adults, occurring consistently in approximately 4% of patients each year. Cholangiocarcinoma is more common in adult patients; however, who may have decades of disease duration and potential for hepatobiliary inflammation to progress to dysplasia and cancer. The higher rate of cholangiocarcinoma in adults (and their associated high mortality) is likely a large source of inaccuracy when pediatric data are entered into these models. There are no known differences in the underlying pathogenesis of PSC in children as compared with adults. Other than patient age, the laboratory markers and phenotypic features included in each of the adult models have generally been shown to be useful predictors in children (12,13). It is likely that an optimized pediatric-specific model will include many of the same predictors, but will apply different weights to each. Bilirubin, platelet count, and serum albumin are strong candidates for a pediatric model.
The strength of this study was the large size of the validation cohort we utilized. The Pediatric PSC Consortium is the largest cohort of pediatric-onset PSC patients, and includes a diverse mix of secondary and tertiary referral centers. The weakness of the study is the retrospective nature of the Pediatric PSC Consortium data. This prevented a standardized diagnostic and therapeutic algorithm for each patient, and misclassification bias may be present. Although we were able to evaluate the most popular and user-friendly risk stratification models, were unable to evaluate all existing prognostic models because of lack of original histopathology and cholangiography data, and lack of subjective assessments of organomegaly in all patients.
In conclusion, we used the Pediatric PSC Consortium dataset to evaluate the validity of adult-derived prognostic models to predict clinical outcomes in children. The best discrimination, prediction, and risk-stratification was provided by the Mayo model. None of the models accounted for the high prevalence of features of autoimmune hepatitis overlap in children and the associated elevations of aminotransferase levels that are unrelated to cirrhosis. Total bilirubin, albumin, and platelet count are strong candidates for inclusion into a future pediatric-specific model. Weighting of predictors to account for the unique biochemical profile of children, is likely to yield more useful and accurate predictions and risk-stratification for pediatric-onset PSC.
The authors thank Drs Jason Yap and Reham Abdou for assistance with data collection.
1. Wiesner RH, Grambsch PM, Dickson ER, et al. Primary sclerosing cholangitis: natural history, prognostic factors and survival analysis. Hepatology
2. Farrant JM, Hayllar KM, Wilkinson ML, et al. Natural history and prognostic variables in primary sclerosing cholangitis. Gastroenterology
3. Dickson ER, Murtaugh PA, Wiesner RH, et al. Primary sclerosing cholangitis: refinement and validation of survival models. Gastroenterology
4. Broome U, Olsson R, Loof L, et al. Natural history and prognostic factors in 305 Swedish patients with primary sclerosing cholangitis. Gut
5. Kim WR, Therneau TM, Wiesner RH, et al. A revised natural history model for primary sclerosing cholangitis. Mayo Clin Proc
6. Boberg KM, Rocca G, Egeland T, et al. Time-dependent Cox regression model is superior in prediction of prognosis in primary sclerosing cholangitis. Hepatology
7. Ponsioen CY, Vrouenraets SM, Prawirodirdjo W, et al. Natural history of primary sclerosing cholangitis and prognostic value of cholangiography in a Dutch population. Gut
8. Tischendorf JJ, Hecker H, Kruger M, et al. Characterization, outcome, and prognosis in 273 patients with primary sclerosing cholangitis: a single center study. Am J Gastroenterol
9. Ponsioen CY, Reitsma JB, Boberg KM, et al. Validation of a cholangiographic prognostic model in primary sclerosing cholangitis. Endoscopy
10. de Vries EM, Wang J, Williamson KD, et al. A novel prognostic model for transplant-free survival in primary sclerosing cholangitis. Gut
11. Chapman R, Fevery J, Kalloo A, et al. Diagnosis and management of primary sclerosing cholangitis. Hepatology
12. Deneau MR, El-Matary W, Valentino PL, et al. The natural history of primary sclerosing cholangitis in 781 children: a multicenter, international collaboration. Hepatology
13. Valentino PL, Wiggins S, Harney S, et al. The natural history of primary sclerosing cholangitis in children: a large single-center longitudinal cohort study. J Pediatr Gastroenterol Nutr
14. Bjornsson E, Lindqvist-Ottosson J, Asztely M, et al. Dominant strictures in patients with primary sclerosing cholangitis. Am J Gastroenterol
15. Burak K, Angulo P, Pasha TM, et al. Incidence and risk factors for cholangiocarcinoma in primary sclerosing cholangitis. Am J Gastroenterol
16. Kornfeld D, Ekbom A, Ihre T. Survival and risk of cholangiocarcinoma in patients with primary sclerosing cholangitis. A population-based study. Scand J Gastroenterol
17. Bergquist A, Ekbom A, Olsson R, et al. Hepatic and extrahepatic malignancies in primary sclerosing cholangitis. J Hepatol
18. Angulo P, Maor-Kendler Y, Lindor KD. Small-duct primary sclerosing cholangitis: a long-term follow-up study. Hepatology
19. Bjornsson E, Boberg KM, Cullen S, et al. Patients with small duct primary sclerosing cholangitis have a favourable long term prognosis. Gut
20. van Buuren HR, van Hoogstraten HJE, Terkivatan T, et al. High prevalence of autoimmune hepatitis among patients with primary sclerosing cholangitis. J Hepatol
21. Kaya M, Angulo P, Lindor KD. Overlap of autoimmune hepatitis and primary sclerosing cholangitis: an evaluation of a modified scoring system. J Hepatol
22. Mayo Foundation for Medical Education and Research. The revised natural history model for primary sclerosing cholangitis. https://www.mayoclinic.org/medical-professionals/model-end-stage-liver-disease/revised-natural-history-model-for-primary-sclerosing-chonalgitis
. 2000. Accessed 23 May 2019
23. PSC Expertise Centrum - Academic Medical Centre. Amsterdam-Oxford PSC Score calculator. https://www.amc.nl/web/leren/research-62/research/amsterdam-oxford-psc-score-calculator.htm
. Amsterdam, NL; 2017.
24. Mileti E, Rosenthal P, Peters MG. Validation and modification of simplified diagnostic criteria for autoimmune hepatitis in children. Clin Gastroenterol Hepatol
25. Arjas E. A graphical method for assessing goodness of fit in Cox's Proportional Hazards Model. J Am Stat Assoc
26. Bland JM, Altman DG. Survival probabilities (the Kaplan-Meier method). BMJ
27. Bland JM, Altman DG. The logrank test. BMJ
28. Harrell FE Jr, Califf RM, Pryor DB, et al. Evaluating the yield of medical tests. JAMA
29. Caetano SJ, Sonpavde G, Pond GR. C-statistic: a brief explanation of its construction, interpretation and limitations. Eur J Cancer
30. D'Souza RS, Neves Souza L, Isted A, et al. AST-to-platelet ratio index in non-invasive assessment of long-term graft fibrosis following pediatric liver transplantation. Pediatr Transplant
31. Joshita S, Umemura T, Ota M, et al. AST/platelet ratio index associates with progression to hepatic failure and correlates with histological fibrosis stage in Japanese patients with primary biliary cirrhosis. J Hepatol
32. McGoogan KE, Smith PB, Choi SS, et al. Performance of the AST-to-platelet ratio index as a noninvasive marker of fibrosis in pediatric patients with chronic viral hepatitis. J Pediatr Gastroenterol Nutr
33. Vesterhus M, Hov JR, Holm A, et al. Enhanced liver fibrosis score predicts transplant-free survival in primary sclerosing cholangitis. Hepatology
34. de Vries EMG, Farkkila M, Milkiewicz P, et al. Enhanced liver fibrosis test predicts transplant-free survival in primary sclerosing cholangitis, a multi-centre study. Liver Int