Secondary Logo

Journal Logo

ARTICLE: LIVER

Comparative Performance of 14 HCC Prediction Models in CHB: A Dynamic Validation at Serial On-Treatment Timepoints

Wu, Shanshan PhD1; Zhou, Jialing MS2; Wu, Xiaoning PhD2; Sun, Yameng PhD2; Wang, Bingqiong PhD2; Kong, Yuanyuan PhD1; Zhan, Siyan PhD3; Jia, Jidong PhD1,2; Yang, Hwai-I PhD4,5,6,7; You, Hong PhD1,2

Author Information
The American Journal of Gastroenterology: September 2022 - Volume 117 - Issue 9 - p 1444-1453
doi: 10.14309/ajg.0000000000001865

Abstract

INTRODUCTION

Chronic hepatitis B (CHB) virus infection remains leading cause of hepatocellular carcinoma (HCC) worldwide (1). To date, several HCC risk prediction models exist for risk stratification, including earlier models developed in untreated patients, recent models in treated patients, and other models based on mixed treatment status (2–13). However, given heterogenous crucial case-mix of these models, it is always difficult for clinicians to decide optimal model applied for individual patients. We previously evaluated comparative performance of these models based on baseline values before antiviral treatment initiation in treatment-naive patients (14).

Nevertheless, in the era of antiviral treatment, nearly all patients have received antiviral therapy for a while. In this context, it is important and urgent to generalize risk scores from using baseline (before treatment) to on-treatment values in routine clinical practice. Because most models were developed using data before treatment initiation, whether these risk scores could achieve comparable predictability with reassessment during therapy remained unclear.

So far, few studies examined this topic, given scarcity of a regular follow-up cohort with treatment-naive patients. Only few earlier models built in untreated/mixed treatment patients were validated (15–17), whereas most recent models built in treated patients have not been examined yet (18). Even within those validation studies, insufficient sample size and limited follow-up timepoints during therapy could lead to conflicting evidence and relatively low statistical power (15–17). Furthermore, there is lack of evidence comparing performance across all models with head-to-head validation in the same cohort through on-treatment data at different timepoints.

Therefore, we aimed to assess comparative performance of 14 HCC prediction models in CHB patients using on-treatment values at different timepoints, based on a prospective regular follow-up cohort with treatment-naive patients undergoing antiviral therapy.

METHODS

Identification of published HCC prediction models

HCC prediction models in CHB were systematically searched in our previous study through keywords “HCC,” “hepatitis B,” “prognostic,” “prediction model,” etc (14). Model development study presenting formal model to provide individualized HCC prediction in CHB patients was included. Overall, 14 models were identified, including 4 in untreated patients (GAG-HCC, NGM1-HCC, NGM2-HCC, and REACH-B), 7 in treated patients (mREACH-BI, mREACH-BII, PAGE-B, mPAGE-B, AASL-HCC, CAMD, and REAL-B), and 3 contained patients with mixed treatment status (CU-HCC, LSM-HCC, and RWS-HCC) (2–13). Of all, PAGE-B (11) was developed in Caucasians, and other 13 models (2–7,9–13) were developed in Asians. Regarding cirrhosis, REACH-B (7) was based on noncirrhotic population, whereas other 13 models (2,4–13) were developed in populations with different cirrhosis proportion (15%–47%). Specific predictors and categories, derived scores, risk formulas, and cutoffs of each model were shown in Appendix 1, Supplementary Digital Content 1, https://links.lww.com/AJG/C553.

Study population

The study was based on a multicenter prospective cohort, conducted in 22 hospitals in China (Beijing Friendship Hospital [BFH] cohort), comprising 986 treatment-naive CHB patients aged 18–65 years enrolled between 2013 and 2015 (19). All patients receiving entecavir 0.5 mg per-day–based treatment were assessed for demographic characteristics, liver function test, HBV-DNA, hepatitis B e antigen (HBeAg), α-fetoprotein (AFP), liver stiffness measurement (LSM), liver ultrasonography, contrast-enhanced computed tomography (CT), and/or magnetic resonance imaging (MRI) at baseline and every 26 weeks during follow-up. Liver biopsies were performed in eligible patients at baseline and week 78 to evaluate fibrosis stage. Diagnosis of cirrhosis at baseline was detailed everywhere (19).

HCC ascertainment

Regarding HCC diagnosis, dynamic imaging including CT/MRI was performed for characterization at detection of newly developed hepatic nodule, based on recommendations of American Association for the Study of Liver Diseases (20). We set end of follow-up for HCC occurrence as January 1, 2020.

Variables for model validation

All predictors of eligible models were available in the BFH cohort (see Appendix 1, Supplementary Digital Content 1, https://links.lww.com/AJG/C553). On-treatment values of each predictor at week 26, 52, 78, and 104 were used separately to calculate risk scores, including laboratory variables (HBeAg, HBV-DNA, PLT, ALB, ALT, total bilirubin [TBIL], AFP, and LSM), demographic variables (age and sex), medical history (family history of HCC and diabetes), lifestyle factor (alcohol drinking), and cirrhosis status.

Regarding cirrhosis at week 26, 52, 78, and 104 after treatment initiation, patients who had same stage of liver biopsy between baseline and week 78 were considered with same stage at week 26 and 52 as well. Of which, liver biopsy with Metavir fibrosis score equal to 4 was defined as cirrhosis. For other patients whose biopsy could not be performed, it should meet 2 of following 4 criteria at week 26, 52, 78, and 104: (i) ultrasonography or CT/MRI indicated imaging changes in liver morphology, including nodules in hepatic parenchyma, serrated change on liver surface, spleen thickness>4.0 cm, or width of portal vein>1.2 cm; (ii) Aspartate Aminotransferase-to-Platelet Ratio Index (APRI) > 2; (iii) Fibrosis 4 index (FIB-4) > 3.25; and (iv) LSM > 8 kPa.

Statistical analysis

Kaplan-Meier method was used to calculate cumulative incidence of HCC. Univariable Cox regression was performed to assess impact of each score at different on-treatment timepoints on HCC development, with per 10% change of each score to ensure comparability of different scores. Based on current follow-up of the BFH cohort, time-horizon prediction for all models was restricted to 3 years (i.e, week 26, 52, 78, and 104 was considered as baseline, respectively. Therefore, the whole evaluated period was from week 26 to 182, week 52 to 208, week 78 to 234, and week 104 to 260, respectively).

Model discrimination was assessed using time-dependent area under the receiver operating characteristic curve (AUC) with 95% confidence interval (CI) through inverse probability of censoring weighting, followed by head-to-head comparison among all models. Benjamini and Hochberg method, one of common false discovery rate methods, was used to justify P values of multiple testing. Calibration plots and Brier scores (BSs) at each timepoint were evaluated. However, only 4 models (REACH-B, REAL-B, mPAGE-B, and CAMD) were available to assess calibration with available disease-free probability at specific timepoint.

To further assess model's performance by established cutoffs using on-treatment values, we validated cutoffs through calculating common diagnostic accuracy measures. In addition, efficiency and failure rate were calculated from clinical practice point of view. Efficiency was defined as proportion of patients in the whole cohort stratified to the low-risk group. Failure rate was defined as proportion of patients in the low-risk group ultimately developing HCC later, calculating through the Kaplan-Meier method. Difference of failure rate with 95% CI was calculated by bootstrap of 1,000 samples.

Few laboratory variables (HBeAg, LSM, AFP, HBV-DNA, ALB, PLT, TBIL, and ALT) were missing in ≥5% of patients. To rule out impact of sample size on comparative performance, multiple imputation was conducted with all 8 variables and generating 5 imputed data sets, and model performance was finally estimated using Rubin rule.

All analyses were conducted using SAS Version 9.4 and R version 4.0.3 (timeROC, ipred, ggplot2, and forestplot package).

RESULTS

Baseline characteristics

Among 986 treatment-naive patients undergoing entecavir therapy, mean age at enrollment was 43.7 years (SD: 11.1), and 74.5% were men. Overall, 67.2% had cirrhosis, 48.5% were with positive HBeAg, and 13.0% had a family history of HCC. Mean HBV DNA level before treatment was 5.4 log IU/mL (SD: 1.9). Median ALT and PLT values were 58.9 U/L and 116 109/L, respectively.

During median 4.7-year follow-up, 56 cases developed HCC. The 5-year cumulative incidence of HCC was 7.5%. Totally, 51, 40, 37, and 33 HCC occurred starting from 26-, 52-, 78-, and 104-week follow-up after treatment, respectively. The 3-year cumulative incidence in patients starting from 26-, 52-, 78-, and 104-week follow-up after treatment was 4.22%, 4.02%, 4.86%, and 5.53%, separately.

Generally, HBV DNA, biochemical indexes, AFP, and LSM (15.9 kPa at baseline to 11.8 kPa at week 26) were dramatically improved during 26-week treatment, with slightly ameliorated thereafter. Proportions of HBeAg-positive and cirrhosis decreased, with APRI (1.26 at baseline to 0.53 at week 26) and FIB-4 (2.42 at baseline to 1.85 at week 26) rapid decline during first 26-week treatment. Accordingly, most scores dramatically declined during 26-week treatment and slightly decreased thereafter, except for PAGE-B (Table 1, see Appendix 2, Supplementary Digital Content 1, https://links.lww.com/AJG/C553).

T1
Table 1.:
Baseline characteristics of different on-treatment timepoints in the BFH validation cohort

Model discrimination

Most risk scores, either 26-, 52-, 78-, or 104-week on-treatment scores, were significantly associated with HCC development (P values < 0.05), except for NGM2-HCC model (Table 2). Moreover, HRs for most scores were highest using 26-week on-treatment values, whereas slightly decreased when using 52-, 78-, or 104-week on-treatment values. Among all 14 models, REAL-B [1.90 (95% CI: 1.56–2.32), 1.80 (1.42–2.26), 1.80 (1.42–2.26), and 1.65 (1.31–2.07)] had highest HRs in 4 on-treatment timepoints, followed by mPAGE-B, CAMD, AASL-HCC, mREACH-BII, and GAG-HCC.

T2
Table 2.:
Hazard ratios of 14 HCC risk prediction model scores based on different on-treatment values in BFH validation cohort

Generally, discrimination using on-treatment values within first 2 years was acceptable for most models (3-year AUCs ranging from 0.68 to 0.81), except for REACH-B, NGM1-HCC, NGM2-HCC, and PAGE-B, although AUCs slightly decreased when using on-treatment values from week 26 to 104 (Figure 1, see Appendix 3, Supplementary Digital Content 1, https://links.lww.com/AJG/C553). Of these, REAL-B exhibited highest discrimination (26-week AUC: 0.81 [95% CI: 0.73–0.88], 52-week: 0.73 [0.64–0.83], 78-week: 0.76 [0.67–0.85], and 104-week: 0.74 [0.66–0.82]), followed by CAMD (26-week: 0.80 [0.73–0.86], 52-week: 0.73 [0.65–0.82], 78-week: 0.73 [0.65–0.82], and 104-week: 0.72 [0.63–0.80]) and GAG-HCC (26-week: 0.80 [0.71–0.89], 52-week: 0.73 [0.63–0.82], 78-week: 0.73 [0.63–0.83], and 104-week: 0.71 [0.62–0.80]). AASL-HCC, LSM-HCC, mPAGE-B, and mREACH-BII also showed high discrimination with AUCs ranging from 0.76 to 0.79, 0.72 to 0.76, 0.70 to 0.72, and 0.71 to 0.72 using 26-, 52-, 78-, and 104-week on-treatment scores, respectively. Particularly, models developed in treated patients achieved higher discrimination ranging from 0.75 (PAGE-B) to 0.81 (REAL-B), 0.67 (PAGE-B) to 0.73 (REAL-B), 0.65 (PAGE-B) to 0.76 (REAL-B), and 0.64 (PAGE-B) to 0.74 (REAL-B) at 4 timepoints, respectively, whereas models developed in untreated patients showed lower AUCs ranging from 0.62 to 0.72, 0.64 to 0.71, 0.58 to 0.67, and 0.55 to 0.64 at 4 timepoints, except for GAG-HCC achieving good discrimination (0.71–0.80). Similar results were obtained either in cirrhotic or noncirrhotic patients, as well as patients without diabetes, although CIs were wide for noncirrhotic subgroup (see Appendix 4, Supplementary Digital Content 1, https://links.lww.com/AJG/C553).

F1
Figure 1.:
Three-year area under the receiver operating characteristic curve of 14 HCC risk prediction models using on-treatment values at different timepoints. Discrimination was generally acceptable for all models except for REACH-B and NGM-HCC. Specifically, REAL-B, AASL-HCC, CAMD, mREACH-BII, mPAGE-B, and GAG-HCC model showed higher discrimination. HCC, hepatocellular carcinoma.

Head-to-head comparison (see Appendix 5, Supplementary Digital Content 1, https://links.lww.com/AJG/C553) indicated AUCs of REACH-B, NGM1-HCC, NGM2-HCC, CU-HCC, and mREACH-BI were significantly lower than other models (REAL-B, CAMD, mREACH-BII, AASL-HCC, LSM-HCC, and GAG-HCC) when using on-treatment values either at week 26, 52, 78, or 104 (P values < 0.05); however, after adjustments of multiple testing, only AUC of NGM2-HCC was significantly lower than other models.

Model calibration

REAL-B and CAMD calibrated well with Brier score ranging from 0.037 to 0.052, whereas REACH-B underestimated HCC risk with BS ranging from 0.040 to 0.055, and mPAGE-B seemed overestimating HCC risk (BS from 0.041 to 0.055). Overall, BS of each model slightly increased when using on-treatment values from week 26 to 104, indicating poorer calibration accompanied by longer duration of antiviral treatment (Table 3, see Appendix 6, Supplementary Digital Content 1, https://links.lww.com/AJG/C553). Results were similar in patients with cirrhosis, noncirrhosis, or nondiabetes (see Appendix 7, Supplementary Digital Content 1, https://links.lww.com/AJG/C553).

T3
Table 3.:
Three-year calibration of REACH-B, mPAGE-B, CAMD, and REAL-B models using on-treatment values at different timepoints from baseline

Predictive accuracy and cumulative HCC incidence in each risk category based on model cutoffs

Overall, 9 models provided score cutoffs for clinical application, stratifying population to low, intermediate, or high risk. Irrespective of on-treatment values at any timepoint, most models discriminated evidently among different risk categories, particularly REAL-B (3-year HCC incidence 0.40%–1.48%, 3.76%–7.62%, and 14.97%–21.55% in the low-, intermediate-, and high-risk group), AASL-HCC (0.85%–1.43%, 3.33%–8.11%, and 15.04%–16.52%), CAMD (0.89%–2.26%, 3.27%–6.40%, and 10.29%–13.89%), and mPAGE-B (0.76%–1.56%, 3.60%–7.77%, and 8.70%–10.44%). However, CU-HCC and PAGE-B could not differentiate clearly with relatively low incidence in the high-risk and intermediate-risk group (Figure 2, see Appendix 8, Supplementary Digital Content 1, https://links.lww.com/AJG/C553).

F2
Figure 2.:
K-M cumulative incidence of low/intermediate/high-risk patients according to REAL-B, AASL-HCC, CAMD, and mPAGE-B models using on-treatment values at different timepoints. Irrespective of on-treatment values at any timepoint, most models discriminated evidently among risk categories based on established cutoffs, particularly REAL-B, AASL-HCC, CAMD, and mPAGE-B. HCC, hepatocellular carcinoma.

Either using on-treatment values at week 26, 52, 78, or 104, most models achieved high sensitivity (around 80%–90%) except for GAG-HCC (12.1%–42.0%), RWS-HCC and CU-HCC (around 40%–60%), whereas nearly all models showed 30%–50% specificity with exception of GAG-HCC (93.1%–97.3%) and RWS-HCC (77.2%–92.4%) (Table 4). More importantly, among all prediction models, REAL-B achieved highest NPV during first 2 years (99.2%, 98.6%, 99.1%, and 99.6%, respectively), followed by AASL-HCC (99.2%, 99.0%, 99.0%, and 98.8%, respectively) and mPAGE-B (98.7%, 98.7%, 99.2% and 98.5%, respectively).

T4
Table 4.:
Predictive accuracy measures with 95% CI of 9 prediction models validated in BFH validation cohort based on different on-treatment values

Regarding model efficiency, REAL-B, AASL-HCC and mPAGE-B could identify 30%–40% of patients as low risk with minimal failure rate (0.40% [REAL-B]–1.56% [mPAGE-B]) during study period (Table 4, Appendix 9, Supplementary Digital Content 1, https://links.lww.com/AJG/C553). By contrast, GAG-HCC exhibited significantly highest failure rate (2.71%–5.01%), although it achieved high efficiency (89.7%–94.0%). PAGE-B seemed to afford lowest efficiency (16.0%–21.0%) with comparable failure rate (1.29%–1.86%), whereas CAMD could identify large proportion of low-risk patients (40.8%–47.7%) with slightly higher failure rate compared with REAL-B (0.89%–2.26%).

DISCUSSION

We examined 14 published HCC models in CHB and compared predictive performance in a representative cohort with reassessment of scores through on-treatment values during first 2 years. Overall, most models achieved acceptable discrimination, except for REACH-B and NGM-HCC. Specifically, REAL-B, AASL-HCC, CAMD, mREACH-BII, mPAGE-B, and GAG-HCC showed higher discrimination with evident differentiation among 3 risk categories defined by established cutoffs. Moreover, REAL-B, AASL-HCC, and mPAGE-B could identify 30%–40% of patients as low risk with minimal failure rate, whereas CAMD could identify ≥40% of low-risk patients with slightly higher failure rate compared with REAL-B. In addition, both REAL-B and CAMD calibrated well using on-treatment values.

To the best of our knowledge, this is the first study to comprehensively evaluate all HCC prediction models' performance using on-treatment values at different timepoints, with head-to-head comparison in a totally treatment-naive cohort with long-term regular follow-up, which is helpful for clinicians to counsel patients undergoing antiviral treatment of HCC risk and further determine candidacy for surveillance. Not only discrimination and calibration but also efficiency and failure rate according to suggested cutoffs were compared, which reflected the effect on daily clinical practice and were useful for guiding clinicians to choose best models.

Generally, most models with reassessment after 0.5–2 years of antiviral therapy could offer acceptable discrimination, particularly those developed in treated/mixed treatment status patients. Consistent with our results, studies in Hong Kong and Japan with treated patients indicated CU-HCC and GAG-HCC achieved similar AUCs when estimated at baseline and 2 years of therapy (15,16). Another South Korea study demonstrated acceptable dynamic performance of CU-HCC, REACH-B, LSM-HCC, and mREACH-B, with scores either determined at baseline or 14 months later, of which mREACH-B performed better (17). PAGE-B and mPAGE-B could offer similar good AUCs when assessed at baseline and year 2 on-treatment (18). Although antiviral therapy greatly improved/altered several parameters (21–23), including hepatitis activity and biochemical index, significant difference on PLT, LSM, and AFP persistently existed during first 2 years between HCC and non-HCC groups (see Appendix 2, Supplementary Digital Content 1, https://links.lww.com/AJG/C553). Because cirrhosis was the strongest HCC predictor so far (24,25), it is no wonder models incorporated parameters related to severity of fibrosis/cirrhosis (PLT, LSM, and cirrhosis), including REAL-B, AASL-HCC, CAMD, mPAGE-B, mREACH-BII, and GAG-HCC, could achieve better performance with reassessment during therapy.

Nevertheless, another important aspect for clinical application of HCC risk scores might be ability to identify patients who do not require HCC surveillance, considering limited healthcare resources (25,26). Therefore, absolute size/proportion of the low-risk group and NPV/failure rate of each model is of considerable value. Currently, according to several guidelines (27–30), HCC surveillance would be considered as cost-effective if annual HCC incidence ≥0.2% and 1.5% in noncirrhotic and cirrhotic patients, respectively. Thus, it means 3-year NPV should be ≥ 99.4% for noncirrhotic and 95.5% for cirrhotic patients with failure rate ≤0.6% for noncirrhotic and 4.5% for cirrhotic patients, respectively. Our results suggested REAL-B, AASL-HCC, and mPAGE-B could identify 30%–40% of patients as low risk with acceptable failure rate, particularly REAL-B. Therefore, future guidelines for HCC surveillance program should incorporate these models in daily clinical practice irrespective of treatment duration to rule out candidacy, thereby achieving optimal cost-effectiveness.

Meanwhile, HCC risk prediction models could also be useful for identification of high-risk patients because they would be benefit from rigorous monitoring (25). Therefore, PPV and HCC incidence in the high-risk group are critically important as well. According to EASL guideline, patients with annual HCC risk exceeding 3%–5% were considered as high risk and recommended rigorous surveillance (30). Our study suggested REAL-B, AASL-HCC, mPAGE-B, and CAMD using on-treatment values still achieved nearly ≥3% annual HCC incidence in high-risk patients identified by corresponding cutoffs. Thus, these 4 models could be of considerable value for clinical application, thereby improving long-term prognosis of high-risk patients receiving antiviral therapy.

In addition to identified proportions of high-risk and low-risk patients, availability of each model's predictors and corresponding cost as well as cost utility should be taken into account when considering model application in routine clinical practice. Only models containing available, affordable, and accurate predictors might be appropriate for daily clinical practice. Further model impact studies incorporating cost-utility analysis and HCC screening interval with patients' preferences are needed to assess potential clinical utility for these models.

Besides, generalizability of our findings to non-Asians (i.e., white and African) should be taken with caution because all except one of HCC risk scores were developed in Asians. Owing to multiple confounding factors, including differences in time and acquisition of HBV, HBV genotype, concurrent NAFLD or alcoholic liver disease, and concurrent metabolic disease risk factors, disease progression may be varied and may ultimately affect model performance (31–33). Compared with GALAD score (calculated from sex, age, AFP, AFP-L3, and DCP) developed as diagnostic biomarker for HCC/early stage HCC, rather than predictive score of HCC development, the evaluated 14 risk scores were used to predict later 3/5-year HCC occurrence (34–38). Hence, these 14 risk scores were very suitable to make risk stratification and further identify patients who might be really benefit from regular HCC surveillance. Accordingly, those identified high-risk patients could receive tests with high sensitivity (i.e., GALAD) as regular surveillance. Therefore, approaches combine patients risk stratification (by these 14 risk scores) and multiple candidate biomarkers (i.e., GALAD) may achieve optimal performance with cost benefit for HCC screening and surveillance. Nevertheless, the promising GALAD still requires validation in well-powered large phase III studies before routine clinical use, including both diagnostic and predictive performance.

However, several limitations should be mentioned. First, on-treatment cirrhosis was defined by ourself, which might lead to misclassification. However, on-treatment cirrhosis definition varied across studies currently, and cutoffs selected in this study were based on previous high-quality research. Second, owing to small proportion of noncirrhosis (32.8%) and relatively low HCC incidence in the BFH cohort, we could not obtain sufficient statistical power in the noncirrhotic subgroup during limited follow-up timeframe. Thus, CIs in noncirrhotic patients were wide. Third, entecavir-based therapy in this cohort may limit the generalizability of our findings to tenofovir-treated population. Finally, we only assessed model performance with on-treatment values on predicting 3-year HCC occurrence because of limited totally 5-year follow-up data. Thus, results should be interpreted with caution. Further external validation studies with longer follow-up are needed to confirm this finding.

In summary, in this undergoing antiviral treatment CHB cohort, most HCC prediction models performed well even using on-treatment values during first 2 years, particularly REAL-B, AASL-HCC, CAMD, and mPAGE-B. This could provide useful information in risk model application for clinical decision-making in the era of widely antiviral therapy. Future guidelines may incorporate these findings to assess potential clinical utility for risk stratification and HCC surveillance program.

CONFLICTS OF INTEREST

Guarantor of the article: Hong You, MD, PhD.

Specific author contributions: H.Y., H.I.Y., and S.S.W. designed the study and drafted the manuscript. J.L.Z., X.N.W., Y.M.S., and B.Q.W. followed up participants in the BFH validation cohort, J.L.Z. and X.N.W. verified the data, H.I.Y. and S.S.W. analyzed the data. J.J.D., H.Y., and H.I.Y. revised the manuscript. S.Y.Z. and Y.Y.K. interpreted the results, incorporated comments for the coauthors, and finalized the manuscript. All authors approved the final version of the paper.

Financial support: Funded by Beijing Natural Science Foundation Program (7194255), National Science and Technologies Major Project (2018ZX10302204 and 2017ZX10203202-003).

Potential competing interests: None to report.

Ethical approval: The BFH validation cohort was approved by the Institutional Research Board of the Beijing Friendship Hospital, Capital Medical University (approval number BJFH-EC/2013-029), and all patients provided written inform content.

Data availability statement: Additional data for the eligible studies are available on request from the corresponding author at [email protected].

Study Highlights

WHAT IS KNOWN

  • ✓ To identify chronic hepatitis B patients at increased risk of developing hepatocellular carcinoma (HCC) would provide substantial benefit for clinical practice. However, whether these risk scores could achieve comparable predictability during therapy, particularly at different on-treatment timepoints, remains unclear.
  • ✓ To the best of our knowledge, this is the first study to comprehensively evaluate all HCC risk prediction models' performance using on-treatment values at different timepoints in a totally treatment-naive cohort with every 26-week follow-up.

WHAT IS NEW HERE

  • ✓ A total of 14 HCC risk prediction models were validated using on-treatment values at week 26, 52, 78, and 104, respectively.
  • ✓ Most HCC prediction models performed well even using on-treatment values during first 2 years, particularly REAL-B, AASL-HCC, CAMD, and mPAGE-B model, with good discrimination and calibration as well as high efficiency.
  • ✓ Future guidelines may incorporate REAL-B, AASL-HCC, CAMD, and mPAGE-B models for risk stratification and HCC surveillance in the era of widely antiviral therapy.

REFERENCES

1. World Health Organization. Global Hepatitis Report 2017, 2017 (https://www.who.int/hepatitis/publications/global-hepatitis-report2017/en/). Accessed August 10, 2021.
2. Yuen MF, Tanaka Y, Fong DY, et al. Independent risk factors and predictive score for the development of hepatocellular carcinoma in chronic hepatitis B. J Hepatol 2009;50(1):80–8.
3. Yang HI, Sherman SuJ, et al. Nomograms for risk of hepatocellular carcinoma in patients with chronic hepatitis B virus infection. J Clin Oncol 2010;28(14):2437–44.
4. Wong VW, Chan SL, MO F, et al. Clinical scoring system to predict hepatocellular carcinoma in chronic hepatitis B carriers. J Clin Oncol 2010;28(10):1660–5.
5. Yang HI, Yuen MF, Chan HL, et al. Risk estimation for hepatocellular carcinoma in chronic hepatitis B (REACH-B): Development and validation of a predictive score. Lancet Oncol 2011;12(6):568–74.
6. Wong GL, Chan HL, Wong CK, et al. Liver stiffness-based optimization of hepatocellular carcinoma risk score in patients with chronic hepatitis B. J Hepatol 2014;60(2):339–45.
7. Lee HW, Yoo EJ, Kim BK, et al. Prediction of development of liver-related events by transient elastography in hepatitis B patients with complete virological response on antiviral therapy. Am J Gastroenterol 2014;109(8):1241–9.
8. Papatheodoridis G, Dalekos G, Sypsa V, et al. PAGE-B: A risk score for hepatocellular carcinoma in Caucasians with chronic hepatitis B under a 5-year entecavir or tenofovir therapy. J Hepatol 2016;64(4):800–6.
9. Poh Z, Shen L, Yang HI, et al. Real-world risk score for hepatocellular carcinoma (RWS-HCC): A clinically practical risk predictor for HCC in chronic hepatitis B. Gut 2016;65(5):887–8.
10. Kim JH, Kim YD, Lee M, et al. Modified PAGE-B score predicts the risk of hepatocellular carcinoma in Asians with chronic hepatitis B on antiviral therapy. J Hepatol 2018;69(5):1066–73.
11. Hsu YC, Yip TC, Ho HJ, et al. Development of a scoring system to predict hepatocellular carcinoma in Asians on antivirals for chronic hepatitis B. J Hepatol 2018;69(2):278–85.
12. Yu JH, Suh YJ, Jin YJ, et al. Prediction model for hepatocellular carcinoma risk in treatment-naive chronic hepatitis B patients receiving entecavir/tenofovir. Eur J Gastroenterol Hepatol 2019;31(7):865–72.
13. Yang HI, Yeh ML, Wong GL, et al. Real-World effectiveness from the Asia pacific rim liver consortium for HBV risk score for the prediction of hepatocellular carcinoma in chronic hepatitis B patients treated with oral antiviral therapy. J Infect Dis 2020;221(3):389–99.
14. Wu SS, Zeng N, Sun F, et al. Hepatocellular carcinoma prediction models in chronic hepatitis B: A systematic review of 14 models and external validation. Clin Gastroenterol Hepatol 2021;19(12):2499–513.
15. Wong GL, Chan HL, Chan HY, et al. Accuracy of risk scores for patients with chronic hepatitis B receiving entecavir treatment. Gastroenterology 2013;144(5):933–44.
16. Tawada A, Chiba T, Saito T, et al. Utility of prediction scores for hepatocellular carcinoma in patients with chronic hepatitis B treated with nucleos(t)ide analogues. Oncology 2016;90(4):199–208.
17. Jeon MY, Lee HW, Kim SU, et al. Feasibility of dynamic risk prediction for hepatocellular carcinoma development in patients with chronic hepatitis B. Liver Int 2018;38(4):676–86.
18. Yip TC, Wong GL, Wong VW, et al. Reassessing the accuracy of PAGE-B-related scores to predict hepatocellular carcinoma development in patients with chronic hepatitis B. J Hepatol 2020;72(5):847–54.
19. Wu S, Kong Y, Piao H, et al. On-treatment changes of liver stiffness at week 26 could predict 2-year clinical outcomes in HBV-related compensated cirrhosis. Liver Int 2018;38(6):1045–54.
20. Bruix J, Sherman M, American Association for the Study of Liver Diseases. Management of hepatocellular carcinoma: An update. Hepatology 2011;53(3):1020–2.
21. Ono A, Suzuki F, Kawamura Y, et al. Long-term continuous entecavir therapy in nucleos(t)ide-naïve chronic hepatitis B patients. J Hepatol 2012;57(3):508–14.
22. Kaneko S, Kurosaki M, Tamaki N, et al. Tenofovir alafenamide for hepatitis B virus infection including switching therapy from tenofovir disoproxil fumarate. J Gastroenterol Hepatol 2019;34(11):2004–10.
23. Koike K, Suyama K, Ito H, et al. Randomized prospective study showing the non-inferiority of tenofovir to entecavir in treatment-naïve chronic hepatitis B patients. Hepatol Res 2018;48(1):59–68.
24. Abu-Amara M, Cerocchi O, Malhi G, et al. The applicability of hepatocellular carcinoma risk prediction scores in a North American patient population with chronic hepatitis B infection. Gut 2016;65(8):1347–58.
25. Voulgaris T, Papatheodoridi M, Lampertico P, et al. Clinical utility of hepatocellular carcinoma risk scores in chronic hepatitis B. Liver Int 2020;40(3):484–95.
26. Papatheodoridis GV, Voulgaris T, Papatheodoridi M, et al. Risk scores for hepatocellular carcinoma in chronic hepatitis B: A promise for precision medicine. Hepatology 2020;72(6):2197–205.
27. Terrault NA, Lok AS, McMahon BJ, et al. Update on prevention, diagnosis, and treatment of chronic hepatitis B: AASLD 2018 hepatitis B guidance. Hepatology 2018;67:1560–99.
28. Sarin SK, Kumar M, Lau GK, et al. Asian-pacific clinical practice guidelines on the management of hepatitis B: A 2015 update. Hepatol Int 2016;10:1–98.
29. European Association for the Study of the Liver. EASL 2017 Clinical Practice Guidelines on the management of hepatitis B virus infection. J Hepatol 2017;67:370–98.
30. European Association for the Study of the Liver. EASL clinical practice guidelines: Management of hepatocellular carcinoma. J Hepatol 2018;69:182–236.
31. Mittal S, Kramer JR, Omino R, et al. Role of age and race in the risk of hepatocellular carcinoma in veterans with hepatitis B virus infection. Clin Gastroenterol Hepatol 2018;16:252–9.
32. Liu CJ, Kao JH. Global perspective on the natural history of chronic hepatitis B: Role of hepatitis B virus genotypes A to. J Semin Liver Dis 2013;33:97–102.
33. Kao JH, Chen PJ, Lai MY, et al. Hepatitis B genotypes correlate with clinical outcomes in patients with chronic hepatitis B. Gastroenterology 2000;118:554–9.
34. Johnson PJ, Pirrie SJ, Cox TF, et al. The detection of hepatocellular carcinoma using a prospectively developed and validated model based on serological biomarkers. Cancer Epidemiol Biomarkers Prev 2014;23(1):144–53.
35. Singal AG, Tayob N, Mehta A, et al. GALAD demonstrates high sensitivity for HCC surveillance in a cohort of patients with cirrhosis. Hepatology 2022;75(3):541–9.
36. Parikh ND, Mehta A, Singal AG, et al. Biomarkers for the early detection of hepatocellular carcinoma. Cancer Epi Biomarkers Prev 2020;29(12):2495–503.
37. Best J, Bechmann LP, Sowa JP, et al. GALAD score detects early hepatocellular carcinoma in an international cohort of patients with nonalcoholic steatohepatitis. Clin Gastroenterol Hepatol 2020;18(3):728–35.e4.
38. Berhane S, Toyoda H, Tada T, et al. Role of the GALAD and BALAD-2 serologic models in diagnosis of hepatocellular carcinoma and prediction of survival in patients. Clin Gastroenterol Hepatol 2016;14(6):875–86. e6.

Supplemental Digital Content

© 2022 by The American College of Gastroenterology