Real-World Practice of Gastric Cancer Prevention and Screening Calls for Practical Prediction Models : Clinical and Translational Gastroenterology

Secondary Logo

Journal Logo


Real-World Practice of Gastric Cancer Prevention and Screening Calls for Practical Prediction Models

He, Siyi MS1,*; Sun, Dianqin MS1,*; Li, He MS1; Cao, Maomao MS1; Yu, Xinyang MS1; Lei, Lin MS2; Peng, Ji PhD2; Li, Jiang PhD1; Li, Ni PhD1; Chen, Wanqing PhD1

Author Information
Clinical and Translational Gastroenterology 14(2):p e00546, February 2023. | DOI: 10.14309/ctg.0000000000000546



Gastric cancer ranks fifth in the global cancer spectrum with an incidence rate of 14.0 per 100,000 and fourth in mortality with a rate of 9.9 per 100,000 (1). The prognosis of gastric cancer was poor, but it might be improved significantly when detected early (2,3). Although mass screening for gastric cancer has been conducted in countries with a high incidence, such as Japan and South Korea (4–8), in the circumstance that more health resources have been input to control coronavirus disease 2019, limited gastroscopies can be allocated more efficiently by exact risk prediction. Risk prediction models may also inform individual risks and finally contribute to the improvements in the attendance and compliance of cancer screening for high-risk groups (9). Meanwhile, the population assessed with nonhigh risk can avoid nosocomial infection, mental burden, and other physical injuries. Besides, risk stratification could facilitate primary prevention of gastric cancer, including Helicobacter pylori eradication and adoption of early interventions, which was also a valid way to reduce gastric cancer burden (10,11).

Till now, some risk prediction models for gastric cancer have been developed to support the risk-stratified strategy, differing in study design, statistical methods, and performance (12–15). It is unclear which of these models is high-quality, well-performed, and easy to use. Systematic reviews of prediction models for colorectal cancer (16,17), breast cancer (18), and lung cancer were available (19). Still, there were no corresponding reviews for gastric cancer, as far as we know. In this study, we aimed to systematically summarize the published risk prediction models for gastric cancer for the general population, map their characteristics, and assess the risk of bias (ROB) and applicability of the included models, so as to provide information for candidate selection of further practice of gastric cancer prevention and screening.


This systematic review was prospectively registered at the International Prospective Register of Systematic Reviews (registration number: CRD42021203804) and was conducted following the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (20). Supplementary Table S1 presents the key items to guide the framing of this review (see Supplementary Table S1, Supplementary Digital Content 1,

Search strategy

A systematic search for relevant publications was conducted in 2 electronic bibliographic databases (PubMed and EMBASE) from inception to August 1, 2021, without language restriction. Search strategies consisted of both free text words and MeSH/Emtree, and the details are provided in the Supplementary Material (see Supplementary Digital Content 1, Besides, the references and citing articles of all the articles eligible for inclusion were screened to ensure the comprehensiveness of the search.

Eligibility criteria

We included studies that met the following criteria: (i) published as an original article in a peer-reviewed journal; (ii) developing or validating a tool, score, or algorithm that could calculate individual relative or absolute risk, so as to perform risk stratification; (iii) including only incident gastric cancer as the outcome; (iv) presenting the area under the receiver-operating characteristic (AUC) curves; (v) applicable to asymptomatic individuals or population at average risk of gastric cancer; and (vi) published in English. For articles that reported more than 1 prediction model for gastric cancer, we selected only the model regarded as the primary outcome of the study (e.g., the enhanced model, but not the conventional model) or the one with best performance (e.g., the highest c-statistic).

Studies were excluded if they were (i) not population-based, such as those developed based on natural history, meta-analysis, or literature review; (ii) collecting data from patients with a definite diagnosis of gastric diseases; and (iii) models with less than 2 indicators. Because we intended to review models to be used to select high-risk individuals for endoscopy, we removed diagnosis models that included predictors derived from endoscopy, fluoroscopy, or gastric tissues. Still, prognostic models with endoscopy-derived variables were included.

Data extraction and quality assessments

According to the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies, 2 reviewers independently finished the article screening and conducted data extraction. Any disagreement was resolved by consensus discussion. For each eligible article, we collected information on study design; participants; and the development, validation, and evaluation of prediction models. To assess the quality of the included studies, we used the Prediction model Risk of Bias Assessment Tool (PROBAST) to evaluate the ROB and applicability of each prediction model through signaling questions in 4 domains of participants, predictors, outcome, and statistical analysis (applicability assessment focuses on the former 3 domains) (21).


In this review, 4,223 articles were identified and full texts of 127 articles were screened. Of these, 104 articles were removed because of irrelevant topic, lack of required data, unmatched participants, and language. A total of 28 articles met all the inclusion criteria, reporting 18 diagnostic models (12,13,22–37) and 10 prognostic models for risk prediction of gastric cancer (14,15,38–45). Figure 1 shows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram of study selection.

Figure 1.:
Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram of study selection.

Basic characteristics

In general, the prognostic models were basically developed from prospective researches while the diagnostic models were based on case-control studies or medical records. Of all the 18 diagnostic models, 13 models (72.2%) were developed on the Asian population (Table 1). Half were developed only, and 1 model conducted both internal and external validation. There was diversity in the sample size, ranging from 58 to 9,838. Five models (27.8%) focused on gastric adenocarcinomas, with 1 on noncardia adenocarcinoma. Besides, the difference of gender distribution was not negligible in several studies, especially between the case and control groups, while sex was not considered in the development of prediction models (13,15,22).

Table 1. - General characteristics of included studies
Study Type of study a Country/region Study period of baselineb Data source Sample size (events) Missing values (method) Sex (male, %) Age (mean ± SD), years Primary outcome
Diagnostic models
 Lee et al. (23) 2a South Korea 2005 Questionnaire 382 (183) None P: 65.0; C: 47.7 NI Gastric cancer
 Kaise et al. (29) 1a Japan P: 2007–2009
C: 2008–2009
Laboratory test 748 (187) None NI P: 64.3 ± 9.7
C: 52.3 ± 12.4
Gastric cancer
 Ahn et al. (13) 2a South Korea P: 2002–2003/2006–2007
C: 2004
Laboratory test D: 240 (120)
IV: 146 (95)
None P: 59; C: 28 P: 59.4 ± 11.1
C: 52.1 ± 6.6
Gastric adenocarcinomas
 Cho et al. (27) 1a South Korea P: 2006–2008
C: 2007–2010
Medical record 948 (474) None P: 65.0; C: 65.0 P: 52.6 ± 9.1
C: 52.9 ± 9.6
Gastric adenocarcinomas
 Yang et al. (36) 1a China 2011–2013 Laboratory test 426 (106) None P: 72.64; C: 62.5 P: 59.7 ± 13.4 Gastric cancer
 Zhu et al. (37) 2a China 2007–2011 Laboratory test D: 80 (40)
IV: 150 (48)
None D: P: 72.5; C: 72.5
IV: P: 72.9; C: 70.6
D: P: 53.83 ± 10.34; C: 53.55 ± 10.11
IV: P: 56.63 ± 10.37; C: 54.03 ± 10.45
Gastric noncardia adenocarcinoma
 Kucera et al. (24) 1a Czech 2013–2015 Laboratory test 105 (36) None NI P: 65.2; C: 63.6 Gastric cancer
 Tong et al. (35) 2a China 2008–2010 Laboratory test D: 418 (228)
IV: 95 (48)
None P: 71.93; C: 63.16 P: 59.82 ± 11.32
C: 59.15 ± 9.27
Primary gastric adenocarcinoma
 In et al. (22) 1a United States NI Questionnaire 140 (90) NI P: 50.0; C: 24.0 NI Gastric cancer
 Wang et al. (25) 3a China 2013–2015 Laboratory test D: 558 (279)
EV: 327 (186)
None D: 74.9; EV: 73.7 D: 58.7 ± 12.0
EV: 58.8 ± 11.6
Gastric cancer
 Cai et al. (12) 3a China 2015–2017 Questionnaire + laboratory test D: 9,838 (267)
EV: 5,091 (138)
Yes (delete) D: 49.63; EV: 49.77 D: 56.2 ± 9.6
EV: 56.3 ± 9.7
Gastric cancer
 Dong et al. (28) 1a China 2016–2017 Laboratory test 150 (119) None P: 74.79 P: range (23–82) Gastric cancer
 In et al. (26) 1a United States NI Questionnaire 14 0 (40) Yes (subgroup) P: 50.0; C: 24.0 NI Gastric cancer
 Kong et al. (31) 1a China 2016–2017 Questionnaire + laboratory test 1,017 (474) None P: 43.88; C: 47.88 P: 58.00 ± 6.98
C: 57.41 ± 5.50
Gastric cancer
 Liu et al. (33) 1a United States 2017 Public domain 407 (375) None P: 37.33 P: 64.92 ± 10.65 Stomach adenocarcinoma
 Kim et al. (30) 2a South Korea NI Laboratory test D: 484 (69)
IV: 207 (30)
None NI P: 61 ± 11.0 (27–88)
C: 57 ± 8.8 (38–79)
Stomach cancer
 Lee et al. (32) 4a South Korea 2012–2015 Laboratory test D: 85 (54)
EV: 58 (35)
None D: 61.18; EV: 81.03 D: P: 55; C: 48
EV: P: 59; C: 54
Gastric cancer
 Song et al. (34) 2a Poland 1994–1996 Laboratory test 200 (100) None 61 65 Stomach cancer (ICD-O 151 or ICD-O-2 C16)
Prognostic models
 Shikata et al. (40) 1a Japan 1988 Questionnaire + medical record 2,446 (69) Yes (delete) 41.5 57.3 ± 11.4 Gastric cancer
 Eom et al. (41) 3a South Korea 1996–1997 Questionnaire + medical record Yes (imputation) Model for males and females, respectively D: P: 45.08 ± 10.47; C: 48.7 ± 11.0
EV: P: 46.83 ± 12.80; C: 51.08 ± 12.05
Gastric cancer (C16)
 Charvat et al. (15) 2a Japan 1993–1994 Questionnaire + laboratory test 19,028 (412) Yes (delete) P: 61.9; C: 35.7 P: 63.3
C: 59.3
Gastric cancer (C160-C169)
 Ikeda et al. (38) 1a Japan 1988 Questionnaire + medical record + laboratory test 2,446 (123) Yes (delete) 41.5 58.3 ± 11.4 Gastric cancer
 Iida et al. (14) 3a Japan 1988–2002 Questionnaire + laboratory test D: 2,444 (90)
EV: 3,204 (35)
Yes (delete) D: 41.6; EV: 42.1 D: 58 ± 11
EV: 62 ± 13
Gastric cancer
 Taninaga et al. (44) 2a Japan 2006–2017 Medical record D: 1,144 (74)
IV: 287 (15)
None P: 84.2; C: 77.6 P: 56.7 ± 8.8
C: 46.2 ± 1.0
Gastric cancer
 Charvat et al. (42) 5a Japan 1990–1993 Questionnaire + laboratory test 1,292 (27) None 34.1 56.52 ± 5.78 Gastric cancer (C160-C169)
 Jang et al. (39) 1a South Korea 1993–2004 Questionnaire + laboratory test 476 (238) Yes (delete) 41.01 53.50 ± 10.23 Gastric cancer
 Sarkar et al. (43) 1a United States 2015–2016 Questionnaire 140 (40) None 31.4 NI Gastric cancer
 Trivanovic et al. (45) 1a Croatia NI Laboratory test 116 (25) None 60.3 68.34 ± 13.93 Gastric cancer
C, control group; D, deviation; EV, external validation; IV, interval validation; NI, no information; P, patient group; V, validation.
aType of study: 1a, development only; 2a, development + internal validation; 3a, development + external validation; 4a, development + internal validation + external validation; 5a, external validation only.

Regarding the prognostic models, a great majority (60.0%) was developed in Japan. There were 5 studies only reporting the model development, 1 study conducting external validation of an existing model, and 4 studies reporting both processes. Over half (60.0%) contained missing values, possibly because of the need for long-term follow-up, and 1 study adopted the imputation method. The prognostic models did not limit the subtype of gastric cancer. Although the uneven ratio of men to women also occurred in prognostic models, half studies included sex as a predictor, and Eom et al. developed prediction models of gastric cancer for each sex (41).

From the perspective of real-world practice, the diagnostic models selected in this review were mainly aimed to provide a reliable tool for the pre-examination of large-scale endoscopic screening (12,22,23), surveillance after intervention (27,36), early diagnosis of gastric cancer, or preliminary diagnosis of symptomatic patients based on nongastroscopic predictors (35,37). The prognostic models were applied to screen the appropriate high-risk targets for further endoscopic examination (14,38,40) or to promote cancer prevention (including health education, behavior change, encouraging screening) as a risk reminder (15,41). However, most studies just stated a general purpose, failing to clearly describe models' targeting application scenarios.

Development and performance

Traditional methods, including logistic and Cox proportional hazards regression models, were commonly used to develop prediction models for gastric cancer (Table 2). Machine learning was also adopted in the included studies for modeling. The discrimination of diagnostic models was acceptable, with a range of 0.73–0.99. Different methods were applied to conduct internal validation, such as Bootstrap, random splitting, and leave-one-out cross-validation. Except for 1 study (37), the differences in AUCs between the development and validation processes were not significant, which suggests that the selected diagnostic models might not overfit the training data set (Figure 2). In addition, the performance in external validation did not decrease significantly, so models in this review might not face the modeling error of underfitting. However, only 3 in 18 models reported the performance of calibration. The events per variable (EPVs) values of 10 diagnostic models were over 20, but 4 were with a value less than 10, whose reliability should be taken cautiously.

Table 2. - Key information on the development and validation of included models
Study Model development Model evaluation Model validation
EPV Type of predictor Modeling method Discrimination Calibration Internal validation External validation
Diagnostic models
 Lee et al. (23) 16.64 Demographic characteristics + medical history + lifestyle-related factors Logistic regression 0.888 H-L test: P = 0.1747 Bootstrap resampling technique: 0.904 (0.876–0.932) None
 Kaise et al. (29) 93.5 Blood measurements Logistic regression 0.883 (0.856–0.909) NR None None
 Ahn et al. (13) 10.91 Blood measurements Support vector machine 0.955 NR None None
 Cho et al. (27) 79 Demographic characteristics + disease stages Logistic regression 0.783 NR None None
 Yang et al. (36) 26.5 Blood measurements Logistic regression 0.959 (0–1) NR None None
 Zhu et al. (37) 8 Blood measurements Logistic regression 0.989 NR Random split sampling: 0.812 None
 Kucera et al. (24) 7.2 Blood measurements Logistic regression 0.9553 NR None None
 Tong et al. (35) 45.6 Blood measurements Random forest 0.8788 (0.8127–0.9449) NR Random split sampling (NR) None
 In et al. (22) 11.25 Demographic characteristics + lifestyle-related factors + family history + immigration/acculturation Logistic regression 0.941 (0.901–0.982) H-L test: P = 0.8562 None None
 Wang et al. (25) 46.5 Blood measurements Logistic regression 0.841 (0.808–0.871) NR None Wang et al.: 0.856 (0.812–0.893)
 Cai et al. (12) 38.14 Demographic characteristics + lifestyle-related factors + blood measurements Logistic regression 0.76 (0.73–0.79) H-L test: P = 0.605; calibration in the large: P < 0.001 Bootstrap resampling technique: 0.76 (0.71–0.80) Cai et al.: 0.73 (0.68–0.77)
 Dong et al. (28) 59.5 Blood measurements Logistic regression 0.821 (0.750–0.878) NR None None
 In et al. (26) 5 Demographic characteristics + lifestyle-related factors + family history + immigration/acculturation Logistic regression 0.95 (0.92–0.98) NR None None
 Kong et al. (31) 118.5 Lifestyle-related factors + results from genomics Logistic regression 0.745 NR None None
 Liu et al. (33) 37.5 Results from genomics Lasso logistic regression 0.986 NR None None
 Kim et al. (30) 5.75 Results from proteomics Generalized linear models + random forest 0.9098 NR Random split sampling: 0.9706 None
 Lee et al. (32) 10.8 Results from transcriptomics Logistic regression 0.924 (0.845–0.970) NR Bootstrap resampling technique: 0.896 (0.894–0.898) Lee et al.: 0.988 (0.916–1.000)
Bootstrap: 0.947 (0.946–0.949)
 Song et al. (34) 25 Results from immunoproteomics Lasso logistic regression 0.73 NR Leave-one-out cross validation (NR) None
Prognostic models
 Shikata et al. (40) 5.75 Demographic characteristics + medical history + lifestyle-related factors + health examination results Cox proportional hazards model 0.809 (0.761–0.856) NR None None
 Eom et al. (41) Men: 2433.13
Women: 929.83
Demographic characteristics + family history + lifestyle-related factors Cox proportional hazards model Men: 0.764 (0.760–0.768); women: 0.706 (0.698–0.715) Calibration plot and slope: men: 1.000 (0.983–1.017); women 1.000 (0.962–1.038) None Eom et al.: men: 0.782 (0.777–0.787); women: 0.705 (0.696–0.714)
 Charvat et al. (15) 68.67 Demographic characteristics + family history + lifestyle-related factors + blood measurements Cox proportional hazards model 0.777 H-L test: P = 0.06; and calibration plot Bootstrap resampling technique: 0.768 Charvat et al.: 0.798 (0.725–0.861)
 Ikeda et al. (38) 12.3 Demographic characteristics + lifestyle-related factors + health examination results + blood measurements Cox proportional hazards model 0.773 NR None None
 Iida et al. (14) 18 Demographic characteristics + lifestyle-related factors + blood measurements Cox proportional hazards model 0.79 (0.74–0.83) H-L test: P = 0.31 None Iida et al.: 0.76 (0.69–0.83)
 Taninaga et al. (44) 9.25 Health examination results XGBoost 0.899 NR Cross validation: 0.874 None
 Charvat et al. (42) 4.6 Demographic characteristics + family history + lifestyle-related factors + blood measurements Parametric survival regression model 0.798 (0.725–0.861) The Nam-d’ Agostino χ2 test: χ2 = 5.57, P = 0.23 / /
 Jang et al. (39) 47.6 Demographic characteristics + lifestyle-related factors + blood measurements Logistic regression 0.71 (0.64–0.78) NR None None
 Sarkar et al. (43) 10 Demographic characteristics Logistic regression 0.859 (0.796–0.922) NR None None
 Trivanovic et al. (45) 12.5 Blood measurements Logistic regression 0.700 (0.57–0.83) NR None None
EPV, events per variable; H-L test, Hosmer-Lemeshow test; NR, not reported.

Figure 2.:
The c-statistics reported by the included models. The models are grouped into prognostic and diagnostic models; the uppers were values reported in prognostic models, and the lowers were values from diagnostic models. The type of model (development, external validation, and internal validation) and internal validation method are indicated in the figure.

In general, the performance of prognostic models was inferior to that of diagnostic models, with the AUCs ranging from 0.66 to 0.86. The model based on machine learning showed better discrimination. The results of external validation for the model were close to the AUCs of the original models, and the EPVs were high, suggesting that the model was more reliable. However, for half of the prognostic models, the EPVs did not reach 20 and were not internally or externally validated, which needed to be further verified and optimized. In addition, there was a diversity of the follow-up time among the included models, with a range of 3–20 years.

Considered variables of the prediction models

Laboratory indicators were most commonly considered in diagnostic models, mainly routine examinations on H. pylori infection and pepsinogen as well as molecular-level detection on protein, gene, microRNA, and hormone (Table 3). Specifically, a variety of proteins were applied to predict the risk of gastric cancer, including typical carcinoembryonic antigens (CEA, CA125 and CA19-9), antibodies, and proteins involved in life activities (responsible for metabolism, blood coagulation, chemotaxis, and other cytokines). Personal characteristics were also adopted frequently in diagnosing suspected individuals, mainly sociodemographic variables (age and sex), lifestyle-related factors (dietary habits, alcohol intake and smoking), and health conditions.

Table 3. - Predictors included in the risk prediction models for gastric cancer
Study No. Demographic characteristics Health situation Lifestyle-related factors Laboratory measurement
Age Sex Others Family history Disease history BMI Others Smoking Alcohol drinking Eating habit Others H. pylori infection PG testing Others
Diagnostic models
 Lee et al. (23) 11 Financial status History of gastroscopy or UGI series; health status Occupational hazards
 Kaise et al. (29) 2 Gene: TFF3
 Ahn et al. (13) 11 Protein: EGFR, proApoA1, ApoA1, TTR, DD, A2M, CRP, RANTES, IL-6, VN, and PAI-1
 Cho et al. (27) 6 OLGIM stage
 Yang et al. (36) 4 Oncofetal protein: CA72-4, CA125, CA19-9, and CEA
 Zhu et al. (37) 5 miRNA: miR-16, miR-25, miR-92a, miR-451, miR-486-5p
 Kucera et al. (24) 5 Oncofetal protein: CA72-4 and CEA; enzyme: MMP7
 Tong et al. (35) 5 Protein: ADAM8 (CD156); VEGF
 In et al. (22) 8 Race, education, US generation, acculturation Cultural food consumption frequency
 Wang et al. (25) 6 Autoantibody against TAAs: p62, c-Myc, NPM1, 14-3-3ξ, MDM2, and p16
 Cai et al. (12) 7 Pickled food and fried food Hormone: G-17
 Dong et al. (28) 2 Oncofetal protein: CEA mRNA: MT1-MMP mRNA
 In et al. (26) 8 Race, education, and US generation Cultural food consumption frequency and salt intakes
 Kong et al. (31) 4 Pickled food, tea drinking SNP: MEG3 gene polymorphism (rs7158663)
 Liu et al. (33) 10 m6A gene: METTL14, METTL16, WTAP, KIAA1429, ZC3H13, RBM15, ALKBH5, YTHDF1, YTHDF2, and YTHDC1
 Kim et al. (30) 12 Oncofetal protein: CA125, CA19-9, AFP, and tPSA; CEA
Other protein: ApoA1, ApoA2, β2M, CRP, TTR, CYFRA21-1, and HE4
 Lee et al. (32) 5 Gene: HBB, KRT7, UBD, PLA2G2A, and ISG15
 Song et al. (34) 4 Autoantibody: anti-Ggt, anti-HslU, anti-NapA, and anti-CagA
Prognostic models
 Shikata et al. (40) 12 Diabetes Physical activity Total cholesterol
 Eom et al. (41) Men 8 Regularity of eating and salt intakes Physical activity
 Eom et al. (41) Women 6 Salt intakes
 Charvat et al. (15) 6 Salt intakes •△ •△
 Ikeda et al. (38) 10 Salt intakes, total energy HbA1C and total cholesterol
 Iida et al. (14) 5 •△ •△ HbA1C
 Taninaga et al. (44) 8 Postgastrectomy HbA1c, MCV, and lymphocyte ratio
 Jang et al. (39) 5 Gene: CagA; CagA-relating GRS
Factor: HGF
 Sarkar et al. (43) 4 Race and education
 Trivanovic et al. (45) 2
A2M, α-2 macroglobulin; ADAM8, A disintegrin and metalloproteinase domain-containing protein 8; Apo, apolipoprotein; CA, cancer antigen; CEA, carcinoembryonic antigen; CRP, C-reactive protein; DD, D-dimer; EGFR, epidermal growth factor receptor; HbA1c, hemoglobin (Hb)A1c; HGF, hepatocyte growth factor; IL, interleukin; MCV, mean corpuscular volume; MDM2, mouse double minute 2; NPM1, nucleophosmin 1; PAI-1, plasminogen activator inhibitor-1; PG, pepsinogen; ProApo, pro-apolipoprotein; RANTES, regulated upon activation, normally T-expressed and presumably secreted; TAA, tumor-associated antigen; TFF3, trefoil factor 3; TTR, transthyretin; VEGF, vascular endothelial growth factor; VN, vitronectin.
• Presents for one single variable; •△ presents for a combined predictor.

By contrast, each prognostic model tended to be developed based on multiple predictors. The most frequent variable considered was age, followed by smoking, age, BMI, and family history of gastric cancer. Salt intake was included in 3 models, and alcohol assumption and physical activity were also adopted because of the possible associations with gastric cancer. Moreover, the laboratory indicators were more convenient to test, such as the levels of glycosylated hemoglobin and total cholesterol. In 2 models, the results of H. pylori infection and pepsinogen were integrated into 1 predictor.

ROB and applicability

According to the evaluation by PROBAST, all the included models were assessed to have high ROB and were attributed to the statistical method (Figure 3). Specifically, most diagnostic models did not enroll sufficient samples, selected candidate variables by univariate analysis, or lacked reports of calibration. For example, age and sex were predictors of gastric cancer that could achieve nontrivial risk discrimination when applied alone, whereas some studies (12,28,31–34,43), which took a data-driven method for candidate variable selection, is likely to drop such essential predictors. Instead, the limited sample sizes, failure to analyze all recruited participants, and incomplete model evaluation contributed to the bias of prognostic models.

Figure 3.:
Assessments on risk of bias and applicability for 28 prediction models of gastric cancer based on the Prediction model Risk of Bias Assessment Tool. (a) Proportion of diagnostic models evaluated as high risk/low risk/unclear in the aspect of risk of bias. (b) Proportion of diagnostic models evaluated as high risk/low risk/unclear in the aspect of applicability. (c) Proportion of prognostic models evaluated as high risk/low risk/unclear in the aspect of risk of bias. (d) Proportion of prognostic models evaluated as high risk/low risk/unclear in the aspect of applicability.

Among all the included models, 75% of the models did not assess calibration, so the reliability remained unclear. Over 70% models assessed calibration through the Hosmer-Lemeshow test, failing to indicate the existence and magnitude of miscalibration. Although the sample sizes of 3-quarter models met the traditional EPV rules of thumb by the EPVs ≥10 rule, less than half reached 20 EPVs. Limitations in study design was another main source of bias for included prediction models. Several in this review were developed from case-control studies (43–45), where the weight of the sampling population was not readjusted to the source population, involving concerns of bias in the baseline risk estimation, or from routine care registries with variable data quality. In addition, both types of models lacked the explicability of data complexity, for instance, the competitive risk and treatment of the censored data.

Regarding the model's applicability, 11 in 18 of the diagnostic models were evaluated to be applicable. The models with low applicability may be explained as follows: (i) selection bias introduced by poor matching between the concerned and actual participants (such as college students, exclusion of individuals with inflammation); (ii) poor accessibility in the measurement of predictors; and (iii) limited primary outcome (e.g., only focusing on the stage I gastric cancer). Only 1 prognostic model was assessed as having low applicability because of the convenient predictor measurement, fewer sample requirements, and better representativeness of data. In addition, nearly one-third of the models did not clearly state the inclusion criteria, which would limit the evaluation on applicability and further verification of the models.


To the best of our knowledge, this is the first review to systematically summarize and evaluate prediction models of gastric cancer, aiming to assist the selection for model users and call for improvements for future model development. The background, design, structure, and performance of 18 diagnostic and 10 prognostic models were mapped in detail. We also summarized current situations of gastric cancer prediction tools and common problems. Furthermore, we underlined the significance of reporting potential application scenarios specifically, which was frequently ignored, and classified models by application scenario.

Common pitfalls and perspectives

Although there were 28 prediction models for gastric cancer, few could be expected to be translated into real-world practice because of the high ROB, difficult balancing of accuracy and applicability, and unclear report of application scenario (Figure 4). More attention is warranted to consider the above 3 aspects for future studies.

Figure 4.:
Common limitations of the included models and suggestions for application from code to bedside.

ROB of all included gastric cancer prediction models were identified as facing high ROB. Actually, similar findings were often observed in the fields of other diseases' prediction models (46,47). The low quality of risk prediction models is associated with the nature of post hoc analysis for most studies. We have to acknowledge that frequently a model was simply created with available data and statistical tools to satisfy researchers' aim of publishing articles instead of affecting practice (48). Researchers should be educated to be responsible for the quality and clinical implications of the model. In our study, the leading bias of the included models was derived from data analysis, including limited sample size, inappropriate predictor selection, and lack of validation or calibration assessment. Therefore, we recommend researchers to use relevant standard tools during model development, such as PROBAST. Moreover, there was a large cross in the selected predictors but a lack of external validation. Instead of mass production of prediction models, verifying or optimizing previous models is supposed to better accelerate the transformation from code to public health practice (49,50). Transregional cooperation would contribute to controlling ROB, through enlarging sample size and validating the model among extensive independent population. However, the data homogeneity and comparability between groups also require full guarantee.


The prediction models were supposed to achieve better performance using laboratory-related and questionnaire-based variables, given that cancer is caused by both intrinsic characteristics and external environment (51). Researchers tended to explore predictors for gastric cancer from a micro perspective at this stage, which might raise burdens of economic expenditure and inevitably affect the data accessibility in the clinical practice of model application. In such circumstances, reasonable useful questionnaires were expected to accurately describe individuals' behavioral patterns, rather than simply classifying them into 2 categories (yes or no). In addition, given that immunological predictors have been widely adopted because of their good acceptability and convenient measurement, joint indicators are expected to have better application value for massive screening or regions with limited health resources. For example, the combination of H. pylori and PG was recommended as the ABC method in Japan (52) and included in subsequent models (14,15). Instead, other indicators with high prediction efficiency but lower accessibility, such as carcinoembryonic antigens, genetic predictors, and multiomics data, should be considered for early diagnosis or individualized screening for gastric cancer.

Machine learning brings new ideas and challenges to prediction models. Compared with the traditional modeling methods, machine learning has been proven to better match the complex and unpredictable nature of human physiology, thus has been gradually applied in medicine (53–55). However, the transparency issue of the models also concerns (56). When adopted for clinical screening, it must be fully considered for the model developers how to present the risk calculation process to ensure the individual's right to know and correct possible errors in time.

Application scenario

The application-oriented nature of prediction models should be further underscored with the explosion of gastric cancer prediction models. Cardia and noncardia cancer were subtypes of gastric cancers, different in the epidemic trends, risk factors, and prognosis (57,58). Almost all prediction models in this review define both types of gastric cancers as the target outcome. Nevertheless, some studies suggest that developing models separately for the 2 types of gastric cancers may be more effective. For example, one study on noncardia cancer showed superior diagnostic efficiency, with an AUC of 0.989 (37). Considering its worse prognosis and the increasing incidence trends (59–61), the development of risk prediction models focusing on cardia cancer is a practical way to explore in the future. At the same time, the corresponding workload, necessity of screening, and practical applicability also matter in real-world, large-scale screening practice.

The models in this review showed discrepant follow-up periods, which targeted various application scenarios and should also be considered before translation to clinical practice. For example, In et al. developed a questionnaire-based diagnostic model for both the community and healthcare settings to identify high-risk individuals instantly before screening endoscopy (22) while Charvat et al. provided an algorithm for predicting the 10-year probability of gastric cancer occurrence (15), aiming to assist clinicians in health education on cancer prevention. However, most studies just stated a general purpose (29,36,42), failing to clearly describe models' targeting application scenarios, bringing barriers to real-world practice. Unfortunately, academia shows irrational tolerance with the unclear report of prediction models' real-world application scenarios. The transparent reporting of a multivariable prediction model for individual prognosis or diagnosis statement and PROBAST did not include the application scenario as a requirement (21,62). Moreover, only a few studies in this review have entirely presented key information (including parameter calculation and stratification criteria) for risk prediction (31–33,37,43–45). A complete report of predictive algorithms is fundamental to promoting prediction models' translation from code to public health practice and requires more attached importance (63).

Strengths and limitations

This review systematically searched and summarized the risk prediction models for gastric cancer, and the supplementary search of references and citations also ensured the comprehensiveness of retrieval. Researchers seeking for a suitable model could balance model performance, data accessibility, and their own purpose to make a final decision. Although the best model for gastric cancer cannot be obtained, researchers could also verify an existing model in population with temporal or regional differences or optimize it with new predictors, which set lower requirements for data and achieved more robust predictions in comparison with developing new models. In addition, we listed the considered predictors for each prediction model and summarized the commonly used predictors, which could provide references for identifying high-risk individuals. We also concluded the most common methodological pitfalls during the whole process of model development, aiming to remind readers of potential bias and provide suggestions for future steps.

There are also some limitations in this work. This study did not include relating models on predicting precancerous lesions of gastric cancer, which also contributed to the cancer control. However, the number of studies was still relatively limited. Besides, studies using other classification measures (e.g., sensitivity, predictive values) were excluded, given that the preset probability thresholds might not be clinically relevant. The entire range of the model-predicted probabilities was not fully used.


All gastric cancer prediction models included in this review were assessed to have high ROB, mainly caused by inappropriate statistical analysis and incomplete model evaluation. Most models had acceptable applicability, and the leading limitations were the inconvenient measurement, high sample requirements, and limited data representativeness. Besides, application scenario was urgently needed to be stated specifically in future prediction models on gastric cancer to provide references for interest groups.


Guarantor of the article: Wanqing Chen, PhD.

Specific author contributions: S.H. and D.S.: conceptualization, data curation, methodology, formal analysis, roles/writing—original draft, and writing—review and editing. H.L., M.C., and X.Y.: supervision, writing—review and editing. L.L. and J.P.: funding acquisition and supervision. J.L. and N.L.: methodology and writing—review and editing. W.C.: conceptualization, funding acquisition, methodology, supervision, and writing—review and editing. All authors have agreed on the journal to which the article will be submitted, given final approval of the version to be published, and want to declare that the work has not been published previously elsewhere.

Financial support: This work was supported by the National Key R&D Program of China (grant number: 2018YFC1313100); the Cooperation Project in Beijing, Tianjin, and Hebei of China (grant number: J200017); the Special Fund for Health Research in the Public Interest (grant number: 201502001); the Major State Basic Innovation Program of the Chinese Academy of Medical Sciences (grant numbers: 2016-I2 M-2-004 and 2019-I2 M-2-004); and the Sanming Project of Medicine in Shenzhen (grant number: SZSM201911015).

Potential competing interests: None to report.


The authors thank Professor Jinhui Tian from Lanzhou University for his assistance in guiding the search.


1. Ferlay J, Ervik M, Lam F, et al. Global Cancer Observatory: Cancer Today. International Agency for Research on Cancer: Lyon, France, 2020. (
2. Sun F, Sun H, Mo X, et al. Increased survival rates in gastric cancer, with a narrowing gender gap and widening socioeconomic status gap: A period analysis from 1984 to 2013. J Gastroenterol Hepatol 2018;33(4):837–46.
3. Everett SM, Axon ATR. Early gastric cancer in Europe. Gut 1997;41(2):142–50.
4. Mizoue T, Yoshimura T, Tokui N, et al. Prospective study of screening for stomach cancer in Japan. Int J Cancer 2003;106(1):103–7.
5. Choi KS, Jun JK, Lee HY, et al. Performance of gastric cancer screening by endoscopy testing through the National Cancer Screening Program of Korea. Cancer Sci 2011;102(8):1559–64.
6. Huang HL, Leung CY, Saito E, et al. Effect and cost-effectiveness of national gastric cancer screening in Japan: A microsimulation modeling study. BMC Med 2020;18(1):257.
7. Suh YS, Lee J, Woo H, et al. National cancer screening program for gastric cancer in Korea: Nationwide treatment benefit and cost. Cancer 2020;126(9):1929–39.
8. Fan X, Qin X, Zhang Y, et al. Screening for gastric cancer in China: Advances, challenges and visions. Chin J Cancer Res 2021;33(2):168–80.
9. Hull MA, Rees CJ, Sharp L, et al. A risk-stratified approach to colorectal cancer prevention and diagnosis. Nat Rev Gastroeneterol Hepatol 2020;17(12):773–80.
10. Ford AC, Yuan Y, Moayyedi P. Helicobacter pylori eradication therapy to prevent gastric cancer: Systematic review and meta-analysis. Gut 2020;69(12):2113–21.
11. O'Connor A, O'Morain CA, Ford AC. Population screening and treatment of Helicobacter pylori infection. Nat Rev Gastroenterol Hepatol 2017;14(4):230–40.
12. Cai Q, Zhu C, Yuan Y, et al. Development and validation of a prediction rule for estimating gastric cancer risk in the Chinese high-risk population: A nationwide multicentre study. Gut 2019;68(9):1576–87.
13. Ahn HS, Shin YS, Park PJ, et al. Serum biomarker panels for the diagnosis of gastric adenocarcinoma. Br J Cancer 2012;106(4):733–9.
14. Iida M, Ikeda F, Hata J, et al. Development and validation of a risk assessment tool for gastric cancer in a general Japanese population. Gastric Cancer 2018;21(3):383–90.
15. Charvat H, Sasazuki S, Inoue M, et al. Prediction of the 10-year probability of gastric cancer occurrence in the Japanese population: The JPHC study cohort II. Int J Cancer 2016;138(2):320–31.
16. Peng L, Weigl K, Boakye D, et al. Risk scores for predicting advanced colorectal neoplasia in the average-risk population: A systematic review and meta-analysis. Am J Gastroenterol 2018;113(12):1788–800.
17. McGeoch L, Saunders CL, Griffin SJ, et al. Risk prediction models for colorectal cancer incorporating common genetic variants: A systematic review. Cancer Epidemiol Biomarkers Prev 2019;28(10):1580–93.
18. Louro J, Posso M, Hilton Boon M, et al. A systematic review and quality assessment of individualised breast cancer risk prediction models. Br J Cancer 2019;121(1):76–85.
19. Toumazis I, Bastani M, Han SS, et al. Risk-based lung cancer screening: A systematic review. Lung Cancer 2020;147:154–86.
20. Moons KGM, de Groot JAH, Bouwmeester W, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: The CHARMS checklist. PLoS Med 2014;11(10):e1001744.
21. Moons KGM, Wolff RF, Riley RD, et al. PROBAST: A tool to assess risk of bias and applicability of prediction model studies: Explanation and elaboration. Ann Intern Med 2019;170(1):W1–w33.
22. In H, Langdon-Embry M, Gordon L, et al. Can a gastric cancer risk survey identify high-risk patients for endoscopic screening? A pilot study. J Surg Res 2018;227:246–56.
23. Lee DS, Yang HK, Kim JW, et al. Identifying the risk factors through the development of a predictive model for gastric cancer in South Korea. Cancer Nurs 2009;32(2):135–42.
24. Kucera R, Smid D, Topolcan O, et al. Searching for new biomarkers and the use of multivariate analysis in gastric cancer diagnostics. Anticancer Res 2016;36(4):1967–71.
25. Wang S, Qin J, Ye H, et al. Using a panel of multiple tumor-associated antigens to enhance autoantibody detection for immunodiagnosis of gastric cancer. Oncoimmunology 2018;7(8):e1452582.
26. In H, Solsky I, Castle PE, et al. Utilizing cultural and ethnic variables in screening models to identify individuals at high risk for gastric cancer: A pilot study. Cancer Prev Res (Phila) 2020;13(8):687–98.
27. Cho SJ, Choi IJ, Kook MC, et al. Staging of intestinal- and diffuse-type gastric cancers with the OLGA and OLGIM staging systems. Aliment Pharmacol Ther 2013;38(10):1292–302.
28. Dong Z, Sun X, Xu J, et al. Serum membrane type 1-matrix metalloproteinase (MT1-MMP) mRNA protected by exosomes as a potential biomarker for gastric cancer. Med Sci Monit 2019;25:7770–83.
29. Kaise M, Miwa J, Tashiro J, et al. The combination of serum trefoil factor 3 and pepsinogen testing is a valid non-endoscopic biomarker for predicting the presence of gastric cancer: A new marker for gastric cancer risk. J Gastroenterol 2011;46(6):736–45.
30. Kim YS, Kang KN, Shin YS, et al. Diagnostic value of combining tumor and inflammatory biomarkers in detecting common cancers in Korea. Clin Chim Acta 2021;516:169–78.
31. Kong X, Yang S, Liu C, et al. Relationship between MEG3 gene polymorphism and risk of gastric cancer in Chinese population with high incidence of gastric cancer. Biosci Rep 2020;40(11):BSR20200305.
32. Lee IS, Ahn J, Kim K, et al. A blood-based transcriptomic signature for noninvasive diagnosis of gastric cancer. Br J Cancer 2021;125(6):846–53.
33. Liu T, Yang S, Cheng YP, et al. The N6-methyladenosine (m6A) methylation gene YTHDF1 reveals a potential diagnostic role for gastric cancer. Cancer Manag Res 2020;12:11953–64.
34. Song L, Song M, Rabkin CS, et al. Helicobacter pylori immunoproteomic profiles in gastric cancer. J Proteome Res 2021;20(1):409–19.
35. He L, Ye F, Tong W, et al. Serum biomarker panels for diagnosis of gastric cancer. Onco Targets Ther 2016;9:2455–63.
36. Yang AP, Liu J, Lei HY, et al. CA72-4 combined with CEA, CA125 and CAl9-9 improves the sensitivity for the early diagnosis of gastric cancer. Clin Chim Acta 2014;437:183–6.
37. Zhu C, Ren C, Han J, et al. A five-microRNA panel in plasma was identified as potential biomarker for early detection of gastric cancer. Br J Cancer 2014;110(9):2291–9.
38. Ikeda F, Shikata K, Hata J, et al. Combination of Helicobacter pylori antibody and serum pepsinogen as a good predictive tool of gastric cancer incidence: 20-year prospective data from the Hisayama Study. J Epidemiol 2016;26(12):629–36.
39. Jang J, Ma SH, Ko KP, et al. Hepatocyte growth factor in blood and gastric cancer risk: A nested case-control study. Cancer Epidemiol Biomarkers Prev 2020;29(2):470–6.
40. Shikata K, Ninomiya T, Yonemoto K, et al. Optimal cutoff value of the serum pepsinogen level for prediction of gastric cancer incidence: The Hisayama Study. Scand J Gastroenterol 2012;47(6):669–75.
41. Eom BW, Joo J, Kim S, et al. Prediction model for gastric cancer incidence in Korean population. PLoS One 2015;10(7):e0132613.
42. Charvat H, Shimazu T, Inoue M, et al. Estimation of the performance of a risk prediction model for gastric cancer occurrence in Japan: Evidence from a small external population. Cancer Epidemiol 2020;67:101766.
43. Sarkar S, Dauer MJ, In H. Socioeconomic disparities in gastric cancer and identification of a single SES variable for predicting risk. J Gastrointest Cancer 2021;53(1):170–8.
44. Taninaga J, Nishiyama Y, Fujibayashi K, et al. Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study. Sci Rep 2019;9(1):12384.
45. Trivanovic D, Plestina S, Honovic L, et al. Gastric cancer detection using the serum pepsinogen test method. Tumori 2021;108(4):386–91.
46. Bellou V, Belbasis L, Konstantinidis AK, et al. Prognostic models for outcome prediction in patients with chronic obstructive pulmonary disease: Systematic review and critical appraisal. BMJ 2019;367:l5358.
47. Wynants L, Van Calster B, Collins GS, et al. Prediction models for diagnosis and prognosis of COVID-19: Systematic review and critical appraisal. BMJ 2020;369:m1328.
48. Vickers AJ, Cronin AM. Everything you always wanted to know about evaluating prediction models (but were too afraid to ask). Urology 2010;76(6):1298–301.
49. Adibi A, Sadatsafavi M, Ioannidis JPA. Validation and utility testing of clinical prediction models: Time to change the approach. JAMA 2020;324(3):235–6.
50. Janssen KJ, Moons KG, Kalkman CJ, et al. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol 2008;61(1):76–86.
51. Kachuri L, Graff RE, Smith-Byrne K, et al. Pan-cancer analysis demonstrates that integrating polygenic risk scores with modifiable risk factors improves risk prediction. Nat Commun 2020;11(1):6084.
52. Inoue K, Fujisawa T, Haruma K. Assessment of degree of health of the stomach by concomitant measurement of serum pepsinogen and serum Helicobacter pylori antibodies. Int J Biol Markers 2010;25(4):207–12.
53. Xie X, Niu J, Liu X, et al. A survey on incorporating domain knowledge into deep learning for medical image analysis. Med Image Anal 2021;69:101985.
54. Budd S, Robinson EC, Kainz B. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Med Image Anal 2021;71:102062.
55. Heo J, Yoon JG, Park H, et al. Machine learning-based model for prediction of outcomes in acute stroke. Stroke 2019;50(5):1263–5.
56. Murdoch WJ, Singh C, Kumbier K, et al. Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci U S A 2019;116(44):22071–80.
57. Islami F, DeSantis CE, Jemal A. Incidence trends of esophageal and gastric cancer subtypes by race, ethnicity, and age in the United States, 1997-2014. Clin Gastroenterol Hepatol 2019;17(3):429–39.
58. Colquhoun A, Hannah H, Corriveau A, et al. Gastric cancer in Northern Canadian populations: A focus on cardia and non-cardia subsites. Cancers (Basel) 2019;11(4):534.
59. Petrelli F, Ghidini M, Barni S, et al. Prognostic role of primary tumor location in non-metastatic gastric cancer: A systematic review and meta-analysis of 50 studies. Ann Surg Oncol 2017;24(9):2655–68.
60. Wang Z, Graham DY, Khan A, et al. Incidence of gastric cancer in the USA during 1999 to 2013: A 50-state analysis. Int J Epidemiol 2018;47(3):966–75.
61. Wang X, Liu F, Li Y, et al. Comparison on clinicopathological features, treatments and prognosis between proximal gastric cancer and distal gastric cancer: A national cancer data base analysis. J Cancer 2019;10(14):3145–53.
62. Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMJ 2015;350(4):g7594.
63. Waters EA, Taber JM, McQueen A, et al. Translating cancer risk prediction models into personalized cancer risk assessment tools: Stumbling blocks and strategies for success. Cancer Epidemiol Biomarkers Prev 2020;29(12):2389–94.

Supplemental Digital Content

© 2022 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of The American College of Gastroenterology