Journal Logo

Research Article: Clinical Trial/Experimental Study

An immune-related gene signature predicts prognosis of gastric cancer

Jiang, Bitao MDa; Sun, Qingsen MDb; Tong, Yao BSc; Wang, Yuzhuo MDd; Ma, Haifen MDe; Xia, Xuefei MDe; Zhou, Yu MDe; Zhang, Xingguo MDe,∗; Gao, Feng MDf,g,∗; Shu, Peng MDe,∗

Editor(s): Shibaguchi., Hirotomo

Author Information
doi: 10.1097/MD.0000000000016273


1 Introduction

Gastric cancer (GC) is the 4th most common malignancy worldwide.[1] In Eastern Asia, GC remains a leading cause of cancer mortality and severe public health problems.[2] Most of the patients with GC are diagnosed at advanced stages and have a high risk of death due to tumor recurrence and respond poorly to subsequent chemotherapy.[3] Nowadays, invasive procedures, like enhanced computed tomography (CT) and endoscopy, remain the gold standard for confirming and staging GC. However, these procedures are expensive and difficult to expand to daily screening.[4] Noninvasive markers, like carbohydrate antigen (CA) 19–9 and carcinoembryonic antigen (CEA), are widely applied to runtime practice but have restricted efficiency.[5] Traditional approaches are not appropriate for precise risk stratification and treatment. Meanwhile, patients with equal clinical or pathologic conditions have different clinical outcomes. The patients’ genetic heterogeneity contributes most to the inherent clinical and molecular diversities of GC.[6]

Recently, researchers have established numerous gene expression signatures for patients’ stratification and have constructed prognostic multigene-expression signatures that can divide GC into different risk groups.[7–10] Unfortunately, none have been applied to routine clinical practice owing to issues such as overfitting on small discovery data sets and lack of sufficient validation cohorts, and this might reduce the power and robustness of statistical conclusions.[11,12] Increasing evidence shows that the immune system plays a critical role during cancer initiation and progression.[13,14] Several studies have depicted a possible association between programmed death-1 (PD-1) or cytotoxic T-lymphocyte associated antigen 4 (CTLA-4) polymorphisms and the development of GC.[15,16] Considering their prognostic potential in GC,[17,18] the molecular characteristics of immune interaction should be extensively studied.

In this study, we analyzed immune-related genes from large amounts of GC transcriptional data. Combination of multiple immune biomarkers was developed to construct a prognostic immune-related gene signature (IRGS). Furthermore, the prognostic prediction value of the IRGS was systematically validated. This would help to make the therapeutic strategy for patients with GC.

2 Materials and methods

2.1 Ethical approval

The researchers were granted approval to conduct the research by their Departmental Research Ethics Committee at the Beilun People's Hospital, Ningbo, China. All the procedures were performed in accordance with the Declaration of Helsinki and relevant policies in China.

2.2 Public datasets

This study used public data to do a comprehensive analysis. The complete lists of selected all gene expression profiles (GEP), related accession numbers and corresponding publications are given in Supplemental Table 1, In total, 3 independent datasets from different technical platforms were used in this study, including Cristescu (GSE62254, n = 300),[6] Lee (GSE26253, n = 277),[19] and Yong (GSE84437, n = 433). Gene expression data together with clinical profiles were downloaded from Gene Expression Omnibus (GEO In total, 1010 cases were analyzed in this study. For each data set, the expression profiles were collapsed from probe-level to the corresponding gene symbols based on the annotation platform of each set.

2.3 Identification of a prognostic IRGS

The identification of prognostic IRGPs was performed as described previously.[20] We constructed a prognostic IRGS by focusing on 1811 immune-related genes (IRGs) downloaded from the ImmPort database ( [21] and selected IRGs that were measured by all platforms involved in this study. The IRGs with relatively high variation (determined by MAD > 0.5) in the training cohort were selected. Prognostic IRGs were further selected using the Cox proportional hazards regression against 1000 randomization (80% of all patients each time) test to assess the association between each IRGs and patients’ overall survival (OS) in the training cohort. The selected prognostic IRGs showing as a robust predictor (>95% times significant) were candidates to construct the IRGS. In order to minimize the risk of over-fitting and generate a risk model, we applied a Cox proportional hazards regression model on GC samples combined with the least absolute shrinkage and selection operator (LASSO). The penalty parameter was estimated by 10-fold cross-validation in the training data set at the minimum partial likelihood deviance. To stratify patients into low or high-risk groups, the optimal cut off of the IRGS was determined using time-dependent receiver operating characteristic (ROC) curve analysis (survivalROC, version 1.0.3) [22] at 10 years OS in the training cohort. The IRGS corresponding to the shortest distance between the ROC curve and point representing the 100% true positive rate and 0% false-positive rates was used as the cutoff value.

2.4 Validation of the IRGS

The IRGS prognostic value was evaluated in all stages and stage I&II patients with GC in the training, meta-validation (combination of the Yong and Lee cohorts) and independent validation cohorts by the log-rank test, respectively. Then we integrated IRGS with available clinical and pathologic variables in multivariate analyses to further assess whether it is an independent risk factor in the Yong cohort (only this data set contains clinical information such as age and tumor stage).

2.5 Functional annotation and analysis

Gene ontology (GO) analysis was conducted using gProfiler ( for prognostic immune signature. Gene set enrichment analysis (GSEA) [23] was performed between high- and low- risk groups using the Bioconductor package ‘fgsea’ with 10,000 permutations. Gene sets of cancer hallmarks from MSigDB [24] were examined. The FDR-adjusted P < .05 was used to select statistically significant gene sets. To dissect immune cell infiltration in different risk groups, we employed CIBERSORT,[25] a popular algorithm for characterizing cell composition from bulk-tumor gene expression profiles.

2.6 Statistical analysis

Statistical analyses were performed using R (version 3.4.3, Continuous variables were compared using Wilcoxon rank sum tests. Survival analyses were performed using the Kaplan–Meier method and compared using a log-rank test by ‘survival’ package (version: 2.41.3). The LASSO regression was implemented using ‘glmnet’ R package (version: 2.0-16). Univariate and multivariate analysis of the association of IRGS and other clinical pathologic factors was evaluated using the log-rank test. Time-dependent ROC curver analysis was performed using R package ‘survivalROC’ (version: 1.0.3). For all tests, a P-value < .05 was considered to be significant. Statistical significance is shown as ∗P < .05, ∗∗P < .01, ∗∗∗P < .001.

3 Results

3.1 Construction of IRGS and its prognostic value

Cristescu (GSE62254) cohort was used as the training cohort. In total, 1811 immune-related genes (IRGs) from the ImmPort database, including 17 categories. A 360 IRGs remained after filtering median absolute deviation (MAD) > 0.5 and excluding the genes expressed less median expression level. By 1000 randomization of Cox univariate regression, 59 IRGs were found robustly related with patient's OS. To establish a risk model for patients with GC, 16 prognostic IRGs (HSPA1A, HSPA1B, HSPA5, MICB, PSMC3, TAP2, KIAA0368, RBP1, APOD, VDR, PPP3R1, IL11RA, LGR4, NRP1, PLCG1, GZMB) were selected and combined for the construction of the IRGS using LASSO Cox regression in the training cohort (Supplemental Fig. 1, 2 and Table 1, The coefficient of each gene in the IRGS was listed in Table 1. Risk scores were calculated using the formula devised from this Cox regression model (Table 1). Time-dependent ROC curve analysis showed satisfactory prognostic significance at 10 years, the optimal cutoff to stratify immune high- and low-risk group was determined at 0.073 (Supplemental Fig. 3, The IRGS significantly divided patients into low- and high- risk groups in terms of OS (Fig. 1A, B and Supplemental Table 2,, HR, 3.9 [2.78–5.47]; P < 1.0 × 10−22) in the training cohort. When considering patients with stage I&II GC only, the IRGS remained highly prognostic in terms of OS (Fig. 2A, B, HR, 5.85 [2.98–11.48]; P = 6.21 × 10−9) for the training cohort.

Table 1
Table 1:
Model information.
Figure 1
Figure 1:
The outcome of low and high immune risk in patients with GC. Patients with GC were ranked by immune risk scores in the training cohort (A) and the meta-validation cohorts (C). Kaplan–Meier curves comparing patients with low or high immune risk in training cohort (B) and meta-validation cohort (D). P-values were calculated using log-rank tests and HR is short for hazard ratio. GC = gastric cancer.
Figure 2
Figure 2:
The association of the IRGS with OS in stage I&II patients with GC. Patients with GC of Stage I&II were ranked by immune risk scores in the training cohort (A) and the meta-validation cohorts (C). Kaplan–Meier curves showed OS of stage I&II patients in immune score low and high subgroups in training (B) and meta-validation cohort (D) datasets, respectively. P values comparing risk groups were calculated with the log-rank test. GC = gastric cancer, IRGS = immune-related gene signature, OS = overall survival.

3.2 Validation of the IRGS as a prognostic factor of patients with GC

To determine whether the IRGS had similar prognostic value in different populations, we applied the same formula to two different cohorts from the Yong and Lee cohorts as external validation sets. As expected, the IRGS significantly stratified patients in terms of OS (Supplemental Table 2 and Fig. 1C, D,, HR, 1.84 [1.47–2.30]; P = 6.59 × 10−8) in the meta-validation dataset. On the Yong and Lee cohorts, the high-risk group had significantly worse OS than the low-risk patients (Fig. 3, Lee cohort, HR, 1.69 [1.15–2.49], P = 7.16 × 10−3; Yong cohort, HR, 1.88 [1.42–2.48], P = 6.1 × 10−6). Considering stage I&II GC, the IRGS remained highly prognostic for the meta-validation cohort (Fig. 2C, D, HR, 1.96 [1.34–2.89]; P = 4.73 × 10−4). When considering stage I&II patients in independent validation cohorts, the high-risk group had significantly poor OS than patients in the low-risk group (Supplemental Fig. 4,, Lee cohort, HR, 1.70 [1.08–2.68], P = 1.97 × 10−2; Yong cohort, HR, 2.88 [1.36–6.10], P = 3.93 × 10−3). To further investigate whether the IRGS could serve as an independent predictor of prognosis, uni- and multivariate Cox regression analyses were applied to the Yong cohort. After adjusting for the clinical and pathological factors such as sex and tumor stage, the IRGS remained an independent prognostic factor (Table 2).

Figure 3
Figure 3:
Kaplan–Meier curves for validations of the IRGS. (A) and (C) OS among patients in the validation 1 cohort. (C) and (D) OS among patients in the validation 2 cohort. Hazard ratios (HRs) and 95% CIs are for high vs low immune risk. P values comparing risk groups were calculated with the log-rank test. IRGS = immune-related gene signature, OS = overall survival.
Table 2
Table 2:
Univariate and multivariate analyses of prognostic factors in validation cohort.

3.3 In silico functional analysis of the IRGS

In order to gain new insights into the biological role of the obtained risk groups, we carried out immune infiltration analysis, GO analysis and GSEA. For immune infiltration, we found that the percentages of T cells CD4 memory resting and Macrophage M2 were significantly higher in the high-risk risk group compared with the low-risk group. Among low-risk group, NK cells activated, CD4+ T cells memory activated and CD8+ T cells infiltration showed significantly higher than the high-risk group (Fig. 4A). Previous studies have reported the association of macrophage M2 [26] with poor prognosis, while NK cells activated [27] and CD8+ T cells [28] as the indicator of better prognosis. Furthermore, the risk groups specific immune cell subsets infiltration was also validated in meta-validation cohorts (Fig. 4B). The GO analysis revealed that the genes within the IRGS were mostly involved in the immune response (Supplemental Fig. 5,; GO terms, such as immune response, immune system process, and regulation of immune system process). The GSEA was carried out between high- and low- risk groups to investigate the pathways that were significantly altered. Multiple mesenchymal phenotype-related pathways, including the epithelial-mesenchymal transition (EMT), adhesion junction, TGF-beta signaling pathway, and extracellular matrix signaling pathway, were highly enriched for the high-risk groups (false discovery rate (FDR) < 0.01) (Supplemental Fig. 6, and Supplemental Table 3, The enrichment of mesenchymal phenotype-related pathways suggests that the IRGS reflects cancer status and thus predicts the prognosis of GC.

Figure 4
Figure 4:
Immune infiltration status of IRGS risk groups. 22 immune cells’ abundance for different immune risk groups (A). Specifically enriched immune cells within different immune risk groups in the training and meta-validation cohorts (B). Macrophage M2 and T cells CD4 memory resting cells were enriched in the high-risk group. The NK cells activated, CD4+ T cells memory activated and CD8+ T cells were enriched in the low-risk group. In all boxplots, P-values are based on the Wilcoxon Test (∗ P < .05, ∗∗ P < .01, ∗∗∗ P < .001). IRGS = immune-related gene signature.

4 Discussion

Gastric cancer (GC) is the 4th most common malignancy diagnosed worldwide [1] and the 2nd leading cause of cancer-related death with a poor 5-year survival rate.[29] Accordingly, it is essential to evaluate the prognosis of patients with GC. Prognostic-related biomarkers are the key point in the risk stratification of individual patients with GC and the decision regarding treatment. Reliable prognostic biomarkers are urgently needed to screen patients with the highest risk of recurrence or who may require additional systematic treatment. This need is particularly pronounced in the treatment of patients with GC, where effective prognostic markers can be more selective in the application of adjuvant chemotherapy. Currently, tumor stage and grade are still the most common ways to evaluate the risk of patients with GC, and besides, a lot of multigene prognostic signatures [7–10] have been developed for GC, but the accuracy of their prognosis predictions remains uncertain. Prognostic biomarkers related to the tumor microenvironment may hold great promise for identifying novel molecular targets for improving treatment strategy in patients with GC.[18,30,31] In this study, we built a prognostic model contained 16 IRGs based on the ranking of gene values.

To the best of our knowledge, this is a novel study incorporating different IRGs to develop a prognostic predicted immune-related signature for patients with GC. Most genes involved in our immune signature were antigen processing and presentation, antimicrobials, and cytokine receptors, which play key roles in immune response, response to bacterium, and inflammatory processes. Our prognostic immune-related signature can stratify patients with GC into subgroups with different survival outcomes in multiple independent data sets, including stage I&II patients with GC. In addition, Macrophage M2 has been shown to be related to poor prognosis in a variety of cancer types.[26] We found significantly increased infiltration level of Macrophage M2 in the immune high-risk group. The NK cells activated [27] and CD8+ T cells [28] were significantly correlated with patients’ longer survival. On the basis of the aforementioned findings, we found significantly increased infiltration of NK cells activated and CD8+ T cells in the immune low-risk group. Based on current findings, immune cell subsets might play a role in the prognosis differences observed between risk groups as defined by our immune signature. This result was consisted with the functional annotation by GSEA. Most valuable overrepresented biological processes we found were mesenchymal phenotype-related pathways, which were associated with tumor metastasis.

We should also consider the limitations of our research. First, our prognostic signature is based on the gene expression profiles produced by microarray platforms, which is difficult to popularize in routine clinical applications due to its high price, long conversion cycle and requirements of bioinformatics expertise. Some alternatives might be worth exploring, such as IHC based assays that derive optimized feature genes filtered from the prognostic signature. Secondly, the training cohort used to construct the immune signature came from retrospective studies and included fresh frozen samples. Therefore, there still remain doubts concerning the robustness and efficiency of FFPE samples. More data sets with different sample attributes need to be integrated for extensive validation.

Taken together, the IRGS identified by us may provide a legitimate approach in GC management. Meanwhile, we also revealed that the signature positively correlated with the infiltration of immune cell subsets and inflammatory response (e.g., Macrophage M2, CD8+ T cells, and immune response). Our immune gene signature can effectively predict patients’ survival of patients with GC. Moreover, this signature provides a panoramic view of the tumor immune microenvironment and will be a useful predictive tool to identify patients who might benefit from immunotherapy.

Author contributions

Conceptualization: peng shu, Xingguo Zhang, Feng Gao.

Data curation: peng shu, Bitao Jiang, Qingsen Sun, Feng Gao.

Formal analysis: peng shu, Bitao Jiang, Qingsen Sun, Yao Tong.

Funding acquisition: Xingguo Zhang.

Investigation: peng shu, Qingsen Sun, Feng Gao.

Methodology: Bitao Jiang, Qingsen Sun, Yao Tong, Yu Zhou.

Project administration: peng shu, Bitao Jiang.

Resources: Qingsen Sun, Yao Tong, Yuzhuo Wang, Xuefei Xia.

Software: Qingsen Sun, Yuzhuo Wang.

Supervision: peng shu.

Validation: Bitao Jiang.

Visualization: Bitao Jiang, Haifen Ma, Yu Zhou.

Writing – original draft: peng shu, Bitao Jiang, Xingguo Zhang, Feng Gao.

Writing – review & editing: peng shu, Xingguo Zhang, Feng Gao.


[1]. Torre LA, Bray F, Siegel RL, et al. Global cancer statistics, 2012. CA Cancer J Clin 2015;65:87–108.
[2]. Shen L, Shan Y-S, Hu H-M, et al. Management of gastric cancer in Asia: resource-stratified guidelines. Lancet Oncol 2013;14:e535–47.
[3]. Van Cutsem E, Sagaert X, Topal B, et al. Gastric cancer. Lancet 2016;388:2654–64.
[4]. Okines A, Verheij M, Allum W, et al. ESMO Guidelines Working Group. Gastric Cancer. Ann Oncol 2010;21 Suppl 5:v50–4.
[5]. Wang W, Chen X-L, Zhao S-Y, et al. Prognostic significance of preoperative serum CA125, CA19-9 and CEA in gastric carcinoma. Oncotarget 2016;7:35423–36.
[6]. Cristescu R, Lee J, Nebozhyn M, et al. Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes. Nat Med 2015;21:449–56.
[7]. Wang P, Wang Y, Hang B, et al. A novel gene expression-based prognostic scoring system to predict survival in gastric cancer. Oncotarget 2016;7:55343–51.
[8]. Xu Z-Y, Shu Y-Q. Gene expression profile towards the prediction of patient survival of gastric cancer. Biomed Pharmacother 2009;63:324.
[9]. Takeno A, Takemasa I, Seno S, et al. Gene expression profile prospectively predicts peritoneal relapse after curative surgery of gastric cancer. Ann Surg Oncol 2010;17:1033–42.
[10]. Deng X, Xiao Q, Liu F, et al. A gene expression-based risk model reveals prognosis of gastric cancer. Peer J 2018;6:e4204.
[11]. Kim M, Rha SY. Prognostic index reflecting genetic alteration related to disease-free time for gastric cancer patient. Oncol Rep 2009;22:421–31.
[12]. Yamaguchi U, Nakayama R, Honda K, et al. Distinct gene expression-defined classes of gastrointestinal stromal tumor. J Clin Oncol 2008;26:4100–8.
[13]. Angell H, Galon J. From the immune contexture to the Immunoscore: the role of prognostic and predictive immune markers in cancer. Curr Opin Immunol 2013;25:261–7.
[14]. Gentles AJ, Newman AM, Liu CL, et al. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat Med 2015;21:938–45.
[15]. Savabkar S, Azimzadeh P, Chaleshi V, et al. Programmed death-1 gene polymorphism (PD-1.5 C/T) is associated with gastric cancer. Gastroenterol Hepatol Bed Bench 2013;6:178–82.
[16]. Yan Q, Chen P, Lu A, et al. Association between CTLA-4 60G/A and -1661A/G polymorphisms and the risk of cancers: a meta-analysis. PLoS One 2013;8:e83710.
[17]. Qing Y, Li Q, Ren T, et al. Upregulation of PD-L1 and APE1 is associated with tumorigenesis and poor prognosis of gastric cancer. Drug Des Devel Ther 2015;9:901–9.
[18]. Liu K, Yang K, Wu B, et al. Tumor-infiltrating immune cells are associated with prognosis of gastric cancer. Medicine 2015;94:e1631.
[19]. Lee J, Sohn I, Do I-G, et al. Nanostring-based multigene assay to predict recurrence for gastric cancer patients after surgery. PLoS One 2014;9:e90133.
[20]. Gentles AJ, Bratman SV, Lee LJ, et al. Integrating tumor and stromal gene expression signatures with clinical indices for survival stratification of early-stage non-small cell lung cancer. J Natl Cancer Inst 2015;107:djv211doi:10.1093/jnci/djv211.
[21]. Bhattacharya S, Andorf S, Gomes L, et al. ImmPort: disseminating data to the public for the future of immunology. Immunol Res 2014;58:234–9.
[22]. Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 2000;56:337–44.
[23]. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005;102:15545–50.
[24]. Liberzon A, Birger C, Thorvaldsdóttir H, et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 2015;1:417–25.
[25]. Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 2015;12:453–7.
[26]. Niino D, Komohara Y, Murayama T, et al. Ratio of M2 macrophage expression is closely associated with poor prognosis for Angioimmunoblastic T-cell lymphoma (AITL). Pathol Int 2010;60:278–83.
[27]. Ishigami S, Natsugoe S, Tokuda K, et al. Prognostic value of intratumoral natural killer cells in gastric carcinoma. Cancer 2000;88:577–83.;2-v">10.1002/(sici)1097-0142 (20000201)88:3<577::aid-cncr13>;2-v.
[28]. Zhuang Y, Peng L, Zhao Y, et al. CD8 T cells that produce interleukin-17 regulate myeloid-derived suppressor cells and are associated with survival time of patients with gastric cancer. Gastroenterol 2012;143:951–62e8. doi:10.1053/j.gastro.2012.06.010.
[29]. Ferlay J, Shin H-R, Bray F, et al. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer 2010;127:2893–917.
[30]. Kim JW, Nam KH, Ahn S-H, et al. Prognostic implications of immunosuppressive protein expression in tumors as well as immune cell infiltration within the tumor microenvironment in gastric cancer. Gastric Cancer 2016;19:42–52.
[31]. Dai C, Geng R, Wang C, et al. Concordance of immune checkpoints within tumor immune contexture and their prognostic significance in gastric cancer. Mol Oncol 2016;10:1551–8.

gastric cancer; immune-related gene signature; prediction; prognosis

Supplemental Digital Content

Copyright © 2019 the Author(s). Published by Wolters Kluwer Health, Inc.