Secondary Logo

Journal Logo

ARTICLE: ENDOSCOPY

A Novel Deep Learning System for Diagnosing Early Esophageal Squamous Cell Carcinoma: A Multicenter Diagnostic Study

Tang, Dehua PhD1; Wang, Lei MD, PhD1; Jiang, Jingwei MSc1; Liu, Yuting MSc2; Ni, Muhan MSc1; Fu, Yiwei MSc3; Guo, Huimin MD, PhD1; Wang, Zhengwen MSc2; An, Fangmei MD, PhD4; Zhang, Kaihua PhD2; Hu, Yanxing PhD5; Zhan, Qiang MD, PhD4; Xu, Guifang MD, PhD1; Zou, Xiaoping MD1

Author Information
Clinical and Translational Gastroenterology: August 2021 - Volume 12 - Issue 8 - p e00393
doi: 10.14309/ctg.0000000000000393
  • Free

Abstract

INTRODUCTION

Esophageal cancer (EC) is one of the most common malignant diseases and the sixth leading cause of cancer deaths worldwide (1). Esophageal squamous cell carcinoma (ESCC) is the most predominant type of EC in China, while esophageal adenomas carcinoma (EAC) is the primary pathological type in the West (1–3). Due to minimal symptoms at an early stage, most ESCC patients are diagnosed at an advanced stage, when they are no longer amenable to curative surgical resection (4). Therefore, the prognosis for this malignancy has been unsatisfactory, with a 5-year survival rate of approximately 15% in China (5). If ESCC can be detected and resected at an early stage, the 5-year survival rate can be increased to greater than 85% (5). However, it is challenging to diagnose early ESCC, and thus, efficient tools to accurately diagnose early ESCC are needed urgently.

Lugol chromoendoscopy with targeted biopsy is regarded as a standard strategy to diagnose early ESCC, yet the specificity (65%) of this method remains unsatisfactory (6). Virtual chromoendoscopy, such as narrow band imaging (NBI) (Olympus Medical Systems, Tokyo, Japan), is emerging as a reliable methodology to facilitate the diagnosis of early ESCC (7). A recent systematic review compared the diagnostic performance of NBI and Lugol chromoendoscopy in diagnosing early ESCC and found that NBI manifested a better performance (8). However, NBI is not always available in endoscopy centers in rural or undeveloped areas. Moreover, interpretation of the NBI results requires appropriate training and cognitive competency, which leads to a significantly low diagnostic sensitivity for inexperienced endoscopists (9). Moreover, even experts experience variability in several aspects when using NBI, including competency, diagnostic duration, and interpretation confidence (10,11). Therefore, it is essential to develop a more effortless, efficient, and reproducible methodology in diagnosing early ESCC.

In recent years, artificial intelligence (AI) based on deep convolutional neural networks (DCNNs) has achieved incredible advancement in successfully recognizing colorectal polyps and upper gastrointestinal lesions (12,13). Our previous work established a real-time DCNN system in detecting early gastric cancer with white light imaging endoscopy (WLI) (14). Several studies have reported an acceptable performance for applying AI-based diagnostic techniques in ESCC diagnosis with NBI (15–18). However, only 2 previous studies reported applying a deep learning system in ESCC diagnosis with WLI (19,20). Although the reported diagnostic performance was excellent, with a sensitivity of 97.8%, the test data set was relatively small and the controls were all from normal esophageal mucosa. These limited data restrict the usage scenarios of AI. As a result, a more generalized DCNN model is required in ESCC diagnosis with WLI.

This study aimed to construct a DCNN system to diagnose early ESCC with WLI by using a novel algorithm architecture and validating the system comprehensively.

METHODS

Study design and patients

This study was conducted at 3 hospitals in China: Nanjing University Medical University Affiliated Drum Tower Hospital (NJDTH), Wuxi People's Hospital (WXPH), and Taizhou People's Hospital (TZPH). The study design was approved by the Medical Ethics Committee of Nanjing University Medical Affiliated Drum Tower Hospital (approval no. 2020-026-01). Written consent was waived because anonymous images were collected from the medical databases at each hospital retrospectively.

According to the inclusion and exclusion criteria, 1,405 patients were included from January 2018 to February 2020. Among these patients, 1,243 patients were from NJDTH, 78 patients were from WXPH, and 84 patients were from TZPH. The inclusion criteria and exclusion criteria are described in detail in supplementary materials (see Supplementary Data, Supplementary Digital Content 6, http://links.lww.com/CTG/A660).

Data sets and annotations

The distribution of the data sets for training, internal validation, and external validation was depicted in Table S1 (see Supplementary Data, Supplementary Digital Content 6, http://links.lww.com/CTG/A660). Three data sets were used to train and validate the DCNN model: the training data set, the internal validation data set, and the external validation data set.

The training data set.

We used 4,002 images from 1,078 patients to train the DCNN model. The DCNN model was used to distinguish malignant lesions (early ESCC) from benign (reflux esophagitis) and normal mucosa. All the images were from patients in NJDTH between January 2018 and May 2018. Among these images, 1,000 images were from 337 patients with early ESCC, 1,000 images were from 413 patients with reflux esophagitis, and 2,002 images were from 328 patients with normal esophageal mucosa.

The internal validation data set.

We used 333 images from 81 patients as internal invalidation. All the images were from patients in NJDTH between January 2018 and May 2018. Among these images, 96 were from 33 patients with early ESCC, 44 were from 23 patients with reflux esophagitis, and 193 were from 25 patients with normal esophageal mucosa.

The external validation data set.

We used 700 images from 162 patients as external invalidation. All the images were from patients in WXPH or TZPH between January 2018 and May 2018. For images from WXPH, 124 were from 27 patients with early ESCC, 100 were from 26 patients with reflux esophagitis, and 158 were from 25 patients with normal esophageal mucosa. For images from TZPH, 83 were from 31 patients with early ESCC, 100 were from 27 patients with reflux esophagitis, and 135 were from 26 patients with normal esophageal mucosa.

Other data sets.

A low-quality image data set was used for sensitivity analysis, with 388 images from 70 patients captured by Q260 between January 2018 and May 2018 from NJDTH. Among these images, 145 were from 16 patients with early ESCC, 126 were from 30 patients with reflux esophagitis, and 117 were from 24 patients with normal esophageal mucosa (see Supplementary Table S2, Supplementary Digital Content 6, http://links.lww.com/CTG/A660). To evaluate the real-time performance of the DCNN system, we included 14 videos from 14 patients in another time interval. All the patients were from NJDTH between December 2019 and February 2020.

Two graduate students (D.H.T. and J.W.J.) first independently reviewed all the images and excluded irrelevant images (images of the stomach, images of the duodenum, and images of the NBI mode) and poor-quality images (halation, defocus, blurs, bubbles, and fuzzy). All the excluded images were then reviewed and confirmed by an expert endoscopist (H.M.G.) with 8 years' experience and 8,000 oesophagogastroduodenoscopies (OGD) examinations. Next, all the included high-quality esophageal images were annotated by 2 expert endoscopists (L.W. and G.F.X.) with 10 years' experience with 10,000 OGD examinations. For images of reflux esophagitis, the 2 experts cooperatively discussed and labeled the delineating lines. For images of early ESCC, the 2 experts cooperatively discussed and marked the delineating lines according to actual pathological results.

All the images and videos were recorded using Olympus endoscopes (GIF-Q260, GIF-H260Z, GIFHQ290, GIF-H290Z; Olympus Medical Systems) with the electric endoscopic systems (EVIS LUCERA CV260/CLV260SL, EVIS LUCERA ELITE CV290/CLV290SL; Olympus Medical Systems).

Construction of the DCNN models

The images of malignant lesions, benign lesions, and the normal mucosa were from the esophagus. Thus, the 3 catalogs of images shared some common features. To improve the classifier's performance, the used feature-extracting model must locate the most discriminative parts. The DCNN model must find the most discriminative granularities and then fuse the information to incorporate cross multigranularity. A progressive multigranularity architecture of DCNN was used to achieve good performance in this study (see Supplementary Figure S1, Supplementary Digital Content 1, http://links.lww.com/CTG/A655) (21). The specific information about the training process has been described in the supplementary materials.

Validation and test of the DCNN model and comparison with endoscopists

We used a 10-fold cross-validation method to validate the DCNN model. Next, we tested the performance of the DCNN model with the internal data set and the external data sets. Finally, we compared the performance of the DCNN model among 5 expert endoscopists (10 years' experience with 10,000 OGD examinations) and 5 novice endoscopists (1 year's experience with 1,000 OGD examinations) for diagnosing early ESCC using the internal data set with or without the assistance of the DCNN model. These endoscopists were not engaged in the annotation of the data sets and were masked to the endoscopic findings and pathological results of the data sets. The flowchart of this study is shown in Figure 1.

Figure 1.
Figure 1.:
A flowchart for the development and validation of the DCNN system for diagnosing early ESCC. DCNN, deep convolutional neural network; ESCC, early esophageal squamous cell carcinoma. RE, reflux esophagitis.

Outcomes

The primary outcome was the diagnostic performance of the DCNN model, including accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The formulas used to calculate the results are presented in the supplementary materials (see Supplementary Data, Supplementary Digital Content 6, http://links.lww.com/CTG/A660).

Statistical analyses

The receiver operating characteristic curve was plotted, and the area under receiver operating characteristic (AUC) was calculated. A 2-sided McNemar test was used to compare the differences of accuracy, sensitivity, specificity of the DCNN model and endoscopists, and the endoscopists with or without the assistance of the DCNN model. Generalized score statistics was used to compare the differences of PPV and NPV of the DCNN model and endoscopists and the endoscopists with or without the assistance of the DCNN model (22). Interobserver agreement of the experts and DCNN model was assessed with the Cohen kappa coefficient. Statistical significance was identified when the P value <0.05. All the statistical analyses were conducted using R software (version 4.0.3; https://www.r-project.org) with RStudio (version 1.4.1106; https://www.rstudio.com).

RESULTS

Performance of the DCNN model in diagnosing early ESCC

The DCNN model was performed with the early stopping method (see Supplementary Data Figure S2, Supplementary Digital Content 2, http://links.lww.com/CTG/A656). In the NJDTH validation data set, the model showed a sensitivity of 0.979, a specificity of 0.886, a PPV of 0.777, an NPV of 0.991, and an AUC of 0.954 (Table 1 and Figure 2a). Consistently, the model also exhibited a comparable performance in validation data sets. In the WXPH data set, there was a sensitivity of 0.968, a specificity of 0.868, and an AUC of 0.925. In the WXPH data set, there was a sensitivity of 1.000, a specificity of 0.889, and an AUC of 0.934 (Table 1 and Figure 2a). We then performed a subgroup analysis by dividing the internal validation data sets into groups of different sizes, locations, and invasion depths. The results showed that the AUC of the model was 0.956–0.975 in different sizes, 0.900–0.964 in different locations, and 0.889–0.957 in different invasion depths (Figure 2b–d; see Supplementary Tables S3–S5, Supplementary Digital Content 6, http://links.lww.com/CTG/A660). We also evaluated the performance of the DCNN model with low-quality images captured by Q260. We found that the sensitivity, specificity, and AUC of the model were 1.000, 0.860, and 0.921, respectively (Figure 2e; see Supplementary Table S6, Supplementary Digital Content 6, http://links.lww.com/CTG/A660). Moreover, the model also showed a good performance in delineating and highlighting areas of early ESCC with Grad-CAM (Figure 3).

Table 1. - Performance of the DCNN model in validation data sets
NJDTH validation External validation
Internal validation WXPH TZPH
Accuracy (95% CI) 0.913 (0.878–0.939) 0.901 (0.866–0.927) 0.918 (0.883–0.944)
Sensitivity (95% CI) 0.979 (0.927–0.996) 0.968 (0.920–0.987) 1.000 (0.956–1.000)
Specificity (95% CI) 0.886 (0.839–0.921) 0.868 (0.821–0.904) 0.889 (0.843–0.923)
PPV (95% CI) 0.777 (0.695–0.842) 0.779 (0.707–0.837) 0.761 (0.673–0.832)
NPV (95% CI) 0.991 (0.966–0.998) 0.982 (0.956–0.993) 1.000 (0.982–1.000)
AUC 0.954 (0.934–0.974) 0.925 (0.899–0.951) 0.934 (0.907–0.961)
AUC, area under the receiver operating characteristic curve; CI, confidence interval; NJDTH, Nanjing University Medical University Affiliated Drum Tower Hospital; NPV, negative predictive value; PPV, positive predictive value; WXPH, Wuxi People's Hospital; TZPH, Taizhou People's Hospital.

Figure 2.
Figure 2.:
Receiver operating characteristic curves (ROC) illustrating the ability of the DCNN model. (a) ROC presenting the diagnostic performance of the DCNN model to diagnose ESCC in NJDTH, WXPH, and TZPH validation data sets. (b) ROC presenting the diagnostic performance of the DCNN model to diagnose ESCC in lesions with different sizes in NJDTH validation data sets. (c) ROC presenting the diagnostic performance of the DCNN model to diagnose ESCC in lesions with different locations in NJDTH validation data sets. (d) ROC presenting the diagnostic performance of the DCNN model to diagnose ESCC in lesions with different invasion depths in NJDTH validation data sets. (e) ROC presenting the diagnostic performance of the DCNN model to diagnose ESCC in lesions with different endoscopes. DCNN, deep convolutional neural network; ESCC, early esophageal squamous cell carcinoma; NJDTH, Nanjing University Medical School Affiliated Drum Tower Hospital; TZPH, Taizhou People's Hospital; WXPH, Wuxi People's Hospital.
Figure 3.
Figure 3.:
Representative images predicted by the DCNN system for the diagnosis of ESCC. DCNN, deep convolutional neural network; ESCC, early esophageal squamous cell carcinoma.

Comparison between the DCNN model and endoscopists

We compared the performance of the DCNN model with endoscopists using the NJDTH validation data set. The results showed that the model exhibited better performance than all the endoscopists, including experts and novices. In detail, the accuracy (0.913 [95% confidence interval [CI] 0.878–0.939] vs 0.871 [95% CI 0.854–0.887; P < 0.001], the sensitivity 0.979 [95% CI 0.927–0.996] vs 0.850 [95% CI 0.815–0.879; P < 0.001], and NPV 0.991 [95% CI 0.966–0.998] vs 0.935 [95% CI 0.919–0.948]; P < 0.001) of the DCNN model were superior to expert endoscopists (Table 2; see Supplementary Data Table S7, Supplementary Digital Content 6, http://links.lww.com/CTG/A660). Meanwhile, the specificity (0.886 [95% CI 0.839–0.921] vs 0.881 [95% CI 0.860–0.898], P = 0.704) and the PPV (0.777 [95% CI 0.695–0.842] vs 0.743 [95% CI 0.705–0.778], P = 0.113) of the DCNN model were comparable with that of the experts (Table 2; see Supplementary Data Table S7, Supplementary Digital Content 6, http://links.lww.com/CTG/A660). Similarly, the accuracy (0.913 [95% CI 0.878–0.939] vs 0.839 [95% CI 0.821–0.856], P < 0.001), the sensitivity (0.979 [95% CI 0.927–0.996] vs 0.742 [95% CI 0.701–0.779], P < 0.001), the PPV (0.777 [95% CI 0.695–0.842] vs 0.712 [95% CI 0.671–0.750], P = 0.004), and the NPV (0.991 [95% CI 0.966–0.998] vs 0.894 [95% CI 0.875–0.910], P < 0.001) of the model were also higher than that of novice endoscopists (Table 2; see Supplementary Data Table S7, Supplementary Digital Content 6, http://links.lww.com/CTG/A660). Subsequently, we evaluated the assistance of the DCNN model by comparing the performance of endoscopists with or without referring to the DCNN model. The results showed that the sensitivity (0.850 [95% CI 0.815–0.879] vs 0.998 [95% CI 0.988–1.000], P < 0.001) and specificity (0.881 [95% CI 0.860–0.898] vs 0.897 [95% CI 0.878–0.913]) of the experts were significantly elevated after referring to the results of the DCNN model (Table 2; see Supplementary Data Table S7, Supplementary Digital Content 6, http://links.lww.com/CTG/A660). In novices, the sensitivity (0.742 [95% CI 0.701–0.779] vs 0.819 [95% CI 0.782–0.851], P < 0.001) was markedly increased and the specificity (0.878 [95% CI 0.859–0.896] vs 0.890 [95% CI 0.871–0.907], P = 0.052) was also marginally elevated (Table 2; see Supplementary Data Table S7, Supplementary Digital Content 6, http://links.lww.com/CTG/A660). Next, we analyzed the interobserver agreement of the DCNN model and endoscopists. The results showed that the interobserver agreement of expert endoscopists (κ: 0.452–0.615) was higher than that of novice endoscopists (κ: 0.314–0.537) (Table 3). The interobserver agreement of the model and expert endoscopists (κ: 0.515–0.615) was also higher than that of the model and novice endoscopists (κ: 0.378–0.610; Table 3).

Table 2. - Comparison between the DCNN model and endoscopists in Nanjing University Medical University Affiliated Drum Tower Hospital internal validation data set
Experts Novices
Without DCNN With DCNN Without DCNN With DCNN
Accuracy (95% CI) 0.871 (0.854–0.887)a 0.926 (0.913–0.938)b 0.839 (0.821–0.856)a 0.870 (0.853–0.885)b
Sensitivity (95% CI) 0.850 (0.815–0.879)a 0.998 (0.988–1.000)b 0.742 (0.701–0.779)a 0.819 (0.782–0.851)b
Specificity (95% CI) 0.881 (0.860–0.898) 0.897 (0.878–0.913)c 0.878 (0.859–0.896) 0.890 (0.871–0.907)
PPV (95% CI) 0.743 (0.705–0.778) 0.797 (0.763–0.827)b 0.712 (0.671–0.750)d 0.751 (0.713–0.787)b
NPV (95% CI) 0.935 (0.919–0.948)a 0.999 (0.995–1.000)b 0.894 (0.875–0.910)a 0.924 (0.907–0.938)b
CI, confidence interval; DCNN, deep convolutional neural network; NPV, negative predictive value; PPV, positive predictive value.
aP < 0.001, vs DCNN model.
bP < 0.001, vs without DCNN model.
cP < 0.05, vs without DCNN model.
dP < 0.01, vs DCNN model.

Table 3. - Interobserver agreement of endoscopists and the deep convolutional neural networks model
AI E1 E2 E3 E4 E5 N1 N2 N3 N4 N5
AI
E1 0.551
E2 0.615 0.528
E3 0.515 0.481 0.474
E4 0.577 0.502 0.452 0.583
E5 0.556 0.615 0.476 0.509 0.544
N1 0.477
N2 0.454 0.321
N3 0.378 0.314 0.411
N4 0.610 0.495 0.537 0.525
N5 0.492 0.469 0.383 0.330 0.513

OGD videos with early ESCC detection and an open-accessed website

The DCNN system only needs 15 milliseconds to diagnose esophageal lesions per image. We tested the performance of the DCNN system with real-time videos (see Supplementary Videos 1 and 2, Supplementary Digital Contents 4 and 5, http://links.lww.com/CTG/A658, http://links.lww.com/CTG/A659). The DCNN system diagnosed 13 (92.9%) early ESCC lesions from 14 videos. Based on the excellent performance of the DCNN system, we created an open-access website for our DCNN system (see Supplementary Data, Figure S3, Supplementary Digital Content 3, http://links.lww.com/CTG/A657; http://112.74.182.39/esophagus). Researchers can upload their images to our website to assess remote diagnoses based on our DCNN system.

DISCUSSION

In this study, we developed a novel DCNN model with 4,002 white light images for diagnosing early ESCC from reflux esophagitis and normal esophageal mucosa. The DCNN model showed a good performance for diagnosing early ESCC with a sensitivity of 0.979, a specificity of 0.886, a PPV of 0.777, an NPV of 0.991, and an AUC of 0.954 in the internal validation data set. The model also showed a tremendously generalized performance in the other 2 external data sets. We then compared the performance of the model with the endoscopists and found that the model was superior to the endoscopists. Notably, the performance of the endoscopists elevated significantly after referring to the results of the DCNN model.

Lugol chromoendoscopy with targeted biopsy is regarded as a standard strategy to diagnose early ESCC with a high sensitivity of >0.95 (6). Yet, the specificity of Lugol chromoendoscopy is relatively low (0.65), and Lugol chromoendoscopy often leads to chest pain (6). The sensitivity of our DCNN model was 0.979, and the specificity of the model was >0.85, which is much higher when compared with Lugol chromoendoscopy. NBI has been reported to have better performance than Lugol chromoendoscopy with an accuracy of >0.95 (7). However, NBI is not available in undeveloped areas, and the interpretation of the NBI results requires appropriate training and cognitive competency. Here, we established the DCNN system based on WLI, which avoided the potential barrier of rural or undeveloped areas not being equipped with NBI-related facilities. The DCNN model could distinguish early ESCC from esophagitis and normal mucosa with a satisfactory diagnostic performance.

Several previous studies have used the DCNN technology based on convolutional neural networks to identify early ESCC using NBI images (15–18). Guo et al. developed a DCNN system for diagnosing precancerous lesions and early ESCC using NBI images, with a sensitivity of 0.980 and a specificity of 0.950 (16). Ohmori et al. developed an AI system to detect ESCC with NBI images. They found that the diagnostic sensitivity and specificity of nonmagnified endoscopy were 1.00 and 0.63, respectively, and the sensitivity and specificity of magnified endoscopy were 0.98 and 0.56, respectively (23). However, only 2 previous studies have investigated applying a DCNN system in the ESCC diagnosis with WLI. Horie et al. first established a DCNN model to diagnose both superficial and advanced EC and achieved a sensitivity of 98% with the combination of WLI and NBI images (19). Cai et al. developed a deep neural network to localize and identify early ESCC with WLI images and achieved a sensitivity of 97.8% and a specificity of 0.854 (20). Yet, the test data sets of the 2 studies were relatively small, and the controls were all normal esophageal mucosa. Of note, some esophagitis lesions had a reddish appearance on the surface, which is similar to early ESCC. In this study, we used control images including esophagitis and normal esophageal mucosa. Our DCNN model showed a sensitivity of 0.979, a specificity of 0.886, a PPV of 0.777, an NPV of 0.991, and an AUC of 0.954 in the internal validation data set, which was comparable with the previous studies. To evaluate the generalization of the model, we assessed the performance of the model in 2 external validation data sets. We found that the model manifested a comparable performance in the 2 validation data sets with a sensitivity of 0.968, a specificity of 0.868, and an AUC of 0.925 in the WXPH data set and a sensitivity of 1.000, a specificity of 0.889, and an AUC of 0.934 in the TZPH data set.

Another advantage of our study was that all the pathologic results were based on endoscopic resection specimens rather than endoscopic forceps biopsy specimens. It has been reported that histopathologic discrepancies can be observed between endoscopic forceps biopsy and endoscopic resection specimens in patients with early ESCC because the forceps biopsy specimens sometimes cannot correctly reflect the true histology (24). Moreover, all the images were reviewed and annotated by 2 expert endoscopists (L.W. and G.F.X.), who each had 10 years of experience with 10,000 OGD examinations. This ensured that our results were much more reliable in predicting early ESCC.

Furthermore, we also compared the performance of our DCNN model with endoscopists in identifying early ESCC. Our results showed that the model had a better performance than expert and novice endoscopists. Notably, we also revealed that the DCNN system markedly increased the diagnostic performance of both expert and novice endoscopists. This indicated that the DCNN system might be adopted in the assisting endoscopists to detect more suspicious lesions. In addition, we assessed the interobserver agreements for the model and endoscopists using Cohen kappa coefficients. The results showed that expert endoscopists (κ: 0.452–0.615) achieved superior interobserver agreement than novice endoscopists (κ: 0.314–0.537). However, this was significantly lower than the model (κ: 1.000). This suggests that the model may improve the diagnostic variability among different endoscopists.

Nonetheless, this study does have several limitations. First, the study only focused on detecting ESCC, whereas the detection of esophageal adenocarcinoma was not examined. This was mainly because ESCC is the most prevalent type of esophageal malignancy in China (25). As of now, we are working toward collecting enough images of early esophageal adenocarcinoma to further train our model. Second, we only used the DCNN system to detect early ESCC, but the invasion depth, size, and even the histologic type of the lesion was not thoroughly studied. Third, this is a retrospective study, meaning that the excellent performance of the DCNN system may not reflect actual clinical application in the real world. Considering this, we have designed a prospective randomized controlled trial to investigate the clinical value of this DCNN system in routine OGD.

In conclusion, we developed a DCNN system to diagnose early ESCC from esophagitis and normal esophageal mucosa. The DCNN system showed good performance in validation data sets and was superior to endoscopists. The DCNN system markedly increased the performance of the endoscopists. These results indicate that the DCNN system might assist endoscopists in diagnosing early ESCC during OGD. However, more prospective validation is needed to understand its true clinical significance in the real world.

CONFLICTS OF INTEREST

Guarantor of the article: Xiaoping Zou, MD.

Specific author contributions: Dehua Tang, PhD, Lei Wang, MD, PhD, Jingwei Jiang, MSc, and Yuting Liu, MSc, contributed equally to this study. X.P.Z., G.F.X., and Q.Z.: conceived and supervised this study. D.H.T.: designed the experiments. D.H.T., J.W.J., Y.T.L., and M.H.N.: conducted the experiments. G.F.X., L.W., Y.W.F., and F.M.A.: collected the images. H.M.G., G.F.X., and L.W.: labeled image. Y.T.L., W.Z.W., K.H.Z., and Y.X.H.: established the deep learning model. D.H.T. and Y.T.L.: analyzed the data. D.H.T., M.H.N., and Y.X.H.: wrote the manuscript. G.F.X.: modified the language of the manuscript. All authors reviewed and approved the final version of the manuscript.

Financial support: This study was supported by grants from the National Natural Science Foundation of China (grant nos. 81672935, 81871947), the Jiangsu Clinical Medical Center of Digestive Diseases (grant no. BL2012001), and the high-end medical team of Wuxi (Taihu talent project). The funders had no role in study design, data collection, data analyses, interpretation, or writing of the report.

Potential competing interests: Y.X.H. is an employee of Xiamen Innovision. All the other authors have no conflicts of interest to declare.

Ethics statement: The institutional review board of Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School, approved this retrospective study and waived to obtain informed consent from the patients. All images were anonymized prior to inclusion to avoid personal information misuse.

Consent for publication: Written informed consent for publication was provided from all participants.

Availability of data and material: Due to patients' privacy, all other data sets generated or analyzed in this study are available from the corresponding author on reasonable request.

Study Highlights

WHAT IS KNOWN

  • ✓ The prognosis of esophageal squamous cell carcinoma (ESCC) is unsatisfactory because it has a 5-year survival rate of approximately 15%.
  • ✓ If ESCC can be detected and resected early, the 5-year survival rate can be increased to greater than 85%.
  • ✓ However, it is challenging to diagnose ESCC at an early stage.

WHAT IS NEW HERE

  • ✓ A real-time deep convolutional neural networks system was constructed to diagnose early ESCC.
  • ✓ The deep convolutional neural networks system showed good performance in internal and external validation data sets and exhibited superior performance compared with endoscopists.

ACKNOWLEDGMENT

We thank Shahzeb Hassan for the language editing of this manuscript.

REFERENCES

1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2020. CA Cancer J Clin 2020;70(1):7–30.
2. Chen W, Zheng R, Baade PD, et al. Cancer statistics in China, 2015. CA Cancer J Clin 2016;66(2):115–32.
3. Davydov MC, Delektorskaya VV, Kuvshinov YP, et al. Superficial and early cancers of the esophagus. Ann N Y Acad Sci 2014;1325:159–69.
4. Lagergren J, Smyth E, Cunningham D, et al. Oesophageal cancer. Lancet 2017;390(10110):2383–96.
5. Abnet CC, Arnold M, Wei WQ. Epidemiology of esophageal squamous cell carcinoma. Gastroenterology 2018;154(2):360–73.
6. Dawsey SM, Fleischer DE, Wang GQ, et al. Mucosal iodine staining improves endoscopic visualization of squamous dysplasia and squamous cell carcinoma of the esophagus in Linxian, China. Cancer 1998;83(2):220–31.
7. Takenaka R, Kawahara Y, Okada H, et al. Narrow-band imaging provides reliable screening for esophageal malignancy in patients with head and neck cancers. Am J Gastroenterol 2009;104(12):2942–8.
8. Morita FH, Bernardo WM, Ide E, et al. Narrow band imaging versus lugol chromoendoscopy to diagnose squamous cell carcinoma of the esophagus: A systematic review and meta-analysis. BMC Cancer 2017;17(1):54.
9. Nakayoshi T, Tajiri H, Matsuda K, et al. Magnifying endoscopy combined with narrow band imaging system for early gastric cancer: Correlation of vascular pattern with histopathology (including video). Endoscopy 2004;36(12):1080–4.
10. Feng F, Liu J, Wang F, et al. Prognostic value of differentiation status in gastric cancer. BMC Cancer 2018;18(1):865.
11. Shibagaki K, Amano Y, Ishimura N, et al. Diagnostic accuracy of magnification endoscopy with acetic acid enhancement and narrow-band imaging in gastric mucosal neoplasms. Endoscopy 2016;48(1):16–25.
12. Luo H, Xu G, Li C, et al. Real-time artificial intelligence for detection of upper gastrointestinal cancer by endoscopy: A multicentre, case-control, diagnostic study. Lancet Oncol 2019;20(12):1645–54.
13. Wang P, Xiao X, Glissen Brown JR, et al. Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy. Nat Biomed Eng 2018;2(10):741–8.
14. Tang D, Wang L, Ling T, et al. Development and validation of a real-time artificial intelligence-assisted system for detecting early gastric cancer: A multicentre retrospective diagnostic study. EBioMedicine 2020;62:103146.
15. Everson M, Herrera L, Li W, et al. Artificial intelligence for the real-time classification of intrapapillary capillary loop patterns in the endoscopic diagnosis of early oesophageal squamous cell carcinoma: A proof-of-concept study. United European Gastroenterol J 2019;7(2):297–306.
16. Guo L, Xiao X, Wu C, et al. Real-time automated diagnosis of precancerous lesions and early esophageal squamous cell carcinoma using a deep learning model (with videos). Gastrointest Endosc 2020;91(1):41–51.
17. Fukuda H, Ishihara R, Kato Y, et al. Comparison of performances of artificial intelligence versus expert endoscopists for real-time assisted diagnosis of esophageal squamous cell carcinoma (with video). Gastrointest Endosc 2020;92(4):848–55.
18. Shimamoto Y, Ishihara R, Kato Y, et al. Real-time assessment of video images for esophageal squamous cell carcinoma invasion depth using artificial intelligence. J Gastroenterol 2020;55(11):1037–45.
19. Horie Y, Yoshio T, Aoyama K, et al. Diagnostic outcomes of esophageal cancer by artificial intelligence using convolutional neural networks. Gastrointest Endosc 2019;89(1):25–32.
20. Cai SL, Li B, Tan WM, et al. Using a deep learning system in endoscopy for screening of early esophageal squamous cell carcinoma (with video). Gastrointest Endosc 2019;90(5):745–53.e2.
21. Du R, Chang D, Bhunia AK, et al (eds). Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches. Springer International Publishing: Cham, 2020.
22. Leisenring W, Alono T, Pepe MS. Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics 2000;56(2):345–51.
23. Ohmori M, Ishihara R, Aoyama K, et al. Endoscopic detection and differentiation of esophageal lesions using a deep neural network. Gastrointest Endosc 2020;91(2):301–9.e1.
24. Park YJ, Kim GH, Park DY, et al. Histopathologic discrepancies between endoscopic forceps biopsy and endoscopic resection specimens in superficial esophageal squamous neoplasms. J Gastroenterol Hepatol 2019;34(6):1058–65.
25. Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018;68(6):394–424.

Supplemental Digital Content

© 2021 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of The American College of Gastroenterology