Artificial Intelligence-based Analytics for Diagnosis of Small Bowel Enteropathies and Black Box Feature Detection : Journal of Pediatric Gastroenterology and Nutrition

Secondary Logo

Journal Logo

Original Articles: Gastroenterology

Artificial Intelligence-based Analytics for Diagnosis of Small Bowel Enteropathies and Black Box Feature Detection

Syed, Sana; Ehsan, Lubaina; Shrivastava, Aman; Sengupta, Saurav; Khan, Marium; Kowsari, Kamran; Guleria, Shan; Sali, Rasoul; Kant, Karan; Kang, Sung-Jun; Sadiq, Kamran; Iqbal, Najeeha T.; Cheng, Lin; Moskaluk, Christopher A.; Kelly, Paul; Amadi, Beatrice C.; Asad Ali, Syed; Moore, Sean R.; Brown, Donald E.

Author Information
Journal of Pediatric Gastroenterology and Nutrition 72(6):p 833-841, June 2021. | DOI: 10.1097/MPG.0000000000003057



Striking histopathological overlap between distinct but related conditions poses a disease diagnostic challenge. There is a major clinical need to develop computational methods enabling clinicians to translate heterogeneous biomedical images into accurate and quantitative diagnostics. This need is particularly salient with small bowel enteropathies; environmental enteropathy (EE) and celiac disease (CD). We built upon our preliminary analysis by developing an artificial intelligence (AI)-based image analysis platform utilizing deep learning convolutional neural networks (CNNs) for these enteropathies.


Data for the secondary analysis was obtained from three primary studies at different sites. The image analysis platform for EE and CD was developed using CNNs including one with multizoom architecture. Gradient-weighted class activation mappings (Grad-CAMs) were used to visualize the models’ decision-making process for classifying each disease. A team of medical experts simultaneously reviewed the stain color normalized images done for bias reduction and Grad-CAMs to confirm structural preservation and biomedical relevance, respectively.


Four hundred and sixty-one high-resolution biopsy images from 150 children were acquired. Median age (interquartile range) was 37.5 (19.0–121.5) months with a roughly equal sex distribution; 77 males (51.3%). ResNet50 and shallow CNN demonstrated 98% and 96% case-detection accuracy, respectively, which increased to 98.3% with an ensemble. Grad-CAMs demonstrated models’ ability to learn different microscopic morphological features for EE, CD, and controls.


Our AI-based image analysis platform demonstrated high classification accuracy for small bowel enteropathies which was capable of identifying biologically relevant microscopic features and emulating human pathologist decision-making process. Grad-CAMs illuminated the otherwise “black box” of deep learning in medicine, allowing for increased physician confidence in adopting these new technologies in clinical practice.

See “Automated Enteropathy: Discovering the Potential of Machine Learning in Environmental Enteropathy” by Wallach on page 785.

What Is Known/What Is New

What Is Known

  • Striking histopathological overlap exists between distinct but related conditions which pose a disease diagnostic challenge; such as for small bowel enteropathies including environmental enteropathy (EE) and celiac disease (CD).
  • There is a major clinical need to develop computational [including artificial intelligence (AI)] methods enabling clinicians to translate heterogeneous biomedical images into accurate and quantitative diagnostics.
  • A major issue plaguing the use of AI in medicine is the so-called “black box,” an analogy which describes the lack of insight that humans have into how the models arrive at their decision-making.

What Is New

  • AI-based image analysis platform demonstrated high classification accuracy for small bowel enteropathies (EE vs CD vs histologically normal controls).
  • Gradient-weighted class activation mappings illuminated the otherwise “black box” of deep learning in medicine, allowing for increased physician confidence in adopting these new technologies in clinical practice.

A major challenge of interpreting clinical biopsy images to diagnose disease is the striking overlap in histopathology between distinct but related conditions (1–3). Owing to this significant clinical need, computational methods can pave the way for accurate and quantitative diagnostics (4). Computational modeling enhancements in medicine, particularly for image analysis, have shown the potential benefit of artificial intelligence (AI) for disease characterization (5). There is an increasing interest in the utilization of deep learning (6,7), a subset of AI involving iterative optimization strategies based on pixel-by-pixel image evaluation (8). These AI-based deep learning models, in particular convolutional neural networks (CNNs), have been shown to be effective for medical image analysis (5). CNNs have demonstrated potential for image feature extraction of diseases relying on radiological and histopathological diagnosis (9–13). Deep residual network (ResNet) is a CNN that has repeatedly shown success for image classification as it is optimized to gather fine-grain attributes from regions of interest within the image (14–16). It has outperformed early deep learning models such as AlexNet (17) and VGG (18) and achieved superior performance on the ImageNet (19,20) and COCO (21) image recognition benchmarks. Although the deep learning models are noteworthy of their own accord, a major issue plaguing the use of AI in medicine is the so-called “black box” of deep learning, an analogy which describes the lack of insight that humans have into how the models arrive at their decision-making (22).

An accurate AI-based biopsy image analysis platform may enable efficient detection and differentiation of small bowel enteropathy damaged tissue features, which are challenging due to histopathological overlap (1,2,23,24). This will not only enable pathologists to filter and prepopulate scans, improving turn-around time, but also provide insight into previously uncharacterized tissue features unique to complex small bowel enteropathies such as environmental enteropathy (EE) and celiac disease (CD).

EE has been linked to poor sanitation and hygiene with dire consequences such as cognitive decline, oral vaccine response, and growth failure (3). EE assessment and its histopathological differentiation from similar diseases (ie, CD) is an essential task performed by pathologists (25). Even though an EE histological index for the scoring disease has been proposed, it will benefit from approaches narrowing down specific cellular parameters important for an accurate diagnosis (25). CD is an immune-mediation condition with gluten sensitivity leading to small bowel injury. Modified Marsh score system (26,27) has been used to classify the severity of CD but it does not account for complex disease features, such as the role of goblet or enteroendocrine cells, that can potentially improve diagnostic accuracy.

We have previously published a histopathological analysis model demonstrating 93.4% classification accuracy for duodenal biopsies from children with EE and CD (28). We also added a layer of explainability by using deconvolutions but they lacked biopsy regions of interest being specifically highlighted which did not fully explain the model decision-making process (22). Despite increasing utilization of deep learning architectures in medicine, they remain underutilized for improving diagnostic accuracy and differentiation between histologically similar small bowel enteropathies. We now build upon our prior work to address knowledge gaps in the approaches previously reported. We aimed to: (1) optimize datasets by removing unavoidable bias acquired by archival data sourcing from multiple sites by using a hematoxylin and eosin (H&E) color normalization method with structure-preserving capabilities; (2) deploy an enteropathy focused deep learning CNN model with multiple layers modified to gather fine-grain attributes from image regions of interest; (3) mimic the interpretation methodology of human pathologists by using a combination of multizoom and reduced parameter approaches (shallow deep learning models); and, (4) improve visualization of deep learning classification decision making.


Study Design and Archival Biopsy Image Dataset Acquisition

Archival Specimen Sources

Archival data for secondary analyses were sourced from (1) primary EE studies from Pakistan (PK; Aga Khan University) and Zambia (ZA; University of Zambia School of Medicine, University Teaching Hospital), conducted 2013–2015 (29,30), and (2) the University of Virginia (UVA), United States (US). Further biopsy collection and organization as part of the PK and ZA primary studies are underway, our current secondary analyses involved the utilization of biopsies collected during the initial phase from 2013 to 2015. Timeline for our secondary data analysis was from November 2017 to December 2019. UVA biopsies were from archival specimens with clinical diagnoses of CD and histologically normal controls (referred to as controls) from participants who had undergone esophagogastroduodenoscopy in the past 25 years (data accessed: 1992–2017) as part of an archival controls sub-study, methods reported elsewhere (31). Biopsies available from each site for this project were processed per local institutional H&E slide staining, tissue sectioning, and tissue paraffin embedding protocols.

Biopsy Digitization

Biopsy slides for our secondary analysis had been previously digitized at high resolutions (average 20,000 × 20,000 pixels); 40× (PK, US) and 20× (ZA) magnification. Digitization was done using Olympus VS 120 (Olympus Corporation Inc, Center Valley, Pennsylvania), Aperio Scanscope CS scanners (Leica Biosystems Division of Leica Microsystems Inc, Buffalo Grove, IL, USA), and Leica SCN400 brightfield scanner (Leica Microsystems CMS GmbH, Germany) at PK, ZA, and US, respectively. Work from these datasets has been published elsewhere (2,28,29).

Archival Biopsy Image Data Acquisition

For the biopsy images acquired, diagnosis of EE was as previously defined in primary studies. At each site, clinical pathologists made diagnoses based on histological and clinical findings (29,30). For the US archival images, duodenal biopsy slides were obtained from the Biorepository and Tissue Research Facility at UVA with disease diagnoses as per clinical pathology reports. Biopsies from participants reported as controls were only included if there was no disease in any other part of the gastrointestinal tract (eg, eosinophilic esophagitis, inflammatory bowel disease, Helicobacter pylori gastritis, posttransplant liver disease, and so on) or overall (eg, patients with solid organ transplant, leukemia, and so on).

Biopsy Image Analysis Model Design

Dataset Preprocessing

Biopsy Image Patch Creation

High-resolution whole slide images are patched for deep learning models to account for computational limitations (32). Images were split into patches of 1000 × 1000 and 2000 × 2000 pixels with an overlap in horizontal and vertical axes of 750 and 1000 pixels, respectively (details noted in Fig. 1, Supplemental Digital Content, Each biopsy whole slide image generated an average of two hundred fifty 1000 × 1000 and forty 2000 × 2000 patches.

Sample Size Augmentation, Balancing, and Justification

As there were more biopsy images for CD, EE, and control, images were upsampled to balance datasets. Extensive data augmentations were performed for better generalization. As biopsy images exhibit both rotational and axial symmetry, a random combination of rotation (90°, 180°, or 270°), mirroring, and zoom (between 1× and 1.1×) was applied (Fig. 2, Supplemental Digital Content, Deep learning studies have focused on obtaining as much image data as possible without any standard sample size recommendations (32). There are limited studies focused on EE deep learning models although the sample size of this model is larger compared to our preliminary analysis (28).

Stain Color Normalization

Stain color normalization using the structure-preserving method as described by Vahadane et al (33,34) was used to eliminate bias due to color differences in the biopsies from different sites (Fig. 1). Three independent pathologists (LC, ZA, RI) completed a blind review of the color normalized biopsy images from different sites to assess the structure-preserving ability of the method.

Stain color normalization—images in the top row highlight the differences in stain colors before normalization while the bottom row shows images post color normalization, which was applied using the specified target image.

Image Analysis Model: Deep Learning Computer-Aided Biopsy Disease Classification System

Several deep learning CNN architectures were used to address specific questions as outlined below:

Need for the Identification of Microscopic Features Visible at High Magnification (Zoom) Level: ResNet50 is a widely used deep CNN architecture with 50 layers for image classification requiring identification of microscopic patterns (14–16). Final decision layers were modified to improve accuracy (34). To combat data sparsity we used transfer learning, established methods used to improve training using limited datasets by pretraining the model on the ImageNet dataset (19,35). Additional details of our modifications to the ResNet50 architecture are described in Appendix 1, Supplemental Digital Content (

Emulate Human Pathologist Decision-Making Process for Visualizing Biopsies at Multiple Levels of Magnification (Zoom): We developed a framework incorporating multiple zoom levels (Fig. 2). Biopsy images were segmented into 2000 × 2000 pixels then further into 1000 × 1000 pixel patches with an overlap of 750 pixels (horizontal and vertical axes). Two independent ResNet50 models were independently trained and the 1000 × 1000 and 2000 × 2000 corresponding patches were paired, passed through the respective trained models, and features from the last fully connected layer of each model were extracted and concatenated for an overall representation of specific regions of the biopsy images. This concatenated vector was further passed through a set of trainable linear layers for the final classification (additional details in Appendix 1, Supplemental Digital Content,

Combined figure explaining the methods for the development of an image analysis model for diagnosis of small bowel enteropathies and black box feature detection: (A) digitized duodenal biopsy slides were obtained for secondary analysis; (B and C) dataset was preprocessed by the creation of biopsy image patches, sample size augmentation, and stain color normalization [shown in Fig. 1]; (D) development of the image analysis model with (D1) ResNet50, (D2) ResNet50 multizoom architecture, and (D3) custom shallow convolutional neural network (CNN); and, (E) visualization of model decision-making (explained in detail in Fig. 4) frameworks.

Reduce Computational Model Complexity Used for Disease Classification: Custom shallow CNN architecture was designed to help reduce the number of parameters the model optimized for disease classification. This shallow CNN consisted of three convolutional layers (Fig. 2; detailed methods in Appendix 2, Supplemental Digital Content,

Explore Methods of Improving Disease Classification by Using a Combined Model: Ensemble models have been shown to generally improve the accuracy and robustness of classification (36). We combined Resnet50 and shallow CNN to improve model classification accuracy (detailed methods in Appendix 3, Supplemental Digital Content,

Visualization of Model Decision-Making

Grad-CAMs (37) were used to visualize the regions of interest utilized by the model for decision making (detailed methods outlined in Appendix 4, Supplemental Digital Content, These were reviewed by a team of medical professionals [pathologist specialized in gastroenterology (CM), pediatric gastroenterologist (SS)] enabling corroboration of the model results with incumbent classification. Since human intuition is required to assess if features highlighted can be biologically explained, we used the Environmental Enteropathic Dysfunction Biopsy Initiative histological index for EE (25) and the modified Marsh score classification for CD (26,27) to inform human medical professional intuition (Fig. 3, Supplemental Digital Content, and Table 1, Supplemental Digital Content,

Base Case for Comparison Using Existing Computer Vision Approach

An alternative method, CellProfiler (38), using cellular feature extraction for explainability of the models was also explored (Appendix 5, Supplemental Digital Content, and Fig. 4, Supplemental Digital Content, CellProfiler isolated nucleated cells from the biopsy images and classified EE vs. CD vs. controls based on the cellular feature differences, achieving 65% accuracy.

Ethical Considerations

The secondary analyses as part of this study were approved by the University of Virginia institutional review board. Ethical approval for prior original primary studies was obtained from (1) the ethical review committee of Aga Khan University, Karachi, PK (informed consent obtained from parents and/or guardians for EE cases), (2) the biomedical research ethics committee of the University of Zambia School of Medicine, University Teaching Hospital, Lusaka, ZA (informed consent obtained from caregivers for EE cases), and (3) University of Virginia Institutional Review Board (waiver of consent granted).

This manuscript has been prepared in accordance with STARD guidelines (39).


Biopsy Image Dataset

We obtained 461 digitized biopsy images (171 H&E stained duodenal biopsy glass slides) from 150 participants (US: 124; PK: 10; ZA: 16). Our EE data consisted of 29 and 19 biopsy images from 10 PK and 16 ZA participants, respectively. Data from 124 UVA participants; 63 CD and 61 controls, was available for these analyses.

Population Clinical Characteristics

Of the 150 participants, 77 (51.3%) were male. Median (interquartile range: IQR) age and LAZ/HAZ (length/height-for-age z score) of the EE participants was 22.2 (IQR: 20.8–23.4) months and −2.8 (IQR: −3.6 to −2.3), respectively (Table 1). Participants with EE were overall younger as compared to the US CD and controls (median age: 25.0; IQR: 16.5–41.0 months).

TABLE 1 - Population characteristics of the children from which biopsy images were obtained
Total population PK Zambia US
(n = 150) EE (n = 10) EE (n = 16) Celiac (n = 63) Normal (n = 61)
Biopsy images 461 29 19 239 174
Age, median (IQR), months 37.5 (19.0–121.5) 22.2 (20.8–23.4) 16.5 (9.5–21.0) 130.0 (85.0–176.0) 25.0 (16.5–41.0)
Sex; male, n (%) 77 (51.3) 5 (50.0) 10 (62.5) 29 (46.0) 33 (54.0)
LAZ/ HAZ, median (IQR) −0.6 (−1.9 to 0.4) −2.8 (−3.6 to −2.3) −3.1 (−4.1 to −2.2) −0.3 (−0.8 to 0.7) −0.2 (−1.3 to 0.5)
EE = environmental enteropathy; IQR = interquartile range (written as the first quartile to the third quartile); LAZ/HAZ = length/height-for-age z score; PK = Pakistan, Aga Khan University; Zambia = University of Zambia Medical Center, US = United States, University of Virginia.
For some patients, two to three biopsy images were available.
Height of three patients was not available for children with celiac disease (z scores calculated for 60).

Stain Color Normalization Assessment

Our panel of blinded independent pathologists confirmed that medically relevant cell types (lymphocytes, polymorphonuclear neutrophils, epithelial cells, eosinophils, goblet cells, paneth cells, and neuroendocrine cells) were preserved after color normalization. The panel also commented that features used for EE assessment via the published EE histologic index (25) were also visible. Although the granularity of eosinophilic cytoplasm and sharpness of paneth cell globules was less appreciated in postcolor-normalized images, these features were consistent throughout the biopsy images and were therefore hypothesized to not result in classification bias.

Disease Classification Accuracy and Performance

Patches were used for training the models and the predictions were then aggregated to identify classifications for their parent biopsy images. Modified ResNet50 and shallow CNN exhibited overall accuracies at the biopsy level of 98% (sensitivity: 93%, specificity: 94%) and 96% (sensitivity: 80%, specificity: 88%), respectively. The accuracy increased to 98·3% (sensitivity: 95%, specificity: 96%) with the ensemble. Accuracy of the multizoom ResNet50 architecture was 98% (sensitivity: 96%, specificity: 97%). Confusion matrices of the models’ classification for each disease are shown in Figure 4. Receiver operating curves (ROC) with the area under the curve (AUC) were used to assess the certainty of the model classifications. Biopsy level AUC for modified ResNet50, ResNet50 with multizoom architecture, and shallow CNN were 0.99, 0.99, and 0.96, respectively (Fig. 3). Error analysis and performance statistics for the models are included in Table 2, Supplemental Digital Content,

Accuracies of the deep learning models. A1, B1, and C1 show patch-level whereas A2, B2, and C2 show biopsy level accuracies of ResNet50, ResNet50 multizoom architecture, and shallow CNN, respectively.
Confusion matrices for patch-level disease classification models (top left: ResNet50; top right: ResNet50 multizoom architecture; bottom left: shallow convolutional neural network (CNN); bottom right: ensemble. The numbers within the matrices represent normalized data in the form of percentages and darker colors indicate higher percentages.

Grad-CAM Interpretations

Grad-CAMs were obtained for all the models (Fig. 5). Modified ResNet50 and ResNet50 multizoom mainly focused on similar areas for classification of EE versus CD versus controls whereas shallow CNN focused on distinct, yet medically relevant cellular features (Table 3, Supplemental Digital Content, for detailed Grad-CAM findings and Appendix 3, Supplemental Digital Content, for detailed methods including the use of guided Grad-CAMs).

Gradient-weighted class activation mappings for patches trained via modified ResNet50, ResNet50 multizoom, and shallow CNN are shown. Areas being highlighted are: (A) superficial epithelium with high number of intraepithelial lymphocytes (IELs) and mononuclear cells in the lamina propria (LP); (B) areas of white slit-like spaces representing artefactual separation of tissue within the LP and ignoring areas of crypt cross-sections; (C) crypt cross-sections; (D) areas with cells consistent with mononuclear cellular infiltrate; (E) telescoping crypt; (F) lymphocytes stacked in a row simulating a linear epithelial architecture; (G) superficial epithelium with tall columnar cells and a brush border; (H) base of crypts resembling superficial epithelium; (I) superficial epithelium with high number of IELs; (J) several crypt cross-sections rather than one to two crypts when compared with ReNet50 only; (K) superficial epithelium with abundant cytoplasm; (L) surface epithelium with IELs and goblet cells; (M) inner lumen of crypt cross-sections; (N) mononuclear cells within the LP; (O) surface epithelium with epithelial cells containing abundant cytoplasm and goblet cells. CD = celiac disease; CNN = convolutional neural network; EE = environmental enteropathy.


Diagnosis of overlapping histological diseases, such as small bowel enteropathies, lacks computational approaches for histopathological assessment (28). Although scoring systems such as the histological index and modified Marsh score for EE and CD, respectively, have the potential to diagnose and classify the severity of these diseases, there is a need to further accurately define the cellular parameters playing an important diagnostic role (25). To the best of our knowledge, similar small bowel enteropathy image analysis platforms like the one we have developed to build upon our prior work that includes EE do not exist. Our modified ResNet50, ResNet50 with multizoom, shallow CNN, and an ensemble model (modified ResNet50 and shallow CNN) depicted overall >90% classification accuracies. We were able to uncover the features utilized by the models for classification which were reviewed by pathologists. These AI-based analytics and black box feature detection findings pave the way for standardizing and improving diagnoses for small bowel enteropathies.

Before training our deep learning models, dataset preparation included stain color normalization since with studies obtaining data from multiple sites for further analysis, an important consideration is to optimize datasets to remove site-specific bias (33,40). Along with stain color normalization, it successfully preserved small bowel architecture as assessed by our team of pathologists.

Our modified ResNet50 deep learning model trained to classify biopsy images of EE versus CD versus healthy controls was able to identify microscopic features of each disease. This work has been shown to be successful for other medical fields with ResNet providing a superior performance for biopsy images (14–16). This expands upon our preliminary model, as with the use of ResNet50 it provides powerful analytics for images. Further, the patch-level input also previously (15,32) represents approximately 2% of the original biopsy image. To ensure patch level inputs take into account features visualization at multiple magnifications, emulating the process of human pathologists (41), multizoom ResNet50 was developed. Multizoom model's Grad-CAMs focused on biologically similar areas as the model without multizoom, suggesting that the multizoom architecture does not induce a bias. Further, the architecture of the shallow CNN was less complex which is a useful quality in the case of suboptimal computational environment. Although the shallow CNN Grad-CAMs focused on far too many features for EE to be clinically useful, it focused on distinct and biologically informative features for CD. These Grad-CAM results led us to hypothesize that the shallow CNN may have better utility when combined with a more complex model such as ResNet50 which was done via an ensemble with an overall accuracy of 98.3%, higher than either model alone (ResNet50-only accuracy: 98%, shallow-CNN-only accuracy: 96%).

While these results are noteworthy of their own accord, we further extracted Grad-CAMs to help illuminate the “black box” (22) of these deep learning models, to improve the insight into how the models arrive at their decision-making and provide the much needed “explainability” for clinicians and researchers. Grad-CAMs supported the models’ use of biologically relevant features for defining each class: EE, CD, or controls. Grad-CAMs for EE were focused on, among other features, intraepithelial lymphocytes, and goblet cells. These findings are supported by the histological index for EE evaluation which entails recognizing intraepithelial lymphocytosis and decreased goblet cell density as EE-specific features (25). We were, however, unable to assess goblet cell density because the Grad-CAMs utilized biopsy patches rather than a WSI for entire biopsy estimation. Grad-CAMs for CD further focused on different areas than EE which highlights the potential of the use for these models for distinguishing between overlapping diseases. Notably, crypt cross-sections were highlighted by the Grad-CAMs and crypt hyperplasia is a known CD histopathological feature (42). Further, recognition of superficial epithelium with tall columnar cells by Grad-CAMs for controls is a depiction of normal duodenal biopsies with such epithelial architecture (43,44). These features can be affected by inflammatory injury leading a loss of structure due to which the epithelial cells may appear more flattened (42). The image features visualized via Grad-CAMs, if confirmed via molecular methods such as immunohistochemistry or RNA in situ hybridization, may pave the way for optimizing the EE and modified Marsh (for CD) scoring systems by drawing attention to the parameters pertinent to disease diagnoses.

Major strengths of our secondary analysis study include the addition of data from 48 new patients (in addition to the data reported for preliminary results) which increased the data volume for patch-level analysis. We also assessed the validity of our models with a wide variety of performance statistics (accuracy, sensitivity/specificity, positive predictive value/negative predictive value, precision, recall, and F1-score), many of which are often missing in the work done for deep learning in medicine. In our preliminary work, we also utilized γ correction for stain color normalization without an assessment of the method's structure-preserving ability. Our current stain color normalization method is more robust for eliminating bias and its preserved tissue structure, an important quality for any successful normalization process. We computationally balanced the datasets and further augmented them to combat our prior issues with imbalanced datasets. We have also built upon the single model that we reported earlier to create a complex AI-based image analysis platform. Our multizoom architecture emulating human pathologist biopsy assessment process brings us one step closer to automated detection of biopsy disease features. Lastly, our prior deconvolution-based approach to explaining the decision-making process of the preliminary model left much to be desired. Our current approach using Grad-CAMs to explain the models’ decision-making is an improvement, one more easily understood by the average medical professional, and which may pave the way for both increasing the confidence in AI-based decision making and improve biopsy-based small bowel enteropathy diagnosis.

Despite these strengths, we experienced several limitations. Due to ethical considerations and lack of clinical indications to perform endoscopy on adequate growing local controls, the parent study was unable to include biopsies from this population (30,31). Due to this, we supplemented the EE biopsies using US archival controls and normally found via retrospective chart review. Since the data was obtained for secondary analysis, we were limited in terms of improving data quality due to differences in the scanners used for digitization and staining methods or controlling for age. Owing to this, we were unable to further modify the EE dataset to account for high classification accuracies. Despite this, we utilized an approach to potentially eliminate stain color differences. We were also limited in our ability to benchmark our findings due to limited literature for the use of deep learning models for small bowel architecture, especially the use of ResNet versus custom deep learning approaches. To improve on our current efforts, more robust color normalization methods are underway. Finally, our techniques remain subject to the limitations inherent to tissue-based diagnostic and management approaches in clinical medicine, for example, patchiness of findings and site-specific workflows for tissue acquisition and processing.


AI-based analytics provide an exciting opportunity for improving small bowel enteropathy diagnostics with histological overlap. Our work suggests that models incorporated in our image analysis platform are capable of identifying biologically relevant microscopic features and emulating human pathologist decision-making process. This work was improved by structure-preserving stain color normalization of the image inputs along with visualizations of the model outputs, which are imperative for determining the tissue features used for decision making. Further work will advance the clinical use of deep learning models for enteropathies and other histopathology based diseases, improving the effectiveness and efficiency of clinical care in gastroenterology and beyond.


Additional Contributions: We acknowledge the following individuals from Aga Khan University, Karachi, Pakistan, for contributing to this manuscript: Drs Romana Idrees, MBBS, FCPS, and Zubair Ahmad, MBBS, FCPS, FRCPath, for assessment of stain color normalized biopsy images, Saad Mallick, medical student, for the organization of image activation mappings, Dr Saman Siddiqui, MD, for coordinating with the pathologists at Aga Khan University, and field workers (community health workers, led by Sadaf Jakhro, MSc, and coordinated by Dr Tauseef Akhund, MBBS), data management unit (Najeeb Rahman, BS), and laboratory staff (Aneeta Hotwani, BS) for contributing to the data collection in this work. Patcharin Pramoonjago, PhD, Biorepository and Tissue Research Facility, University of Virginia, Charlottesville, assisted in data collection for this work. All individuals were compensated for their time.


1. Sullivan PB, Marsh MN, Mirakian R, et al. Chronic diarrhea and malnutrition—histology of the small intestinal lesion. J Pediatr Gastroenterol Nutr 1991; 12:195–203.
2. Syed S, Yeruva S, Herrmann J, et al. Environmental enteropathy in undernourished Pakistani children: clinical and histomorphometric analyses. Am J Trop Med Hyg 2018; 98:1577–1584.
3. Syed S, Ali A, Duggan C. Environmental enteric dysfunction in children: a review. J Pediatr Gastroenterol Nutr 2016; 63:6.
4. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25:44DOI: 10.1038/s41591-018-0300-7.
5. Sharma G, Carter A. Artificial intelligence and the pathologist: future frenemies? Arch Pathol Lab Med 2017; 141:622–623.
6. Patel V, Khan MN, Shrivastava A, et al. Artificial intelligence applied to gastrointestinal diagnostics: a review. J Pediatr Gastroenterol Nutr 2020; 70:4–11.
7. Yang YJ, Bang CS. Application of artificial intelligence in gastroenterology. World J Gastroenterol 2019; 25:1666.
8. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521:436DOI: 10.1038/nature14539.
9. Medeiros FA, Jammal AA, Thompson AC. From machine to machine: an OCT-trained deep learning algorithm for objective quantification of glaucomatous damage in fundus photographs. Ophthalmology 2019; 126:513–521.
10. Li Z, He Y, Keel S, et al. Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs. Ophthalmology 2018; 125:1199–1206.
11. Zhang Z, Xie Y, Xing F, et al. Mdnet: A semantically and visually interpretable medical image diagnosis network. In Proceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 6428–6436).
12. Coudray N, Ocampo PS, Sakellaropoulos T, et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat Med 2018; 24:1559DOI: 10.1038/s41591-018-0177-5.
13. Beck AH, Sangoi AR, Leung S, et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci Transl Med 2011; 3:108ra13–113ra.
14. Bejnordi BE, Veta M, Van Diest PJ, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 2017; 318:2199–2210.
15. Wei JW, Wei JW, Jackson CR, et al. Automated detection of celiac disease on duodenal biopsy slides: a deep learning approach. J Pathol Inform 2019; 10:7DOI: 10.4103/jpi.jpi_87_18.
16. Jiang Y, Chen L, Zhang H, et al. Breast cancer histopathological image classification using convolutional neural networks with small SE-ResNet module. PLoS One 2019; 14:e0214587DOI: 10.1371/journal.pone.0214587.
17. Krizhevsky A, Sutskever I, Hinton GE, eds. Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process Syst; 2012.
18. Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. arXiv preprint 2014; arXiv:14091556.
19. Deng J, Dong W, Socher R, et al., eds. Imagenet: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition; 2009: IEEE.
20. Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge. Int J Comput Vis 2015; 115:211–252.
21. Lin TY, Maire M, Belongie S, et al. Microsoft coco: Common objects in context. In European conference on computer vision. 2014 Sep. 6 (pp. 740–755). Springer, Cham.
22. Castelvecchi D. Can we open the black box of AI? Nat News 2016; 538:20DOI: 10.1038/538020a.
23. Ramakrishna BS, Venkataraman S, Mukhopadhya A. Tropical malabsorption. Postgrad Med J 2006; 82:779–787.
24. Elsevier; 2014; Pai RK. A Practical Approach to Small Bowel Biopsy Interpretation: Celiac Disease and its Mimics. Seminars in Diagnostic Pathology.
25. Liu T-C, VanBuskirk K, Ali SA, et al. A novel histological index for evaluation of environmental enteric dysfunction identifies geographic-specific features of enteropathy among children with suboptimal growth. PLoS Negl Trop Dis 2020; 14:e0007975.
26. Modified Marsh Classification of histologic findings in celiac disease (Oberhuber) Stanford Medicine. Available at:
27. Oberhuber G. Histopathology of celiac disease. Biomed Pharmacother 2000; 54:368–372.
28. Syed S, Al-Boni M, Khan MN, et al. Assessment of machine learning detection of environmental enteropathy and celiac disease in children. JAMA Network Open 2019; 2: e195822.
29. Iqbal NT, Sadiq K, Syed S, et al. Promising biomarkers of environmental enteric dysfunction: a prospective cohort study in Pakistani children. Scientific Rep 2018; 8:2966DOI: 10.1038/s41598-018-21319-8.
30. Amadi B, Besa E, Zyambo K, et al. Impaired barrier function and autoantibody generation in malnutrition enteropathy in Zambia. EBioMedicine 2017; 22:191–199.
31. Iqbal NT, Syed S, Sadiq K, et al. Study of Environmental Enteropathy and Malnutrition (SEEM) in Pakistan: protocols for biopsy based biomarker discovery and validation. BMC Pediatr 2019; 19:247DOI: 10.1186/s12887-019-1564-x.
32. Iizuka O, Kanavati F, Kato K, et al. Deep learning models for histopathological classification of gastric and colonic epithelial tumours. Scientific Rep 2020; 10:1–11.
33. Vahadane A, Peng T, Sethi A, et al. Structure-preserving color normalization and sparse stain separation for histological images. IEEE Trans Med Imaging 2016; 35:1962–1971.
34. Shrivastava A, Kant K, Sengupta S, et al., eds. Deep learning for visual recognition of environmental enteropathy and celiac disease. 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI); 2019. IEEE.
35. Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng 2009; 22:1345–1359.
36. Liu B, Cui Q, Jiang T, et al. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinformatics 2004; 5:136DOI: 10.1186/1471-2105-5-136.
37. Selvaraju RR, Cogswell M, Das A, et al., eds. Grad-cam: visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision; 2017.
38. Yu K, Zhang C, Berry G, et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun 2016; 7: [Epub ahead of print]. PMID: 27527408.
39. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies: enhancing the QUAlity and transparency of health research. Available at:
40. Khan AM, Rajpoot N, Treanor D, et al. A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Trans Biomed Eng 2014; 61:1729–1738.
41. Brunyé TT, Mercan E, Weaver DL, et al. Accuracy is in the eyes of the pathologist: the visual interpretive process and diagnostic accuracy with digital whole slide images. J Biomed Inform 2017; 66:171–179.
42. Fasano A, Catassi C. Current approaches to diagnosis and treatment of celiac disease: an evolving spectrum. Gastroenterology 2001; 120:636–651.
43. Serra S, Jani PA. An approach to duodenal biopsies. J Clin Pathol 2006; 59:1133–1150.
44. Normal Duodenum Human Protein Atlas. Available at:

biopsy image analysis; convolutional neural networks; environmental enteropathy; global health; intestinal structure; AI; artificial intelligence; AUC; area under the curve; CD; celiac disease; CNN; convolutional neural network; EE; environmental enteropathy; EEDBI; environmental enteropathic dysfunction biopsy initiative; Grad-CAMs; gradient-weighted class activation mappings; H&E; hematoxylin and eosin; IQR; interquartile range; LAZ/HAZ; length/height-for-age z score; PK; Pakistan; ResNet; deep residual network; ROC; receiver operating curves; US; United States; UVA; University of Virginia; ZA; Zambia

Supplemental Digital Content

Copyright © 2021 by European Society for Pediatric Gastroenterology, Hepatology, and Nutrition and North American Society for Pediatric Gastroenterology, Hepatology, and Nutrition