Review of Visualization Approaches in Deep Learning Models of Glaucoma : The Asia-Pacific Journal of Ophthalmology

Secondary Logo

Journal Logo

Review Articles

Review of Visualization Approaches in Deep Learning Models of Glaucoma

Gu, Byoungyoung MD*,†; Sidhu, Sophia MS*,†; Weinreb, Robert N. MD*; Christopher, Mark PhD*; Zangwill, Linda M. PhD*; Baxter, Sally L. MD, MSc*,†

Author Information
Asia-Pacific Journal of Ophthalmology 12(4):p 392-401, July/August 2023. | DOI: 10.1097/APO.0000000000000619


Glaucoma is a major cause of irreversible blindness worldwide. As glaucoma often presents without symptoms, early detection and intervention are important in delaying progression. Deep learning (DL) has emerged as a rapidly advancing tool to help achieve these objectives. In this narrative review, data types and visualization approaches for presenting model predictions, including models based on tabular data, functional data, and/or structural data, are summarized, and the importance of data source diversity for improving the utility and generalizability of DL models is explored. Examples of innovative approaches to understanding predictions of artificial intelligence (AI) models and alignment with clinicians are provided. In addition, methods to enhance the interpretability of clinical features from tabular data used to train AI models are investigated. Examples of published DL models that include interfaces to facilitate end-user engagement and minimize cognitive and time burdens are highlighted. The stages of integrating AI models into existing clinical workflows are reviewed, and challenges are discussed. Reviewing these approaches may help inform the generation of user-friendly interfaces that are successfully integrated into clinical information systems. This review details key principles regarding visualization approaches in DL models of glaucoma. The articles reviewed here focused on usability, explainability, and promotion of clinician trust to encourage wider adoption for clinical use. These studies demonstrate important progress in addressing visualization and explainability issues required for successful real-world implementation of DL models in glaucoma.


Early intervention in glaucoma, a major cause of irreversible blindness, may prevent vision loss.1 A multitude of novel ideas for screening, diagnosing, and detecting changes over time in glaucoma have been proposed to facilitate its early detection and ameliorate its progression. Artificial intelligence (AI) has emerged as a rapidly advancing tool to achieve these objectives.2

Ophthalmology, in particular, has become a prime area for applications of AI and deep learning (DL),3 with many studies using DL models to address glaucoma.2,4–13 Although these models have demonstrated encouraging results, several challenges remain.7 One key challenge is that models often appear as “black boxes” to clinicians or other users. This means that while a model may provide accurate predictions, it may not necessarily provide an explanation or an intuitive understanding to users about how those predictions were made.14,15 This can be an obstacle that causes a cognitive burden or hesitation in use among clinicians.16

Recently, visualization approaches have been developed to better understand DL models and enhance their interpretability.17 These visualization techniques may improve the cognitive processing of data effectively and efficiently.18 Furthermore, model interpretability allows users to understand the system and, therefore, engenders trust in the model output.19 User-facing dashboards are another strategy developed to help clinicians quickly review large amounts of data. These are data-driven clinical decision support (CDS) tools that can execute queries across multiple databases and allow the review of many clinically relevant indicators visually in a single report.20 The utility of dashboards comes from their ability to provide a compact overview of important information.20

Although many AI and DL models for glaucoma have been developed, advancing these models to real-world clinical translation requires investigation of visualization approaches and user interface (UI) considerations for model outputs. In this narrative review, we summarize current publications regarding visualization approaches and UI considerations in relation to DL models of glaucoma, with a focus on approaches aimed toward enhancing interpretability, trust, and usability.


Eligibility Criteria and Information Sources

Peer-reviewed journal articles published in English between August 1, 2013 and August 1, 2022, with available full text were reviewed. The rationale for this timeframe was to review literature from the most recent 10 years preceding the time the review was initiated, particularly in light of the increasing integration of DL in the medical field since the early 2010s.21

An initial search of PubMed, Cochrane, and Google Scholar was conducted in July 2022, and the search was repeated in August 2022 to evaluate any additional articles before collating the observations for this review. Details of the search strategy and study selection are provided in Supplementary Digital Content 1 (


Using the above search strategy, we identified 18 primary research articles that described DL models for glaucoma. We also included an element of visualization of model results and/or discussion of approaches to enhance the interpretability of the models. These visualization approaches may help inform the design of future UIs for real-world glaucoma management. Table 1 provides a summary of findings from the articles.

TABLE 1 - Summary of Studies Using Visualization Approaches in DL Models of Glaucoma
References Network Country of Data Sets/Single or Multicenter Input Data Type Input Image Output Built Dashboard or Interface Visualization Method (Designed Way) Purpose
Li et al22 AG-CNN China/Multicenter FP ONH and it around RNFL Glaucoma vs nonglaucoma (binary) No Heatmap Detection
Christopher et al23 CNN US/Multicenter FP ONH-focused Softmax probability of GON No Heatmap Detection
Kucur et al24 CNN Hungary/Single-center VF Voronoi image converted from VF Values in the maps range from 0 to 1* No Voronoi image Detection
Liu et al25 GD-CNN (based on the ResNet) China, US/Multicenter FP ONH-focused Glaucoma vs nonglaucoma (binary) No Heatmap Detection
Ran et al26 ResNet China, Hong Kong/ Multicenter OCT OCT (3D volume, 2D enface) Glaucoma vs nonglaucoma (binary) No Heatmap Detection
Ajitha S et al27 CNN India/Multicenter FP Whole FP image Glaucoma vs nonglaucoma (binary) No Feature maps Detection
Oh et al28 CNN, Shapley value Korea/Single-center VF, RNFL, OCT, IOP, FP Whole FP image Glaucoma vs nonglaucoma (binary), estimating global (VFI) and its degree of certainty Yes Gauge chart, Radar chart, and SHAP chart Detection
Dixit et al29 The convolutional LSTM model US/Single-center VF VF represented as an 8×9 grid Progression or not No Heatmap Progression
Christopher et al15 ResNet US, Tokyo/Multicenter FP ONH-focused Glaucoma vs nonglaucoma (binary) No Heatmap Detection
Yu et al30 Adam RMSprop with Nesterov momentum US/Single-center OCT OCT (MAC, ONH) Estimating “global VFI” No Heatmap Progression
Li et al31 EfficientNet-B0 China/Multicenter FP Whole FP image Glaucoma vs nonglaucoma (binary) + progression No Saliency maps Detection and progression
Huang et al32 ResNet US, China/Multicenter OCT, VF, FP Voronoi image converted from VF Grading (5 grade) Yes Developed interface (FGGDL) Detection and grading (progression)
Maetschke et al33 CNN US/Single-center OCT OCT (enface, side) Glaucoma vs nonglaucoma (binary) No Class activation maps Detection
van den Brandt et al17 ResNet50 US/Multicenter OCT OCT (B scan, RNFL, and GCIPL thickness maps) Predicted mean deviation Yes Heatmap Detection
Baxter et al34 Random Forests US/Multicenter Clinical EHR data not applicable Need for glaucoma surgery as a proxy for progression No Gini variables of importance Progression
Christopher et al35 ResNet50 US/Multicenter OCT, VF OCT (RNFL thickness map, RNFL enface image, CSLO image) GVFD vs non-GVFD No Heatmap Detection
Kamal et al36 SNN, ANFISs Kaggle Data Set (Google) /Public FP, clinical records Whole FP image Glaucoma vs nonglaucoma (binary) by pixel density analysis No SP-LIME Detection
Chayan et al37 CNN, fully connected neural network US/Single-center FP Whole FP image Glaucoma vs nonglaucoma (binary) No LIME explanation Detection
AG-CNN indicates attention-based convolutional neural network for glaucoma detection; ANFIS, adaptive neuro-fuzzy inference system; CNN, convolutional neural network; CSLO, confocal scanning laser ophthalmoscopy; DL, deep learning; EHR, electronic health record; FGGDL, fine-grained grading deep learning system; FP, fundus photography; GCIPL, ganglion cell-inner plexiform layer; GD-CNN, glaucoma diagnosis with CNN; GON, glaucomatous optic neuropathy; GVFD, glaucomatous visual field damage; IOP, intraocular pressure; LSTM, long short-term memory; MAC, macula; N/A, not applicable; OCT, optical coherence tomography; ONH, optic nerve head; RNFL, retinal nerve fiber layer; SHAP, SHapley Additive exPlanations; SP-LIME, submodular pick local interpretable model-agnostic explanation; SNN, spike neural network; VF, visual field test; VFI, visual field indices.
*Values in the maps range from 0 to 1, where 0 indicates the pixel or region has no impact on the CNN decision whereas 1 indicates a region with maximal importance.
Technically, they investigated global visual field indices, visual field index, and mean deviation.

Data Types and Visualization Approaches for Presenting Model Predictions

The included articles represent a wide range of data types used for training DL models of glaucoma. Evaluation of imaging data plays a prominent role in the clinical management of glaucoma, so unsurprisingly many of the DL models that have been developed for glaucoma entail some components of automated image analysis. Images are amenable to saliency methods for visualization. Saliency methods are tools that highlight features in an input that are relevant for generating a prediction.38–40 For example, a “heatmap” (saliency map or activation map) can depict regions of an image that highly influence a model’s decision-making process.7,17 Nearly half (8/18, 44%) of the review articles in this study introduced a visualization approach that allowed superimposition on a single fundus photograph.15,22,23,25,27,31 These results are not surprising given the widespread use of fundus photography in ophthalmology due to its availability, affordability, and ease of use.41,42 Other tools used by researchers included photography encompassing the optic nerve head with the surrounding fundus (Fig. 1A)15,22,23 and whole fundus photography to detect the retinal nerve fiber layer (RNFL, Fig. 1B).25,27,31 Some studies used optical coherence tomography (OCT), which is useful in 3-dimensional visualization, to build heatmaps that evaluate the performance of DL models (Fig. 1C).17,26,30,33,35

Heatmaps for several examples of DL models. These heatmaps highlight the areas, which are the most relevant for analysis by DL. In section (A), bright pink regions indicate a large impact on model predictions. The heatmaps are shown overlaid on a representative healthy (left) and GON (right) eye (adapted from Christopher et al23). B, In the same way, a heatmap is presented from a DL model, which can be superimposed on the input image to highlight the areas of the model important for diagnosis. In this example, note that the heatmap strongly highlights the RNFL defect (adapted from Liu et al25). C, Representative examples of heatmaps in enface view, highlighting retinal regions strongly contributing to the macular and ONH regions estimates of VFI. Upper photos show healthy eyes (VFI = 99), whereas lower photos show glaucomatous eyes (VFI = 33), and primary open angle glaucoma (adapted from Yu et al30). DL indicates deep learning; GON, glaucomatous optic neuropathy; ONH, optic nerve head; RNFL, retinal nerve fiber layer; VFI, visual field index. Adaptations are themselves works protected by copyright. So in order to publish this adaptation, authorization must be obtained both from the owner of the copyright in the original work and from the owner of copyright in the translation or adaptation.

Researchers have also explored DL models that incorporate functional modalities, such as visual field (VF) tests.28,32 To distinguish healthy and early glaucomatous VF for DL analysis, 2 studies introduced the “Voronoi images” concept, which transformed VF into 2-dimensional images.24,32 In addition, some studies provided heatmaps, highlighting which VF regions contributed to DL analysis.24,29 Consideration of both the structural and functional aspects can enhance the discriminatory power of DL models and lead to a more comprehensive evaluation of glaucoma.24 In addition, supplementation of images with clinical data is associated with an improvement in the performance of DL models.43,44 Dixit et al29 generated a model that integrated both VF and clinical data and used a heatmap to understand the extent, to which various points in the VF contributed to the DL. This model demonstrated a superior ability to detect glaucoma progression compared with a model based exclusively on VF data.

Importance of Data Source Diversity and Data Standardization to Improve the Utility and Generalizability of Deep Learning Models

In addition to the supplementary use of structural and clinical data, multicenter evaluation and inclusion of patients with diverse demographic characteristics are important in improving the utility and generalizability of DL models.5,10,45 As shown in Table 1, a majority of the studies (11/18, 61%) obtained their data from multicenter sources. Single-center studies often have limited external validity, with decreased model performance when applied to populations with different characteristics than the training data.46,47 Studies that use multicenter input data tend to have less bias, improved generalizability, and thus enhanced potential for clinical implementation.47 Futoma et al48 argued, however, that an emphasis on generalizability results in machine learning systems that have poor performance at several sites at the expense of systems that have robust performance at a single site. They suggest shifting the focus from broad applicability to gaining a better understanding of how and why certain machine learning systems work.

A lack of standardization is a significant barrier to the development of robust AI models with generalizability across different patient populations. Although these models have shown powerful diagnostic performance,49,50 challenges in applying them to clinical practice include (1) a lack of large-image data sets from multiple devices, (2) nonstandardized imaging and/or postprocessing protocols between devices, (3) limited graphics processing unit capabilities, and (4) inconsistency in reporting metrics.51 Proposed solutions include standardizing images from various imaging platforms and training models with large data sets consisting of annotated real-world data, a wide range of image quality, and different types of imaging data.45,51,52 The call for data standardization in the ophthalmology community across multiple data modalities has become more prominent in recent years53–55 and broad-based efforts, such as through DICOM Working Group 9, the American Academy of Ophthalmology, the National Eye Institute, and the Observation Health Data Sciences and Informatics organization, are ongoing to advance standardization and enable improved interoperability and data harmonization across different data sources.

Design Features to Enhance Artificial Intelligence Explainability and Trustworthiness

A key challenge to the implementation of AI models is engendering trust among clinicians. Clinicians will often trust models to a greater extent if the decision-making underlying the model predictions mimics human clinical decision-making processes.16,28 The examples below illustrate some innovative approaches to understanding predictions of AI models and alignment with clinicians, beyond heatmaps alone.

Li et al22 proposed a human attention-based DL model (convolutional neural network) for glaucoma detection and pathologic area localization. This stemmed from the idea that glaucoma is correctly detected when heatmaps correspond with attention maps used by ophthalmologists in glaucoma detection.22 The model was based on blurred regions of fundus images that ophthalmologists manually cleared to diagnose glaucoma.22 Their model implemented an attention prediction subnet that located the salient areas and extracted unessential features of fundus images that did not play a role in DL recognition (Fig. 2). Similarly, clinician eye-tracking data and their generated heatmaps have contributed to explainable AI models with applications to both ophthalmology and radiology.56,57 The incorporation of a human-attention mechanism may not only enhance the performance of models, but may also provide insight into the value of attention-based methods for identifying pathologic areas, increasing reliability, and building trustworthiness among users.

Attention maps used by ophthalmologists to detect glaucoma and heatmaps used for a human attention-based DL model. Li et al22 reported that glaucoma was detected correctly by an attention-based convolutional neural network when the highlighted areas visualized by the heatmaps were consistent with the pathologic areas used by ophthalmologists (adapted from Li et al22). DL indicates deep learning. Adaptations are themselves works protected by copyright. So in order to publish this adaptation, authorization must be obtained both from the owner of the copyright in the original work and from the owner of copyright in the translation or adaptation.

In a study conducted by Christopher et al,23 DL models identifying glaucomatous optic neuropathy were evaluated. They determined that neuroretinal rim areas, specifically the inferior rim and superior rim, were more important for the models’ decisions than other peripheral lesions (Fig. 1A).23 Given that these regions correspond to those used by clinicians to diagnose glaucomatous optic neuropathy,23 clinicians may have increased trust in the ability of DL to model their clinical reasoning. As a result, clinicians may have enhanced confidence in their capacity for decision support and adoption in clinical practice.16

Recently, a new data-efficient image transformer algorithm has been proposed as an alternative approach, generating AI models with greater generalizability than ResNet and superior explainability compared with saliency maps.58 Efforts to explain models’ decision-making processes will likely continue to evolve and provide developers and validators with areas to focus on for improving performance.

The integration of DL in clinical practice can facilitate more individualized glaucoma care for patients.59 Specifically, DL can help clinicians depend less on VF tests that often are highly variable, and instead, determine the frequency of individualized VF tests based on the predicted VF loss from OCT scans.35

Explainable Artificial Intelligence for Tabular Data

In addition to imaging-based approaches, there are also methods to enhance the interpretability of clinical features from tabular data used to train AI models.60 Guidotti et al61 described the black box issue, which conceals the internal logic of AI models to users; this is both a practical and ethical issue. They detailed issues with model explanation, model inspection, and outcome explanation. An innovative model-agnostic technique, local interpretable model-agnostic explanations (LIME), was introduced to provide medical professionals with a visual representation of key features considered in the model’s classification of glaucoma.19 Chayan et al37 proposed that providing explainability through LIME would provide medical professionals with comprehensive information to make decisions and thus would build their trust in the DL model. Kamal et al36 proposed another model, submodular pick LIME (SP-LIME), that explained the predictive results and associated risk factors for the determination of glaucoma class. They claimed their model allowed clinicians to better understand the decision-making process and obtain convincing and consistent decisions.36

Another form of explainable AI, the Shapley value, is a formula derived from a novel solution concept, Cooperative Game Theory.62 It takes into account all the contributions other players make when interacting with a player and is considered the standard for quantifying an instance’s contribution.63 Oh et al28 built their model and charts using the XGboost algorithm and SHapley Additive exPlanations, which is a variant of Shapley value. These statistical charts provide insight into why the DL output produced a certain result. Decision trees are an additional technique that can help users interpret model explanations.64 Tree ensembles, such as random forests, are combinations of decision trees that create excellent predictive execution compared with a single decision tree.63 This approach was used by Baxter et al34 to assess variables of importance driving predictions generated by a random forest model for predicting glaucoma progression using systemic data in electronic health records (EHRs). Given the limited explainable AI studies conducted in the glaucoma field, this area warrants further exploration.

Dashboards/Interfaces Focused on Explaining Deep Learning Measurements

Here, several examples of published dashboards or interfaces with a focus on DL predictions to facilitate end-user engagement are provided. These illustrate various approaches to interface development to enhance the use and adoption of DL models for glaucoma.

Example 1: “Understandable prediction model”

Oh et al28 proposed a DL interface for glaucoma prediction and provided explanations for each individual prediction. Their aim was to build an “understandable prediction model” rather than a “highly accurate model”. Several properties, including a VF (with pattern SD), RNFL OCT (with superior, inferior, and temporal), and intraocular pressure test, were analyzed for glaucoma prediction.28 When users input these properties in the prediction model, they obtain a binary output as “glaucoma” or “healthy”. The interface includes a gauge chart that shows the value of the superior RNFL quadrant in a distribution of training data.28 Gauge and radar charts display the position of input values among the overall distribution of values, while the SHapley Additive exPlanation chart displays the role of each value in the decision. The interface also includes a radar chart, which provides a visualization of multivariate data in the form of a 2-dimensional chart of 3 or more quantitative variables28 (Fig. 3). Their model shows an accuracy of 0.95 and an area under the curve of 0.95, allowing users to obtain the basis for a glaucoma prediction and providing clinical insight to users.28

DL interface implemented by Oh et al28. Oh and colleagues implemented their prediction model and 3 explanation charts [gauge (A) and radar and SHAP charts (B)] in their interface for glaucoma prediction (adapted from Oh et al28). DL indicates deep learning; SHAP, SHapley Additive exPlanation. Adaptations are themselves works protected by copyright. So in order to publish this adaptation, authorization must be obtained both from the owner of the copyright in the original work and from the owner of copyright in the translation or adaptation.

Example 2: “Intuitive AI for glaucoma”

Huang et al32 developed an interactive glaucoma grading interface with DL-based recommendations. They emphasized that the relationship between structure and function is a priority for proper patient evaluation and developed a fine-grained grading DL system (FGGDL), which integrates these 2 facets to introduce a more unified patient evaluation.32 The system converts VF data to Voronoi images and derives a grading system from the classification of VF defects and saliency maps. The vision loss in VFs corresponds to maps of the structural damage in the optic nerves of the fundus photographs.32 They reported that their objective model accomplished an accuracy of 0.85 and an area under the curve of 0.90, showing superior results compared with medical students (accuracy = 0.56–0.73, respectively, P < 0.01, at 95% CI) and comparable to ophthalmologists (accuracy = 0.87, P = 0.61, at 95% CI). They then determined that using FGGDL by clinicians improved performance compared with assessment without FGGDL help.32 As a result, the study concluded that the FGGDL could potentially provide productive guidance to the clinical setting, demonstrating a tool that corresponds with users’ intuitive methods for detecting the progression of glaucoma.

Example 3: “Clinical dashboard for visualizing AI predictions”

In a prior study conducted by our group,17 we proposed a visualization approach that incorporated a VF prediction model into a multifaceted UI to provide CDS in managing glaucoma progression. Interface development emphasized the interpretability of explanations and the reliability of predictions.17

This interface (GLANCE, a visualization tool designed to help clinicians make DL-based glaucoma progression management decisions efficiently, ie, at a “glance”) is designed to allow clinicians to select patients, examine demographics, and display OCT and derived data (such as thickness maps) all in one view. This is similar to the existing CDS system for OCT-based assessment in glaucoma. New features in this interface include the DL-generated Mean deviation (MD) prediction that informs the clinician of expected VF loss and the visual descriptions corresponding to these predictions.17 Historical VF MD values (alongside DL-generated predictions) also provide an assessment of model reliability to clinicians. Clinical decision-making was based on predicted MD, as opposed to real VF MD. In most cases (54%), clinicians made a decision on management that was based on the predicted MD.17 In particular, in 31% of the cases that changed their recommendation, clinicians had changed their first choice of the management and their reliability for the prediction only after going over the data with a visual explanation (heatmap) of the DL model’s results, as compared with only 11% of the cases where they changed their opinion without a heatmap.17 This provides evidence that models that reinforce explanations of automated decisions can augment clinicians’ knowledge and calibrate their trust in DL-based measurements during clinical decision-making.17

Decision Support to Minimize Time and Cognitive Burden on Users

Even if an interface based on DL had good performance, detailed explanations, and high clinical trust, users may be hesitant to use it in clinical practice if it imposed a cognitive burden or a substantial time burden. Read-Brown et al65 described the concerns regarding the time burden and the negative impact on productivity associated with using EHR systems. Studies demonstrate that the mean time for clinician use of EHR systems per appointment was around 10 minutes, with time in clinical data review only comprising 1 minute.65,66 In a survey study, all respondents who were ophthalmologists considered time and documentation a burden.66 In time-sensitive clinical settings, an easy-to-use interface was the most important CDS characteristic to clinicians.67 The “System Usability Scale (SUS)” and the “Post-Study System Usability Questionnaire” are useful tools for assessing the usability of software interfaces.68 The Post-Study System Usability Questionnaire is a usability evaluation survey form based on scenarios that were developed by IBM.69 It consists of 19 items focused on 5 usability characteristics of a system: rapid completion of the task, ease of learning, high-quality documentation and online information, functional adequacy, and rapid acquisition of productivity.70 These instruments are a reliable way to evaluate user satisfaction with DL models and to assess the cognitive and time burden a model imposes.70 Chen et al71 recently applied one of these tools, the SUS score, to calculate a commonly used and validated scoring system that ranges from 0 to 100 for evaluating the user-friendliness of the GLANCE interface. They found that while their interface was revealed to show mediocre usability (SUS score in the 43rd percentile, mean ± SD SUS score = 66.1±16.0), earlier work has shown that clinicians commonly have unfavorable perceptions of EHR usability when estimated using SUS scores (mean scores < 10th percentile). This highlights the challenges of developing usable CDS instruments in the EHR and the demand for continuous work in this area.71

Integration into clinical workflows is also essential to ensure that AI models are seamlessly incorporated into existing clinical information systems (such as existing EHR and picture archiving and communications systems), rather than residing in external systems that take additional time and effort for clinicians to access. This requires the adoption and implementation of data standards to enable clinical operability. For example, Fast Health Care Interoperability Resources comprises a data exchange standard that can be used stand-alone or in combination with other existing standards.53 The need for innovation in AI from the EHR data perspective has led to the development of a “common data model” for big data storage and analytics.53 One data model, the Observation Medical Outcomes Partnership Common Data Model, provides a standard for merging and unifying data from different EHR systems.72,73 In the future, developing and implementing standards for various types of data will be essential for accelerating AI techniques in ophthalmology.55


In summary, we have discussed key principles regarding visualization approaches in DL models of glaucoma: (1) using both imaging and clinical data to develop DL models of glaucoma; (2) promoting model interpretability and explainability to help engender clinician trust, particularly when important features for prediction align with traditional clinical decision-making processes; and (3) designing interfaces that minimize cognitive burden and can be successfully integrated with existing clinical information systems.

To contextualize this with a framework developed by the broader biomedical informatics community, the American Medical Informatics Association has recommended that the safe and effective use of AI in medicine includes the following stages: (1) achieving technical performance (stage 1); (2) evaluating usability and integration into clinical workflows (stage 2); and (3) assessing health impact (stage 3).74 Although ophthalmology has seen a wealth of AI models published in recent years, most of these studies fall into the stage 1 category, as they are focused on technical performance and optimizing the performance metrics of the models themselves. However, to advance the field further to clinical translation, there is a substantial need to expand into stages 2 and 3 and place a greater emphasis on implementation science. AI-enabled diabetic retinopathy screening is one area where ophthalmology has advanced into clinical practice75,76 and is demonstrating impact (stage 3) from the perspective of enhancing screening rates. AI implementation in glaucoma, however, has not yet reached that stage. The articles reviewed here represent cutting-edge innovation in stage 2 studies focused on usability. These studies demonstrate that important progress has been made in advancing toward real-world clinical implementation of AI models in glaucoma, but that much more work is required before these models can be implemented into clinical practice.


1. Weinreb RN, Aung T, Medeiros FA. The pathophysiology and treatment of glaucoma: a review. JAMA. 2014;311:1901–1911.
2. Thompson AC, Jammal AA, Medeiros FA. A review of deep learning for screening, diagnosis, and detection of glaucoma progression. Transl Vis Sci Technol. 2020;9:42.
3. Ahmad BU, Kim JE, Rahimy E. Fundamentals of artificial intelligence for ophthalmologists. Curr Opin Ophthalmol. 2020;31:303–311.
4. Chaurasia AK, Greatbatch CJ, Hewitt AW. Diagnostic accuracy of artificial intelligence in glaucoma screening and clinical practice. J Glaucoma. 2022;31:285–299.
5. Schuman JS, Cadena MDLAR, McGee R, et al. A case for the use of artificial intelligence in glaucoma assessment. Ophthalmol Glaucoma. 2022;5:3–13.
6. Campbell CG, Ting DSW, Keane PA, et al. The potential application of artificial intelligence for diagnosis and management of glaucoma in adults. Br Med Bull. 2020;134:21–33.
7. Devalla SK, Liang Z, Pham TH, et al. Glaucoma management in the era of artificial intelligence. Br J Ophthalmol. 2020;104:301.
8. Ahn JM, Kim S, Ahn K-S, et al. A deep learning model for the detection of both advanced and early glaucoma using fundus photography. PLoS One. 2018;13:0207982.
9. Phene S, Dunn RC, Hammel N, et al. Deep learning and glaucoma specialists: the relative importance of optic disc features to predict glaucoma referral in fundus photographs. Ophthalmology. 2019;126:1627–1639.
10. Shibata N, Tanito M, Mitsuhashi K, et al. Development of a deep residual learning algorithm to screen for glaucoma from fundus photography. Sci Rep. 2018;8:1–9.
11. Devalla SK, Chin KS, Mari J-M, et al. A deep learning approach to digitally stain optical coherence tomography images of the optic nerve head. Invest Ophthalmol Vis Sci. 2018;59:63–74.
12. Wen JC, Lee CS, Keane PA, et al. Forecasting future Humphrey visual fields using deep learning. PLoS One. 2019;14:e0214875.
13. Berchuck SI, Mukherjee S, Medeiros FA. Estimating rates of progression and predicting future visual fields in glaucoma using a deep variational autoencoder. Sci Rep. 2019;9:1–12.
14. Thompson AC, Jammal AA, Medeiros FA. a review of deep learning for screening, diagnosis, and detection of glaucoma progression. Transl Vis Sci Technol. 2020;9:42.
15. Christopher M, Nakahara K, Bowd C, et al. Effects of study population, labeling and training on glaucoma detection using deep learning algorithms. Transl Vis Sci Technol. 2020;9:27.
16. Miotto R, Wang F, Wang S, et al. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform. 2017;19:1236–1246.
17. van den Brandt A, Christopher M, Zangwill LM, et al. GLANCE: visual analytics for monitoring glaucoma progression. VCBM. 2020:85–96.
18. Engelbrecht L, Botha A, Alberts R. Designing the visualization of information. Int J Image Graph. 2015;15:1540005.
19. Ribeiro MT, Singh S, Guestrin C. “Why should I trust you?” Explaining the predictions of any classifier. arXiv. 2016;8:1135–1144.
20. Wilbanks BA, Langford PA. A review of dashboards for data analytics in nursing. Comput Informatics Nurs. 2014;32:545–549.
21. Tong Y, Lu W, Yu Y, et al. Application of machine learning in ophthalmic imaging modalities. Eye Vis. 2020;7:22.
22. Li L, Xu M, Liu H, et al. A large-scale database and a CNN model for attention-based glaucoma detection. IEEE Trans Med Imaging. 2020;39:413–424.
23. Christopher M, Belghith A, Bowd C, et al. Performance of deep learning architectures and transfer learning for detecting glaucomatous optic neuropathy in fundus photographs. Sci Rep. 2018;8:16685.
24. Kucur SS, Hollo G, Sznitman R. A deep learning approach to automatic detection of early glaucoma from visual fields. PLoS One. 2018;13:e0206081.
25. Liu H, Li L, Wormstone IM, et al. Development and validation of a deep learning system to detect glaucomatous optic neuropathy using fundus photographs. JAMA Ophthalmol. 2019;137:1353–1360.
26. Ran AR, Cheung CY, Wang X, et al. Detection of glaucomatous optic neuropathy with spectral-domain optical coherence tomography: a retrospective training and validation deep-learning analysis. Lancet Digit Health. 2019;1:172–182.
27. Ajitha S, Akkara JD, Judy MV. Identification of glaucoma from fundus images using deep learning techniques. Indian J Ophthalmol. 2021;69:2702–2709.
28. Oh S, Park Y, Cho KJ, et al. Explainable machine learning model for glaucoma diagnosis and its interpretation. Diagnostics (Basel). 2021;11:510.
29. Dixit A, Yohannan J, Boland MV. Assessing glaucoma progression using machine learning trained on longitudinal visual field and clinical data. Ophthalmology. 2021;128:1016–1026.
30. Yu HH, Maetschke SR, Antony BJ, et al. Estimating global visual field indices in glaucoma by combining macula and optic disc OCT scans using 3-dimensional convolutional neural networks. Ophthalmol Glaucoma. 2021;4:102–112.
31. Li F, Su Y, Lin F, et al. A deep-learning system predicts glaucoma incidence and progression using retinal photographs. J Clin Invest. 2022;132:e157968.
32. Huang X, Jin K, Zhu J, et al. A structure-related fine-grained deep learning system with diversity data for universal glaucoma visual field grading. Front Med (Lausanne). 2022;9:832920.
33. Maetschke S, Antony B, Ishikawa H, et al. A feature agnostic approach for glaucoma detection in OCT volumes. PLoS One. 2019;14:e0219126.
34. Baxter SL, Saseendrakumar BR, Paul P, et al. Predictive analytics for glaucoma using data from the All of Us research program. Am J Ophthalmol. 2021;227:74–86.
35. Christopher M, Bowd C, Belghith A, et al. Deep learning approaches predict glaucomatous visual field damage from OCT optic nerve head en face images and retinal nerve fiber layer thickness maps. Ophthalmology. 2020;127:346–356.
36. Kamal MS, Dey N, Chowdhury L, et al. Explainable AI for glaucoma prediction analysis to understand risk factors in treatment planning. IEEE Transactions on Instrumentation and Measurement. 2022;71:1–9.
37. Chayan TI, Islam A, Rahman E, et al. Explainable AI based glaucoma detection using transfer learning and LIME. IEEE Asia-Pacific Conference on Computer Science and Data Engineering 2022:1-6.
38. Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. Eur Confer Comput Vis. 2014;8689:818–833.
39. Zhou B, Khosla A, Lapedriza A, et al. Learning deep features for discriminative localization. arXiv. 2016:2921–2929.
40. Zintgraf LM, Cohen TS, Adel T, et al. Visualizing deep neural network decisions: prediction difference analysis. arXiv preprint. 2017;arXiv:170204595.
41. Owsley C, McGwin G, Lee DJ, et al. Diabetes eye screening in urban settings serving minority populations: detection of diabetic retinopathy and other ocular findings using telemedicine. JAMA Ophthalmol. 2015;133:174–181.
42. Miller SE, Thapa S, Robin AL, et al. Glaucoma screening in Nepal: cup-to-disc estimate with standard mydriatic fundus camera compared to portable nonmydriatic camera. Am J Ophthalmol. 2017;182:99–106.
43. Kazemian P, Lavieri MS, Van Oyen MP, et al. Personalized prediction of glaucoma progression under different target intraocular pressure levels using filtered forecasting methods. Ophthalmology. 2018;125:569–577.
44. Garway-Heath DF, Zhu H, Cheng Q, et al. Combining optical coherence tomography with visual field data to rapidly detect disease progression in glaucoma: a diagnostic accuracy study. Health Technol Assess. 2018;22:1–106.
45. Lee EB, Wang SY, Chang RT. Interpreting deep learning studies in glaucoma: unresolved challenges. Asia Pac. J Ophthalmol (Phila). 2021;10:261–267.
46. Date RC, Jesudasen SJ, Weng CY. Applications of deep learning and artificial intelligence in retina. Int Ophthalmol Clin. 2019;59:39–57.
47. Adlung L, Cohen Y, Mor U, et al. Machine learning in clinical decision making. Med. 2021;2:642–665.
48. Futoma J, Simons M, Panch T, et al. The myth of generalisability in clinical research and machine learning in health care. Lancet Digit Health. 2020;2:489–492.
49. De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24:1342–1350.
50. Lee CS, Tyring AJ, Deruyter NP, et al. Deep-learning based, automated segmentation of macular edema in optical coherence tomography. Biomed Opt Express. 2017;8:3440–3448.
51. Yanagihara RT, Lee CS, Ting DSW, et al. Methodological challenges of deep learning in optical coherence tomography for retinal diseases: a review. Transl Vis Sci Technol. 2020;9:11.
52. Chen D, Ran EA, Tan TF, et al. Applications of artificial intelligence and deep learning in glaucoma. Asia Pac. J Ophthalmol (Phila). 2023;12:80–93.
53. Baxter SL, Lee AY. Gaps in standards for integrating artificial intelligence technologies into ophthalmic practice. Curr Opin Ophthalmol. 2021;32:431–438.
54. Baxter SL, Reed AA, Maa A, et al. Ocular health and national data standards: a case for including visual acuity in the United States Core Data for Interoperability (USCDI). Ophthalmol Sci. 2022;2:100210.
55. Halfpenny W, Baxter SL. Towards effective data sharing in ophthalmology: data standardization and data privacy. Curr Opin Ophthalmol. 2022;33:418–424.
56. Muddamsetty SM, Jahromi MNS, Moeslund TB Del Bimbo A, Cucchiara R, Sclaroff S, et al. Expert level evaluations for explainable AI (XAI) methods in the medical domain. Pattern Recognition ICPR International Workshops and Challenges. Cham: Springer International Publishing; 2021;12663:35–46.
57. Karargyris A, Kashyap S, Lourentzou I, et al. Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development. Sci Data. 2021;8:92.
58. Fan R, Alipour K, Bowd C, et al. Detecting glaucoma from fundus photographs using deep learning without convolutions: transformer for improved generalization. Ophthalmol Sci. 2023;3:100233.
59. Christopher M, Bowd C, Proudfoot JA, et al. Deep learning estimation of 10-2 and 24-2 visual field metrics based on thickness maps from macula OCT. Ophthalmology. 2021;128:1534–1548.
60. Molnar C. Interpretable Machine Learning: A Guide for Making Black-Box Models Explainable. 2023. Accessed June 20, 2023.
61. Guidotti R, Monreale A, Ruggieri S, et al. A survey of methods for explaining black box models. ACM Comput Surv. 2018;51:1–42.
62. Shapley L. Quota solutions op n-person games1. Edited by Emil Artin and Marston Morse. 1953:343.
63. Sahakyan M, Aung Z, Rahwan T. Explainable artificial intelligence for tabular data: a survey. IEEE Access. 2021;9:135392–135422.
64. Craven M, Shavlik J. Extracting tree-structured representations of trained networks. Adv Neural Inf Process Syst. 1995;8:24–30.
65. Read-Brown S, Hribar MR, Reznick LG, et al. Time requirements for electronic health record use in an academic ophthalmology center. JAMA Ophthalmol. 2017;135:1250–1257.
66. Baxter SL, Gali HE, Mehta MC, et al. Multicenter analysis of electronic health record use among ophthalmologists. Ophthalmology. 2021;128:165–166.
67. Stagg B, Stein JD, Medeiros FA, et al. Interests and needs of eye care providers in clinical decision support for glaucoma. BMJ Open Ophthalmol. 2021;6:e000639.
68. Bai E, Song SL, Fraser HSF, et al. A graphical toolkit for longitudinal dataset maintenance and predictive model training in health care. Appl Clin Inform. 2022;13:56–66.
69. Martins AI, Rosa AF, Queirós A, et al. European Portuguese validation of the system usability scale (SUS). Procedia Comput Sci. 2015;67:293–300.
70. Lewis JR. Psychometric evaluation of the PSSUQ using data from five years of usability studies. Int J Hum Comput Interact. 2002;14:463–488.
71. Chen JS, Baxter SL, Brandt AVD, et al. Usability and clinician acceptance of a deep learning-based clinical decision support tool for predicting glaucomatous visual field progression. J Glaucoma. 2023;32:151–158.
72. Denny JC, Rutter JL, Goldstein DB, et al. The “All of Us” research program. N Engl J Med. 2019;381:668–676.
73. Maier C, Kapsner LA, Mate S, et al. Patient cohort identification on time series data using the OMOP common data model. Appl Clin Inform. 2021;12:57–64.
74. American Medical Informatics Association. AMIA 2023 Artificial Intelligence Evaluation Showcase. Accessed June 20, 2023.
75. Grzybowski A, Brona P, Lim G, et al. Artificial intelligence for diabetic retinopathy screening: a review. Eye. 2020;34:451–460.
76. Padhy SK, Takkar B, Chawla R, et al. Artificial intelligence in diabetic retinopathy: a natural step to the future. Indian J Ophthalmol. 2019;67:1004–1009.

deep learning; explainability; glaucoma; interface; visualization

Supplemental Digital Content

Copyright © 2023 Asia-Pacific Academy of Ophthalmology. Published by Wolters Kluwer Health, Inc. on behalf of the Asia-Pacific Academy of Ophthalmology.