Utilizing human intelligence in artificial intelligence for detecting glaucomatous fundus images using human-in-the-loop machine learning : Indian Journal of Ophthalmology

Secondary Logo

Journal Logo

Featured Article, Artificial Intelligence, Original Article

Utilizing human intelligence in artificial intelligence for detecting glaucomatous fundus images using human-in-the-loop machine learning

Ramesh, Prasanna Venkatesh; Subramaniam, Tamilselvan1; Ray, Prajnya2; Devadas, Aji Kunnath2; Ramesh, Shruthy Vaishali3; Ansar, Sheik Mohamed4; Ramesh, Meena Kumari5; Rajasekaran, Ramesh6; Parthasarathi, Sathyan7

Author Information
Indian Journal of Ophthalmology: April 2022 - Volume 70 - Issue 4 - p 1131-1138
doi: 10.4103/ijo.IJO_2583_21
  • Open

Abstract

Artificial intelligence (AI) in the field of glaucoma diagnosis is getting increasingly popular, with the usage of a basic convolutional neural network (CNN) for enhancing and upgrading patient care.[123456] A well-trained CNN can identify various pathologies of the fundus.[7] However, the previously reported AI models with CNN were criticized by the ophthalmological community due to the black box dilemma. In the black box dilemma, the CNN-based systems analyzed the data based upon their own self-generated rules. The real rationale behind why and how the predictions were made in the first place was not clearly understood.[8] Any ophthalmologist would always expect the AI results to not only predict the diagnosis but also predict and locate detailed signs in the fundus images in a detailed manner, such as splinter hemorrhages, glaucomatous optic atrophy, vertical glaucomatous cupping, peripapillary atrophy, and retinal nerve fiber layer (RNFL) defect. For any successful AI algorithm, the base starts with data annotation.[9] If there is no annotated data in the first place, then there is no machine learning algorithm to detect the image. This is where the human interface comes into play in the process of data annotation. Annotating data, also known as labeling data, is the first and most important step in creating a successful AI model.[9] One such image annotation tool that can be utilized by all for comprehensive and customized data labeling is Microsoft Visual Object Tagging Tool (VoTT).[10] Customized annotation of optic nerve head and retinal nerve fiber layer (RNFL) images can prove useful in not only identifying glaucomatous discs but also in predicting various segmentations of the glaucomatous cup, disc, peripapillary atrophy, and RNFL defect from the background fundus.[11] This methodology of annotations, though time-consuming, can be utilized by all ophthalmologists to create their own human-in-the-loop (HITL) AI model.

In this paper, we have employed a novel CNN approach in not only diagnosing glaucoma from TrueColor confocal fundus images but also in predicting and locating detailed signs in the glaucomatous fundus images. The idea was to overcome the black box dilemma by creating an explainable AI with HITL machine learning.

Methods

We used a well-curated private dataset consisting of 1,400 high-quality confocal fundus images in the interest of building an efficient AI model to aid the ophthalmologists in practice. The 1,400 images were split into 80% (1,120 images) and 20% (280 images) for training and testing, respectively. A team of humans (comprising two glaucoma specialists and two optometrists) annotated the 1,120 training images with predefined 26 medical conditions pertaining to glaucoma by using Microsoft VoTT [Fig. 1].

F1
Figure 1:
(a) Sample fundus photograph of an eye with glaucomatous cupping and retinal nerve fiber layer defect utilized for annotating. (b) Customized labeling of the optic cup (green-dotted area). (c) Customized labeling of bayoneting signs (red-dotted area). (d) Customized labeling of superior notching (blue-dotted area). (e) Customized labeling of the optic disc (pink-dotted area). (f) Customized labeling of peripapillary atrophy (gray-dotted area). (g) Customized annotation of the retinal nerve fiber layer (RNFL) defect (green-dotted area). (h) Complete annotation of a fundus image with glaucomatous changes in the optic nerve head and RNFL region

Process of annotations

From June 2021 to July 2021 on a daily basis, annotation of the existing glaucoma fundus image dataset was performed with a team of two glaucoma specialists and two optometrists. Only high-resolution TrueColor confocal images were utilized for annotations. We shortlisted the Sansten AI toolbox that supports the human-in-the-loop annotation and human validation process [Fig. 2]. The annotation team created a predefined list of 26 glaucomatous fundus signs that were relevant to identify the glaucomatous damages that existed in the fundus image.

F2
Figure 2:
Image showing the methodology workflow of this study

Based on their clinical schedule, the ophthalmology AI team of Mahathma Eye Hospital Private Limited, Trichy conducted the annotation process with approximately 40–50 image annotations per day. The annotation time varies from image to image and also varies according to the shapes of annotations and the number of glaucomatous findings. It includes multiple bounding boxes, circular, and freestyle shapes. Few sample source images with multiple glaucomatous fundus signs and respective annotated images are shown in Fig. 1. After human annotations, the dataset was sent for training. The primary expectation during training was to employ the AI model with glaucomatous damage detection from the fundus images by using computer-aided object detection algorithms.[12]

This study involved human participants, where there were no direct interactions with humans, as their fundus images were only used for the study. Ethics approval for this study was obtained from an Independent Ethics Committee (Institutional Review Board), and the study adhered to the tenets of the Declaration of Helsinki. Informed written consent was obtained from all the study participants.

Algorithm employed for training - You only look once version 5 (YOLOv5)

The computer-aided object detection algorithm used here to precisely identify the underlying conditions is YOLOv5. YOLOv5 looks at the fundus as numerous segments rather than viewing the fundus image as a whole. This customized tool was able to identify and illustrate customized anchor boxes over multiple areas within the glaucomatous fundus images.

For performing the final detection part, we used the model head network algorithm (as shown in Table 1), which generates the final output vectors with objectness scores, class probabilities, and bounding boxes that apply the anchor boxes on glaucomatous fundus features [Fig. 3].

T1
Table 1:
The model head used to perform the final detection part
F3
Figure 3:
A sample of the batch size of eight image predictions during training, consisting of class probabilities, objectness scores, and bounding boxes

Statistics and Results of the Testing Images

The 1,400 images were split up into 80% and 20% for training and testing, respectively [Table 2]. The descriptive statistics of the 280 testing images [Table 3] in the form of frequencies and percentages were calculated. The collected MS-Excel coded data were analyzed using SPSS (Statistical Package for Social Scientist; version 20, IBM USA), which is later calculated in the form of frequencies, and percentages. The 2D and 3D distributions of the annotation dataset are depicted in Fig. 4. The AI tool was evaluated with mean average precision (mAP), which was calculated by taking the average precision (AP) over all classes and/or over all intersection over union (IoU) thresholds. The IoU thresholds were calculated by dividing the area of overlap with the area of union.

T2
Table 2:
The private dataset of the high-resolution fundus images split-up into 80% and 20% for training and testing respectively
T3
Table 3:
The split-up of the 280 testing images into three different testing groups (90+100+90). Test 1 predictions were performed after the first fifteen days of annotations. Test 2 predictions were performed after the next fifteen days of annotations. Test 3 predictions were performed after the final one month of annotation. With time there is a surge in the no. of correct predictions as the machine is learning with more data
F4
Figure 4:
(a) 2D distribution graph showing the annotations which were repeatedly used depicted as spikes. (b and c) 3D distribution graph showing repeated annotations seen as warmer colors

The initial training [Fig. 5a] showed the [email protected] to be 25% or below and [email protected] to be 10% or below, and the final training [Fig. 5b] achieved a better accuracy of [email protected] to be 60% or below and [email protected] to be 27% or below, respectively. The 280 images used for testing were split into 90, 100, and 90 for three test runs done once every 15 days. These tests showed consistent increments in the accuracy from 94.44% to 98.89% in predicting the diagnosis, severity, and detailed findings [Table 3]. Test 1 predictions were performed following the first 15 days of annotations. Test 2 predictions were performed following the next 15 days of annotations. Test 3 predictions were performed following the next 15 days of annotations. From test 1 to test 3, there was an increment in the sensitivity from 90.24% to 100% and in the specificity from 97.96% to 98.14% (as shown in Tables 4, 5, and 6, respectively).

F5
Figure 5:
(a) Image showing the quality of training during the beginning of training with respect to GLoU, objectness. classification, precision, and recall. (b) Image showing the quality of improvement at the end of the training with respect to GLoU, objectness, classification, precision, and recall
T4
Table 4:
This table shows the distribution of specificity and sensitivity of the 90 images in Test 1. TP - True Positive; FN - False Negative; TN - True Negative; FP - False Positive
T5
Table 5:
This table shows the distribution of specificity and sensitivity of the 100 images in Test 2. TP - True Positive; FN - False Negative; TN - True Negative; FP - False Positive
T6
Table 6:
This table shows the distribution of specificity and sensitivity of the 90 images in Test 3. TP - True Positive; FN - False Negative; TN - True Negative; FP - False Positive

Discussion

Glaucoma, also called the silent thief of sight, is one of the leading causes of irreversible blindness in developing countries like India. It typically increases the intraocular pressure inside the eye and damages the optic nerve which results in blindness. In developing countries like India, diagnosis of glaucoma is still a challenge; this is where a good explainable AI model with reliable sensitivity and specificity will aid in rapid screening and detection.

Review of Literature - What was previously known or reported

Akkara et al.[13] explained several studies and their importance pertaining to AI and machine learning (ML) in glaucoma from color fundus photographs. Li et al.[1] stated that they achieved high sensitivity (95.6%) and specificity (92%) in detecting referable glaucomatous optic neuropathy from fundus photographs by utilizing deep learning (DL) algorithms. They also mentioned confounding factors and their influence on the results, such as high myopia led to false negatives and physiological cupping led to false positives in their study. Al-Aswad et al. assessed Pegasus (Visulytix Ltd., London, UK), a DL technology to identify glaucomatous optic neuropathy from color fundus photographs, and showed that it outmatched 5 out of 6 ophthalmologists who participated in that study.[2] Another AI that identifies glaucomatous fundus photographs was stated by Netra.AI (Leben Care Technologies Pte., Ltd).[13]

Several other studies also explained several techniques to identify glaucomatous optic neuropathy from optic disc fundus photographs. Cerentini et al. used GoogLeNet to develop an automatic classification method to detect glaucomatous changes in fundus images.[4] Haleem et al. used a novel technique for automatic boundary detection of optic disc and cup to aid automatic glaucoma diagnosis from fundus photos.[5] Thompson et al. used deep learning to measure the NRR loss from optic disc photos.[6] Indian start-up, Kalpah Innovations (Vishakhapatnam, India) released the Retinal Image Analysis – Glaucoma (RIA-G) cloud-based software in 2016 to analyze fundus images to look for the likelihood of glaucoma.[14]

The new facts reported in this study

The major limitation in all the above models was that they did not overcome the black box dilemma. Most of them were having good performance, but they were less explainable in terms of how the final glaucoma diagnosis was arrived at. We have created an explainable AI model with good interpretability, which helps to overcome the black box dilemma. It learns continuously with regular improvement in the accuracy of predictions, in not only identifying the condition but also predicting all detailed signs in the glaucomatous fundus with bounding boxes. In addition, this AI toolbox monitors the accuracy of predictions continuously and utilizes human feedback [Fig. 3] to calibrate the models, which help in identifying the conditions and eliminating the error rate over time.[8] Studies have noted that under the influence of the data quality, the model’s performance is affected.[15] Thus, TrueColor confocal fundus photographs were utilized for the research, which yields good-quality high-resolution images, thus improving the interpretability.

Human annotations - HITL

In this study, few annotations were repeatedly used, such as notching, cupping, laminar dot sign, and bayonetting sign, along with RNFL defects. In the 2D graph, these annotations were seen as spikes [Fig. 4a], and in the 3D graph, these repeated annotations were seen as warmer colors [Fig. 4b&c]. The distribution of the annotation bounding boxes reveals that the smaller bounding boxes were used more frequently (depicted by the warmer colors). The main reason for that in this study is that we used multiple human annotations and many detailed small signs such as bayonetting signs, baring of circumlinear vessels were annotated separately with small bounding boxes.

This novel VoTT annotation tool employs human glaucoma specialists and optometrists to do the complex labeling of a glaucomatous fundus, where every aspect of the glaucomatous disc and background fundus is labeled by them. Though time-consuming, there is a pairing of humans with the machine and not the supremacy of one over the other, thus catering to a team effort involving machines and humans. Customized human-led data annotation process of labeling datasets can pave the path for AI training and predictions, where the pairing of humans and machines using HITL machine learning can yield good results. Thus, there is definitely a big role for HITL machine learning, especially in medical science, where it is unwise to have a black box problem. This will not only speed up machine learning but also make it more accurate, reliable, and trustworthy.[16]

You only look once version 5 - Object detection methodology

We used a YOLOv5-based object detection methodology to precisely train the underlying glaucomatous fundus images. YOLOv5 algorithm works accurately by drawing a bounding box around the fundus signs by using the bounding box regressor.[17] YOLOv5-based object detection methodology helps to train and identify glaucomatous fundi in less than a second. In addition, with continuous training, there was an increase in the number of correct predictions, resulting in subsequent incremental machine learning. The system automatically identifies glaucomatous damage along with subtle details in the predicted images [Fig. 6a-d], such as notching, cupping, laminar dot sign, and bayonetting sign, along with RNFL defects.

F6
Figure 6:
(a-d) Image depicting the prediction done by the trained AI module on feeding a new fundus image not previously trained by the tool, after the AI tool has been primed and trained. These images were predicted with diagnosis, severity, and all their detailed findings seen in glaucomatous damage

Incremental improvements from Test 1 To Test 3

The objectness loss (i.e., objects marked wrongly by AI) was almost the same in test 1 and test 3 (3%). Similarly, the classification loss (i.e., classification marked wrongly by AI) was also the same in test 1 and test 3 (8.5%). The success of this study was that the precision increased from 22% in test 1 to 45% in test 3. The recall increased from 35% in test 1 to 70% in test 3. The initial training showed that the [email protected] was 25% or below and the [email protected] to be 10% or below, and the final training module achieved a better accuracy of [email protected] to be 60% or below and [email protected] to be 27% or below, respectively. Among the 280 testing images, from test 1 to test 3, the number of correct glaucomatous predictions that included all detailed signs increased from 94.44% to 98.89%. Moreover, the number of images in which some of the detailed signs were wrongly predicted decreased from 5.56% to 1.11%. Although objectness loss and classification loss remained the same in both the tests, all the other parameters, such as precision, recall, mAP, and the glaucomatous predictions were in favor of a good machine learning AI model.

In this study, the test data and training data were different and never integrated.

The same dataset could have been utilized for the subsequent tests, but we chose to run our tests with different independent datasets in all three instances to create a real-time practical scenario. Just like the testing dataset, the training dataset was also split into three groups as 360, 400, and 360 images. In total, 360 images were trained before carrying out test 1. Thus, by the time the second test was ready to be carried out, the training data (i.e., annotated images) had increased from 360 to 760 images. Similarly, by the time the third test was ready to be carried out, the training data had increased to 1,120. A similar number of epochs was run in all the tests. Over the course of three tests, it can be seen that the wrong predictions were decreasing and the correct predictions, including the detailed details (i.e., clinical glaucomatous fundus signs) were increasing.

Currently, we have provided this AI glaucoma detection tool free of cost along with the confocal fundus scanner; moreover, its incorporation into the fundus scanner is underway. Thus, the prerequisites for using this tool would only be the possession of a confocal fundus scanner and can be used by any ophthalmologist in both private and institutional practice. The advantage of this glaucoma AI tool is that it can be used as a screening tool even in the absence of a glaucoma specialist. Scaling this novel system across the world will be more beneficial for developing countries and Pacific Island countries such as Cook Islands, Micronesia, Nauru, Niue, and Tuvalu where there are no ophthalmologists present.[18]

Limitation and suggestions

The quality of the fundus images obtained with lower-resolution fundus cameras might find it challenging to achieve similar results with this AI model. Furthermore, multimodal clinical images, such as optical coherence tomography, visual fields, and non-invasive angiography, along with fundus images, should be integrated to build a generalized and more reliable AI diagnostics system.

Conclusion

Utilizing human intelligence in AI for detecting glaucomatous fundus images using HITL machine learning has never been reported in the literature before. By employing HITL machine learning, we have created an explainable AI, which has overcome the black box dilemma. This study also shows that with constant human training the prediction accuracy can be increased via a feedback mechanism.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

Acknowledgements

We are grateful to Mr. Pragash Michael Raj, Consultant, Mahathma Centre of Moving Images Private Limited, Trichy, Tamil Nadu, India for the compilation of the images used in this manuscript. We sincerely thanks Dr. Balamurugan Ramananthan, Director, Kovai Diabetic Speciality Centre, Coimbatore, Tamil Nadu, India for his constant support and guidance throughout this study.

References

1. Li Z, He Y, Keel S, Meng W, Chang RT, He M Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs Ophthalmology 2018 125 1199 206
2. Al-Aswad LA, Kapoor R, Chu CK, Walters S, Gong D, Garg A, et al. Evaluation of a deep learning system for identifying glaucomatous optic neuropathy based on color fundus photographs J Glaucoma 2019 28 1029 34
3. Devi S It is time to embrace artificial intelligence TNOA J Ophthalmic Sci Res 2021 59 231 2
4. Cerentini A, Welfer D, Cordeiro d'Ornellas M, Pereira Haygert CJ, Dotto GN Automatic identification of glaucoma using deep learning methods Stud Health Technol Inform 2017 245 318 21
5. Haleem MS, Han L, Hemert JV, Li B, Fleming A, Pasquale LR, et al. A novel adaptive deformable model for automated optic disc and cup segmentation to aid glaucoma diagnosis J Med Syst 2017 42 20
6. Thompson AC, Jammal AA, Medeiros FA A deep learning algorithm to quantify neuroretinal rim loss from optic disc photographs Am J Ophthalmol 2019 201 9 18
7. Diabetic Retinopathy Detection Available from: https://kaggle.com/c/diabetic-retinopathy-detection Last accessed on 2021 Mar 12
8. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D Key challenges for delivering clinical impact with artificial intelligence BMC Med 2019 17 195
9. What Is Data Annotation And Why Does It Matter? Available from: https://www.telusinternational.com/articles/what-is-data- annotation Last accessed on 2021 Jul 01
10. Image Annotation Tools: Which One to Pick in 2020? cited 2022 Feb 26 Available from: https://bohemian.ai/blog/image-annotation-tools-which-one-pick-2020/
11. Ramesh PV, Ramesh SV, Aji K, Ray P, Tamilselvan S, Parthasarathi S, et al. Modeling and mitigating human annotations to design processing systems with human-in-the-loop machine learning for glaucomatous defects: The future in artificial intelligence Indian J Ophthalmol 2021 69 2892 4
12. Mohammed MA, Abd Ghani MK, Arunkumar N, Hamed RI, Abdullah MK, Burhanuddin MA A real time computer aided object detection of nasopharyngeal carcinoma using genetic algorithm and artificial neural network based on Haar feature fear Future Gener Comput Syst 2018 89 539 47
13. Akkara JD, Kuriaaose A Role of artificial intelligence and machine learning in ophthalmology Kerala J Ophthalmol 2019 31 150 60
14. Retinal Image Analysis – Glaucoma http://www.kalpah.com/RIAG_brochure.pdf Last accessed on 2020 May 06
15. Ting DSW, Pasquale LR, Peng L, Campbell JP, Lee AY, Ramanet R, et al. Artificial intelligence and deep learning in ophthalmology Br J Ophthalmol 2019 103 167 75
16. Akkara JD, Kuriakose A Commentary: Artificial intelligence for everything: Can we trust it? Indian J Ophthalmol 2020 68 1346 7
17. Nishad G You Only Look Once (YOLO): Implementing YOLO in less than 30 lines of Python Code Medium 2019 Available from: https://towardsdatascience.com/you-only-look -once-yolo-implementing-yolo-in-less-than-30-lines -of-python-code-97fb9835bfd2 Last accessed on 2020 Dec 04
18. Resnikoff S, Lansingh VC, Washburn L, Felch W, Gauthier T-M, Taylor HR, et al. Estimated number of ophthalmologists worldwide (International Council of Ophthalmology update): Will we meet the needs? Br J Ophthalmol 2020 104 588 92
Keywords:

Artificial Intelligence; Confocal Fundus Images; Glaucomatous Cupping; HITL; Machine Learning

Copyright: © 2022 Indian Journal of Ophthalmology