Pegasus outperformed five of the six ophthalmologists in terms of diagnostic performance, and there was no statistically significant difference between the deep learning system and the “best case” consensus between the ophthalmologists. The agreement between Pegasus and gold standard was 0.715, while the highest ophthalmologist agreement with the gold standard was 0. 613. Furthermore, the high sensitivity of Pegasus makes it a valuable tool for screening patients with glaucomatous optic neuropathy.
To evaluate the performance of a deep learning system for the identification of glaucomatous optic neuropathy.
Six ophthalmologists and the deep learning system, Pegasus (Visulytix Ltd., London UK), graded 110 color fundus photographs in this retrospective single-center study. Patient images were randomly sampled from the Singapore Malay Eye Study (SiMES). Ophthalmologists and Pegasus were compared to each other and to the original clinical diagnosis given by the SiMES, which was defined as the gold standard. Pegasus’ performance was compared to the “best case” consensus scenario, which was the combination of ophthalmologists whose consensus opinion most closely matched the gold standard. The performance of the ophthalmologists and Pegasus, at binary classification of non-glaucoma versus glaucoma from fundus photographs, was assessed in terms of sensitivity, specificity and the Area Under the Receiver Operating Characteristic curve (AUROC), and the intra- and inter-observer agreements were determined.
Pegasus achieved an AUROC of 92.6% compared to ophthalmologist AUROCs that ranged from 69.6% to 84.9% and the “best case” consensus scenario AUROC of 89.1%. Pegasus had a sensitivity of 83.7% and a specificity of 88.2%, whereas the ophthalmologists’ sensitivity ranged from 61.3% to 81.6% and specificity ranged from 80.0% to 94.1%. The agreement between Pegasus and gold standard was 0.715, while the highest ophthalmologist agreement with the gold standard was 0.613. Intra-observer agreement ranged from 0.62 to 0.97 for ophthalmologists and was perfect (1.00) for Pegasus. The deep learning system took approximately 10% of the time of the ophthalmologists in determining classification.
Pegasus outperformed five of the six ophthalmologists in terms of diagnostic performance, and there was no statistically significant difference between the deep learning system and the “best case” consensus between the ophthalmologists. The high sensitivity of Pegasus makes it a valuable tool for screening patients with glaucomatous optic neuropathy. Future work will extend this study to a larger sample of patients.
*Columbia University Medical Center, Harkness Eye Institute, New York, NY, USA
†Visulytix Ltd, Screenworks 22 Highbury Grove, Highbury East, London, N5 2EF, United Kingdom
‡King’s College Hospital NHS Foundation Trust. Denmark Hill, London, SE5 9RS, United Kingdom
Financial Support: This research was supported partially by Save the Vision Foundation and Research to Prevent Blindness Foundation. It did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The software Pegasus (Visulytix Ltd., London UK) was provided, free of charge, for the purposes of this research.
Conflict of Interest: Rogers TW and Jaccard N are employees of Visulytix. Trikha S has stock options and gets speakers fees from Visulytix. The remaining authors declare that there are no conflicts of interest related to this article.
Reprints: Lama A. Al-Aswad, MD, MPH, Harkness Eye Institute, Columbia University, 635 W. 165th Street, New York, NY 1002 (e-mail: email@example.com).
Received March 27, 2019
Accepted June 15, 2019