Artificial intelligence (AI) technologies promise a revolution in the way we prevent, diagnose, and manage disease. The potential for AI to transform medicine is increasingly becoming part of popular discourse, although the origin of these technologies can be traced back to Alan Turing's research in the 1930s.1 The term “artificial intelligence” was first coined by John McCarthy in 1955 to describe the science of creating intelligent machines which replicate human behavior.2 It has many subtypes and modern examples include machine learning (ML), which learns automatically from data sets in the absence of explicitly programmed rules. A subset of ML is deep learning (DL), which trains itself using multiple layers of neural networks which are adaptable programming units inspired by the structure of human neurons.3
Most modern progress has been made with DL, which has gained extraordinary attention since the publication of a seminal 2012 paper on image recognition.4 Deep learning relies on data moving through layers of neural networks and excels at identifying correlations within input data. Convolutional neural networks, a class of DL, are particularly well-suited for image recognition in highly image-driven medical specialties such as ophthalmology, dermatology, and radiology. These convolutional neural networks utilize vast quantities of labelled imaging data and have been at the forefront of AI research. Significant recent progress has been achieved in ophthalmology, as fundus photography and optical coherence tomography (OCT) are routinely used clinically to make a diagnosis. Not surprisingly, the first US Food and Drug Administration (FDA)-approved AI diagnostic device was in the field of ophthalmology.5
Ophthalmologists, as well as other medical professionals who monitor and manage patients at risk of developing ocular disease such as general practitioners and endocrinologists, therefore, have a unique opportunity to lead the way for the adoption of these technologies. The successful clinical adoption of AI will ultimately be achieved if it is implemented in environments which are safe, evidence-based, and meaningful in improving patient outcomes, and our experience may be instructive for medicine as a whole. Significant progress in the development of AI algorithms for ophthalmology has already been achieved and is extensively detailed by Balyen and Peto in this issue of the Journal.6 However, key challenges remain. This editorial will focus on the accuracy and challenges which must be resolved prior to the wide adoption of these technologies within clinical care.
Rapid advances have been achieved in the performance, imaging modalities, and multitude of diseases able to be detected by ML and DL algorithms. Recent publications have demonstrated AI's efficacy in the diagnosis of diabetic retinopathy (DR),5,7,8 age-related macular degeneration (AMD),9,10 glaucoma,11,12 and retinopathy of prematurity.13 Gulshan et al7 in 2016 trained an AI diagnostic algorithm using 128,175 de-identified retinal images from combined US and Indian databases, achieving an area under the curve (AUC) of 0.99 for the detection of referable DR. Similarly, Li et al8 trained and tested a DL algorithm for referable DR using over 100,000 images, achieving an AUC of 0.99 on internal validation and 0.955 on external validation using an independent multi-ethnic data set. Ting et al9 in 2017 collated 494,661 retinal images to train an AI system which was externally tested in 11 multi-ethnic cohorts, achieving clinically acceptable diagnostic performance in referable DR, as well as high accuracies in the diagnosis of glaucoma and AMD which may be concurrently present in the same eye. Following the successful development of AI algorithms using retinal photographs, De Fauw et al14 at DeepMind in 2018 applied DL to OCT, developing a model which automatically segmented tissue layers, achieving a diagnostic performance meeting and exceeding those of human expert graders for 50 common ophthalmic conditions.
Despite the promise of these research advances, significant deficits remain in the validation of these algorithms. Of note, many AI algorithms remain validated in silico with testing typically completed on retrospective data. Validation of diagnostic algorithms in real-world settings—with differing camera systems, operators, image quality—and in geographically and racially diverse settings, remains an emerging area.15 Algorithms have so far typically performed less accurately in prospective real-world studies compared with their in silico counterparts.3 The limitations of retrospective in silico validation are increasingly being recognized. As such, several research groups have commenced validation of these algorithms in real-world prospective trials.5,16-18 Notably in a recent review of AI research within all areas of medicine, prospective real-world clinical trials had only been completed in 5 disease areas, 2 of which were in ophthalmology.3 Furthermore, it is common for these algorithms to be trained and validated using retinal images which are given a subjective clinical label based on the consensus of multiple clinicians, rather than the ground truth diagnosis. This kind of labelling may be sufficient for performing straightforward tasks; however, disease classification or pattern recognition may require a more accurate ground truth diagnosis based on real-world clinical data. For example, the diagnosis of glaucoma is made based on several factors such as family history, intraocular pressure, anterior chamber angle, visual field results, and optic nerve appearance. Information such as a confirmed medical history, visual field damage, or evidence of progression is a far superior ground truth for diseases such as glaucoma and this type of high-level information will likely lead to the highest levels of diagnostic accuracy.
Significant attention has been devoted recently to the development of a regulatory framework for AI systems. Artificial intelligence algorithms are typically classified as “Software as a Medical Device” and may be assessed through the FDA's Software Precertification Program which has recently been piloted.19 Further, AI systems are designed to continuously adapt and improve over time as they receive and train with more data input. The performance of a DR screening system, for example, may improve over time as it adapts to the unique patient characteristics of a particular real-world site. Existing regulatory approvals have been for ‘locked box' systems where training and diagnostic thresholds have been completed and set, denying the adaptive nature of these systems. The first-ever regulatory-approved autonomous AI diagnostic system, the IDx-DR,5 has been approved using this initial pathway.20 To provide a potential framework which accounts for the evolving nature of these systems, the FDA has recently published a discussion paper for public feedback.21 Overall, the regulatory framework for AI systems is crystallizing, although elements are yet to be finalized.
To achieve FDA's approval, the team behind the IDx-DR system performed a prospective cohort study of 900 participants across 10 primary care sites in the US.5,20 The system was able to detect more-than-mild DR at a sensitivity and specificity of 87.2% and 90.7%, respectively,5 which ultimately led to it becoming the first-ever approved autonomous AI diagnostic system. Following this precedent, prospective and real-world validation trials of AI systems will be essential for future regulatory approval. Further to this, the uptake of these systems will be reliant on the creation of a commercial product that can be easily deployed and utilized by end-users. There is a plethora of published work on the performance of these AI systems but very few have developed a commercial product. Researchers from the Centre for Eye Research Australia have developed a fully functioning EyeGrader clinician interface, with integrated, automated diagnosis which produces an immediate grading report for clinicians that can function both online and offline. Unlike the IDx-DR system, EyeGrader is able to grade for 4 common blinding eye diseases including referable DR,8 suspect glaucoma,11 late-wet AMD, and cataract. The feasibility and acceptability of the EyeGrader system has been pilot-tested in real-world settings in Australia with great success. Over 95% of screened patients reported that they were either satisfied or very satisfied with the automated screening model.17 Early analysis of in-depth interviews suggests that the system is well-accepted by clinicians and has the potential to improve overall patient care (J Scheetz and MK He, unpublished).
As discussed above, AI algorithms have been shown to be able to carry out repeatable tasks with great accuracy, speed and consistency, including the automated diagnosis of fundal images and OCT scans. However, thought must be given to which clinical settings are most suitable for the adoption of AI systems given that close to 50% of eye diseases remain undiagnosed in countries such as Australia.22,23 It is likely that AI systems will be of most benefit in rural and remote centers where access to specialist care is limited, as well as opportunistic screening of at-risk individuals by non-eye health care professionals such as general practitioners and endocrinologists since many common blinding eye diseases are asymptomatic in their early stages. Whilst retinal imaging is an effective screening tool for the detection of many ocular conditions, it is highly dependent on image interpretation by clinical experts. The introduction of automated retinal grading has the potential to remove the barriers associated with previous retinal screening models. Such systems will give non-eye health care professionals the capacity to detect disease in patients who have previously had no access to ophthalmic care or are in the early stages of disease and are asymptomatic. This in turn will lead to more timely treatment and potentially reduce the burden of visual impairment and blindness.
Whilst there are many potential benefits to the introduction of AI systems there are significant ethical and legal considerations which remain, particularly surrounding possible harm which may arise from AI systems. Iatrogenic harm which may result from a mistaken algorithm used across populations can be immense. A key component to addressing these concerns is understanding how an AI algorithm reaches its decision. The immense complexity of AI algorithms means that the rationale for how these systems generate outputs may often not be interpretable nor understood—this is known as the ‘black box' problem. This opacity, particularly if an algorithm reaches a false conclusion, may limit the ability of end users to identify root causes and prevent future errors from recurring. Concern for the intelligibility of AI decision-making may thus constrain clinical acceptance and adoption. Significant research efforts have been underway to visualize and interpret the features DL models evaluate to produce their outputs, in an effort to address these concerns around interpretability and clinician acceptance.12,24
As a discipline, ophthalmology is leading the way in the use of AI in medicine. The review by Balyen and Peto6 importantly highlights the ever-evolving use of AI and its potential clinical use in diagnosing ophthalmic disease. Significant challenges remain in translating recent research advances into changes to routine clinical care; however, clinician and patient acceptance for the potential of these technologies is growing.17 Ophthalmologists' experience and leadership may be instructive for the medical community as a whole in harnessing the extraordinary potential for AI to improve patient care.
1. Turing AM. On computable numbers, with an application to the Entscheidungs problem. Proc London Math Soc
2. McCarthy J, Minsky ML, Rochester N, et al. A proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Magazine
3. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med
4. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst
5. Abràmoff MD, Lavin PT, Birch M, et al. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med
6. Balyen L, Peto T. Promising artificial intelligence-machine learning-deep learning algorithms in ophthalmology. Asia Pac J Ophthalmol (Phila
7. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA
8. Li Z, Keel S, Liu C, et al. An automated grading system for detection of vision-threatening referable diabetic retinopathy on the basis of color fundus photographs. Diabetes Care
9. Ting DSW, Cheung CY, Lim G, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA
10. Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell
11. Li Z, He Y, Keel S, et al. Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs. Ophthalmology
12. Keel S, Wu J, Lee PY, et al. Visualizing deep learning models for the detection of referable diabetic retinopathy and glaucoma. JAMA Ophthalmol
. 2018 Dec 20. Epub ahead of print.
13. Brown JM, Campbell JP, Beers A, et al. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol
14. De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med
15. Keane PA, Topol EJ. With an eye to AI and autonomous diagnosis. NPJ Dig Med
16. Kanagasingam Y, Xiao D, Vignarajan J, et al. Evaluation of artificial intelligence-based grading of diabetic retinopathy in primary care. JAMA Netw Open
17. Keel S, Lee P, Scheetz J, et al. Feasibility and patient acceptability of a novel artificial intelligence-based screening model for diabetic retinopathy at endocrinology outpatient services: a pilot study. Sci Rep
21. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD): Discussion Paper and Request for Feedback. FDA Web site. https://www.fda.gov/media/122535/download
. Accessed May 22, 2019.
22. Chua BE, Xie J, Arnold AL, et al. Glaucoma prevalence in Indigenous Australians. Br J Ophthalmol
23. Tapp RJ, Shaw JE, Harper CA, et al. The prevalence of and factors associated with diabetic retinopathy in the Australian population. Diabetes Care
24. Grassmann F, Mengelkamp J, Brandl C, et al. A deep learning algorithm for prediction of age-related eye disease study severity scale for age-related macular degeneration from color fundus photography. Ophthalmology