‘Big Data’ is a term often informally used in health care to describe any dataset bigger than a few thousand data points. However, the definition of big data is not guided by sample size cutoffs; rather, the scale and the challenge of ‘big data’ in the modern age is that datasets are now so large that storage or processing of the data strains the computational capacity of a single computer and require more specialized computing solutions. In health care, big datasets may have many rows comprising the records of millions of patients, and/or have many columns corresponding to the multiple features captured for each patient. Furthermore, besides the sheer volume of data, big data is often characterized by velocity (speed of acquisition) and by variety, which in health care often implies multimodal data: a combination of images, text, and structured fields .
The volume, velocity, and variety of big data in health care combined with the recent leaps in data storage and processing technologies have created a fertile environment for the development of artificial intelligence models to perform a wide variety of classification and prediction tasks. Ophthalmology in particular has many quantitative structured data fields, such as visual acuity and intraocular pressure, meshed with structural imaging and functional measurements, leading to an era where ophthalmology is at the forefront of artificial intelligence in medicine . In the present review, we summarize how big data and associated technologies have evolved to this critical juncture, describe the current frontier of standards and technologies for big data in artificial intelligence, and discuss what is needed for big data to enable the next generation of artificial intelligence.
Current capabilities enabled by data standards and artificial intelligence hardware technologies
Previously, some of the biggest data in ophthalmology research came from national government-sponsored studies , insurance claims , large clinical trials [5,6], cohort studies [7,8], and individual institutional registries . However, largely scalable methods of analysis and data aggregation were not widespread. The increasing adoption of electronic health records (EHRs) since 2008 , driven by the Meaningful Use program of the HITECH Act , and the simultaneous development and widespread adoption of standards for health data exchange have enabled the aggregation of health data from multiple sources, creating a new rich environment for big data and artificial intelligence.
Standards for health data exchange define formats and fields for the storage of health data from different health systems and sources; therefore, enabling the syntactic interoperability required for machines to read data from different sources using common conventions. Digital Imaging and Communications in Medicine (DICOM), established in 1993, is the premier standard for radiology and ophthalmology imaging . Standards for EHR data have undergone significant evolution through different versions of the Health Level-7 guidance and the development of the new Fast Healthcare Interoperability Resources (FHIR): the next-generation standards framework for EHR data which supports the latest web service technologies . FHIR stores data into modular components called resources designed to be simultaneously flexible, interoperable, and easy to implement. These standard data exchange formats enable data to be aggregated from different sources into ever larger datasets, which, in turn, enable the development of powerful artificial intelligence predictive algorithms that can be validated across different settings, for example, predicting hospitalization mortality, length-of-stay, and readmissions across different healthcare systems using a common deep-learning approach [14▪▪].
Training an artificial intelligence model on such large scale data requires intensive computing resources. Options now extend beyond utilizing local on-premise supercomputing hardware to utilizing cloud-based computing and storage infrastructures, many of which include security measures enabling the storage and processing of protected health information in a manner compliant with the Health Insurance Portability and Accountability Act (HIPAA) . These cloud-based solutions offer the ability to house and analyze data on a large scale, enabling the ability to train powerful artificial intelligence models. Once a model is trained, the process of using that model to generate predictions is called inference. Inference hardware is relatively inexpensive and becoming ubiquitous in some places: smartphones’ voice-activated assistants use inference to recognize commands, as do image search, spam filtering, and product recommendation applications. However, in hospitals and healthcare settings, this type of infrastructure needs to be optimized, yet will be needed to realize the future benefits of deploying artificial intelligence models to support and enhance clinical treatment decisions.
Requirements for reproducible science in big data for artificial intelligence
To maximize the potential of big data and artificial intelligence in ophthalmology, there are several critical areas of need going forward. An important minimum standard for scientific research is producing reproducible results. In training artificial intelligence models on big data, reproducibility is critically reliant on the following: how the data used to train the artificial intelligence model is labeled, the underlying structure and characteristics of the data, and the details of the model architecture. Each of these components is vital areas to consider when developing and evaluating artificial intelligence models using big data.
Supervised artificial intelligence models are trained for prediction or classification tasks based on data that are labeled with the ‘ground truth’ or ‘gold standard’ classification; thus, there is a critical need for ophthalmologists to develop standardized definitions and classification approaches. For example, when diagnosing glaucoma or determining glaucoma-related outcomes, many glaucoma specialists rely upon visual field testing, disc morphology, or optical coherence tomography (OCT) findings to varying degrees. Therefore, for research studies, adjudication of glaucoma outcomes is often carried out by consensus of a panel of glaucoma specialists, with work continually ongoing to develop standardized research definitions . Similarly, for grading retinal images, some artificial intelligence studies also use a panel of graders , similar to in clinical trials. However, utilizing a consensus or panel to grade one particular dataset does not harmonize potentially idiosyncratic definitions across different datasets.
While reference standards for diseases across different studies would be ideal, they are challenging to develop and deploy in any research setting, and especially challenging in big data for artificial intelligence. Ideally, reference standards are based on clinical outcome, which can be challenging in the case of chronic diseases [18▪▪]. Applying strict and complex grading criteria that require the consensus of ophthalmologic specialists is very expensive in time, effort, and dollars, even for the relatively small scale of clinical trials, and often infeasible on the scale of modern big data. Complex diagnostic criteria often require many disparate data components, such as history, examination, and imaging elements. Observational data sources inevitably include patients who are missing data for some of these elements, and those missing data are unlikely to be missing at random. For example, patients may be missing OCTs, visual fields, or lab measurements in a pattern which is systematically related to their suspected disease . EHRs also introduce unique challenges, since data fields may differ across EHR systems, both for data input and storage, and providers may use different terminology even within a single EHR system (e.g., corneal superficial punctate keratitis vs. corneal punctate epithelial erosions to represent the same clinical finding). These variations often necessitate custom disease definitions for each study based on the availability and content of the data fields. Thus, there is also a need to standardize how definitions are developed to label big data and the criteria for judging whether the definitions are acceptable for big data that are used to train artificial intelligence algorithms. These criteria are likely to differ from those used to judge clinical trials because they must balance feasibility with quality and consensus. Successful scaling of big data artificial intelligence will require leadership to standardize, where possible, the consensus definitions for disease diagnoses so they can be used to generate high quality, ‘ground truth’ labels, and standardize how consensus definitions are developed for big data studies.
To sidestep the issue of using complex ‘ground truth’ disease definitions as labels, some studies use different types of proxy or surrogate outcomes which are more readily available in the dataset. Examples include using one type of imaging modality to predict the parameters of another modality: predicting various retinal nerve fiber layer OCT parameters from fundus photos [20▪,21▪] and using OCT to predict other types of imaging outcomes [22,23], or predicting future results from past results, as when predicting future visual fields from prior visual fields, or other biomarkers validated against true outcome [24▪,25,26]. Prediction labels may also be a proxy based on cost or utilization, as when predicting high-cost or high-utilization patients, which is convenient when many sources of big data in health care include only such administrative data and not clinical outcomes. However, caution must be exercised before deploying such algorithms for decision support, as there may be underlying societal biases that affect healthcare access, utilization, and cost [27▪▪]. This speaks to the larger concern that predicting health outcomes requires a careful understanding of those outcomes in the context of the system being studied and recognition that predicting health outcomes in one system may not generalize to other systems.
Data sharing is an important mechanism to ensure that, after the expensive process of data curation and producing high-quality labels is completed, results from artificial intelligence models can be reliably reproduced and the data can be used by other researchers, therefore, having the widest possible impact by vastly multiplying the number of studies that can be performed. The FAIR guiding principles for data stewardship and management provide a framework for ensuring that data are Findable, Accessible, Interoperable, and Reusable .
Although there has been progress in development of data interoperability standards, many challenges remain in the actual process of data sharing and data reuse. Big data today is often observational, raw, and uncurated or minimally curated. Using data exchange standards does not guarantee semantic interoperability: data from different sources may be stored in the same format, but its meaning may differ depending on the context of how and from whom it was collected. Discrepancies may exist in data formatting and/or units of measure. In ophthalmology, for example, intraocular pressure may be collected by a wide variety of methods, by different types of providers, and under different conditions, all of which may affect the recorded data. Visual acuity is even more widely variable, with the possibility to measure using different methods, under different lighting conditions, with different types of correction, and with varying notation (‘20/400’ vs. ‘400’). For data to be reusable, methods for data cleaning, harmonization, and validation should be standardized and fully transparent. Careful annotation with metadata describing the provenance of different data fields remains an important component of data sharing. A reasonable balance must be struck between validating the underlying data, which is expected to contain some errors, and manually validating the entire dataset, which is usually infeasible and would defeat the advantages of data analysis at scale.
Finally, data sharing is subject to privacy concerns. While imaging may be more easily deidentifiable and shareable, EHR data are relatively difficult to deidentify  and suffer from the risk of possible reidentification . Furthermore, storing data in HIPAA-compliant environments requires thoughtful management, including safeguards against data unmasking as investigators potentially use different platforms for data joining or extract and snapshot data from one system to another. Despite the challenges in data sharing, it is absolutely crucial to support and incentivize mechanisms for secure data sharing to enable continued innovation in big data and artificial intelligence for ophthalmology in the academic sector. If the largest and highest quality datasets become proprietary or exclusive, it will become exceedingly challenging to support continued innovation outside of those groups. In the ideal world, all those with access to clinical data could become data stewards, ensuring the safety of the data and the use of the data for the public good, enabling the development of knowledge and tools that benefit future patients as widely as possible [31▪].
Common code and application programming interfaces
When data cannot be shared due to privacy concerns, the code that specifies the artificial intelligence models and the innumerable underlying data preprocessing decisions should be made public to best enable reproducible research. There are several different levels of sharing of artificial intelligence models, enabling different degrees of reproduction. A bare minimum requires sharing the model architecture as code, without the trained weights or parameters. The next level includes sharing the model architecture and the training code itself, so that other researchers can train their own weights using different data. Sharing the training code would include sharing the many hidden parameters that can affect model performance [32▪], such as how the data are split into batches, initialization of terms, and rates of learning, thus enabling other researchers to more closely mimic the original research. The highest level of sharing, short of including the full original data, would include the model architecture, training code, and the trained weights with inference code, which would allow other researchers to plug in their own data to make predictions based on the original artificial intelligence model. While standards for how to share artificial intelligence models are nascent, Open Neural Network Exchange, a community project that specifies an open format to represent machine-learning and deep-learning models , provides one approach to model sharing. Beyond enabling reproducible research, code that is made publicly available can have additional uses in contributing to research endeavors beyond its original purpose, becoming a resource for further collaborative innovation.
Many complex artificial intelligence models also depend upon specific underlying hardware configurations to execute. Standards for application program interfaces (APIs) to access artificial intelligence models would be enormously helpful in enabling researchers at different sites to access common artificial intelligence models while retaining full control over their own data. Standardization of artificial intelligence APIs would enable validation of artificial intelligence models across different sites and allow the training of better models based on more diverse data. Realizing the promise of artificial intelligence models also requires addressing the challenges of implementing and integrating the use of artificial intelligence models into clinical workflow. Here, too, standards for interfacing with APIs are critically needed in order for locally generated data to be able to interface with artificial intelligence models. APIs for access to artificial intelligence models could and should take advantage of the standardized data formats now available (such as FHIR or DICOM) in specifying the format of data inputs for widest possible use. For example, the Substitutable Medical Applications, Reusable Technologies (SMART) on FHIR API is an open, free, and standards-based API that aims to enable the creation of an ecosystem of apps built on access to EHR data [34,35]. While most SMART on FHIR apps are not specifically in the artificial intelligence space, artificial intelligence-based applications can use a similar approach for access to their artificial intelligence models. As with data sharing, the challenge in model sharing is to continue to support academic innovation by promoting openness, interoperability, and standardized APIs, thereby accruing the widest possible benefits of artificial intelligence and preventing artificial intelligence models from becoming ‘locked down’ into separate inaccessible silos as we have seen with EHRs. In ophthalmology, in particular, efficient real-time inference workflows are important as the time between image acquisition and need for artificial intelligence inference results are comparatively short compared with other fields of medicine such as radiology or pathology where interpretation traditionally happens asynchronously.
The environment for big data and artificial intelligence is evolving rapidly, supported by the adoption of EHRs, standards for health data exchange, and modern technologies for storing and processing large datasets. Future requirements for big data and artificial intelligence include fostering reproducible science, continuing open innovation, and supporting the clinical use of artificial intelligence. Researchers, academic institutions, healthcare companies, and regulatory agencies should work together to achieve these goals by promoting data label standards, data sharing, standards for sharing artificial intelligence model architecture, and accessible code and APIs.
The authors would like to thank Dr Marian Blazes (University of Washington) for her help in editing this article.
Financial support and sponsorship
NIH/NEI K23EY029246, NIH P30EY10572; an unrestricted grant from Research to Prevent Blindness; T15 LM 007033 (S.Y.W.). The sponsors/funding organizations had no role in the design or conduct of this research.
Conflicts of interest
A.Y.L. is an employee of the US FDA, has received research funding from Novartis, Santen, Carl Zeiss Meditec, and has acted as a consultant for Genentech, Topcon, and Verana Health. The opinions expressed in this article are the author's own and do not reflect the view of the FDA or the US government. AAO Medical Information Technology Committee Members: A.Y.L., MD, MSCI (Co-Chair), Department of Ophthalmology, University of Washington, Seattle, Washington; Thomas S. Hwang, MD (Co-Chair), Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon; James D. Brandt, MD, Department of Ophthalmology & Vision Science, University of California, Davis; Kelly Chung, MD, Oregon Eye Specialists, Portland, Oregon; Nikolas London, MD, Retina Consultants San Diego, Poway, California; April Maa, MD, Department of Veteran Affairs, VISN 7 Regional Telehealth Services, Atlanta, Georgia; Department of Ophthalmology, Emory University School of Medicine, Atlanta, Georgia; S.P., MD, MS, Department of Ophthalmology, Byers Eye Institute, Stanford University; Jessica Peterson, MD, MPH, American Academy of Ophthalmology, San Francisco, California. A.Y.L., US FDA (E), Genentech (C), Topcon (C), Verana Health (C), Santen (F), Novartis (F), Carl Zeiss Meditec (F). April Maa, Click Therapeutics (C), Warby Parker (C). S.P., Verana Health (C), Acumen LLC (C). AAO Task Force on Artificial Intelligence Members: Michael F. Chiang, MD (Chair), Departments of Ophthalmology and Medical Informatics & Clinical Epidemiology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon, Michael D. Abràmoff, MD, PhD, Retina Service, Departments of Ophthalmology and Visual Sciences, Electrical and Computer Engineering, and Biomedical Engineering, University of Iowa, Iowa City, Iowa; IDx, Coralville, Iowa; J. Peter Campbell, MD, MPH, Department of Ophthalmology, Oregon Health & Science University, Portland, Oregon; Pearse A. Keane, MD, FRCOphth, Institute of Ophthalmology, University College London, London; Medical Retina Service, Moorfields Eye Hospital NHS Foundation Trust, London; A.Y.L., MD, MSCI, Department of Ophthalmology, University of Washington, Seattle, Washington; Flora C. Lum, MD, American Academy of Ophthalmology, San Francisco, California; Louis R. Pasquale, MD, Eye and Vision Research Institute, Icahn School of Medicine at Mount Sinai, New York, New York; Michael X. Repka, MD, MBA, Wilmer Eye institute, Johns Hopkins University, Baltimore, Maryland; Rishi P. Singh, MD, Cole Eye Institute, Cleveland Clinic, Cleveland, Ohio; Daniel Ting, MD, PhD, Singapore National Eye Center, Duke-NUS Medical School, Singapore. Michael D. Abramoff, IDx (I,F,E,P,S), Alimera (F). J. Peter Campbell, Genentech (F). A.Y.L., US FDA (E), Genentech (C), Topcon (C), Verana Health (C), Santen (F), Novartis (F), Carl Zeiss Meditec (F). Louis R. Pasquale, Verily (C), Eyenovia (C), Nicox (C), Bausch + Lomb (C), and Emerald Bioscience (C). Pearse A. Keane, DeepMind Technologies (C), Roche (C), Novartis (C), Apellis (C), Bayer (F), Allergan (F), Topcon (F), Heidelberg Engineering (F), Daniel Ting, EyRIS (IP), Novartis (C), Ocutrx (I, C), Optomed (C), Rishi Singh Alcon (c), Genentech (C), Novartis (C), Apellis (F), Bayer (C), Carl Zeiss Meditec (C), Aerie (F), Graybug (F), Regeneron (C).
REFERENCES AND RECOMMENDED READING
Papers of particular interest, published within the annual period of review, have been highlighted as:
▪ of special interest
▪▪ of outstanding interest
1. Mooney SJ, Westreich DJ, El-Sayed AM. Commentary: epidemiology in the era of big data. Epidemiology 2015; 26:390–394.
2. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25:44–56.
3. Qiu M, Wang SY, Singh K, Lin SC. Racial disparities in uncorrected and undercorrected refractive error in the United States. Invest Ophthalmol Vis Sci 2014; 55:6996–7005.
4. Stein JD, Lum F, Lee PP, et al. Use of healthcare claims data to study patients with ophthalmologic conditions. Ophthalmology 2014; 121:1134–1141.
5. Chew EY, Clemons T, SanGiovanni JP, et al. AREDS2 Research Group. The Age-Related Eye Disease Study 2 (AREDS2): study design and baseline characteristics (AREDS2 report number 1). Ophthalmology 2012; 119:2282–2289.
6. Age-Related Eye Disease Study Research Group. The Age-Related Eye Disease Study (AREDS): design implications. AREDS report no 1. Control Clin Trials 1999; 20:573–600.
7. Kang JH, Wu J, Cho E, et al. Contribution of the nurses’ health study to the epidemiology of cataract, age-related macular degeneration, and glaucoma. Am J Public Health 2016; 106:1684–1689.
8. Mitchell P, Smith W, Attebo K, Wang JJ. Prevalence of age-related maculopathy in Australia. The Blue Mountains Eye Study. Ophthalmology 1995; 102:1450–1460.
9. Tan JCK, Ferdi AC, Gillies MC, Watson SL. Clinical registries in ophthalmology. Ophthalmology 2019; 126:655–662.
10. Office of the National Coordinator for Health Information Technology. Office-based physician electronic health record adoption. Health IT Quick-Stat #50, 2019. https://dashboard.healthit.gov/quickstats/pages/physician-ehr-adoption-trends.php
. [Accessed 16 March 2020].
11. Adler-Milstein J, Jha AK. HITECH act drove large gains in hospital electronic health record adoption. Health Aff 2017; 36:1416–1422.
12. Parisot C. The DICOM standard. Int J Card Imaging 1995; 11:171–177.
13. Bender D, Sartipi K. HL7 FHIR: an agile and RESTful approach to healthcare information exchange. In: Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems; 2013:326-331.
14▪▪. Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med 2018; 1:18.
15. Panth D, Mehta D, Shelgaonkar R. A survey on security mechanisms of leading cloud service providers. Int J Comput Appl Technol 2014; 98:34–37.
16. Iyer J, Vianna JR, Chauhan BC, Quigley HA. Toward a new definition of glaucomatous optic neuropathy for clinical research. Curr Opin Ophthalmol 2020; 31:85–90.
17. Keel S, Li Z, Scheetz J, et al. Development and validation of a deep-learning algorithm for the detection of neovascular age-related macular degeneration from colour fundus photographs. Clin Experiment Ophthalmol 2019; 47:1009–1018.
18▪▪. Abràmoff MD, Tobey D, Char DS. Lessons learned about autonomous AI: finding a safe, efficacious, and ethical path through the development process. Am J Ophthalmol 2020; 214:134–142.
19. Belin TR. Missing data: what a little can do, and what researchers can do in response. Am J Ophthalmol 2009; 148:820–822.
20▪. Thompson AC, Jammal AA, Medeiros FA. A deep learning algorithm to quantify neuroretinal rim loss from optic disc photographs. Am J Ophthalmol 2019; 201:9–18.
21▪. Medeiros FA, Jammal AA, Thompson AC. From machine to machine: an OCT-trained deep learning algorithm for objective quantification of glaucomatous damage in fundus photographs. Ophthalmology 2019; 126:513–521.
22. Lee CS, Tyring AJ, Wu Y, et al. Generating retinal flow maps from structural optical coherence tomography with artificial intelligence. Sci Rep 2019; 9:5694.
23. Kihara Y, Heeren TFC, Lee CS, et al. Estimating retinal sensitivity using optical coherence tomography with deep-learning algorithms in macular telangiectasia type 2. JAMA Netw Open 2019; 2:e188029.
24▪. Wen JC, Lee CS, Keane PA, et al. Forecasting future Humphrey Visual Fields using deep learning. PLoS One 2019; 14:e0214875.
25. Park K, Kim J, Lee J. Visual field prediction using recurrent neural network. Sci Rep 2019; 9:8385.
26. [No authors listed]. Fundus photographic risk factors for progression of diabetic retinopathy. ETDRS report number 12. Early Treatment Diabetic Retinopathy Study Research Group. Ophthalmology 1991; 98: (5 Suppl): 823–833.
27▪▪. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019; 366:447–453.
28. Wilkinson MD, Dumontier M, Aalbersberg IJJ, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016; 3:160018.
29. Yogarajan V, Pfahringer B, Mayo M. A review of automatic end-to-end de-identification: is high accuracy the only metric? Appl Artif Intell 2020; 34:251–269.
30. Simon GE, Shortreed SM, Coley RY, et al. Assessing and minimizing re-identification risk in research data derived from healthcare records. EGEMS (Wash DC) 2019; 7:6.
31▪. Larson DB, Magnus DC, Lungren MP, et al. Ethics of using and sharing clinical imaging data for artificial intelligence: a proposed framework. Radiology 2020; 295:675–682.
32▪. Beam AL, Manrai AK, Ghassemi M. Challenges to the reproducibility of machine learning
models in health care. JAMA 2020; doi:10.1001/jama.2019.20866.
33. ONNX. Home. https://onnx.ai/index.html
. [Accessed 30 March 2020].
34. SMART on FHIR. https://docs.smarthealthit.org/
. [Accessed 1 April 2020].
35. Mandl KD, Kohane IS. No small change for the health information economy. N Engl J Med 2009; 360:1278–1281.