Secondary Logo

Journal Logo

Review Article

Big Data in Ophthalmology

Cheng, Ching-Yu MD, PhD∗,†; Soh, Zhi Da MPH; Majithia, Shivani OD; Thakur, Sahil MS; Rim, Tyler Hyungtaek MD, PhD∗,†; Tham, Yih Chung PhD∗,†; Wong, Tien Yin MD, PhD∗,†

Author Information
Asia-Pacific Journal of Ophthalmology: July-August 2020 - Volume 9 - Issue 4 - p 291-298
doi: 10.1097/APO.0000000000000304
  • Open


The fourth industrial revolution has dawned upon us, driven largely by mankind's ability to harness enormous inter-connected mobile, wireless, and digital devices (referred to as internet-of-things or IoT), and the computational prowess to expertly learn, think, and act like humans in real-time (referred to as artificial intelligence or AI).1 Once thought of as a distant fantasy, this IoT and AI revolution is not only a present reality, but one that is already intertwined in every façade of our daily lives, and it will only play a larger role in the decades to come.2

The fuel of this revolution is data, or rather, “big data.”3 Although a universal definition of big data remains elusive, it generally represents the rapid aggregation of a large amount of diverse and constantly changing attributes that are too complex or “big” to be handled by traditional means.3 Thus, big data encompasses more than just a large “volume” of data.


There is a misconception of what “big data” is. Other than simply large volume of data, big data has been further characterized by its variety (how diverse), velocity (how fast), veracity (how accurate), and value (how useful). Volume is the sheer size of a database involved in big data, which is often measured in petabytes or exabytes.4 Variety refers to the diversity of data collected, which has been largely expanded by advancements in omics technologies and the emergence of electronic medical records in the last decade.4 This has created a plethora of data from different sources (eg, biobank, administrative databases, and so on), in different formats (ie, unstructured vs structured), and for different purposes (eg, administration, clinical care, and research, and so on). Velocity is defined by the speed of incoming data, and the burgeoning use of wearable computing devices has rendered data velocity near or at real time.4 Veracity represents the accuracy and quality of data curated, and should ideally be clean, complete, consistent, current, and complaint.3,4 Lastly, value indicates the usefulness of collected data in studying changes in clinical outcomes, behavioral modification, improvement in workflow, and monetization potentials.3


There are significant benefits that big data can bring to biomedical research, clinical medicine, and health care. Big data could potentially transform the way health care is perceived, practiced, and delivered.5

First, big data has the potential to enhance research and redefine the boundaries of traditional research methodology. Big data can generate greater scientific insights into hypotheses that would otherwise be unanswered or answered inadequately by traditional “small data” studies such as randomized clinical trials and even observational studies.6 Big data also allows for real-time assessment of interventions in real-world settings, and provides greater statistical power to detect novel or subtle associations.7 In addition, big data allows for more in-depth understanding of disease pathogenesis,5 such as in genetics and proteonomics studies. An example can be seen in cancer research, where big data has provided the platform to integrate multiple large-scale omics datasets to form molecular meta-networks.8 This aggregation better facilitates the identification of key regulatory and dysregulatory elements in different cancer, thereby providing greater insights into tumor development and modifiers of treatment response.9,10 Furthermore, big data mitigates against common issues, such as selection bias and generalizability, in “small data” studies.7,11

Second, big data can improve clinical care through the development of sophisticated algorithms that provide physicians with a holistic representation of an individual's health status.3 Big data is the foundation of the practice of precision medicine,3 improves prediction of health outcomes and disease progression for timely interventions,5 and aids in standardizing care by providing diagnostic and/or therapeutic recommendations based on aggregated inputs from fellow physicians and other resources.6 For example, in cardiology, big data has been utilized to develop softwares for imaging, risk profiling, genetic assessment, and detecting anomalies on electrocardiogram and predicting cardiovascular events.12 In fact, cardiology has the second most Food and Drug Administration (FDA)-approved algorithms in medicine as of 2019, and includes automation ranging from early detection of atrial fibrillation to quantification of coronary artery calcification.13

Finally, big data may be utilized to evaluate the effectiveness of public health policies, which include the provision of health care services and resources.5,14 Furthermore, IoT and digital devices are now delivering health information directly to individuals, which empower them to play a more active role not only in managing their health, but also in altering the way in which health care services are sought and delivered.6 Importantly, the cost of adopting big data is becoming increasingly affordable.15


Despite its potential, big data are not without its limitations and implementation challenges (Fig. 1), and this is especially so if the appropriate infrastructures and management systems are not first established.15,16

Limitation and implementation challenges of big data.

Big data is inherently messy, which raises concerns regarding its data quality.15 For example, electronic medical records (EMRs) are not intended for research purpose, and incomplete records are common.17 This could come either in the form of incomplete recording by medical staffs or loss to follow-up by patients. Data entry may also be recorded in an unstructured format (eg, free text), and could be missed during data extraction.17 Also, EMR often rely on diagnostic or billing codes such as the International classification of Diseases (ICD) for administrative purposes, which may give rise to misclassification, miscoding, and/or inadequate representation of conditions.18 Taken together, the large, diverse, and rapid influx of data will result in ambiguities that limit its application.15 Furthermore, the real-world implementation of big data is further confounded by issues of privacy and security.16


The basic principles of data analysis are similar in traditional epidemiology and big data.19 For example, commonly used statistical techniques in epidemiological studies, such as the receiver-operating characteristics curve, remain a major metric in deep-learning research that uses big data.20

Deep-learning is the unique subset of machine-learning that is germane to the fourth industrial revolution.21 In deep-learning, computers are developed with algorithms that utilize a cascade of multilayered artificial neural networks that automatically extract, transform, and recognize the intricate structures within inputted data (eg, fundus photographs).22 Each layer within an artificial neural network produces an output that is used as input for processing in the next layer, with the final layer producing a diagnostic output (eg, presence of disease in fundus photographs). This process is refined repeatedly (ie, back-propagation) until the diagnostic output matches a reference standard (ie, ground truth).22 Therefore, big data is required to develop and improve the accuracy of these algorithms.23 This consequently brings about new statistical challenges, and greater scrutiny on statistical analysis and interpretation is vital.24

The sheer size of big data leads to increased statistical power and precision. This in turn leads to remarkably small P values obtained in hypothesis testings, even when the observed differences may be clinically inconsequential.25 In addition, undesirable practices, such as performing multiple testing to derive at a “reportable” outcome that is statistically significant, may be easier to achieve in big data. These practices, also known as “p-hacking,” increase the risk of false-positive results, and findings that are not reproducible.26 These highly precise but biased results are not only misleading, but may also lead to dangerous clinical practice and loss of trust in big data analysis.24 Thus, it remains imperative to understand the limitations of P values, the differences between statistical and clinical significance, and the need for robust validation through peer reviews.

Likewise, issues such as confounding, biases, and reverse causation are not remedied by simply using big data, and must be similarly addressed with appropriate study design and/or statistical adjustments.24 For example, detection bias may arise in the interpretation of EMR and/or administrative data analysis. Detection bias occurs when an exposure is erroneously associated with an outcome through increased surveillance, screening or testing.27 For example, the risk associated between diabetes mellitus and eye diseases, such as glaucoma and cataract, is likely over-estimated due to increased referrals and routine eye screening for diabetic patients.28

The biases associated with EMR and/or administrative data analysis may be ascertained and mitigated statistically. First, selection bias can be controlled by using propensity score adjustment/matching to account for systematic differences in health system “exposure” versus “non-exposure” groups.29–31 Propensity scores may be used to create inverse probability weights to balance observed differences in the 2 populations with the goal of mimicking a scenario where individuals would be randomized to be included versus excluded from the EMR/administrative data. However, the effectiveness of inverse probability weighting to achieve representativeness is still debatable, but nonetheless provides an avenue to address factors that are associated with inclusion or exclusion in the dataset.32

Secondly, post-stratification adjustment can be used to standardize crude estimates according to variables implicating the selection bias.33 In the context of EMR/administrative data, these variables might include demographic factors such as sex, race/ethnicity, and sociodemographic factors, and various systemic risk factors or comorbid conditions. Since inclusion is non-random, controlling for the number of health encounters also accounts for systematic differences between those who regularly or irregularly visit their health care provider.33 Therefore, including the number of health encounters as a propensity score matching variable makes it possible to measure the effect size more accurately.28,29,31


Big data is recognized as an integral component of modern medicine, including ophthalmology.34 In fact, ophthalmology is one of the most data driven specialty in medicine, with data ranging from numerical values (eg, intraocular pressure, spherical equivalent, and so on), 2-dimensional images (eg, fundus photographs), 3-dimensional scans (eg, optical coherence tomography), to clinical and surgical records (Fig. 2).35 These data are often unstructured, and could not be adequately curated until now.36

Characteristics of big data and its sources in ophthalmology. EMR indicates electronic medical records.

Electronic Medical Records and Data Registry

Big data in health care is driven largely by the advents of information technologies, which allow for traditional medical records to be digitalized into electronic format (EMR) and combined with auxillary tests results.37 This results in a “one-stop” data portal that allows physicians to visualize and better understand the pattern-of-care provided, and for administrators to identify gaps in service provision. In addition, advanced digital technologies such as the IoT further allow for fully automated, real-time, data linkage between different EMR systems,38 which in turn, enables the creation of sophisticated clinical data registries such as SMEYEDAT, IRIS, Fight Retinal Blindness! (FRB!), and SOURCE.

The Smart Eye Database (SMEYEDAT) is a web-based ophthalmologic data warehouse that aggregates EMR and diagnostic images from a university eye hospital near real-time to facilitate easier and faster identification of patients with specific conditions.39

IRIS is a cloud-based ophthalmic data registry that was developed by the American Academy of Ophthalmology in 2014 to drive improvements in the provision of eye care services, to promote population health through adequate eye care coverage,40 and to pioneer evidence-based scientific knowledge derived from clinical data registry.41 IRIS aggregates clinical data automatically to provide real-time analysis of 15 quality-control measures and 22 outcome measures from >60 million patients. This allows physicians to compare the effectiveness of different treatment options in real-world settings; the coverage of eye care provision; the impact of rare diseases and disease comorbidities; and to better detect subtle clinical associations.41 For example, Cantrell et al42 utilized the IRIS registry to observe the pattern-of-care rendered by fellow physicians in newly diagnosed cases of macular edema, and reported among other findings that the majority of cases did not receive anti-VEGF treatment within the first 28 days, and bevacizumab was preferred in treated cases. This study highlights the potential of big data in informing physicians of real-world practice to monitor and standardize the quality of care. However, identification of cases was limited by the sole usage of ICD coding, which is affected by misclassification, miscoding, and incompleteness.42

The “FRB!” is an ophthalmic data registry that contains longitudinal data on the effects of switching between different treatment modalities in neovascular age-related macular degeneration.43 Furthermore, FRB! tracks data across Europe, Middle East, and Asia, which further provides a platform for global comparison of treatment outcomes in the future.44

The “Sight Outcomes Research Collaborative” (SOURCE) ophthalmic data repository that improved the efficiency in identifying ocular diseases from EMR data.45 Instead of relying heavily on structured data (eg, ICD billing codes) to search for ocular diseases, SOURCE incorporates an algorithm that based its search on both structured and unstructured (eg, free text from medical reports) EMR data. For example, Stein et al utilized SOURCE to search for exfoliation syndrome, and reported a positive predictive value (PPV) of 95% and negative predictive value (NPV) of 100%. Furthermore, 60% of cases would have been missed if their repository had relied solely on billing code alone.45 This study illustrates the importance of developing appropriate tools (eg, free text analysis) to harness the potentials of big data.46

Big data from EMR has also been utilized in designing new model-of-care in ophthalmology. In the United Kingdom, EMR data has been utilized to assess the viability of virtual eye clinic in identifying cases of unstable glaucoma that required closer observation.47 This study aggregated the mean deviation score from 473,252 Humphrey Visual Field tests to establish a range of expected values, which was used as reference for assessing Humphrey Visual Field measurements conducted in a virtual eye clinic and in normal clinic settings, and its effectiveness in identifying unstable glaucoma cases. This initiative highlights the potential application of big data in designing innovative model-of-care that bridges the “supply-demand” gap in eye care services.48,49

Administrative and Health Insurance Database

In Asia, South Korea and Taiwan have a national health insurance program that covers the majority of their citizen, and data from these databases have been actively utilized in research.50,51 In South Korea, a mandatory health insurance program was started in 1977, and extended to cover the entire nation in 1989.52 Currently, >97% of the population are covered by the Korean National Health Insurance Service.53 In 2015, a database was established to include 2% of Korean National Health Insurance Service data (∼1 million people) and other cohorts of elderlies that provided de-identified claims, health screening, and mortality data.53,54 In Taiwan, a similar database named the National Health Insurance Research Database was set up for research purposes.55

These insurance databases are crucial in providing the data needed to study research hypotheses that are difficult to address in traditional research methodologies. For example, insurance databases provide the necessary sample size needed to study rare diseases, and for the development of deep-learning algorithms.56 In addition, these databases have been used to identify trend in surgery,57 and safety profile of ophthalmic drugs.31,58

Epidemiology Consortia

Research consortia or networking are collaborative initiatives that represent a significant change in the way clinical and epidemiological research are conducted.59 These consortia bring together researchers across multiple domains and countries, and provide a shared platform for capacity building, research collaboration, results’ aggregation and validation, and technology transfer.59,60

In epidemiology research, consortia are valuable resources that further provide a “bird's-eye” view of the burden of diseases and its impact in a particular geographical region. This in turn enables researchers to come together to study pertinent issues that are answered inadequately by individual groups.61 For example, preventable vision impairment continues to be a major global health issue, and yet, its reasons remain inconclusive.62,63

In ophthalmology, epidemiology consortia include the European Eye Epidemiology (E3) consortium,61 the Asian Eye Epidemiology Consortium (AEEC),64 and the Visual Loss Expert Group (VLEG).65 The E3 comprises of 29 study groups (eg, Rotterdam Study, Guttemberg Health Study) from 12 European countries, whereas AEEC comprises of 40 population-based studies in Asia (eg, Beijing Eye Study, Singapore Epidemiology of Eye Diseases Study) from 9 Asian countries. Using data from the AEEC, researchers were able to provide a more accurate and precise estimate of geographic atrophy in Asians, a relatively rare blinding disease in Asia.66 The VLEG is made up of an international team of 78 ophthalmologists, optometrists, and epidemiologists, and was formed by the Global Burden of Disease in 2007 to measure comparable estimates of burden of disease, injuries, and risk factors due to vision impairment.65 Other consortia include The Meta-Analysis for Eye Disease study group, and the International Eye Disease Consortium, who have reported on the global prevalence of diabetic retinopathy and retinal vein occlusion.67,68

These consortia provide a crucial source of big data for regional disease surveillance and study of disease pathogenesis, and provide a “bigger voice” in advocating eye care, providing recommendations for public health policies, and advancing ophthalmic discoveries.69 Taken together, consortia are greater than the sum of its part, and they epitomize the T.E.A.M acronym—Together, Everyone Achieves More.


The idea of collecting and storing human specimens has been around for over a 100 years.70 Handling and storing biospecimens have evolved from storage in a few freezers to large repositories with computerized databases, robotics processing samples at a rapid pace, and the launch of virtual biobanks.71 Biobanks can include samples from different epidemiologic study areas such as population studies, clinical trials, and diagnostic studies.71 An example of a large population biobank is the UK biobank, which is a community-based prospective cohort study.72

The UK biobank was established with the objective of investigating the effects of genetic, lifestyle, and environmental risk factors associated with a wide range of major diseases for 500,000 participants aged between 40 and 69 years from 2006 to 2010.73 These participants have also consented to long-term follow-up and extensive testing including a large assortment of physical measurements, biospecimen (blood, urine, and saliva samples) collection, and genotyping has been included for all participants. Additional testing such as an enhanced ophthalmic examination including visual acuity, auto refraction, and retinal images data has also been collected on 100,000 participants.74,75 The UK biobank is a rich resource as it has the ability to follow-up on the overall health of these participants by linkage to their health records for a more comprehensive patient profile.73 In addition, due to its open-access nature, the UK biobank will allow researchers to study a wide range of complex diseases that will eventually lead to improvements in prevention, diagnosis, and treatment of an array of diseases.73

Crowd Source Data

The advents of digital health have given rise to a new source of crowd source data, which include a wide range of inter-connected data generated from social media, mobile applications, sensor devices, and wearable computing devices.76 Crowd source data may improve and simplify recruitment of research participants,77 and allow traditional methods of health assessment to be integrated with real-time patient data (eg, blood glucose level, physical activities) to gain further insights into the social determinants of health in relation to health outcomes.78

In addition, crowd source data can provide a crucial early source of surveillance information in a pandemic. For example, Sun et al79 utilized information from a combination of sources, ranging from a health care-orientated social media to mainstream news media, to identify health seeking behavior of individuals with symptoms of COVID-19 in China. This information provided health care workers with a nationwide patient-level data, and aids in their understanding of the outbreak progression.

Crowd source data are also increasingly utilized in ophthalmic research. For example, Plano is a smartphone application that has been developed to aid in myopia prevention.80 In addition, sensor devices are currently adopted in myopia research to study the effect of outdoor light intensities on myopia progression,81 and tracking of physical activities through wearable devices has been suggested as a way to track activities of daily living after cataract surgery.82

Big data has also been utilized in dry eye research through the use of crowd source data. The DryEyeRhythm (Ohako Inc, Tokyo, Japan) is a smartphone application that was developed to assess the potential of crowd source data in identifying the characteristics and risk factors associated with diagnosed or undiagnosed dry eye.83 It does so through a mobile application platform that collects patient-specific information, such as demographic characteristics, medical history, lifestyle, subjective symptoms, and disease-specific symptoms measured on the Ocular Surface Disease Index and Zung Self-rating Depression Scale. Importantly, results from this study highlighted the potentials of crowd source data in managing ocular disease. Crowd source data will likely proliferate in the coming years with increased adoption of these smart devices.77

Ocular Image Database

Furthermore, a few large-scale image datasets have been made public for the ophthalmic research.84–86 For example, as mentioned above, the UK Biobank has a huge collection of retinal photographs and optical coherence tomography (OCT) scans made available for ophthalmic research.87 In addition, Kaggle is a major source of image dataset, having organized hundreds of machine learning competitions, including one on diabetic retinopathy,84 since its inception. These ocular datasets are very useful and crucial for developing deep learning algorithms for screening and diagnosis of ocular diseases and prediction of nonocular clinical outcomes (see more details below). Moreover, this form of data will likely proliferate in the years to come, and when properly curated, will be an important data source for performing retrospective clinical validation in AI medical devices.


Big data is particularly useful and widely adopted in the development of predictive algorithms through deep-learning.88 In ophthalmology, deep-learning has been applied predominantly on ocular images to identify and classify ocular diseases such as diabetic retinopathy and retinopathy of prematurity.88 Over the last few years, two successful algorithms have been approved for screening diabetic retinopathy.89,90 For example, SELENA+ is a deep-learning algorithm that was authorized to screen for diabetic retinopathy in Singapore.91 This algorithm was developed with data from the Singapore National Diabetic Retinopathy Screening Programme (SiDRP) and 10 multi-ethnic cohort studies, comprising of 494,661 fundus images.90 Deep-learning has also been adopted in neuro-ophthalmology studies, where an algorithm was developed with 14,341 fundus images from 11 countries to differentiate optic disc with papilledema from normal optic disc and optic disc with nonpapilledema abnormalities.92 In addition, deep-learning has also been successfully applied to 3-dimensional OCT scans. De Fauw et al93 developed a triage algorithm based on 14,884 OCT scans, and reported its performances in making referral recommendations (ie, urgent, semi-urgent, routine, observation) to be as good or even better than eye care practitioners over a range of sight-threatening retinal diseases. Remarkably, this algorithm was developed without an excessively large data set, and was able to maintain its accuracy when analyzing images from different OCT machines.

In addition to screening and detecting ocular disease, deep learning has been applied on retinal photography to predict cardiovascular risk factor and anemia. Poplin et al94 developed a deep-learning algorithm that was able to predict cardiovascular risk factors, the majority of which were not thought to be quantifiable in retinal images. These factors, ranging from age, smoking status, to systolic blood pressure, are core components used in cardiovascular risk calculators, thereby suggesting that cardiovascular risk may be assessed directly from retinal photographs per se.94 Mitani et al95 further reported that retinal photographs could be used to predict hemoglobin concentration, which is the most reliable indicator of anemia. This represents a major advancement as anemia is often treatable but continues to be a major source of poor health due to low detection rates that arise due to the invasiveness and cost of current screening modalities.96


In ophthalmology, big data was first utilized decades ago in the form of large cohorts in epidemiology studies.63 These cohorts were originally set-up for eye disease surveillance, and were imperative in informing the burden of eye diseases, and the inefficiency and/or deficiency in eye care delivery.97 Over time, these cohorts evolved in tandem with advancements in diagnostic tools and omics technologies, and are now a crucial driver for “analytic driven” medicine that strives for a preventive, predictive, personalized, precise and participatory model-of-care.63,98 Over the last few years, research into big data has focused more on its real-world application, rather than as a proof-of-concept. This includes the development of management systems that efficiently handle its complexity,39,45 and predictive algorithms that translate its potentials into clinical solutions.89,90

Big data is poised to grow at an exponential rate in the years to come.38 Therefore, it is imperative to build an incisive and enabling environment that build upon existing infrastructures, and to promote a multidisciplinary collaborative effort to handle its evolving complexity. First, it remains crucial to develop the necessary hardware and software that are required to adequately handle, share, analyze, and safe guard the rapid rate of data growth. Secondly, it is necessary to upscale the training of technical specialists who are skilled to handle the challenges of big data, so as to derive at meaningful and actionable information. Furthermore, it is also vital that medical personnel, policy makers, and the public in general are educated with the necessary knowledge to interpret findings from big data analysis, understand what it can and cannot address, and to recognize poor analytic practices and fictitious claims. Lastly, as big data continue to evolve in size and complexity, an integrated environment that promotes interdisciplinary research collaboration must be endeavored so as to translate its potentials into real-world advantages. In conclusion, the transformative potentials of big data in health care and ophthalmology are massive, and it will continue to grow and exert an even larger influence in the years ahead. Therefore, it is not only prudent that we take stock of where we are currently in this big data revolution, but to also endeavour a path ahead so as to fully harness its full potentials.


1. Forum WE. The fourth industrial revolution: what it means, how to respond. Available at: Published 2016. Accessed March 26, 2020.
2. Pereira F, Machado P, Costa E, Cardoso A. Progress in Artificial Intelligence: 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, Coimbra, Portugal, September 8-11, 2015. Proceedings. Vol 9273. 1st 2015.;1st 2015; ed. Cham: Springer International Publishing; 2015.
3. Panesar A. Machine Learning and AI for Healthcare: Big Data for Improved Health Outcomes. 1st edBerkeley, CA:Apress; 2019.
4. Baro E, Degoul S, Beuscart R, Chazard E. Toward a literature-driven definition of big data in healthcare. Biomed Res Int 2015; 2015:639021–639029.
5. Shilo S, Rossman H, Segal E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med 2020; 26:29–38.
6. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA 2013; 309:1351–1352.
7. Boland MV. Big data, big challenges. Ophthalmology 2016; 123:7–8.
8. Brown JAL, Ni Chonghaile T, Matchett KB, Lynam-Lennon N, Kiely PA. Big data-led cancer research, application, and insights. Cancer Res 2016; 76:6167–6170.
9. Aviner R, Shenoy A, Elroy-Stein O, Geiger T. Uncovering hidden layers of cell cycle regulation through integrative multi-omic analysis. PLoS Genet 2015; 11:e1005554.
10. Yugi K, Kubota H, Hatano A, Kuroda S. Trans-omics: how to reconstruct biochemical networks across multiple ‘omic’layers. Trends Biotechnol 2016; 34:276–290.
11. Clark A, Ng JQ, Morlet N, Semmens JB. Big data and ophthalmic research. Surv Ophthalmol 2016; 61:443–465.
12. Cuocolo R, Perillo T, De Rosa E, Ugga L, Petretta M. Current applications of big data and machine learning in cardiology. J Geriatr Cardiol 2019; 16:601–607.
13. Meskó B. FDA approvals for smart algorithms in medicine in one giant infographic. The Medical Futurist In 2019; Available at: Accessed March 30, 2020.
14. Einav L, Finkelstein A, Mullainathan S, Obermeyer Z. Predictive modeling of U.S. health care spending in late life. Science 2018; 360:1462–1465.
15. Rough K, Thompson JT. When does size matter? Promises, pitfalls, and appropriate interpretation of “big” medical records data. Ophthalmology 2018; 125:1136–1138.
16. Househ M, Kushniruk AW, Borycki EM. Big Data, Big Challenges: A Healthcare Perspective: Background, Issues, Solutions and Research Directions. Cham, Switzerland:Springer International Publishing; 2019.
17. Bowman S. Impact of electronic health record systems on information integrity: quality and safety implications. Perspect Health Inf Manag 2013; 10:1c.
18. Coleman AL, Morgenstern H. Use of insurance claims databases to evaluate the outcomes of ophthalmic surgery. Surv Ophthalmol 1997; 42:271–278.
19. Yu M, Tham YC, Rim TH, Ting DS, Wong TY, Cheng CY. Reporting on deep learning algorithms in health care. Lancet Digit Health 2019; 1:e328–e329.
20. Gonçalves L, Subtil A, Oliveira MR, Bermudez P. ROC curve estimation: an overview. REVSTAT–Statistical Journal 2014; 12:1–20.
21. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521:436–444.
22. Shen D, Wu G, Suk HI. Deep learning in medical image analysis. Annu Rev Biomed Eng 2017; 19:221–248.
23. Rahimy E. Deep learning applications in ophthalmology. Curr Opin Ophthalmol 2018; 29:254–260.
24. Ehrenstein V, Nielsen H, Pedersen AB, Johnsen SP, Pedersen L. Clinical epidemiology in the era of big data: new opportunities, familiar challenges. Clin Epidemiol 2017; 9:245–250.
25. Madondo SM. The American Statistical Association (ASA) Statement of 2016 on statistical significance and P-value: a critical thought. Science Journal of Applied Mathematics and Statistics 2017; 5:41.
26. Nuzzo R. Scientific method: statistical errors. Nature 2014; 506:150.
27. Haut ER, Pronovost PJ. Surveillance bias in outcomes reporting. JAMA 2011; 305:2462–2463.
28. Rim TH, Lee SY, Bae HW, Seong GJ, Kim SS, Kim CY. Increased risk of open-angle glaucoma among patients with diabetes mellitus: a 10-year follow-up nationwide cohort study. Acta Ophthalmol 2018; 96:e1025–e1030.
29. Rim TH, Kim HK, Kim JW, Lee JS, Kim DW, Kim SS. A nationwide cohort study on the association between past physical activity and neovascular age-related macular degeneration in an East Asian population. JAMA Ophthalmol 2018; 136:132–139.
30. Rim TH, Kim HS, Kwak J, Lee JS, Kim DW, Kim SS. Association of corticosteroid use with incidence of central serous chorioretinopathy in South Korea. JAMA Ophthalmol 2018; 136 (10):1164–1169.
31. Rim TH, Yoo TK, Kwak J, et al. Long-term regular use of low-dose aspirin and neovascular age-related macular degeneration: national sample cohort 2010-2015. Ophthalmology 2019; 126:274–282.
32. Lonjon G, Porcher R, Ergina P, Fouet M, Boutron I. Potential pitfalls of reporting and bias in observational studies with propensity score analysis assessing a surgical procedure. Ann Surg 2017; 265:901–909.
33. Bower JK, Patel S, Rudy JE, Felix AS. Addressing bias in electronic health record-based surveillance of cardiovascular disease risk: finding the signal through the noise. Curr Epidemiol Rep 2017; 4:346–352.
34. Amirian P, Lang T, van Loggerenberg F. Big Data in Healthcare: Extracting Knowledge from Point-of-Care Machines. 1st ed.Cham, Switzerland:Springer International Publishing; 2017.
35. Matossian C. Big data analysis can benefit ophthalmoc practice and bump up the bottom line. Available at: Published 2017. Accessed January 4, 2020.
36. Bote-Curiel L, Muñoz-Romero S, Gerrero-Curieses A, Rojo-Álvarez JL. Deep learning and big data in healthcare: a double review for critical beginners. Appl Sci 2019; 9:2331.
37. Elsevier, Sheikh A, Wright A, Bates D, Cresswell K. Key advances in clinical informatics: transforming health care through health information technology. 2017.
38. Dash S, Shakyawar SK, Sharma M, Kaushik S. Big data in healthcare: management, analysis and future prospects. J Big Data 2019; 6:1–25.
39. Kortüm KU, Müller M, Kern C, et al. Using electronic health records to build an ophthalmologic data warehouse and visualize patients’ data. Am J Ophthalmol 2017; 178:84–93.
40. Chiang MF, Sommer A, Rich WL, Lum F, Parke DW 2nd. The 2016 American Academy of Ophthalmology IRIS® Registry (Intelligent Research in Sight) Database: Characteristics and Methods. Ophthalmology. 2018;125:1143-1148.
41. Parke DW, Rich WL, Sommer A, Lum F. The American Academy of Ophthalmology's IRIS® Registry (Intelligent Research in Sight Clinical Data): a look back and a look to the future. Ophthalmology 2017; 124:1572–1574.
42. Cantrell RA, Lum F, Chia Y, et al. Treatment patterns for diabetic macular edema: an intelligent research in sight (IRIS) registry analysis. Ophthalmology 2020; 127:427–429.
43. Gillies MC, Campain A, Barthelmes D, et al. Long-term outcomes of treatment of neovascular age-related macular degeneration: data from an observational study. Ophthalmology 2015; 122:1837–1845.
44. Mantel I, Gillies MC, Souied EH. Switching between ranibizumab and aflibercept for the treatment of neovascular age-related macular degeneration. Surv Ophthalmol 2018; 63:638–645.
45. Stein JD, Rahman M, Andrews C, et al. Evaluation of an algorithm for identifying ocular conditions in electronic health record data. JAMA Ophthalmol 2019; 137:491–497.
46. Matheny ME, Whicher D, Thadaney Israni S. Artificial intelligence in health care: a report from the National Academy of Medicine. JAMA 2019; 323:509.
47. Jones L, Bryan SR, Miranda MA, Crabb DP, Kotecha A. Example of monitoring measurements in a virtual eye clinic using ‘big data’. Br J Ophthalmol 2018; 102:911–915.
48. Ministry of Health (MOH) S. Transforming our healthcare system to meet evolving needs. 2020. Available at: link to PDF - Accessed March 4, 2020.
49. Bigus JP, Campbell M, Carmeli B, et al. Information technology for healthcare transformation. IBM J Res Dev 2011; 55: 6:1-6:14.
50. Cheng TM. Taiwan's new national health insurance program: genesis and experience so far. Health Aff (Millwood) 2003; 22:61–76.
51. Song SO, Jung CH, Song YD, et al. Background and data configuration process of a nationwide population-based study using the korean national health insurance system. Diabetes Metab J 2014; 38:395–403.
52. Song YJ. The South Korean health care system. JMAJ 2009; 52:206–209.
53. Lee J, Lee JS, Park SH, Shin SA, Kim K. Cohort profile: the national health insurance service–national sample cohort (NHIS-NSC), South Korea. Int J Epidemiol 2017; 46:e15.
54. Kim YI, Kim YY, Yoon JL, et al. Cohort Profile: National health insurance service-senior (NHIS-senior) cohort in Korea. BMJ Open 2019; 9:e024344.
55. Lin LY, Warren-Gash C, Smeeth L, Chen PC. Data resource profile: the National Health Insurance Research Database (NHIRD). Epidemiol Health 2018; 40:e2018062.
56. Wang HH, Wang YH, Liang CW, Li YC. Assessment of deep learning using nonimaging information and sequential medical records to develop a prediction model for nonmelanoma skin cancer. JAMA Dermatol 2019; 155:1277–1283.
57. Lee CS, Rim THT, Kwon HJ, Yi JH, Lee SC. Partial lamellar sclerouvectomy of ciliary body tumors in a Korean population. Am J Ophthalmol 2013; 156:36–42.
58. Rim TH, Lee CS, Lee SC, Kim SS. Intravitreal ranibizumab therapy for neovascular age-related macular degeneration and the risk of stroke. Retina 2016; 36:2166–2174.
59. Dockrell HM. Presidential address: the role of research networks in tackling major challenges in international health. Int Health 2010; 2:181–185.
60. Puljak L, Vari SG. Significance of research networking for enhancing collaboration and research productivity. Croat Med J 2014; 55:181–183.
61. Delcourt C, Korobelnik JF, Buitendijk GH, et al. Ophthalmic epidemiology in Europe: the “European Eye Epidemiology” (E3) consortium. Eur J Epidemiol 2016; 31:197–210.
62. Flaxman SR, Bourne RR, Resnikoff S, et al. Global causes of blindness and distance vision impairment 1990-2020: a systematic review and meta-analysis. Lancet Glob Health 2017; 5:e1221–e1234.
63. Wong TY, Hyman L. Population-Based Studies in Ophthalmology. Am J Ophthalmol 2008; 146:656–663.
64. Tham YC, Tao Y, Zhang L, et al. Is kidney function associated with primary open-angle glaucoma? Findings from the Asian Eye Epidemiology Consortium. Br J Ophthalmol 2020;bjophthalmol-2019-314890.
65. Bourne R, Price H, Taylor H, et al. New systematic review methodology for visual impairment and blindness for the 2010 Global Burden of Disease Study. Ophthalmic Epidemiol 2013; 20:33–39.
66. Hyungtaek Rim T, Ryo K, Tham YC, et al. Prevalence and pattern of geographic atrophy in Asia: the Asian Eye Epidemiology Consortium 2020; doi:10.1016/j.ophtha. 2020.04.019.
67. Rogers S, McIntosh RL, Cheung N, et al. The prevalence of retinal vein occlusion: pooled data from population studies from the United States, Europe, Asia, and Australia. Ophthalmology 2010; 117:313–319.
68. Yau JW, Rogers SL, Kawasaki R, et al. Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care 2012; 35:556–564.
69. Bourne RRA, Stevens GA, White RA, et al. Causes of vision loss worldwide, 1990–2010: a systematic analysis. Lancet Glob Health 2013; 1:e339–e349.
70. Eiseman E, Haga SB. Handbook of Human Tissue Sources. Santa Monica, CA:Rand; 1999.
71. De Souza YG, Greenspan JS. Biobanking past, present and future: responsibilities and benefits. AIDS 2013; 27:303–312.
72. Ollier W, Sprosen T, Peakman T. UK Biobank: from concept to reality. Pharmacogenomics 2005; 6:639–646.
73. Biobank U. About UK Biobank. Available at: Published 2019. Accessed March 30, 2020.
74. Allen N, Sudlow C, Downey P, Peakman T, Danesh J. UK Biobank: current status and what it means for epidemiology. Health Policy and Technology 2012; 1:123–126.
75. Collins R. What makes UK Biobank special? Lancet 2012; 379:1173–1174.
76. RTI International, Shapiro MJD, Wald J, Mon D. Patient-Generated Health Data: White paper. 2012.
77. Dimitrov DV. Medical internet of things and big data in healthcare. Healthc Inform Res 2016; 22:156–163.
78. Roski J, Bo-Linn GW, Andrews TA. Creating value in health care through big data: opportunities and policy implications. Health Aff (Millwood) 2014; 33:1115–1122.
79. Sun K, Chen J, Viboud C. Early epidemiological analysis of the coronavirus disease 2019 outbreak based on crowdsourced data: a population-level observational study. Lancet Digit Health 2020; 2:e201–e208.
80. Dirani M, Crowston JG, Wong TY. From reading books to increased smart device screen time. Br J Ophthalmol 2019; 103:1–2.
81. Wu PC, Chen CT, Lin KK, et al. Myopia prevention and outdoor light intensity in a school-based cluster randomized trial. Ophthalmology 2018; 125:1239–1250.
82. Coleman AL. How big data informs us about cataract surgery: The LXXII Edward Jackson Memorial Lecture. Am J Ophthalmol 2015; 160:1091–1103.
83. Inomata T, Iwagami M, Nakamura M, et al. Characteristics and risk factors associated with diagnosed and undiagnosed symptomatic dry eye using a smartphone application. JAMA Ophthalmol 2019; 138:58–68.
84. University of Warwick, Graham B. Kaggle Diabetic Retinopathy Detection Competition Report. 2015.
85. Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 2018; 172:1122–1131.
86. Porwal P, Pachade S, Kamble R, et al. Indian diabetic retinopathy image dataset (idrid): a database for diabetic retinopathy screening research. Data 2018; 3:25.
87. Biobank U. About UK Biobank. Available at: https://www ukbiobank ac uk/a bout-biobank-uk. 2014.
88. Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol 2019; 103:167–175.
89. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med 2018; 1:39.
90. Ting DSW, Cheung CY-L, Lim G, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 2017; 318:2211–2223.
91. Goh T. An A.I. for the eye: New tech cuts time for spotting signs of diabetic eye disease. Available at: Published 2019. Accessed. March 4, 2020.
92. Milea D, Najjar RP, Zhubo J, et al. Artificial intelligence to detect papilledema from ocular fundus photographs. N Engl J Med 2020; 382:1687–1695.
93. De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med 2018; 24:1342–1350.
94. Poplin R, Varadarajan AV, Blumer K, et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng 2018; 2:158–164.
95. Mitani A, Huang A, Venugopalan S, et al. Detection of anaemia from retinal fundus images via deep learning. Nat Biomed Eng 2020. 18–27.
96. Milman N. Anemia—still a major health problem in many parts of the world!. Ann Hematol 2011; 90:369–377.
97. Pearce N. Traditional epidemiology, modern epidemiology, and public health. Am J Public Health 1996; 86:678–683.
98. Alonso SG, de la Torre Díez I, Zapiraín BG. Predictive, personalized, preventive and participatory (4P) medicine applied to telemedicine and eHealth in the literature. J Med Syst 2019; 43:140.

big data; artificial intelligence; internet of things; ophthalmology; data science

Copyright © 2020 Asia-Pacific Academy of Ophthalmology. Published by Wolters Kluwer Health, Inc. on behalf of the Asia-Pacific Academy of Ophthalmology.