Big data refers to large volume of data sets that are analyzed for use in a variety of fields including government, media, retail, and health care and the methods used to perform the analyses. There is no size threshold that defines big data; rather, it is characterized by being challenging to manage and process—a subjective and time-dependent definition (1). For the purposes of this review, we consider big data to be defined by large sample sizes and by collection of data in a specialized manner (i.e., not targeted to study a specific disease). Big data in medical research is not new, with epidemiology having its roots in analyses of large collections of data and medical cohort studies that have been in process for decades (e.g., Framingham Heart Study, founded in 1948) (2). Interest in big data–based research has accelerated recently with the rapid growth of collection in minable formats, improved storage capabilities and statistical methods, and faster computing speed. This has facilitated the use of bigger, broader data sets to investigate novel questions of disease risk and outcome (3).
The past decade has witnessed an enormous explosion of availability of health care–related databases and in the pursuit of “secondary analyses” from this information (4). There is considerable enthusiasm in the clinical research community for “real-world” data collected outside the confines of rigidly structured clinical trials to study risk factors, treatment effects, incidence, prevalence, outcomes of disease, and treatment strategies. However, the quality and variability of the data, often collected for another purpose, and the complexity of the analytic techniques necessitate increasing expertise by both researchers and stakeholders to interpret results appropriately (5). As a result, the “promise” that big data might change medical practice is still met with skepticism from some practitioners.
Germane to practicing neuro-ophthalmologists is big data–based clinical research, which will be our focus. In this review, we will highlight salient features of different types of big data sets, common pitfalls in analysis and interpretation, and new developments in analytics including machine learning. We also provide examples of recent, relevant studies.
Big data sets are heterogeneous in origin with varied sampling strategies. A common result is inherent selection bias due to nonrandom selection of individuals for treatments and from different risk factor groups. The sample may not represent the population, influencing generalizability of the studies (6). Examples of different types of big data sources are given below. For further details, the reader is referred to the Vision and Eye Health Surveillance System (7) (VEHSS) maintained by the US Centers for Disease Control (CDC). The CDC compiles and maintains excellent summaries of many relevant data sources in vision research.
National Health and Nutrition Examination Survey: A Survey Data Set
Surveys are collected prospectively as population-based samples. The National Health and Nutrition Examination Survey (NHANES) assesses nutrition and health of the United States population by administering a survey to approximately 5,000 individuals every 2 years. The data have included vision questions, eye examinations, visual field testing, and retinal imaging. Although data are available from the 2015–2016 administration, eye-related data were most recently collected in 2008. Data collection procedures are rigorous (8), with ample literature and publicly available characterization of the data set (9). Although data validity and reliability are typically excellent, the use of survey data to study neuro-ophthalmic disease is limited by sample sizes. A population sample of 5,000 individuals is unlikely to contain very many cases of neuro-ophthalmic disease. An alternative is a survey with a cohort enriched in disease by sampling from a population seeking medical care. For example, the National Ambulatory Medical Care Survey and the National Hospital Ambulatory Medical Care Survey have been applied to quantify volume and types of medical visits for diplopia (10). Survey data sets have demonstrated utility in the study of ophthalmic markers for common neurological diseases (e.g., retinal vessel measurements for risk stratification of cerebrovascular disease in the Atherosclerosis Risk in Communities Study) (11,12) and study of the relation between vision impairment and common neurological conditions (e.g., visual and cognitive impairment in the Salisbury Eye Study) (13).
There is no direct opportunity for physicians to contribute to the NHANES, which is administered by the US government. The data are publicly available for “purposes of health statistical reporting and analysis.” Some information and linkage to other government data sets is available via a fee through application to the National Center for Health Statistics Research Data Center (14). Access to other survey data sets, such as cohort studies, is at the discretion of study management.
National Inpatient Sample: An Administrative Data Set
These data are compiled by sampling medical administrative data. The National Inpatient Sample (NIS) is a 20% stratified sample of nonfederal US hospital discharges started in 1988 and currently available through 2016. It is compiled and maintained by the US Agency for Healthcare Research and Quality (15). It contains 5–8 million hospital stays, and weighting can be applied to obtain population estimates. The data are cross-sectional and not identifiable by individual, with data points including demographics, medical diagnoses, procedures, costs, and hospital variables extracted from hospital administrative databases maintained by states. The diagnoses and procedures are coded using International Classification of Diseases (ICD) taxonomy. Both undercoding and overcoding are possible leading to information (misclassification) bias.
The utility of administrative data for study of neuro-ophthalmic disease depends on sample size of the disease of interest. Administrative samples are usually population samples with the advantage of generalizability. These tend to be larger than survey data sets likely because data collection is less intensive. Because of the larger database size and enrichment with neuro-ophthalmic diseases due to derivation from health care records, this format has been applied effectively to the study of neuro-ophthalmic conditions including perioperative ischemic optic neuropathy and retinal artery occlusion with spine and cardiac surgery (16–20). A major limitation of NIS for neuro-ophthalmic conditions is the inability to capture those events that do not result in hospital admission or medical coding.
There is no opportunity for physicians to contribute to the NIS, which is sampled from state managed databases. Purchase of data costs between $160 and $500 per year. Databases such as the NIS that include a risk of identifying patients require data use agreements and may require training in proper use and protection of data.
Clinformatics Datamart (Optum, Inc): A Commercial Insurance Claims Data Set
Data sets of this type are compiled by amalgamating insurance claims submitted by health care providers for medical care. Clinformatics contains longitudinal information for more than 50 million covered individuals by a large US national health insurer (21). Thus, it is not a population-based sample. Data points for diagnoses and procedures rely on coding taxonomies (ICD and current procedural technology [CPT]). Such classifications may be a source of information bias. In addition to diagnoses and procedure information, Clinformatics includes demographics as well as inpatient and outpatient provider, facility, and pharmacy claims. There are laboratory data available for a subset of individuals. Linkage to zip codes, socioeconomic status and death records are available.
Despite potential selection bias due to sampling and information errors from overcoding and undercoding, the longitudinal data and very large sample size facilitate the study of long-term outcomes and identification of risk factors in neuro-ophthalmic diseases. Clinformatics has been applied to study risk of thyroid-associated orbitopathy in Graves disease (22), to examine risk factors for branch retinal vein occlusion (23), and to determine risk factors for nonarteritic anterior ischemic optic neuropathy (NAION) (21). Other claims databases that have been applied to neuro-ophthalmic diseases include the Medicare 5% claims sample to investigate associations between diabetes mellitus and NAION (24), the LifeLink database (IMS Health, Inc) to examine the association between medications and secondary pseudotumor cerebri (25), or NAION (26) and MarketScan (Truven Health Analytics, Inc, Ann Arbor, MI) to study the association between uveitis and optic neuritis (27). These databases offer excellent opportunities for further study of neuro-ophthalmic disorders.
There is no direct opportunity for physicians to contribute to Clinformatics or other claims databases. Purchase of commercial claims data sets (e.g., Clinformatics, MarketScan, and LifeLink) carries substantial cost, often in excess of $15,000. Medicare data fees are structured according to the number of subjects, amount of data and period of time. Dollar amounts are estimated at over $10,000 for basic claims information for up to 1 million beneficiaries for a year (28). Data use agreements usually are required.
IRIS (American Academy of Ophthalmology) and Axon (American Academy of Neurology) Electronic Medical Records Data Sets
These medical record data sets are compiled by amalgamating medical records from providers. Ambitious efforts have recently resulted in the American Academy of Ophthalmology (AAO) Intelligent Research in Sight (IRIS) registry (29,30) and the American Academy of Neurology (AAN)Axon Registry (31,32). In 2016, IRIS collected data from over 36 million clinical visits for over 17 million unique patients seen by over 10,000 providers in ophthalmology practices (42% of practicing ophthalmologists in the United States). Prevalence of conditions relevant to neuro-ophthalmologists includes 2.04% for optic nerve disorders (excluding glaucoma) and 1.96% for strabismus (33). Axon has captured more than 4 million visits for 1.3 million unique patients and has more than 1,000 participating neurology providers, 20 of whom are neuro-ophthalmology related (Personal communication, Katie Hentges, Program Manager, Axon Registry, American Academy of Neurology, June 29, 2018). Both registries are actively enrolling new providers. There is selection bias because the registries are not population-based samples. Rather, the data are based on which practices participate, with academic practices currently underrepresented in both registries because of the challenges in setting up data extraction from electronic health records (EHR) used in academic medical health systems.
It is important to point out that the IRIS and Axon registries were established primarily for quality improvement, benchmarking, and to comply with insurance-based incentive provider payment systems, rather than for research. Both registries collect the entirety of the medical record for visits with the participating provider. Extracted data points including performance measures (both Axon and IRIS) and some other data fields (IRIS) are based on mapping between the provider's EHR and the registries. Although potentially available information is large, analyzable information is limited by what has been mapped (34). There are future opportunities through additional mapping to specific EHR fields and application of natural language processing to create comprehensive data sets that capture clinical neurologic and ophthalmic care. A limitation is that both registries are limited to the visit record of the providers and do not include external records, provider notes in other specialties, operating room data or images except as referenced in the enrolled provider's record.
Neither the Axon nor IRIS registries have published research related to neuro-ophthalmic disease. However, there is rich publication in regional and national registries including studies of third nerve palsy and idiopathic intracranial hypertension (IIH) using the Rochester Epidemiology Project (35,36), quantifying the incidence of ocular symptoms after a diagnosis of giant cell arteritis using the Swedish Hospital Registry (37) and studying cranial nerve palsies in diabetic patients using a Saudi diabetes registry (38).
Ophthalmology practices can participate in IRIS, and neurology practices can participate in Axon through applications with the sponsoring societies. Setup time is required to establish practice-specific mapping of variables and data transmission. There is no fee for participation, but membership in the sponsoring organization and US practice are required. Participants are given access to their performance metrics. Access to IRIS for research currently is limited to a competitive grant process administered jointly by the AAO and Research to Prevent Blindness, Inc. A separate AAO fund is being established to support young investigators to perform IRIS-based research, but application details are not yet available. Subspecialty societies are invited to apply for AAO-sponsored IRIS-based research, and the American Glaucoma Society is sponsoring an award this year. (39) Axon is not currently accessible for research use, although this is planned for the future and expected to have a similar process to IRIS.
Opportunities for Neuro-Ophthalmology
With regard to data sources, there is much research that can be pursued in neuro-ophthalmic diseases and vision outcomes in neurological diseases using existing data. Participation in professional organization–sponsored registries will enrich the data sets and will provide individual benefits with regards to performance measurements. There is an opportunity for neuro-ophthalmology professional organizations to sponsor awards that would enable their members to access restricted data sets such as the IRIS registry and to sponsor neuro-ophthalmology–focused data sets.
Consultation with a biostatistician or other individual well versed in big data analysis is extremely important throughout the research process to ensure appropriate formulation of the research question, data management practices, modeling, and interpretation.
In the context of observational studies, big data sets offer the promise of large sample sizes and are particularly attractive for studying rare diseases such as those seen in neuro-ophthalmology. These large samples can be used in cross-sectional, retrospective cohort, case–control, and other study designs to define incidence and prevalence and to identify potential risk factors. This has application to risk stratification and prediction, which typically requires a large sample and population breadth to generate robust, clinically useful results (40). Another promise of big data, particularly those derived from clinical care records, is the opportunity to capture “real-world” experience in contrast to the idealized treatment and testing structure of gold standard randomized interventional trials. The data can be leveraged to investigate questions that may be practically difficult, financially prohibitive, or ethically challenging to address prospectively (e.g., drug-induced diseases). Studies can be exploratory and hypothesis generating, but also can test hypotheses directly.
Power analyses are essential to ensure adequate sample sizes for the research questions asked and the analytical techniques used. Although big data sets are attractive for their size, a rare condition may still have a relatively small sample size in a big data set, and this can limit both the analyses and the conclusions (16,19).
Data management decisions, including how raw data are used to define variables, are an important foundation for subsequent analyses. Too broad a definition for inclusion of data could bias toward toward the null, too narrow a definition can limit sample size and study power. When performed appropriately, the analyses can help to address problems with completeness and consistency in the raw data but could also can skew the results. For example, requiring an observation period of 5 vs 1 year before a diagnosis to define it as incident reduced overestimation of glaucoma incidence from 135% to <30% (41). With regard to IIH, ICD codes from emergency department visits have a 55% positive predictive value (42). One strategy is to require certain tests in the medical records in addition to a diagnosis code. However, this is not foolproof, with less than 70% of patients with an ICD code for IIH and CPT codes for neuroimaging and lumbar puncture meeting criteria for IIH on medical record review (43). Because of the nuances of diagnosing neuro-ophthalmic conditions, neuro-ophthalmic experts have raised serious concerns regarding the accuracy of diagnostic definitions for optic neuritis and ischemic optic neuropathy used in recent studies led by non–neuro-ophthalmologists (44,45). One study found a false positive rate of 60% for optic neuritis diagnoses by non–neuro-ophthalmologists (46).
Selection of control subjects (i.e., those without the exposure or outcome of interest) also requires careful thought to ensure accurate classification of dependent and independent variables (47). For example, in a study of IIH using medical claims data, selecting controls from a population with a previous eye examination or without a diagnosis of headache may decrease misclassification bias. Similarly, if glaucoma is an independent variable of interest, then it is important that cases and controls have similar eye examination histories. Sparse data bias can occur when there are insufficient cases with some combinations of predictive variables, and this can bias away from the null, predicting large effect sizes (48). One strategy to address this is matching controls to cases with strong risk factors—those for which the exposure–outcome relationship has a large effect size.
Missing data, unavoidable and common with epidemiologic research, clinical trials, and big data in particular, can led to biased estimates and reduced precision that significantly affect conclusions. Therefore, missing data is one of the most critical elements that must be acknowledged, described, and addressed in any study. There are 3 general types of missing data: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) (49). Only MCAR, in which missingness does not depend on other variables but rather on only random events, yields unbiased estimates. MAR is missingness dependent on another observed variable, while MNAR missingness, sometimes called “nonignorable nonresponse” or “informative missingness” is dependent on another unobserved variable. As an example, income that is missing based on an observed variable (e.g., sex) would be MAR, but income missing based on whether the income is high or low, which is unobserved for subjects with missing income variables, would be MNAR.
General analytic strategies to address missing data include complete case analysis, multiple imputation (49), likelihood-based methods (50), and inverse probability weighting. Complete case analysis is based on the subset without missing data in either the outcome or covariates and will produce biased samples unless 1) data missingness is MCAR or 2) the overall rate of missing data is small (e.g., <5% of total sample; impact of bias likely to be small). Complete case analysis results in a loss of efficiency (e.g., larger standard errors). Recommendations for missing data generally lean toward complete case analysis when the overall missing rate of the total sample is small (e.g., <5%), regardless of the type of missing data as the impact is low.
Multiple imputation is a common approach to MAR. Multiple imputed values for each missing observation are generated using carefully specified joint imputation models that reflect the uncertainty of the missing value. A statistical model is fit to each complete data set, and the results of multiple separate analyses (for each imputation) are combined to account for the uncertainty in the imputation (49). More naive imputations (e.g., substitution of means and last observation carried forward) can be worse than complete case analysis. Multiple imputation can be valuable when missing data are restricted to covariates and not outcome data.
Likelihood-based approaches model subjects with complete and partial data together and exclude only observations with both covariates and outcomes missing but require sophisticated analytic techniques with assumptions based on the nuisances of covariate distribution. Inverse probability weighting corrects for bias of estimates obtained with complete case analysis, in which each individual is given a sampling weight and the probability for selection proportional to this weight (50). Inverse probability weighting may be most applicable if there is a large amount of missing outcome data, which cannot be strongly modeled with covariates through multiple imputation. Finally, when the same variables are used in analysis, multiple imputation and likelihood-based methods yield similar results.
With regard to evaluating relationships between data, one approach is to use multivariable models with outcome as a function of exposure and covariates. These include logistic regression, Cox regression, mixed models, and generalized estimating equations. Included variables are typically based on previous information from the literature on the investigators' framework. Often univariate comparisons of outcome to each predictive variable and stratified comparisons of outcome to exposure by predictive variable level are used to inform the initial models. Other strategies include incrementally adding terms to models or incrementally removing them to arrive at the final model. Techniques such as propensity scores and mediation analysis can be used to address issues of confounding and questions of causality respectively. The sample size determines the number parameters that can be accommodated to identify associations between dependent and independent variables as well as interactions between independent variables, while accounting for a wide range of potentially confounding variables. For example, a general rule of thumb for logistic regression is analyzing a minimum of 10 events (patients or eyes with outcomes) per analysis variable (51).
Another approach is to use data mining or machine learning techniques without previous identification of relevant variables. The techniques are typically supervised (as opposed to unsupervised) in that the outcomes of interest are defined by the investigator, and the goal of analysis is to identify patterns in the data associated with the outcome (52). From a research perspective, these are hypothesis-generating rather than hypothesis-testing analysis techniques. They have particular application to the development of predictive models. Machine learning also is increasingly popular for its ability to analyze image data, with clear relevance to neuro-ophthalmology, where fundus imaging and optical coherence tomography have been evaluated by this methodology (53–55).
Given a large data set and commonly available statistical software, it becomes relatively straightforward for a novice to run models and generate numerical statistical results. However, planning and selecting the appropriate analyses requires expertise. A practical note is that analyses of large quantities of data, (e.g., from Medicare or NIS), may require high speed computing resources because of insufficient memory on personal computers. Usually, this will result in increased costs and the necessity for programming expertise.
Opportunities for Neuro-Ophthalmology
Beyond using big data set research as a tool to investigate and advance understanding of neuro-ophthalmic disease, the neuro-ophthalmology community can make important contributions in development and validation of algorithms for accurate classification of neuro-ophthalmic disease from claims and administrative data.
Research study results are only applicable to the extent that the data they are based on is high-quality with acknowledged limitations, the analyses are appropriate, and the conclusions are reasonable. With large sample sizes, it becomes more likely to have statistically significant associations with small effect sizes. The investigator must use care to frame the results as clinically relevant or not. Because of the observational nature of most big data sets, the analyses detect association and do not imply causation. Another risk is spurious correlations, which are statistically identified associations that are either coincidental or related to a common cause. As with any research study, a big data study does not stand alone but must be interpreted in the context of the broad literature for the disease of interest.
Attempts to improve study design and reporting include STROBE and RECORD reporting standards (56–58). Concerns have been raised about approaches to report and evaluate data collected in longitudinal big data studies impacting drug therapy (59).
The increasing ease of data collection, storage, and analysis has increased enthusiasm for big data analysis in medical research and clinical care, as well as for many applications outside of medicine. The large sample sizes and real-world observations are promising as a basis for research questions that cannot practically be answered using clinical trials. However, big data remains simply a collection of data sources. Careful data selection, management, analysis, and interpretation are critical to generate meaningful conclusions. There are numerous and increasing opportunities for further research using these databases to study neuro-ophthalmic diseases.
STATEMENT OF AUTHORSHIP
Category 1: a. Conception and design: H. E. Moss, C. E. Joslin, D. S. Rubin, and S. Roth; b. Acquisition of data: H. E. Moss, C. E. Joslin, D. S. Rubin, and S. Roth; c. Analysis and interpretation of data: H. E. Moss, C. E. Joslin, D. S. Rubin, and S. Roth. Category 2: a. Drafting the manuscript: H. E. Moss, C. E. Joslin, D. S. Rubin, and S. Roth; b. Revising it for intellectual content: H. E. Moss, C. E. Joslin, D. S. Rubin, and S. Roth. Category 3: a. Final approval of the completed manuscript: H. E. Moss, C. E. Joslin, D. S. Rubin, and S. Roth.
1. Press G. 12 big data definitions: what's yours? 2014. Forbes [Internet]. Available at: https://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#19f939ca13ae
. Accessed October 3, 2018.
2. Mahmood SS, Levy D, Vasan RS, Wang TJ. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet. 2014;383:999–1008.
3. Obermeyer Z, Emanuel EJ. Predicting the future—big data, machine learning, and clinical medicine. N Engl J Med. 2016;375:1216–1219.
4. Whiteley WN, Emberson J, Lees KR, Blackwell L, Albers G, Bluhmki E, Brott T, Cohen G, Davis S, Donnan G, Grotta J, Howard G, Kaste M, Koga M, von Kummer R, Lansberg MG, Lindley RI, Lyden P, Olivot JM, Parsons M, Toni D, Toyoda K, Wahlgren N, Wardlaw J, Del Zoppo GJ, Sandercock P, Hacke W, Baigent C. Risk of intracerebral haemorrhage with alteplase after acute ischaemic stroke: a secondary analysis of an individual patient data meta-analysis. Lancet Neurol. 2016;15:925–933.
5. Schneeweiss S. Learning from big health care data. N Engl J Med. 2014;370:2161–2163.
6. Sessler DI, Imrey PB. Clinical research methodology 2: observational clinical research. Anesth Analg. 2015;121:1043–1051.
7. Dupont WD, Plummer WD. Power and sample size calculations. A review and computer program. Control Clin Trials. 1990;11:116–128.
8. National Health and Nutrition Examination Survey (NHANES). Ophthalmology Procedures Manual [CDC NHANES Document]. Atlanta, GA: Center for Disease Control; 2005. Available at: https://wwwn.cdc.gov/nchs/data/nhanes/2005-2006/manuals/OP.pdf
. Accessed June 6, 2018.
9. Patton N, Aslam T, MacGillivray T, Pattie A, Deary IJ, Dhillon B. Retinal vascular image analysis as a potential screening tool for cerebrovascular disease: a rationale based on homology between cerebral and retinal microvasculatures. J Anat. 2005;206:319–348.
10. De Lott LB, Kerber KA, Lee PP, Brown DL, Burke JF. Diplopia-related ambulatory and emergency department visits in the United States, 2003–2012. JAMA Ophthalmol. 2017;135:1339–1344.
11. Hubbard LD, Brothers RJ, King WN, Clegg LX, Klein R, Cooper LS, Sharrett AR, Davis MD, Cai J. Methods for evaluation of retinal microvascular abnormalities associated with hypertension/sclerosis in the Atherosclerosis Risk in Communities Study. Ophthalmology. 1999;106:2269–2280.
12. Seidelmann SB, Claggett B, Bravo PE, Gupta A, Farhad H, Klein BE, Klein R, Di Carli M, Solomon SD. Retinal vessel calibers in predicting long-term cardiovascular outcomes: the Atherosclerosis Risk in Communities Study. Circulation. 2016;124:1328–1338.
13. Zheng DD, Swenor BK, Christ SL, West SK, Lam BL, Lee DJ. Longitudinal associations between visual impairment and cognitive functioning: the Salisbury Eye Evaluation Study. JAMA Ophthalmol. 2018;136:989–995.
14. Sinclair AJ, Burdon MA, Nightingale PG, Ball AK, Good P, Matthews TD, Jacks A, Lawden M, Clarke CE, Stewart PM. Low energy diet and intracranial pressure in women with idiopathic intracranial hypertension: prospective cohort study. BMJ. 2010;341:c2701.
15. Sugerman HJ, Felton WL III, Sismanis A, Kellum JM, DeMaria EJ, Sugerman EL. Gastric surgery for pseudotumor cerebri associated with severe obesity. Ann Surg. 1999;229:634–640.
16. Rubin DS, Matsumoto MM, Moss HE, Joslin CE, Tung A, Roth S. Ischemic optic neuropathy in cardiac surgery: incidence and risk factors in the United States from the National Inpatient Sample 1998 to 2013. Anesthesiology. 2017;126:810–821.
17. Calway T, Rubin DS, Moss HE, Joslin CE, Beckmann K, Roth S. Perioperative retinal artery occlusion: risk factors in cardiac surgery from the United States National Inpatient Sample 1998-2013. Ophthalmology. 2017;124:189–196.
18. Calway T, Rubin DS, Moss HE, Joslin CE, Mehta AI, Roth S. Perioperative retinal artery occlusion: incidence and risk factors in spinal fusion surgery from the US National Inpatient Sample 1998-2013. J Neuroophthalmol. 2018;38:36–41.
19. Rubin DS, Parakati I, Lee LA, Moss HE, Joslin CE, Roth S. Perioperative visual loss in spine fusion surgery: ischemic optic neuropathy in the United States from 1998 to 2012 in the Nationwide Inpatient Sample. Anesthesiology. 2016;125:457–464.
20. Lee YC, Wang JH, Huang TL, Tsai RK. Increased risk of stroke in patients with nonarteritic anterior ischemic optic neuropathy: a nationwide retrospective cohort study. Am J Ophthalmol. 2016;170:183–189.
21. Cestari DM, Gaier ED, Bouzika P, Blachley TS, De Lott LB, Rizzo JF, Wiggs JL, Kang JH, Pasquale LR, Stein JD. Demographic, systemic, and ocular factors associated with nonarteritic anterior ischemic optic neuropathy. Ophthalmology. 2016;123:2446–2455.
22. Stein JD, Childers D, Gupta S, Talwar N, Nan B, Lee BJ, Smith TJ, Douglas R. Risk factors for developing thyroid-associated ophthalmopathy among individuals with Graves disease. JAMA Ophthalmol. 2015;133:290–296.
23. Newman-Casey PA, Stem M, Talwar N, Musch DC, Besirli CG, Stein JD. Risk factors associated with developing branch retinal vein occlusion among enrollees in a United States managed care plan. Ophthalmology. 2014;121:1939–1948.
24. Lee MS, Grossman D, Arnold AC, Sloan FA. Incidence of nonarteritic anterior ischemic optic neuropathy: increased risk among diabetic patients. Ophthalmology. 2011;118:959–963.
25. Sodhi M, Sheldon CA, Carleton B, Etminan M. Oral fluoroquinolones and risk of secondary pseudotumor cerebri syndrome: nested case-control study. Neurology. 2017;89:792–795.
26. Nathoo NA, Etminan M, Mikelberg FS. Association between phosphodiesterase-5 inhibitors and nonarteritic anterior ischemic optic neuropathy. J Neuroophthalmol. 2015;35:12–15.
27. Guo D, Liu J, Gao R, Tari S, Islam S. Prevalence and incidence of optic neuritis in patients with different types of uveitis. Ophthalmic Epidemiol. 2018;25:39–44.
28. Corbett JJ, Savino PJ, Thompson HS, Kansu T, Schatz NJ, Orr LS, Hopson D. Visual loss in pseudotumor cerebri. Follow-up of 57 patients from five to 41 years and a profile of 14 patients with permanent severe visual loss. Arch Neurol. 1982;39:461–474.
29. Parke DW II, Rich WL III, Sommer A, Lum F. The American Academy of Ophthalmology's IRIS((R)) Registry (intelligent research in sight clinical data): a look back and a look to the future. Ophthalmology. 2017;124:1572–1574.
30. Chiang MF, Sommer A, Rich WL, Lum F, Parke DW II. The 2016 American Academy of Ophthalmology IRIS Registry (intelligent research in sight) database. Ophthalmology. 2018;125:1143–1148.
31. Sigsbee B, Goldenberg JN, Bever CT Jr, Schierman B, Jones LK Jr. Introducing the Axon Registry: an opportunity to improve quality of neurologic care. Neurology. 2016;87:2254–2258.
32. Busis NA, Franklin GM. The AAN's Axon Registry. Mastering how we are measured. Neurology. 2016;87:2180–2181.
33. Tso MO, Hayreh SS. Optic disc edema in raised intracranial pressure: IV. Axoplasmic transport in experimental papilledema. Arch Ophthalmol. 1977;95:1458.
34. Kesler A, Vakhapova V, Korczyn AD, Drory VE. Visual evoked potentials in idiopathic intracranial hypertension. Clin Neurol Neurosurg. 2009;111:433–436.
35. Fang C, Leavitt JA, Hodge DO, Holmes JM, Mohney BG, Chen JJ. Incidence and etiologies of acquired third nerve palsy using a population-based method. JAMA Ophthalmol. 2017;135:23–28.
36. Kilgore KP, Lee MS, Leavitt JA, Mokri B, Hodge DO, Frank RD, Chen JJ. Re-evaluating the incidence of idiopathic intracranial hypertension in an era of increasing obesity. Ophthalmology. 2017;124:697–700.
37. Ji J, Dimitrijevic I, Sundquist J, Sundquist K, Zoller B. Risk of ocular manifestations in patients with giant cell arteritis: a nationwide study in Sweden. Scand J Rheumatol. 2017;46:484–489.
38. Al Kahtani ES, Khandekar R, Al-Rubeaan K, Youssef AM, Ibrahim HM, Al-Sharqawi AH. Assessment of the prevalence and risk factors of ophthalmoplegia among diabetic patients in a large national diabetes registry cohort. BMC Ophthalmol. 2016;16:118.
39. Sorensen PS, Trojaborg W, Gjerris F, Krogsaa B. Visual evoked potentials in pseudotumor cerebri. Arch Neurol. 1985;42:150–153.
40. Lee YH, Bang H, Kim DJ. How to establish clinical prediction models. Endocrinol Metab (Seoul). 2016;31:38–44.
41. Stein JD, Blachley TS, Musch DC. Identification of persons with incident ocular diseases using health care claims databases. Am J Ophthalmol. 2013;156:1169–1175.e1163.
42. Koerner JC, Friedman DI. Inpatient and emergency service utilization in patients with idiopathic intracranial hypertension. J Neuroophthalmol. 2014;34:229–232.
43. Mudopivic I, Shirazi Z, Moss H, eds. Predictive value of international classification of disease code for idiopathic intracranial hypertension (IIH) in a university health system. North American Neuro-ophthalmology Society Annual Meeting; 2018; Kailua, HI.
44. Eggenberger E. Initiation of anti-TNF therapy and the risk of optic neuritis: from the Safety Assessment of Biologic ThERapy (SABER) Study. Am J Ophthalmol. 2013;156:407–408.
45. Hayreh SS. Increased risk of stroke in patients with nonarteritic anterior ischemic optic neuropathy: a nationwide retrospective cohort study. Am J Ophthalmol. 2017;175:213–214.
46. Stunkel L, Kung NH, Wilson B, McClelland CM, Van Stavern GP. Incidence and causes of overdiagnosis of optic neuritis. JAMA Ophthalmol. 2018;136:76–81.
47. Wacholder S, McLaughlin JK, Silverman DT, Mandel JS. Selection of controls in case-control studies. I. Principles. Am J Epidemiol. 1992;135:1019–1028.
48. Greenland S, Mansournia MA, Altman DG. Sparse data bias: a problem hiding in plain sight. BMJ. 2016;352:i1981.
49. Rubin DB. Multiple Imputation for Nonresponse in Surveys. Vol 258. New York, NY: Wiley, 1987:258.
50. Seaman SR, White IR. Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res. 2013;22:278–295.
51. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49:1373–1379.
52. Lee CH, Yoon HJ. Medical big data: promise and challenges. Kidney Res Clin Pract. 2017;36:3–11.
53. Rohm M, Tresp V, Muller M, Kern C, Manakov I, Weiss M, Sim DA, Priglinger S, Keane PA, Kortuem K. Predicting visual acuity by using machine learning in patients treated for neovascular age-related macular degeneration. Ophthalmology. 2018;125:1028–1026.
54. Mazzaferri J, Larrivee B, Cakir B, Sapieha P, Costantino S. A machine learning approach for automated assessment of retinal vasculature in the oxygen induced retinopathy model. Sci Rep. 2018;8:3916.
55. Ting DSW, Cheung CY, Lim G, Tan GSW, Quang ND, Gan A, Hamzah H, Garcia-Franco R, San Yeo IY, Lee SY, Wong EYM, Sabanayagam C, Baskaran M, Ibrahim F, Tan NC, Finkelstein EA, Lamoureux EL, Wong IY, Bressler NM, Sivaprasad S, Varma R, Jonas JB, He MG, Cheng CY, Cheung GCM, Aung T, Hsu W, Lee ML, Wong TY. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318:2211–2223.
56. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Epidemiology. 2007;18:800–804.
57. Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, Sorensen HT, von Elm E, Langan SM. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med. 2015;12:e1001885.
58. Fitchett EJA, Seale AC, Vergnano S, Sharland M, Heath PT, Saha SK, Agarwal R, Ayede AI, Bhutta ZA, Black R, Bojang K, Campbell H, Cousens S, Darmstadt GL, Madhi SA, Meulen AS, Modi N, Patterson J, Qazi S, Schrag SJ, Stoll BJ, Wall SN, Wammanda RD, Lawn JE. Strengthening the reporting of observational studies in epidemiology for newborn infection (STROBE-NI): an extension of the STROBE statement for neonatal infection research. Lancet Infect Dis. 2016;16:e202–e213.
59. Wang SV, Schneeweiss S, Berger ML, Brown J, de Vries F, Douglas I, Gagne JJ, Gini R, Klungel O, Mullins CD, Nguyen MD, Rassen JA, Smeeth L, Sturkenboom M. Reporting to improve reproducibility and facilitate validity assessment for healthcare database studies V1.0. Pharmacoepidemiol Drug Saf. 2017;26:1018–1032.