Secondary Logo

Journal Logo


Promoting Transparency and Standardization in Ophthalmologic Artificial Intelligence: A Call for Artificial Intelligence Model Card

Chen, Dinah K. MD; Modi, Yash MD; Al-Aswad, Lama A. MD, MPH∗,†

Author Information
Asia-Pacific Journal of Ophthalmology: May-June 2022 - Volume 11 - Issue 3 - p 215-218
doi: 10.1097/APO.0000000000000469
  • Open

Over the last decade, artificial intelligence (AI) has become ubiquitous; it is in our smartphones and homes (Siri and Alexa), our emails (spam filters), Netflix recommendations, and ride-share apps. On a societal level, it is used for predictive policing and credit lending; and in medicine, it promises to transform the shape and scope of health care.

But for its seemingly endless benefits, AI comes with ethical challenges and potential harm.1 Most AI algorithms are supervised, meaning training inputs and outputs are defined by humans. This makes it susceptible to the same socioeconomic, racial, and gender biases that shape our world. Unintended bias can occur at any level: training, algorithm design, and implementation. There are many examples of how AI software, from facial recognition to natural language processing, have displayed encoded racism, noticed only after commercial deployment.2

The explosion in health care AI research, specifically machine learning (ML) and deep learning (DL) have definite clinical relevance in ophthalmology.3–4 In 2018, IDx-DR received approval from the US Food and Drug Administration (FDA) for its autonomous AI for diabetic retinopathy (DR) screening. EyeArt's approval followed shortly. Both set impressively high standards for the validation of AI clinical support decision tools. However, as AI in ophthalmology becomes increasingly commercially available, we will have to contend with the same issues of unintended bias as other industries. Identifying and guarding against these consequences begins with adequate transparency.

To highlight theoretical concepts related to bias in ophthalmologic AI, we examine 2 use cases: DR screening using fundus photos and AI utilizing optical coherence tomography (OCT), a burgeoning area of research. In doing so, we raise questions regarding bias and propose the adoption of a reporting tool, model cards, to promote standardization and transparency.


As with all AI, ML/DL algorithms in ophthalmology are prone to reflect biases intrinsic to the datasets they are developed with. Training data establishes a concept known as “ground truth”—the basis of knowledge from which the algorithm learns. In supervised learning, images are annotated by human graders for input characteristics. For example, in the case of DR screening, humans often label disease features like neovascularization or exudates on fundus photos. Bias can be unintentionally introduced through unbalanced training data (eg, a lack of racially diverse images) and bias in human labeling (eg, if features in lightly or darkly pigmented fundus images are missed).

Because it may be considered proprietary, training data characteristics are not often reported. Furthermore, the vast majority of public imaging databases in ophthalmology do not report demographic information; in a recent review of publicly available datasets globally, only 20% published demographic information.5 Two widely used imaging databases are EyePACS, containing over 30,000 fundus images, and Messidor-2, containing nearly 2000. EyePACS was compiled from sites across the US, while Messidor-2 was compiled through a consortium of institutions predominantly based in France. Neither imaging bank publicizes demographic information.

As fundus pigmentation is known to vary with race,6 understanding the demographic composition of training data is important in defining the context of appropriate use. Using a DL-system trained with lighter-skin fundus images, Burlina et al7 found reduced accuracy (73% vs 60.5%) when tested on simulated darker-skin images, highlighting the susceptibility of AI to produce inequitable outcomes with unbalanced data. These biases may then be overlooked if similarly unbalanced data are used for validation or clinical trial testing.


Reference standards pose another avenue for the introduction of bias. In the case of DR screening, for which fundus reading centers serve as the reference standard, disease grading may have different implications for different races/ethnicities. The Los Angeles Latino Eye Study showed more severe and faster progression of DR in Hispanics.8 Could reading center-graded mild DR signal more advanced disease that requires closer follow-up among Hispanics? Should algorithms take into account that certain diseases, like DR, disproportionately affect Blacks and Hispanics in the US? Should this be reflected in clinical trial testing?


Issues of representation are particularly relevant to AI using OCT for screening and treatment-response prediction, as reports are based on normative data. Each machine produces analyses based off unique and proprietary reference databases. The Zeiss Cirrus normative database for retinal nerve fiber layer (RNFL) thickness and macular thickness analysis was developed from scans of 282 eyes. 43% of patients were Caucasian 24% Asian, 18% Black, 12% Hispanic, 1% Native American, and 6% multiple races.9 Heidelberg Spectralis’ basic reference database was founded on values from Caucasians only; a racially diverse database requires the purchase of a premium module.10

Studies have shown differences in macular thickness in healthy eyes by race,11 mean foveal thickness was demonstrated to be 32 microns thicker in Caucasians than Blacks, and 23 microns thicker in males compared to females.11 Algorithms using average macular thickness may risk incorrect classification when using normative data that does not account for racial, ethnic, and/or sex variations. These implications are further compounded by research demonstrating poorer treatment responses to identical therapies among some races.12 The same is true for RNFL thickness; a recent study found normal subjects had different thicknesses and cup-to-disc ratios by race.13 As OCT-based AI becomes commercially available, we must be cognizant of the potential for bias through unbalanced normative data.

Ultimately, these issues are emblematic of a larger problem—underrepresentation of minorities in clinical data in the US. Between 2015 and 2019, 78% of drug trial participants were White.14 Producing balanced, diverse datasets is difficult but essential in preventing the magnification of bias at scale.


The need for standardization of reporting is well recognized. The American Academy of Ophthalmology (AAO) AI task force and FDA have begun formulating guidelines and there have been calls for AI ocular imaging standardization in the Asia-Pacific, evidenced by the position statement and recommendation jointly issued by the ophthalmic societies in the region.15

As these regulations begin to take shape, we recommend the use of a specific tool to promote transparency and standardization in ophthalmologic AI. In 2018, Google's ethical AI team introduced the concept of the “model card”.16 These cards, analogous to nutrition labels, provide basic, qualitative, and quantitative information in a straightforward format. We propose that alongside metrics that are already regularly reported in medical literature like intended use and performance measures, these cards should include training data information to better inform what the model represents. In the case of proprietary models, basic demographic and clinical features should be considered minimally allowable information, consistent with reporting recommendations for non-AI predictive models in medicine ( Finally, ophthalmologic model cards should be tailored to include ethical considerations and assumptions as they pertain specifically to reference standards and normative databases. We provide a suggested card in Figure 1.

Figure 1:
Suggested model card format for ophthalmology, adapted from Mitchell et al16 and Google.

Though they have yet to be adopted in medicine, model cards represent an opportunity to help stakeholders (developers, providers, patients, and policymakers) understand and judge models. As AI literature continues to grow, this tool will be increasingly useful for comparing models. AI has the capacity to augment our impact as clinicians and help us improve quality of care for all, but standards remain nebulous. We hope this tool helps us, as clinicians, improve our ability to critically evaluate fairness in AI.


1. Abdullah YI, Schuman JS, Shabsigh R, Al-Aswad LA. Ethics of artificial intelligence in medicine and ophthalmology. Asia Pac J Ophthalmol (Phila) 2021; 10:289–298. doi:10.1097/AP0.0000000000000397.
2. Hardesty L. Study finds gender and skin-type bias in commercial artificialintelligence systems. MIT News.; v. 2021.
3. Ran A, Cheung CY. Deep learning-based optical coherence tomography and optical coherence tomography angiography image analysis: an updated summary. Asia Pac J Ophthalmol (Phila) 2021; 10:253–260. doi:10.1097/AP0.0000000000000405.
4. Lee EB, Wang SY, Chang RT. Interpreting deep learning studies in glaucoma: unresolved challenges. Asia Pac J Ophthalmol (Phila) 2021; 10:261–267. doi:10.1097/AP0.0000000000000395.
5. Khan SM, Liu X, Nath S, et al. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. Lancet Digit Health 2021; 3:e51–e66.
6. Greenberg JP, Duncker T, Woods RL, et al. Quantitative fundus autofluorescence in healthy eyes. Invest Ophthalmol Vis Sci 2013; 54:5684.
7. Burlina P, Joshi N, Paul W, et al. Addressing artificial intelligence bias in retinal disease diagnostics. Transi Vis Sci Technol 2021; 10:13.
8. Choudhury F, Varma R, Mckean-Cowdin R, et al. risk factors for four-year incidence and progression of age-related macular degeneration: The Los Angeles Latino Eye Study. Am J Ophthalmol V 2011; 152:385–395.
9. US Food and Drug Administration. Cirrus HD-OCT with Retinal Nerve Fiber Layer (RNFL), Macular, Optic Nerve Head and Ganglion Cell Normative Databases. (FDA, 510K notification): FDA, 2012; v. 2021.
10. US Food and Drug Administration. SPECTRALIS HRA+OCT and variants with High Magnification Module. (FDA, 510K notification): FDA, 2018; v. 2021.
11. Kelty PJ, Payne JF, Trivedi RH, et al. Macular thickness assessment in healthy eyes based on ethnicity using stratus OCT optical coherence tomography. Invest Ophthalmol Vis Sci 2008; 49:2668.
12. Osathanugrah P, Sanjiv N, Siegel NH, Ness S, Chen X, Subramanian ML. The Impact of Race on Short-term Treatment Response to Bevacizumab in Diabetic Macular Edema. Am J Ophthalmol 2021; 222:310–317.
13. Nousome D, Mckean-Cowdin R, Richter GM, et al. Retinal nerve fiber layer thickness in healthy eyes of black, chinese, and latino americans: a population-based multiethnic study. Ophthalmology 2021; 128:1005–1015.
14. US Food and Drug Administration. 2015-2019 Drug Trials Snapshot Summary Report 5- Year Summary and Analysis of Clinical Trial Participants and Demographics. FDA, 2020; v. 2021.
15. Ting DSW, Wong TY, Park KH, et al. Ocular imaging standardization for artificial intelligence applications in ophthalmology: the joint position statement and recommendations from the Asia-Pacific Academy of Ophthalmology and the Asia-Pacific Ocular Imaging Society. Asia Pac J Ophthalmol (Phila) 2021; 10:348–349. doi:10.1097/APO.0000000000000421.
16. Mitchell M, Wu S, Zaldivar A, et al. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT ’19). New York, NY, USA: Association for Computing Machinery; 2019, 220–229. doi:
Copyright © 2021 Asia-Pacific Academy of Ophthalmology. Published by Wolters Kluwer Health, Inc. on behalf of the Asia-Pacific Academy of Ophthalmology.