Secondary Logo

Journal Logo

Review Article

Economic Evaluations of Artificial Intelligence in Ophthalmology

Ruamviboonsuk, Paisan MD; Chantra, Somporn MD; Seresirikachorn, Kasem MD; Ruamviboonsuk, Varis MD; Sangroongruangsri, Sermsiri PhD

Author Information
Asia-Pacific Journal of Ophthalmology: May-June 2021 - Volume 10 - Issue 3 - p 307-316
doi: 10.1097/APO.0000000000000403
  • Open

Abstract

WHY HEALTH ECONOMICS?

Data from the Global Health Expenditure Database of the World Health Organization showed that the global expenditure on health had continued to rise from $7.6 trillion in 2016, to $7.8 trillion in 2017, and then to $8.3 trillion in 2018. This amount in 2018 was $1110.81 per capita—9.85% of the global gross domestic product.1 Data from the Organization for Economic Co-operation and Development (OECD), which is an international organization that formulates policies fostering prosperity, equality, opportunity, and well-being for all, also showed that the current trend of global health expenditure grew by an average of 2.0% in 2017, 2.5% in 2018, and 2.4% in 2019.2 Although the OECD projected that the growth of health spending per capita over the next 15 years would be slower than the historical health growth per capita, this growth was still above the growth in the economy.3

In Asia, the average health spending per capita, based on data from 41 countries, had also increased from $627.8 in 2016, to $674.2 in 2017, and then to $712 in 2018.4 The gap in health spending among countries was very wide; whereas, the highest being in Japan at $4266.59 and the lowest in Bangladesh at $41.91.4 Under the framework of the System of Health Accounts, which incorporates patient characteristics, OECD estimated that eye and adnexa diseases constituted about 1.4% to 4.6% of the overall health expenditure.5

The occurrence of the coronavirus disease in 2019 (COVID-19) has been exacerbating global health expenditure. The International Monetary Fund estimated that, in a benchmark scenario without capacity constraints and social distancing or quarantine measures, the cost of responding to the pandemic would reach over $15 trillion globally. With effective social distancing or quarantine measures, the cost could be reduced, but would still reach $231 billion.6 This estimation did not yet consider the cost for inadequate treatment for other diseases affected by the COVID-19 pandemic.

Artificial intelligence (AI) has emerged as a disruptive technology across many industries, including health care. Evidently, it has a potential to play important roles across a spectrum of application domains, including diagnostics, therapeutics, population health management, administration, and regulation. Moreover, it is expected to drive not only significant medical quality enhancements, but also notable cost-saving improvements.7

Over the past years, we have witnessed a rapid growth of studies of AI in the medical literatures. Although the majority of them (which were from various fields of medicine including ophthalmology) could provide convincing efficacies, their clinical deployment in the real world has still been rare. The adoption of AI in health care may still pose many challenges and barriers to patients, health providers, health organizations, and policymakers.8 Thus, it is critical to gain more information about AI in health care. Although the existing data on efficacies of AI are top-level and abundant, the data on the economics of AI, which many policymakers considered for adopting new technology, have been fragmented and scarce.9

We performed a comprehensive publication search from the following databases: PubMed (Medline), Ovid, Scopus, ProQuest, Google, Web of Science, and Embase. We searched for articles from inception to January 31, 2021, and used “OR/AND” operators with the following keywords and the Medical Subject Headings terms: “ophthalmology”, “diabetic retinopathy”, “age-related macular degeneration”, “glaucoma”, “cataract”, “fundus photograph(s)/image(s)”, “automated screening”, “health care”, “machine learning”, “artificial intelligence”, “deep learning”, “economic evaluation(s)”,”cost”, “cost-minimization analysis”, “cost-effectiveness analysis”, “cost-utility analysis”, and “cost-benefit analysis”. A total of 212 articles were identified from all databases. After the exclusion of duplicated and irrelevant articles, 57 potentially relevant articles were retrieved for more detailed evaluation. The abstracts were then reviewed to match our objectives. As a result, 7 articles were included in this review in detail. These showed the economic impact of artificial intelligence in ophthalmic conditions, which were mostly about diabetic retinopathy screening. A total of 5 articles were economic evaluations and included for comparison in Table 1.

Table 1 - Summary of Published Economic Evaluation of Artificial Intelligence for Screening of Diabetic Retinopathy
Study Method Author, Year, Country, No. Patients Perspective Models and Comparators Outcome Measures and Economic Results Strengths or Notable Remarks Weakness
CMA Xie et al39, Singapore, 39,006 patients Health care provider DT, referral outcomes were similar among human grading, fully automated DL, and semi-automated DL models. The cost of the least expensive model is the outcome measure.The semi-automated model was the least expensive at $62 per patient per year. A deterministic (one-way) sensitivity analysis, which is used to measure the effects of uncertainty related to the critical parameters on the cost estimates, was conducted. Specificity was found to be the most important factor affecting the cost differentials. This may be because the higher rate of false-positive cases drives up the costs from a greater number of consultations for diagnosis. Generalizability to lower-income countries may be doubtful.Diabetic macular edema was excluded.
CEA Scotland et al24, UK, 6,722 patients Health care provider DT, semi-automated ML was compared with manual grading. The additional cost per additional referable case detected (manual vs. semi-automated) was £4088 (US $5709). Meanwhile, the additional cost per additional appropriate screening outcome (manual vs semi-automated) was £1990 (US $2779). This may be the earliest economic evaluation of AI for DR screening for implementation in a national screening program. There was consideration of costs and consequences over a one-year time horizon.DR was graded as Disease/No Disease.ML is a conventional algorithm using feature extraction by human engineers, which had lower performance as compared with DL.
CEA Tufail et al42, UK, 20,258 patients Health care provider DT, A semi-automated ML strategy 1 (ML on top of the first human graders) was compared with semi-automated ML strategy 2 (ML replaced the first human graders), and then compared with conventional manual grading. The semi-automated ML models (Retmarker and EyeArt) had sufficient specificity for them to be cost-effective as compared with manual grading, whereas the ICERs were $15.36 and $4.43 respectively for ML strategy 1; and $18.69 and $7.14 respectively for ML strategy 2. Each commercially available ML model was compared with human grading. It was not designed to look at the accuracy of human grading.The 2 strategies of ML were semi-automated screening. As such, no fully automated AI screening was evaluated.
CEA Wolf et al43, USA, simulated model no number of patients available. Patient DT, an autonomous DL screening before ECP screening was compared to ECP screening alone. At the baseline of 20% screening rate, the use of autonomous AI resulted in a higher mean patient payment ($8.52 for T1D and $10.85 for T2D) than ECP screening ($7.91 for T1D and $8.20 for T2D). This study can be viewed as another semi-automated screening by DL because the referrals were additionally examined by ECP to determine the presence of DR. Notably, with this strategy, AI was found to be cost-saving when the screening rate was 23% or higher. There was consideration of costs and consequences over a one-year time horizon.This analysis did not address system, societal, or other third-party costs. The results may only be applied to pediatric patients.
CUA Xie et al, 201944, Singapore, patients were from another study. Health care provider DT/Markov model, a semi-automated DL was compared with manual grading. The primary outcome measures were the total cost incurred by the health care system and the total QALYs gained per patient.From a health system perspective, semi-automated DL resulted in a lifetime cost-saving of SGD135 (US $100) per patient while maintaining comparable QALYs gained. The study showed that a semi-automated DL was cost-saving in a lifetime horizon of patients. There was lack of detail in the economic evaluation model.
AI indicates artificial intelligence; CEA, cost-effectiveness analysis; CMA, cost-minimization analysis; CUA, cost-utility analysis; DL, deep learning; DR, diabetic retinopathy; DT, Decision Tree (model); ECP, eye care professional; ICER, incremental cost-effectiveness ratio; ML, machine learning; QALYs, quality-adjusted life-years; T1D, type 1 diabetes; T2D, type 2 diabetes.

ECONOMICS OF AI IN HEALTH CARE: CURRENT SITUATION

In a recent systematic review on the economic impact of AI in health care, Wolff et al reported that there was a shortage of publications on this aspect in both quantity and quality.9 In terms of quantity, using search terms, such as artificial intelligence, machine learning, cost, effectiveness, economic impact, and cost-saving in the PubMed database, allowed the authors to find only 66 publications as of mid-2019. Meanwhile, in another article, when the Web of Science database was searched for all kinds of publications on AI-related health care, as of late 2019, 5235 hits were identified.10

As for quality, after applying the following criteria for economic impact assessment, description of cost-effectiveness of AI solution, hypothesis formulation, cost-effectiveness perspective, consideration of cost alternative, benefit today, and verification of base case, which were derived from classical health care effectiveness studies11 and digital health assessments,12 Wolff et al could include only 6 publications for their review. These publications varied and came from varying medical fields, such as the cost-effectiveness of intervention strategies by a machine learning model to reduce total joint replacement readmissions13 and cost-saving of a machine learning model to predict the risk of 30-day readmissions among patients with heart failure.14 Notably, none of the publications had complete economic evaluations, such that the initial investment and operational costs of AI infrastructure, and AI service delivery were not included. In addition, there was inadequate benchmarking of costs from other alternatives that might achieve a similar impact as AI.

ECONOMICS OF AI IN OPHTHALMOLOGY: WHERE ARE THE GAPS?

Studies on the economics of AI in ophthalmology follow a similar trend to the studies on the economics of AI in other diseases in health care. Excluding AI, the proportion of studies on economics in ophthalmology has already been a small fraction of the whole ophthalmic literature although they were gaining increasing levels of importance.15 The number of studies on the economics of AI in ophthalmology was even less although ophthalmology has been one of the leading fields where AI was applied. Figure 1 shows an overview of economic evaluation model in medicine and where AI could fit in. There are 2 important components: (1) costs of AI, and (2) consequences of AI applications.16 These components are compared to provide evidence on the value for money of the applications, based on provider or societal perspectives. The economic evaluations of AI in ophthalmology can also incorporate changes in the quality of life (eg, blindness prevented), which are advantageous based on value-based medicine.17

F1
FIGURE 1:
Overview of economic evaluation of artificial intelligence in medicine.

COSTS OF AI

Costs in health economics can be viewed from various perspectives, including those of the provider and society. The societal perspective, which is broader in scope, is considered the gold standard.18 It includes the totality of costs, such as costs sustained by patients, government, health care providers, and costs incurred from the use of the intervention. However, since it may be difficult to calculate costs from a societal perspective, costs in health economics are often analyzed from the provider's perspective.

In a regulatory framework, AI in software has been viewed as a medical device.19 This analogy may also be applied to AI in terms of economics. In the applications for disease detection or monitoring, the cost of the medical device is usually the main component of costs from the provider perspective.18 Nevertheless, AI may not be considered as the main cost for the applications when it is used as a software that is incorporated with the hardware. For instance, most of AI in ophthalmology is developed to be used with certain types of hardware, such as fundus cameras,20 optical coherence tomography devices,21 and slitlamp biomicroscopes.22 Therefore, the main initial cost would be from the hardware. Notably, the cost of AI software only may be accounted for when it is compared with human experts in image interpretation since the same hardware is used for both modalities.

Generally, it has been recognized that the development of AI is expensive.23 However, the costs of AI in health care or in ophthalmology have not yet been standardized or determined clearly in the literature. The costs of AI should include both short- and long-term.7 Short-term costs may include the costs of investment during research, development, and validation. The labor investment for data collection, preparation, sorting, and labeling, should also be accounted for.23 However, there may be some arguments in the perspective of service provider or delivery since the use of AI software is just the cost being paid for and there is no need to cover the cost of technology and software development. The service providers, however, would still need to cover the cost of integrating the AI system into their existing health care system.

The costs of this initial deployment, which include the costs of hardware, information technology infrastructure, and human operators, should also be counted in the initial cost. The maintenance costs may be a major part of the long-term use of AI. These costs may not only be incorporated through increasing amounts of patient data, but also by updating software algorithms and ensuring hardware operability in the long run.7

The study by Scotland et al24 may be the first to economically evaluate AI for diabetic retinopathy (DR) screening and describe in detail the costs associated with it. The cost per patient for the automated system (AI) was calculated under the assumption that the software would run from an existing central server covering all of Scotland. The implementation costs in this study were estimated as the cost of further software development (12 months of a computer analyst's time, equals to £31,542 [US $44,045]), and the cost of integration with the existing system, equals to £60,000 (US $83,784) plus ongoing support and maintenance annually, equals to £12,000 (US $16,762). The non-recurrent costs were annuitized over 10 years, which was the assumed useful lifespan of the AI software. The total annual equivalent implementation cost (£23,007, or approximately US $32,127) was then divided by the number of patients screened annually (160,000) to give a cost per patient of £0.14 (US $0.19). Although these costs were from 2005 to 2006, they may be used as models for the detailed costs of AI.

Another example of the costing of AI is from IDx-DR. IDx-DR was the first AI diagnostic device, which was approved by the US Food and Drug Administration (FDA) in 2018 for DR screening.25 The clearance of IDx-DR was not usual since it had been approved in the de novo pathway.7 This pathway is associated with novel medical devices in which general controls provide reasonable assurance for safety and effectiveness.26 Notably, IDx-DR underwent the Automatic Class III or De Novo premarket pathway and achieved Breakthrough Device designation.7 These processes demonstrated how the FDA discerned the importance of automated DR screening. Nevertheless, IDx-DR has not yet been adopted widely; whereas, a crucial factor may be attributed to the economic viability of this AI.

A recent article revealed the costs of IDx-DR from a provider perspective in the US.27 There were 3 purchase plans offered to prospective owners for utilizing this AI software and auxiliary equipment, namely: (1) one-time purchase for approximately $13,000, (2) capital leasing plan, and (3) no-capital expenditure plan requiring a minimum number of quarterly examinations per equipment. IDx-DR also required an additional fixed charge of $25 per patient for screening, notwithstanding the purchase plan. Using the current US Medicare reimbursement amount and considering this fixed charge per patient alone, the marginal revenue of IDx-DR would be −$1.18. This deficit would be even greater at −$17.50 per patient in areas where only DR-positive images could be reimbursed. The authors of the article proposed that with the current fixed charge per patient ($25), a reimbursement of at least $79.36 per positive patient would be required for a break-even marginal revenue if only DR-positive images could be reimbursed. A deficit could still exist even if private payers offered reimbursement higher than $25 per patient. Notably, even if all images could be reimbursed, there was still a substantially low marginal revenue per patient for IDx-DR to incentivize primary care providers to widely adopt this technology.

It is not known whether the price options for purchasing, including the cost per patient set by IDx-DR, would be benchmarked as the reference standard for determining the costs of AI for DR screening in the market since there could be more AI devices available for screening in the future. For instance, although it had taken about 2 years, the second AI device for DR screening, EyeArt, was finally cleared by the US FDA in the mid-2020.28 It remains to be seen whether the supply of and demand for AI in ophthalmology would follow the basic law of supply and demand in economics.29

PERFORMANCE OF AI APPLICATIONS AND ECONOMIC EVALUATIONS

The followings are examples of major applications of AI in ophthalmology: (1) detection or screening of diseases, including DR,20 age-related macular degeneration (AMD),30 glaucoma,31 cataract,32 keratoconus,33 and strabismus,34 (2) monitoring disease activity, including monitoring retinal fluid in OCT scans,35 and (3) predicting disease progression, including predicting conversion from early AMD to late AMD36 and predicting progression of glaucoma.37

Few of the aforementioned applications have been evaluated in terms of economics. Since the evaluation of consequences of interventions in health economics is a comparison between the costs of alternative medical interventions, this evaluation for AI in health care would basically be a comparison between the costs of AI applications and the costs of alternative interventions, which could either be human interventions, non-AI interventions, or alternative AI applications. There were only a few, published economic evaluations on AI in ophthalmology and most of them were comparisons between AI and humans. As shown in Figure 1, there are 4 common methods of economic evaluation depending on the outcomes.

Cost-minimization analysis

This method of economic evaluation may be conducted when there is evidence that 2 alternative interventions of interest are equally effective. Therefore, the costs of the 2 interventions can be compared directly.38 Consequently, the intervention with the lower cost can be chosen as cost-saving. For example, in the case of AI, evidence of equal effectiveness between AI and human grading should be provided in cost-minimization analysis (CMA). As in the study by Xie et al,39 they used the CMA method to determine which strategy for using an AI model in DR screening is the best for saving costs, assuming from their previous study that the number of patients referred and therefore treated for DR was not statistically different across the 3 strategies of DR screening. The strategies included: (1) a strategy where conventionally trained human graders were replaced by AI for grading retinal images for detecting referrals (fully automated strategy), (2) a strategy where trained human graders graded retinal images for the same purpose (conventional strategy), and (3) a strategy where AI graded all retinal images similar to the first strategy, but trained human graders re-graded all the retinal images that had been determined as referrals by AI, as the second step, before the patients with those images were actually determined as referrals (semi-automated strategy).

Although the authors did not estimate the exact cost of AI, they included it in the costs of information technology. They also showed that all strategies had equal effectiveness with a sensitivity of 89.9%. When the costs of the different strategies were compared, the annual cost of the semi-automated strategy was the least expensive at $62 per a patient screened. This was followed by the fully automated strategy at $66. The conventional strategy or human grading was the most expensive at $77. Consequently, the authors estimated that Singapore could save $15 million by 2050 if the semi-automated strategy was deployed in the national DR screening program.

The finding that the most cost-saving strategy was the semi-automated model is interesting since this model, which deployed both AI and human graders, would require additional time and workforce of manpower for grading images on top of AI. It was found that these higher costs of manpower for grading images could be offset by the overall lower consultation costs upon referrals in this model. These lower consultation costs resulted from having lower false-positive (FP) cases referred due to the addition of trained human graders who filtered FP cases from AI in the second step of this model. The lower FP cases may be reflected by the higher specificity of the semi-automated strategy when compared with the fully automated strategy (99.6% vs 81.8%). Notably, the sensitivity analysis of this CMA supported that, of all factors affecting costs, the specificity of this model is the most important.

When it came down to a comparison between only the costs of AI and human labor, as in this CMA, it is essential to assess the difference of these costs among countries. Nevertheless, it remains to be seen if similar findings could be replicated in countries other than Singapore where labor costs are cheaper than technology costs.40

Cost-Effectiveness Analysis

Generally, the cost-effectiveness analysis (CEA) method is chosen when we want to compare the costs incurred from disease-specific outcomes (effectiveness) between 2 interventions.18 There is a growing number of this analysis in health care and ophthalmology, but there are only a few studies on AI. Since comparing the ratios of cost-effectiveness between 2 alternatives may not provide the extra cost of the new alternative, the Incremental Cost-Effectiveness Ratio (ICER), which determines the difference in costs between the conventional and the alternative interventions per the difference between their outcomes, is preferable. Broadly, the outcomes being compared in CEA are disease-specific.18 For instance, the costs of different drugs aimed to reduce intraocular pressure, which is a specific outcome for glaucoma, may be compared. This implies that the ability of CEA to compare across a broad range of interventions may be limited41 unless the outcome measured is cost per Quality-Adjusted Life-year (QALY) gained, which is arguably not a disease-specific outcome and is used more commonly in cost-utility analysis.

The study by Scotland et al,24 which was conducted in the UK, may be the first CEA to evaluate AI, in conventional machine learning, for DR screening.24 The authors compared manual versus automated grading of retinal images in 3 levels of image gradings in the DR national screening program, which provided outcomes as (1) recall for re-screening in 6 months, (2) recall in 12 months, or (3) referrals to ophthalmologists. The replacement of manual grading by AI was implemented only in the first level of image gradings. Based on a population of 160,000 diabetic patients in the country, the total cost of manual grading was £201,600 (US $281,514) more expensive than AI, but with 101 more of all appropriate outcomes of the screening and 50 more referrals detected.

When all appropriate outcomes had been considered, the additional cost per additional outcome was £1990 (US $2779). Meanwhile, when only referrals to ophthalmologists had been considered, the additional cost per additional referral was £4088 (US $5709). When the difference between costs and referral outcomes were plot in a cost-effectiveness plane (CEP), it was found that 18% of the plots lied in the quadrant SE (II) where AI was the dominant strategy, and the remaining 82% lied in the quadrant SW (III) where the costs of AI were cheaper but less effective (see Fig. 2).

F2
FIGURE 2:
Cost-effective plane (CEP). The CEP demonstrates that interventions, which fall in either quadrant SE (II) or MW (IV), are considered “dominant” as they are either more effective and less costly (II) or more expensive and less effective (IV). Interventions falling in quadrant II are typically always accepted, while those falling in quadrant IV are typically rejected. In a study by Scotland et al24, 18% of the plots between the difference of the costs over the outcomes lied in the area of point b; the remaining 82% lied in the area of point d. In a study by Tufail et al42, the calculated incremental cost-effectiveness ratios were in quadrant III, and presumably around the area of point d.

In another example of CEA of AI, Tufail et al compared 2 models of machine learning (ML), EyeArt and RetMarker, for DR screening based on screening outcomes and related costs in a cohort of 20,258 diabetic patients in the UK.42 The appropriate outcome for this CEA was the proportion of patients with potentially sight-threatening DR (STDR) that were missed by either EyeArt or RetMarker. Of the 2,844 cases of STDR, 176 cases (6%) and 428 cases (15%) were missed by EyeArt and RetMarker, respectively. However, cases of proliferative retinopathy (PDR) and cases with maculopathy with mild retinopathy were never missed by EyeArt and missed only 1.4% by RetMarker.

The authors also explored 2 possible strategies of deploying these 2 ML models for clinical DR screening, which contained 3 levels of manual gradings. The first strategy involved replacing the first tier of human graders with ML. Meanwhile, the second strategy deployed the ML as a filter on top of the first tier of human graders. The authors found no difference in the effectiveness between both strategies when the same ML was used. However, when either ML was used as a replacement to the first level of manual grading (2 levels of manual gradings left, strategy 1), the cost reduction was more than when either ML was used as a filter (all 3 levels of manual gradings were in place, strategy 2). For RetMarker, the cost reduction per appropriate outcome missed was at $18.69 and $15.36 for strategy 1 and strategy 2, respectively. For EyeArt, the cost reduction per appropriate outcome missed was at $7.14 for strategy 1 and $4.43 for strategy 2, respectively. These results may reflect what was found in the CMA by Xie et al38 that labor costs played significant roles in DR screening although AI was deployed. Figure 3 illustrates the comparison of DR screening workflows between these 2 studies.

F3
FIGURE 3:
Proposed DR screening workflows using artificial intelligence (AI) and manual gradings, a comparison between 2 studies. A, B and C are from Xie et al,39 while D, E, and F are from Tufail et al42 A, D are manual gradings. C is a standalone AI grading. B, E, and F are hybrid AI and manual gradings where AI is situated at different locations in the workflow.

When either ML was compared with manual gradings, the authors found that both ML models were less costly but less effective in both strategies of deployment. The calculated ICERs lied at the quadrant SW (III) of the CEP, similar to the CEA of AI for DR screening by Scotland et al,24 which was mentioned previously. The authors discussed that both MLs should be analyzed in terms of their clinical acceptability for rarely missing cases of PDR and maculopathy with mild retinopathy as well as for their cost-effectiveness. Although there were 2 ML models in the study, the authors did not conduct a CEA between them.

A recent study offered a model for CEA of autonomous deep learning (DL), a recent model of AI, for DR screening of pediatric patients with diabetes in the US.43 This DL was delivered at point of care without additional grading by humans. The study used a Decision Tree to compare the detection of true DR cases between AI screening and eye care professional (ECP) dilated examination. In summary, patients entered the simulated model with the options of having autonomous AI or ECP screening, or not having any screening in either the AI or ECP option. If AI screening was positive or undiagnosable, the patients were referred for ECP. Those who underwent the ECP option could have positive results, which could then turn into true-positive or false-positive results; whereas those with negative ECP results could have turned into true-negative or false-negative results. Those who opted to not have an examination in either AI or ECP arm could have false-negative or true-negative results.

The payoffs in this Decision Tree focused on true-positive results and out-of-pocket costs paid by the patients and families for the examination and treatment of DR in the case of a positive ECP examination. Notably, there was no base cost for AI, but the base cost for ECP was $35. The cost of AI screening was considered as extra ECP visits when it was either positive or undiagnosable. The sensitivity of autonomous AI was 87%, whereas it was between 33% and 34% for the ECP examination. The specificity of autonomous AI was 91%, whereas it was 95% for the ECP examination. The expected true-positive proportions for AI were found at 0.03 for Type 1 diabetes (T1D) and 0.04 for Type 2 diabetes (T2D). Meanwhile, the proportions for the ECP examination were reported at 0.006 (T1D) and 0.01 (T2D).

With the base case scenario of having 20% adherence to DR screening, it was estimated that the use of AI would result in a higher mean patient payment ($8.52 for T1D and $10.85 for T2D) than the ECP screening ($7.91 for T1D and $8.20 for T2D). The ICER values for T1D and T2D were $31 and $95, respectively, for each additional case of DR identified when compared with the ECP screening. However, when the adherence to screening was increased to 23% or greater, AI screening was found to be more cost-saving than the ECP screening.

These 3 models of CEA of AI for DR screening, which were conducted in high-resourced settings, implied that, under certain conditions, the semi-autonomous ML could be cost-saving health care provider perspective. The fully automated DL could also be cost-saving for screening pediatric patients from a patient perspective. Nevertheless, these economic evaluations had a limitation in which patients with diabetes were not analyzed over a lifetime horizon. This type of analysis is usually conducted using cost-utility analysis.

Cost-Utility Analysis

The cost-utility analysis (CUA) method is similar to the CEA method but it contains more advantages, such as an ability to compare interventions across the board on the same league table using the same outcome, which is the cost per QALY gained. It is indicated when: (1) quality of life is an important effectiveness outcome, (2) the intervention affects both morbidity and mortality, (3) the intervention has a broad range of potential effectiveness outcomes but a single general outcome for comparison is preferable, and (4) the intervention has already been compared with other interventions through previous evaluations by CUA.41 Screening for DR is an example of an ophthalmic intervention, which fits well with the indications for using CUA.

In CUA, a hybrid model of Decision Tree and Markov is commonly used for evaluating the costs and consequences of interventions over a lifetime horizon among patients. Decision Tree models can graphically represent the consequences and costs of an intervention when compared to another, accounting for the inherent probabilities of each possible outcome,18 such as the true/false-positive or true/false-negative results of screening by AI that were previously stated under the CEA heading. In the real world, false-positive cases could result in increased unnecessary costs, whereas false-negative cases could lead to increasing costs from delayed recognition and treatment.

A Decision Tree model is a cross-sectional classification of each patient into a “health state”. Meanwhile, a Markov model is a method whereby the analysis of cost-effectiveness is based on cycling patients through health states for a period of time. Patients are cycled through predefined health states that are relevant to the disease being evaluated, whereas costs and effects are accumulated based on which state(s) they occupy and evolve into.41 Patients in health states in a Decision Tree model can be cycled into a Markov model to evaluate the costs and consequences of interventions over their lifetime.

In the CUA of AI for DR screening conducted by Xie et al,44 the authors developed a hybrid Decision Tree / Markov model to simulate the DR progression for a hypothetical cohort of patients with diabetes, and used the prevalence of DR and screening performance of Models A (semi-automated DL and human grading) and B (conventional human grading), reported previously.45 The primary outcome measures were the total QALYs gained per patient and the total cost incurred by the health care system. Results showed that, over a lifetime horizon, a patient with DR would incur a total cost of SGD1177 (US $875) and SGD1312 (US $975) under Models A and B, respectively. In addition, Model A resulted in a lifetime cost-saving of SGD135 (US $100) per patient while maintaining comparable QALYs gained.

However, it was not clear in this study which delivery method of the semi-automated DL and human graders was used. Before this, another CUA from Singapore was conducted to compare DR screening by family physicians at the point of care and trained graders at a reading center via telemedicine.46 This study by Nguyen et al, which was written clearly and in detail, may be used as a model of CUA for comparing the integration of AI with telemedicine and the use of AI at point of care.

The authors developed a hybrid Decision Tree / Markov model47 to estimate the cost, effectiveness, and ICER of telemedicine-based DR screening when compared with a conventional family physician (FP)-based DR screening over a lifetime horizon among patients. The QALY gained was used for the effectiveness measure. Meanwhile, the ICER, which is defined as the difference between the overall costs of each strategy divided by the difference between the total QALYs gained from each strategy, was used as the outcome measure.

The costs in this study were defined clearly as direct medical cost, direct non-medical cost, and indirect cost. Direct medical cost was broken down into 3 components: (1) cost of DR screening, (2) cost of follow-up visits, and (3) costs of laser treatment for severe DR. Direct non-medical costs included transportation costs associated with the visits to primary care clinics and hospitals. Indirect costs included the monetary value of work time lost for attending clinic visits. The cost of DR screening under the FP model included the cost of physicians’ grading of retinal images and overheads (eg, information technology maintenance). On the other hand, the cost of DR screening under telemedicine included the cost of graders, which consisted of training costs, oversight, and quality assurance, and overhead.

Both screening models under societal and health system perspectives similarly gained a total of 13.11 QALYs. The telemedicine model had significantly lower costs on both perspectives. Under this model, the total cost savings were SGD173 (US $129) and SGD144 (US $107) per person for the societal and health system perspectives, respectively. By extrapolating these data to approximately 170,000 patients with diabetes in Singapore's national DR screening programs, the cost-savings associated with the telemedicine model was estimated to be SGD29.4 million (US $21.9 million) over a lifetime horizon. It should be reminded that this was concluded based on human grading, evidence is required to show that AI deployed at the point of care or via telemedicine is more cost-effective.

Cost-Benefit Analysis

The cost-benefit analysis (CBA) method is considered the most comprehensive method for economic evaluation. It is grounded in the traditional welfare economics theory.41 In this method, the consequences of an intervention are valued in monetary units.48 Since outcomes are reported in monetary units, it is the best method to inform allocation decisions. However, this type of economic evaluation was rarely conducted in the context of health care since assigning monetary values to clinical outcomes (eg, QALY or blindness prevented) has been difficult.38

POTENTIAL FUTURE DIRECTIONS AND CHALLENGES

The result of economic evaluation is determined by the cost and benefit for the intervention of interest, which could largely vary across different populations. Thus, we encourage more studies to be conducted and published from different areas of the world, especially in the Asia-Pacific region, where many low-, low-to-middle, and middle-income countries are residing. Although there have been implications that AI could be potentially cost-effective when deployed in these countries, adequate evidence is required to support the decisions of health care providers and policymakers toward the efficient cost or budget allocation.

In view of the different diseases in ophthalmology, DR may be the disease with the highest number of economic evaluations, albeit there was only a handful. Therefore, more CUAs using a Markov model, which accounts for changes in the DR severity levels of patients in their lifetime, are required. Diabetes is a lifelong disease; thus, data on a lifetime horizon would provide clearer evidence on the costs and consequences of AI applications.

Another important model of AI for DR management that may warrant further economic evaluations includes the model for predicting the progression of DR severity to prioritize patients for the next screening schedules.49 This AI model may be useful for the world after the COVID-19 pandemic. An economic evaluation model analyzing the diabetic macular edema treatment using anti-vascular endothelial growth factors as compared with AI applications for better detection of diabetic macular edema50 is another important research area.

Screening for AMD using AI analysis on retinal photography may be the next study in line justifying the need for economic evaluations. A study in South Korea found that a systematic screening of AMD using retinal photography was cost-effective at an ICER of W3,310,448 (US $2911) per QALY gained in comparison to no screening, under the willingness-to-pay threshold of W30,000,000 (US $27,538) per QALY gained.51 However, challenges on economic evaluation of AI for screening for AMD may be attributed to the fact that most AI models for AMD were developed from the data and images from previous clinical trials, such as the Age-Related Eye Disease Study (AREDS),52 and have not yet been validated in real-world populations.

Further economic evaluations may also be warranted for other AI models that have already been developed in the Asia-Pacific region53 for the screening of other common ocular diseases, such as screening for glaucomatous optic neuropathy in retinal photography among high-risk patients54 and screening for cataract in either retinal photography55 or slitlamp photography56 in targeted populations. This is because retinal photography and slitlamp photography are ubiquitous, practical, and have not been considered expensive devices. Moreover, cataract, AMD, and glaucoma are the leading causes of visual loss worldwide and in the Asia-Pacific region.57 These ophthalmic diseases also pose an enormous burden on the regional and global economies.

CONCLUSIONS

Our review emphasizes the real need for economic evaluation studies of AI in ophthalmology. The costs of AI itself pose economic challenges and barriers to its general adoption in health care. It is imperative to study the costs of AI in detail, particularly the estimation of the price point of AI at which there is a balance between the manufacturers’ price and the reimbursements. This would be essential when a charge per patient model of pricing is implemented.

Although there may be some economic evidence showing that AI for DR screening is cost-effective, most of which were conducted in high-income countries. Thus, replicating and generalizing the results among low- and middle-income countries may be doubtful. Although there may have been enough evidence on using telemedicine in screening DR as a cost-saving option, it is not known whether integrating AI into the existing telemedicine system would still be cost-saving as compared with the point-of-care delivery.

There should also be more studies on CUA of AI for DR screening with outcome measures over a lifetime horizon of patients, especially among countries with different resources. The next frontiers for the economic evaluation of AI in ophthalmology may still be focused on screening diseases using retinal photography, such as AMD and glaucoma, which are the leading causes of vision loss worldwide.

REFERENCES

1. World Health Organization. Global spending on health: weathering the storm. World Health Organization; 2020.
2. Organisation for Economic Cooperation and Development (OECD). OECD Health Statistics 2020. https://www.oecd.org/health/health-data.htm
3. Lorenzoni L, Marino A, Morgan D, James C. Health spending projections to 2030: new results based on revised OECD methodology. OECD Health Working Paper 110. May 2019. https://doi.org/10.1787/5667f23d-en
4. The Global Economy. Global economy, world economy. Accessed February 9, 2021. https://www.theglobaleconomy.com/
5. Organisation for Economic Cooperation and Development (OECD). OECD Statistics 2021. Accessed February 9, 2021. https://stats.oecd.org/
6. Dudine P, Hellwig KP, Jahan S. A framework for estimating health spending in response to COVID-19. International Monetary Fund working paper no. 20/145. July 2020. Accessed February 11, 2021. https://www.imf.org/en/Publications/WP/Issues/2020/07/24/A-Framework-for-Estimating-Health-Spending-in-Response-to-COVID-19-49550
7. He J, Baxter SL, Xu J, et al. The practical implementation of artificial intelligence technologies in medicine. Nat Med 2019; 25:30–36.
8. Singh RP, Hom GL, Abramoff MD, et al. Current challenges and barriers to real-world artificial intelligence adoption for the healthcare system, provider, and the patient. Transl Vis Sci Technol 2020; 9:1–6.
9. Wolff J, Pauling J, Keck A, Baumbach J. The economic impact of artificial intelligence in health care: systematic review. J Med Internet Res 2020; 22:1–8.
10. Guo Y, Hao Z, Zhao S, et al. Artificial intelligence in health care: bibliometric analysis. J Med Internet Res 2020; 22:1–12.
11. Haycox A, Walley T. Pharmacoeconomics: evaluating the evaluators. Br J Clin Pharmacol 1997; 43:451–456.
12. Sanyal C, Stolee P, Juzwishin D, Husereau D. Economic evaluations of eHealth technologies: a systematic review. PLoS One 2018; 13:1–11.
13. Lee HK, Jin R, Feng Y, et al. An analytical framework for TJR readmission prediction and cost-effective intervention. IEEE J Biomed Heal Informatics 2019; 23:1760–1772.
14. Golas SB, Shibahara T, Agboola S, et al. A machine learning model to predict the risk of 30-day readmissions in patients with heart failure: a retrospective analysis of electronic medical records data. BMC Med Inform Decis Mak 2018; 18:1–17.
15. Atik A, Barton K, Azuara-Blanco A, Kerr NM. Health economic evaluation in ophthalmology. Br J Ophthalmol 2020; 1–6.
16. Williams A. Health economics: the cheerful face of the dismal science? In: Williams A, ed. Health and Economics. London, UK: Palgrave Macmillan; 1987:1–11.
17. Brown MM, Brown GC, Sharma S, Landy J. Health care economic analyses and value-based medicine. Surv Ophthalmol 2003; 48:204–223.
18. Kuper H, Jofre-Bonet M, Gilbert C. Economic evaluation for ophthalmologists. Ophthalmic Epidemiol 2006; 13:393–401.
19. US Food & Drug Association. Artificial intelligence and machine learning in software as a medical device. US Food & Drug Association; 2021. Accessed January 2021. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device
20. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016; 316:2402–2410.
21. De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med 2018; 24:1342–1350.
22. Wu X, Liu L, Zhao L, et al. Application of artificial intelligence in anterior segment ophthalmic diseases: diversity and standardization. Ann Transl Med 2020; 8:714.
23. Westerheide F. The artificial intelligence industry and global challenges. Forbes. November 27, 2019. Accessed February 9, 2021. https://www.forbes.com/sites/cognitiveworld/2019/11/27/the-artificial-intelligence-industry-and-global-challenges/?sh=565e73313deb
24. Scotland GS, McNamee P, Philip S, et al. Cost-effectiveness of implementing automated grading within the national screening programme for diabetic retinopathy in Scotland. Br J Ophthalmol 2007; 91:1518–1523.
25. Abràmoff MD, Lavin PT, Birch M, et al. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. npj Digit Med 2018; 1:1–8.
26. Benjamens S, Dhunnoo P, Meskó B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. npj Digit Med 2020; 3:1–8.
27. Chen EM, Chen D, Chilakamarri P, et al. Economic challenges of artificial intelligence adoption for diabetic retinopathy. Ophthalmology 2020; 128:475–477.
28. American Academy of Ophthalmology. Autonomous diabetic retinopathy screening system gains FDA approval. American Academy of Ophthalmology; August 6, 2020. Accessed December 3, 2020. https://www.aao.org/headline/autonomous-diabeticretinopathy-screening-system-g
29. Kirzner IM. The law of supply and demand. Foundation for Economic Education; January 1, 2000. Accessed February 11, 2021. https://fee.org/articles/the-law-of-supply-and-demand/
30. Burlina PM, Joshi N, Pekala M, et al. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol 2017; 135:1170–1176.
31. Hagiwara Y, Koh JEW, Tan JH, et al. Computer-aided diagnosis of glaucoma using fundus images: a review. Comput Methods Programs Biomed 2018; 165:1–12.
32. Zhang H, He Z. Automatic cataract grading methods based on deep learning. Comput Methods Programs Biomed 2019; 182:104978.
33. Smadja D, Touboul D, Cohen A, et al. Detection of subclinical keratoconus using an automated decision tree classification. Am J Ophthalmol 2013; 156:237–246.
34. Yehezkel O, Belkin M, Wygnanski-Jaffe T. Automated diagnosis and measurement of strabismus in children. Am J Ophthalmol 2020; 213:226–234.
35. Chakravarthy U, Goldenberg D, Young G, et al. Automated identification of lesion activity in neovascular age-related macular degeneration. Ophthalmology 2016; 123:1731–1736.
36. Yim J, Chopra R, Spitz T, et al. Predicting conversion to wet age-related macular degeneration using deep learning. Nat Med 2020; 26:892–899.
37. Yousefi S, Goldbaum MH, Balasubramanian M, et al. Glaucoma progression detection using structural retinal nerve fiber layer measurements and functional visual field points. IEEE Trans Biomed Eng 2014; 61:1143–1154.
38. Xie Y, Gunasekeran DV, Balaskas K, et al. Health economic and safety considerations for artificial intelligence applications in diabetic retinopathy screening. Transl Vis Sci Technol 2020; 9:22.
39. Xie Y, Nguyen QD, Hamzah H, et al. Artificial intelligence for teleophthalmology-based diabetic retinopathy screening in a national programme: an economic analysis modelling study. Lancet Digit Heal 2020; 2:e240–e249.
40. Dismuke C. Progress in examining cost-effectiveness of AI in diabetic retinopathy screening. Lancet Digit Heal 2020; 2:e212–e213.
41. Rudmik L, Drummond M. Health economic evaluation: important principles and methodology. Laryngoscope 2013; 123:1341–1347.
42. Tufail A, Rudisill C, Egan C, et al. Automated diabetic retinopathy image assessment software: diagnostic accuracy and cost-effectiveness compared with human graders. Ophthalmology 2017; 124:343–351.
43. Wolf RM, Channa R, Abramoff MD, Lehmann HP. Cost-effectiveness of autonomous point-of-care diabetic retinopathy screening for pediatric patients with diabetes. JAMA Ophthalmol 2020; 138:1063–1069.
44. Xie Y, Nguyen Q, Bellemo V, et al. Cost-effectiveness analysis of an artificial intelligence-assisted deep learning system implemented in the national tele-medicine diabetic retinopathy screening in singapore. Investig Ophthalmol Vis Sci 2019; 60:5471.
45. Ting DSW, Cheung CYL, Lim G, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 2017; 318:2211–2223.
46. Nguyen HV, Siew Wei Tan G, Jennifer Tapp R, et al. Cost-effectiveness of a national telemedicine diabetic retinopathy screening program in singapore. Ophthalmology 2016; 123:2571–2580.
47. Griebsch I. Economic evaluation in health care: merging theory with practise. Int J Epidemiol 2002; 31:877–878.
48. Frick KD, Foster A, Faal H. Analysis of costs and benefits of the Gambian Eye Care Program. Arch Ophthalmol 2005; 123:239–243.
49. Bora A, Balasubramanian S, Babenko B, et al. Predicting the risk of developing diabetic retinopathy using deep learning. Lancet Digit Heal 2020; 3:e10–e19.
50. Varadarajan AV, Bavishi P, Ruamviboonsuk P, et al. Predicting optical coherence tomography-derived diabetic macular edema grades from fundus photographs using deep learning. Nat Commun 2020; 11:1–8.
51. Ho R, Song LD, Choi JA, Jee D. The cost-effectiveness of systematic screening for age-related macular degeneration in South Korea. PLoS One 2018; 13:1–14.
52. Burlina PM, Joshi N, Pacheco KD, et al. Use of deep learning for detailed severity characterization and estimation of 5-year risk among patients with age-related macular degeneration. JAMA Ophthalmol 2018; 136:1359–1366.
53. Ruamviboonsuk P, Cheung CY, Zhang X, et al. Artificial intelligence in ophthalmology: evolutions in Asia. Asia Pac J Ophthalmol (Phila) 2020; 9:78–84.
54. Li Z, He Y, Keel S, et al. Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs. Ophthalmology 2018; 125:1199–1206.
55. Xu X, Zhang L, Li J, et al. A hybrid global-local representation CNN model for automatic cataract grading. IEEE J Biomed Heal Informatics 2020; 24:556–567.
56. Wu X, Huang Y, Liu Z, et al. Universal artificial intelligence platform for collaborative management of cataracts. Br J Ophthalmol 2019; 103:1553–1560.
57. Flaxman SR, Bourne RRA, Resnikoff S, et al. Global causes of blindness and distance vision impairment 1990-2020: a systematic review and meta-analysis. Lancet Glob Heal 2017; 5:e1221–e1234.
Keywords:

AI in ophthalmology; artificial intelligence; economic evaluation; health economics; telemedicine

Copyright © 2021 Asia-Pacific Academy of Ophthalmology. Published by Wolters Kluwer Health, Inc. on behalf of the Asia-Pacific Academy of Ophthalmology.