Artificial Intelligence–enabled Decision Support in Surgery: State-of-the-art and Future Directions : Annals of Surgery

Journal Logo

Review Paper

Artificial Intelligence–enabled Decision Support in Surgery

State-of-the-art and Future Directions

Loftus, Tyler J. MD*,†; Altieri, Maria S. MD, MS†,‡; Balch, Jeremy A. MD*,†; Abbott, Kenneth L. MD, MS*,†; Choi, Jeff MD, MSc†,§; Marwaha, Jayson S. MD, MSc†,∥,¶; Hashimoto, Daniel A. MD, MS#,†,**; Brat, Gabriel A. MD, MPH∥,†,¶; Raftopoulos, Yannis MD, PhD†,††; Evans, Heather L. MD†,‡‡; Jackson, Gretchen P. MD, PhD†,§§; Walsh, Danielle S. MD∥∥,†; Tignanelli, Christopher J. MD, MS†,¶¶,##,***

Author Information
Annals of Surgery 278(1):p 51-58, July 2023. | DOI: 10.1097/SLA.0000000000005853



To summarize state-of-the-art artificial intelligence–enabled decision support in surgery and to quantify deficiencies in scientific rigor and reporting.


To positively affect surgical care, decision-support models must exceed current reporting guideline requirements by performing external and real-time validation, enrolling adequate sample sizes, reporting model precision, assessing performance across vulnerable populations, and achieving clinical implementation; the degree to which published models meet these criteria is unknown.


Embase, PubMed, and MEDLINE databases were searched from their inception to September 21, 2022 for articles describing artificial intelligence–enabled decision support in surgery that uses preoperative or intraoperative data elements to predict complications within 90 days of surgery. Scientific rigor and reporting criteria were assessed and reported according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews guidelines.


Sample size ranged from 163–2,882,526, with 8/36 articles (22.2%) featuring sample sizes of less than 2000; 7 of these 8 articles (87.5%) had below-average (<0.83) area under the receiver operating characteristic or accuracy. Overall, 29 articles (80.6%) performed internal validation only, 5 (13.8%) performed external validation, and 2 (5.6%) performed real-time validation. Twenty-three articles (63.9%) reported precision. No articles reported performance across sociodemographic categories. Thirteen articles (36.1%) presented a framework that could be used for clinical implementation; none assessed clinical implementation efficacy.


Artificial intelligence–enabled decision support in surgery is limited by reliance on internal validation, small sample sizes that risk overfitting and sacrifice predictive performance, and failure to report confidence intervals, precision, equity analyses, and clinical implementation. Researchers should strive to improve scientific quality.

Copyright © 2023 Wolters Kluwer Health, Inc. All rights reserved.

You can read the full text of this article if you:

Access through Ovid