American College of Surgeons NSQIP Risk Calculator Accuracy Using a Machine Learning Algorithm Compared with Regression : Journal of the American College of Surgeons

Journal Logo

Original Scientific Articles

American College of Surgeons NSQIP Risk Calculator Accuracy Using a Machine Learning Algorithm Compared with Regression

Liu, Yaoming PhD; Ko, Clifford Y MD, MS, MSHS, FACS; Hall, Bruce L MD, PhD, MBA, FACS; Cohen, Mark E PhD

Author Information
Journal of the American College of Surgeons 236(5):p 1024-1030, May 2023. | DOI: 10.1097/XCS.0000000000000556
  • Buy
  • Infographic


The American College of Surgeons NSQIP risk calculator (RC) uses regression to make predictions for fourteen 30-day surgical outcomes. While this approach provides accurate (discrimination and calibration) risk estimates, they might be improved by machine learning (ML). To investigate this possibility, accuracy for regression-based risk estimates were compared to estimates from an extreme gradient boosting (XGB)-ML algorithm.


A cohort of 5,020,713 million NSQIP patient records was randomly divided into 80% for model construction and 20% for validation. Risk predictions using regression and XGB-ML were made for 13 RC binary 30-day surgical complications and one continuous outcome (length of stay [LOS]). For the binary outcomes, discrimination was evaluated using the area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC), and calibration was evaluated using Hosmer–Lemeshow statistics. Mean squared error and a calibration curve analog were evaluated for the continuous LOS outcome.


For every binary outcome, discrimination (AUROC and AUPRC) was slightly greater for XGB-ML than for regression (mean [across the outcomes] AUROC was 0.8299 vs 0.8251, and mean AUPRC was 0.1558 vs 0.1476, for XGB-ML and regression, respectively). For each outcome, miscalibration was greater (larger Hosmer–Lemeshow values) with regression; there was statistically significant miscalibration for all regression-based estimates, but only for 4 of 13 when XGB-ML was used. For LOS, mean squared error was lower for XGB-ML.


XGB-ML provided more accurate risk estimates than regression in terms of discrimination and calibration. Differences in calibration between regression and XGB-ML were of substantial magnitude and support transitioning the RC to XGB-ML.


© 2023 by the American College of Surgeons. Published by Wolters Kluwer Health, Inc. All rights reserved.

Full Text Access for Subscribers:

You can read the full text of this article if you:

Access through Ovid