A comparison of the National Surgical Quality Improvement Program and the Society of Thoracic Surgery Cardiac Surgery preoperative risk models: a cohort study

Background: Cardiac surgery prediction models and outcomes from the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) have not been reported. The authors sought to develop preoperative prediction models and estimates of postoperative outcomes for cardiac surgery using the ACS-NSQIP and compare these to the Society of Thoracic Surgeons Adult Cardiac Surgery Database (STS-ACSD). Methods: In a retrospective analysis of the ACS-NSQIP data (2007–2018), cardiac operations were identified using cardiac surgeon primary specialty and sorted into cohorts of coronary artery bypass grafting (CABG) only, valve surgery only, and valve+CABG operations using CPT codes. Prediction models were created using backward selection of the 28 non-laboratory preoperative variables in ACS-NSQIP. Rates of nine postoperative outcomes and performance statistics of these models were compared to published STS 2018 data. Results: Of 28 912 cardiac surgery patients, 18 139 (62.8%) were CABG only, 7872 (27.2%) were valve only, and 2901 (10.0%) were valve+CABG. Most outcome rates were similar between the ACS-NSQIP and STS-ACSD, except for lower rates of prolonged ventilation and composite morbidity and higher reoperation rates in ACS-NSQIP (all P<0.0001). For all 27 comparisons (9 outcomes × 3 operation groups), the c-indices for the ACS-NSQIP models were lower by an average of ~0.05 than the reported STS models. Conclusions: The ACS-NSQIP preoperative risk models for cardiac surgery were almost as accurate as the STS-ACSD models. Slight differences in c-indexes could be due to more predictor variables in STS-ACSD models or the use of more disease- and operation-specific risk variables in the STS-ACSD models.


Introduction
Programs for comparing risk-adjusted surgical outcomes are important for quality improvement (QI) efforts. Two early surgical QI programs using risk-adjusted outcomes were the Veterans Affairs Continuous Improvement in Cardiac Surgery Program [1,2] and the Society of Thoracic Surgeons Adult Cardiac Surgery Database (STS-ACSD) [3] . The STS-ACSD includes more than 7.5 million cardiac operations and has served as the gold standard for risk-adjusted postoperative outcomes for cardiac surgery since the original development of its risk prediction models [4][5][6][7][8] . The STS short-term postoperative risk calculator requires manual input of more than 50 variables to predict patient outcomes, and only uses data in their cardiac models from institutions within the United States and Canada. Predictor variables include patient demographics, medical comorbidities, recent procedures, and laboratory values [9,10] . The STS-ACSD mainly uses disease-and operation-specific preoperative variables to predict postoperative outcomes and calculate risk-adjusted outcomes. Institutions benefit from the STS-ACSD because it allows them to evaluate their specific risk-adjusted rates of postoperative outcomes compared to other participating institutions. If an institution finds that it is a low performer in a certain postoperative complication (e.g. mediastinitis/deep sternal wound infection), then more resources can be allocated and efforts be focused toward improving this outcome (e.g. use of negative pressure wound vacs, changes to perioperative antibiotic prophylaxis, etc.).
Based on the success of cardiac surgery QI programs, similar programs were developed for other surgical specialties. The Department of Veterans Affairs started the National Surgical Quality Improvement Program (VA-NSQIP) [11] , which was subsequently adopted for civilian operations by the American College of Surgeons (ACS) [12] . The ACS-NSQIP originally encompassed nine non-cardiac surgical specialties. Therefore, by necessity, it collected mainly generic preoperative variables and postoperative outcomes that would be applicable in a broad surgical population for the prediction of postoperative outcomes and risk-adjustment. In 2007, the ACS-NSQIP also started collecting data on cardiac operations. This provides the opportunity to compare outcomes of cardiac surgery between the STS-ACSD and ACS-NSQIP, and also to compare risk models using mainly disease-specific and operation-specific predictor variables vs. more generic predictor variables. While the ACS-NSQIP has long been considered the gold standard for surgical reporting and QI efforts in non-cardiac operations [13] , the cardiac surgery data within the database have never been comprehensively analyzed. We found only one study in the literature that analyzed the postoperative outcomes of cardiac surgery in the ACS-NSQIP and compared them to outcomes in the STS-ACSD [14] . However, this was a single institution study and did not attempt to generate preoperative prediction models. If models using more generic variables in a more widely available database were found to be similarly predictive of postoperative outcomes, more institutions would have access to these data and have the ability to perform quality improvement, especially internationally. For example, an institution participating in the cardiac ACS-NSQIP could identify their cases in the ACS-NSQIP database, and compare their riskadjusted outcomes to those of the other institutions participating in the cardiac ACS-NSQIP.
The purpose of this study was to estimate outcome rates and develop preoperative prediction models for postoperative cardiac surgery outcomes in the ACS-NSQIP, comparing these to the gold standard STS-ACSD. We hypothesized that cardiac surgical outcomes in the ACS-NSQIP would be similar to those in the STS-ACSD, and that the preoperative prediction models created using the generic variables of the ACS-NSQIP participant use file (PUF) would achieve similar performance characteristics as those generated using the disease-specific and operation-specific variables of the STS-ACSD. This would be important for the more than 40 institutions outside of North America who participate in NSQIP because they could use these data to perform quality improvement efforts in cardiac surgery.

Study design and patients
This was a retrospective analysis of the prospectively collected ACS-NSQIP cardiac data, 2007-2018. This work has been reported in line with the Strengthening the Reporting of Cohort Studies in Surgery (STROCSS) criteria [15] , Supplemental Digital Content 1, http://links.lww.com/JS9/A510. Because the ACS-NSQIP data are deidentified and publicly available, the study was deemed exempt from review by the Colorado Multiple Institutional Review Board.
The ACS-NSQIP data are collected from over 700 hospitals throughout the US and internationally, including academic referral centres, private-based hospitals, and hospitals in both urban and rural settings. Data are collected by trained ACS clinical nurses and are audited. The STS-ACSD uses these categories of cardiac surgery in their statistical analyses and reporting: coronary artery bypass grafting (CABG) only; valve surgery only [including aortic valve replacement (AVR) only, mitral valve replacement (MVR) only, mitral valve repair (MVr) only, MVR/MVr, and AVR and MVR/MVr]; and valve plus CABG (valve + CABG) surgery (including AVR and CABG, MVR and CABG, and MVr and CABG); and other cardiac operations [10] . We identified the CPT codes for each of these groups and generated similar patient cohorts using the CPT codes in the ACS-NSQIP. We began by including all patients who underwent operations performed by surgeons whose specialty was designated as cardiac surgery. Patients who had incomplete data, who did not have operations with CPT codes for surgical procedures on the heart and pericardium (CPT codes 33016-33999), or who had cardiac operations other than CABG only, valve repair or replacement only, or valve + CABG were excluded. We classified the cardiac surgery cases into the different operation types as follows: CABG only (CPT codes 33508, 33510-33536, 35500, 35572, 35600); AVR only (33405-33412); MVR only (33430); and MVr (33418-33427).
The ACS-NSQIP collects 28 non-laboratory preoperative predictor variables that are appropriate for a broad surgical population (e.g. demographic variables; general comorbidities such as diabetes, hypertension, history of congestive heart failure, etc.; functional health status; American Society of Anesthesiology physical status classification (ASA class); emergency operation; inpatient/outpatient setting, etc.). The ACS-NSQIP preoperative laboratory variables were not used because we previously found that they did not add significant prediction beyond the nonlaboratory preoperative variables, and they were often missing not at random [16] .
We used the ACS-NSQIP to generate rates of the following nine postoperative outcomes reported by the STS-ACSD: inhospital mortality (which also includes any 30-day mortality for HIGHLIGHTS • We used the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) to develop cardiac surgery preoperative risk models. • The models generated achieved almost the save predictive capabilities as the Society of Thoracic Surgeons Adult Cardiac Surgery Database (STS-ACSD). • These models would be beneficial for institutions participating in the ACS-NSQIP who would like to perform quality improvement for cardiac surgery. • Data Statement • The analysis from this study used deidentified and publicly available data from the American College of Surgeons Participant Use File. Therefore, the dataset will not be provided with the manuscript submission.
patients discharged prior to 30 days); stroke; renal complication; prolonged ventilation ( > 48 h); unplanned reoperation; composite morbidity and mortality (defined as the occurrence of any of the previous complications); prolonged postoperative length of stay (PLOS > 14 days); short PLOS (PLOS < 6 days and patient alive at discharge); and mediastinitis/deep sternal wound infection (DSWI) [10] . The STS-ACSD and ACS-NSQIP track all outcomes up to 30 days postoperatively.

Statistical analysis
The ACS-NSQIP patient non-laboratory preoperative characteristics were compared between the CABG only, valve only, and valve + CABG groups by χ 2 test for the categorical variables and analysis of variance for the continuous variables. Unadjusted rates of the nine postoperative outcomes were compared between the STS-ACSD and the ACS-NSQIP PUF using χ 2 tests. Riskadjusted rates could not be compared between the two databases because we did not have access to the STS-ACSD data. To create preoperative risk models for the nine postoperative outcomes from the ACS-NSQIP, we used backward stepwise logistic regression analysis with each postoperative outcome (yes/no) as the independent variable and the preoperative non-laboratory ACS-NSQIP variables as the dependent variables with an exit criterion of P value greater than 0.05. The resulting parsimonious models were then compared in a one-step method to approximate estimation of leave one out cross validation to assess the generalizability of the resulting models. We used the resulting predicted probabilities and cross validation predicted estimates to calculate the area under the receiver operating characteristic curves and 95% CI using the Delong method. We calculated Brier scores and constructed Hosmer-Lemeshow (H&L) graphs of observed to expected values for evaluation of calibration. The rates of the nine outcomes and the c-indexes of the models for the STS-ACSD were from the 2018 STS-ACSD publication [10] . Twosided P values less than or equal to 0.05 were considered statistically significant. All statistical analyses were performed using SAS version 9.4 (SAS Inc).

Results
There  [10] . Table 1 shows the preoperative characteristics of the included patient cohort. Patients undergoing cardiac surgery were mostly white, older than 60 years of age, had one or more medical comorbidities, and were ASA class III or IV. Patients undergoing CABG only, compared to valve only or valve + CABG, tended to have a higher percentage of males, a history of diabetes, smoking, and bleeding disorders, and were more likely to be transferred from an acute care hospital. Patients undergoing valve only tended to have a higher percentage of females, with dyspnoea, hypertension, and having a more complex operation at a higher work RVU. Patients undergoing valve + CABG tended to be older, and more often with chronic obstructive pulmonary disease, congestive heart failure, and classified as ASA class IV. All patient characteristics had statistically significant differences between the three operation groups, except for the preoperative comorbidities of acute renal failure, on dialysis, and having disseminated cancer, which were relatively equal across the operation groups. Table 2 shows the comparison of unadjusted postoperative outcome rates between the ACS-NSQIP and the STS-ACSD. Most outcome rates, even when statistically different, were clinically similar between the two databases, except for lower rates in the ACS-NSQIP for prolonged ventilation (CABG only: 5.04% ACS-NSQIP vs. 9.33% STS-ACSD; valve only: 6.61% vs. 11.06%; valve + CABG: 11.62% vs. 18 Table 3 shows the performance statistics of 27 models (nine outcomes for each of the three operation groups) in the training and testing datasets using the ACS-NSQIP. In general, the best models (highest c-index and lowest Brier score) were in the CABG cohort (average of c-indexes in test set = 0.684, average Brier score = 0.060), followed by the valve cohort (0.678, 0.068), followed by valve + CABG (0.654, 0.083). Table 4 compares the testing dataset c-indexes of the 27 preoperative prediction models calculated from the ACS-NSQIP and STS-ACSD databases. All of the ACS-NSQIP models had lower c-indexes than the STS-ACSD models, except for DSWI in the valve + CABG model. The difference in mean c-index in STS-ACSD and ACS-NSQIP models for the CABG cohort was 0.053, for the valve cohort was 0.047, and for the valve + CABG cohort was 0.048, indicating slightly lower discrimination of the ACS-NSQIP models compared to the STS-ACSD models. The STS-ACSD c-index was within the 95% CI of the c-index for the ACS-NSQIP for two of nine CABG models (reoperation and DSWI), two of nine valve models (stroke and renal failure), and five of nine valve + CABG models (stroke, renal failure, prolonged ventilation, reoperation, DSWI). Figure 1 shows the H&L calibration plots computed from the ACS-NSQIP data for mortality for each of the three cohorts, and supplemental Figure 1

Discussion
We successfully used the ACS-NSQIP cardiac surgery data to develop preoperative prediction models and estimate unadjusted incidence rates for the nine postoperative outcomes within the three cardiac surgery patient cohorts used by the STS-ACSD. We believe that this is the first comprehensive analysis of the cardiac surgery data in the ACS-NSQIP and the first comprehensive comparison of these data to the data in the STS-ACSD. While the patient populations outside of North America are likely different in demographic factors, medical comorbidities, and frequency of cardiac operations performed, these models could be beneficial for institutions outside of North America who participate in the ACS-NSQIP but not the STS-ACSD hoping to perform quality improvement activities in cardiac surgery. We found that the distributions of CABG only, valve only, and valve + CABG operations were similar between the ACS-NSQIP and STS-ACSD databases. The unadjusted rates of the nine postoperative outcomes were clinically similar between the two databases, except for lower rates of prolonged ventilation and composite mortality/major morbidity, and higher rates of reoperation, in the ACS-NSQIP. The prolonged ventilation differences were likely due to differences in definition of "prolonged," which the ACS-NSQIP defines as more than 48 h while the STS-ACSD defines as more than 24 h. The higher rates of reoperation in the ACS-NSQIP were likely because ACS-NSQIP counts any reoperation in the 30 days after the patient's cardiac operation while the STS-ACSD only considers cardiac reoperations. We were unable to compare risk-adjusted outcome rates between the two databases, because we did not have access to the STS-ACSD data. Finally, we found that the ACS-NSQIP preoperative prediction models performed almost as well as the STS-ACSD models but were slightly less predictive. This was probably due to the STS-ACSD models having larger sample sizes, more predictor variables, and predictor variables that were disease-specific rather than generic. The H&L plots for the ACS-NSQIP models reflected good calibration for the CABG only and valve only cohorts but weaker calibration in the valve + CABG cohort. This study shows that generic surgical variables can be effectively used to create preoperative predictions models for complications in cardiac surgery. Both of these models are freely available and can be used globally to calculate operative risk of patients undergoing these operations (https://riskcalcu lator.facs.org; stswebriskcalc",0,0,2$10#>https://riskcalc.sts. org > stswebriskcalc).
The STS-ASCD remains the gold standard cardiac surgical outcomes database. Since implementation in 1989, regular audit shows greater than 95% concordance of the data inputted to the STS with source data [3] . It has been used to develop the online STS Short-Term Risk Calculator for estimating risk of postoperative complications after common cardiac operations [9,10,17] , which aids in informed decision making. There have been other attempts to develop preoperative predictor models for cardiac surgery patients besides the STS-ASCD. A review paper by Prins et al. [18] . details 19 historical cardiac surgery models including the EuroSCORE [19,20] and the Parsonnet score [21] . The EuroSCORE was developed from only European patients and only predicts postoperative mortality, which may not be generalizable to other patients and is less useful than the ability to predict other outcomes. The Parsonnet score has fallen out of favour since the early 2000s due to diminished predictive accuracy [22] .
The information obtained in this study could be useful for institutions who participate in ACS-NSQIP and do not have access to the STS-ACSD. While the STS-ACSD has achieved 95% institutional penetrance in the United States among Centers for Medicare and Medicaid Services institutions, this leaves more than 50 institutions without access to the STS-ACSD in the USA alone [23] . Department of Defense (DoD) hospitals, which participate in the ACS-NSQIP, might also benefit from non-STS cardiac surgery data. Additionally, the ACS-NSQIP has at least 40 participating institutions in Europe and Asia. Since less than ten institutions outside of North America participate in the STS-ACSD, these models using the ACS-NSQIP cardiac surgery data might benefit more international institutions for QI purposes.
For hospitals that participate in both the ACS-NSQIP and STS-ACSD, use of the STS-ACSD for QI efforts in cardiac surgery is preferred for several reasons: (1) The STS-ACSD obtains data from all cardiac operations at contributing institutions, while the ACS-NSQIP only samples 10-15% of operations; (2) since the ACS-NSQIP deidentifies their national  The STS-ACSD preoperative predictor models had higher c-indexes than the ACS-NSQIP models for potentially several  reasons. First, the sample sizes for the prediction models were much larger in the STS-ACSD vs. the ACS-NSQIP. The total number of potential predictor variables for inclusion in the backward selection model was significantly higher in the STS-ACSD compared to the ACS-NSQIP (65 covariates vs. 28) [10] . Third, while prior literature has shown preoperative laboratory values are not important and often missing for predicting postoperative outcomes in non-cardiac surgery [16] , this is untested for cardiac operations. The ACS-NSQIP cardiac models may have been improved with the addition of laboratory variables as potential predictors in the backward selection model. And fourth, the candidate predictor variables included in the STS-ACSD models were more disease-specific and operation-specific for patients undergoing cardiac surgery than the ACS-NSQIP predictors. For example, while the ACS-NSQIP PUF includes "congestive heart failure within 30 days [16] , " a covariate specifically related to cardiac function, there are a considerable number of variables specific to heart function and health in the STS-ACSD models (e.g. ejection fraction, preoperative intraaortic balloon pump, presence of left main coronary artery disease, endocarditis, cardiac arrythmia and type, heart failure class and timing, among others). While the number and specificity of these cardiac predictor variables make the STS-ACSD risk calculator more burdensome to use, these more specific predictors likely contribute to its models' superior performance. Strengths of the study include: (1) to the authors' knowledge, this was the first attempt to perform a comprehensive analysis of the cardiac operations in the ACS-NSQIP to analyze outcome rates and generate preoperative predictor models for postoperative cardiac surgery outcomes; (2) we analyzed cohorts that were similar to the three cohorts analyzed by the STS, which made for more accurate comparison between the two datasets and their outcomes and predictor models. Limitations of this study include: (1) a small sample size used to develop the ACS-NSQIP models; (2) the ACS-NSQIP data are completely deidentified, so we have no knowledge about which hospitals contributed data to the cardiac dataset; (3) we did not compare risk-adjusted outcomes between the two databases, because we did not have access to the STS-ACSD data; (4) we did not compare the postoperative outcomes of patients who underwent aortic operations, despite the fact that aortic operations make up a significant portion of cardiac surgery operations in the STS-ACSD; this could be an area for future research; (5) patient characteristics from the development cohort in STS-ASCD were not readily available, so it was uncertain if the developmental cohorts were similar in the ACS-NSQIP and STS-ACSD; and (6) lack of cardiac-specific preoperative predictors in the ACS-NSQIP may have resulted in models that were inferior to those generated using the STS-ACSD.
In conclusion, we successfully analyzed the ACS-NSQIP to estimate unadjusted rates of postoperative outcomes and to develop models to predict postoperative outcomes for cardiac surgery and compared these to the STS-ACSD models. Most postoperative outcome rates were similar to those reported by the STS even when statistically different. The ACS-NSQIP models performed almost as well as STS-ACSD preoperative prediction models. The STS-ACSD models possibly performed better due to larger sample size and use of more preoperative variables and more that were disease-specific. However, the ACS-NSQIP models did show good prediction and calibration for a number of outcomes and operation groups. The ACS-NSQIP could be used for study of cardiac surgery outcomes for institutions without access to the STS-ACSD.

Ethical oversight statement
The Colorado Multiple Institutional Review Board determined this study exempt from review as it used publicly available deidentified data.