KEY POINTS
Question: What are the advantages of applying machine learning to develop a model to predict blood transfusion requirements using the Pediatric Craniofacial Surgery Perioperative Registry data set?
Findings: We developed a highly accurate prediction model and calculator that uses variables from prospective patients to make an individualized recommendation for intraoperative blood requirement.
Meaning: The future clinical utility of this method lies in the opportunity to modify the inputs to the model for each individual patient, thereby producing precision medicine recommendations.
Craniosynostosis is the premature fusion of ≥1 cranial sutures and often requires surgical intervention. Surgery may involve extensive osteotomies, which can lead to substantial blood loss. The mean intraoperative blood loss has been reported as 61 mL/kg, but the range is wide, with some patients losing >3 total blood volumes. Although 95% of children ≤24 months of age and 79% of children >24 months of age receive ≥1 perioperative blood transfusion,1 12% develop significant postoperative anemia, 17% hypotension, and 19% significant metabolic acidosis.2 These figures suggest that although a large volume of blood loss is anticipated, the loss and intravascular volume depletion are assessed and treated inaccurately with current monitoring capabilities.
Despite the high rate of morbidity in this population, no consensus recommendations or guidelines have been published for intraoperative blood management during craniosynostosis repair.3 Both cell salvage systems4–8 and tranexamic acid9–11 have evidence for clinical effectiveness in this patient population, but anesthesiologists and institutions exhibit a wide variation in practice. Stricker et al1 found that 35% of patients underwent surgery without antifibrinolytic drug infusions and that 87% of operations were undertaken without cell salvage techniques. Despite surgical12–16 and pharmacological advances in reducing intraoperative blood loss, many patients still require blood product transfusion.17
Some institutions have incorporated guidelines for using tranexamic acid in the craniosynostosis surgical population and protocols to guide the administration of blood products.3 , 18 However, such guidelines are static, one-size-fits-all models that fail to incorporate variable individual patient characteristics.
The aim of this study was to develop and internally validate an accurate model that can predict intraoperative transfusion requirement during pediatric craniofacial surgery (in units of blood product) by using data from the Pediatric Craniofacial Surgery Perioperative Registry (PCSPR). We used machine-learning techniques that enable patient-specific predictions and allow the model to be continuously refined by retraining as the dataset is updated with new patient entries. These recently developed statistical techniques use complex algorithm structures to identify hidden patterns in the data and can be used for more sophisticated problems than standard statistical measures. They are becoming increasingly of interest to perioperative medical researchers.19 Machine-learning models have already been applied successfully to many areas in health care, such as predicting patient outcomes20–22 and reducing unnecessary postoperative complete blood count testing in the pediatric intensive care unit.23 The use of such a model would allow clinicians to predict which patients are more likely to require a blood transfusion during the intraoperative period and how many units of blood each of these patients would require. A secondary aim was to develop a precision medicine calculator capable of evolving as new data are entered into the PCSPR to predict how many units of blood product an individual patient will require during surgery.
MATERIALS AND METHODS
Pediatric Craniofacial Surgery Perioperative Registry
The PCSPR is a prospective observational registry created by the Pediatric Craniofacial Collaborative Group, under the auspices of the Society for Pediatric Anesthesia, to describe the anesthetic and surgical management of pediatric patients undergoing craniofacial surgery. The dataset is available for use in research-related purposes to all hospitals that contribute to the registry. Ninety-eight percent of registry entries are for craniosynostosis; the remaining cases include reconstruction surgery performed for treatment of other conditions (eg, slit ventricle syndrome, frontonasal dysplasia, hypertelorism). Participating institutions obtained local institutional review board (IRB) approval. The requirement for informed consent was waived by the local IRB at all but 6 institutions. In accordance with local IRB approvals, written informed consent was obtained from participating subjects at 5 of these centers; at the remaining center, verbal informed consent was obtained and written informed consent was explicitly waived by the local IRB. The data are collected and managed with Research Electronic Data Capture (REDCap) electronic data capture tools hosted at the Children’s Hospital of Philadelphia. The full anonymized dataset contained 2390 subjects at the time of this study, collected from June 2012 to October 2017. Subsets of this larger dataset have been the subject of previous studies.1 , 17 , 24–26 Individual institutions are represented by institution codes and are not identifiable.
Preprocessing: Data Collection, Entry, and Validation
Data were collected on patient demographics, intraoperative and postoperative transfusion, intraoperative surgical and anesthetic management, postoperative management and laboratory results, the occurrence of prespecified intraoperative and postoperative complications, and the length of intensive care unit and hospital stay in calendar days. Data were submitted to the registry from 33 institutions within and outside the United States beginning in June 2012. The median case capture rate across all participating institutions was 100% (interquartile range, 98%–100%). All participating institutions were required to report data collection and auditing processes to ensure accuracy.1
In addition to the above requirements, the database director at the Data Coordinating Center audited the dataset before analysis by scrutinizing cases for omissions of critical data (eg, age and weight) and identifying outlier data. The investigator also cross-validated data within individual records to optimize data accuracy. Queries based on omissions, outliers, and discrepancies identified through this process were aggregated and sent to site investigators for rectification.
Some patients (n = 345) had the name of the fused suture(s) missing from their registry entry. Through manual examination of the “diagnosis” and “procedure” fields, we were able to find the affected suture(s) for 233 of these patients and included them in the analysis. Imputation for other missing values in the dataset was not considered because these patients comprised <3% (n = 54) of the study population and we could not confirm that the data were missing at random, which was the main hypothesis for most imputation techniques. We, therefore, excluded a total of 166 patients from the dataset and used 2143 patients for developing and testing our model.
We separated the dataset into cross-validation and testing sets by randomly selecting institutions to be part of a geographic validation testing set. We chose institutions at random, iteratively adding them to the testing set until it reached a size of approximately 30% of the full dataset. The cross-validation set consisted of 1465 patients (1302: 88% transfused), whereas the test set consisted of 678 patients (564: 84% transfused). The cross-validation set was used to develop the machine-learning model, and the testing set was used to evaluate the model results. We used 5-fold cross-validation to train the models. The study is reported according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) checklist for prediction model development.27
Machine-Learning Modeling
Machine-learning modeling consists of 4 steps: (1) feature extraction, (2) model selection, (3) training and validating, and (4) testing the model.
Feature Extraction
Through literature and physician expert review, we identified 22 available preoperative and demographic features that are known or suspected to have an association with the need for blood transfusion during craniofacial surgery. We used these features as inputs to the machine-learning models. Table 1 presents the list of selected features for machine-learning prediction and a comparison of baseline characteristics in the training and testing datasets. No feature size reduction was performed, and all of the 22 features were used as inputs to every model. One-hot encoding was used for categorical variables.
Table 1. -
Comparison of Baseline Characteristics for the Training and Testing Datasets
Variable
Data Type
Cross-Validation Set N = 1465
Testing Set N = 678
Surgical volume (no. of cases per year) per institution, mean (1 SD)
Ordinal
131 (116) Tx: 129 (109)
139 (108) Tx: 134 (112)
Race, n (%)
Categorical
Caucasian
71 (89)
72 (85)
African American
11 (87)
12 (82)
Other
12 (86)
10 (81)
Ethnicity, non-Hispanic, n (%)
Binary
85 (89)
80 (84)
Age (mo), median (range)
Continuous
12 (1–190) 13 (1–190)
12 (1–175) 11 (1–172)
Sex, female, n (%)
Binary
63 (91)
57 (86)
Weight (kg), mean (1 SD)
Continuous
11.9 (8.7) Tx: 12.1 (8.2)
12.1 (7.9) Tx: 12.3 (8.4)
ASA status, n (%)
Categorical
I
12 (88)
20 (81)
II
58 (91)
51 (84)
III
28 (87)
27 (78)
IV
1 (89)
1 (82)
Suture, n (%)
Ordinal
1
76 (88)
79 (85)
2
1 (86)
1 (83)
3
16 (90)
14 (84)
Procedure performed by plastic surgeon, n (%)
Binary
98 (88)
98 (84)
Anatomical site classification, n (%)
Categorical
Anterior
60 (87)
59 (85)
Mid/posterior
29 (89)
28 (83)
Total
11 (86)
13 (84)
Distractor placement, n (%)
Binary
14 (88)
14 (83)
Craniosynostosis-related syndrome, n (%)
Categorical
13 (90)
17 (85)
Patient treated with preoperative erythropoietin, n (%)
Binary
1 (91)
1 (86)
Existing tracheostomy, n (%)
Binary
3 (87)
3 (84)
Preoperative evidence of elevated intracranial pressure, n (%)
Binary
17 (89)
14 (85)
Type of craniosynostosis syndrome, n (%)
Categorical
Crouzon
4 (89)
3 (86)
Apert
3 (86)
3 (83)
Other
3 (87)
4 (84)
Prior craniofacial surgery, n (%)
Binary
20 (91)
17 (86)
Preoperative hemoglobin level (g/dL), mean (1 SD)
Continuous
11 (1.4) Tx: 10.8 (1.2)
12 (1.3) Tx: 11.7 (1.3)
Preoperative hematocrit, mean (1 SD)
Continuous
34 (3.1) Tx: 33 (3.2)
35 (3.4) Tx: 34 (3.7)
Preoperative platelet count, mean (1 SD)
Continuous
365 (112) Tx: 359 (110)
353 (109) Tx: 358 (98)
Cell salvage used, n (%)
Binary
15 (95)
16 (88)
Antifibrinolytic agent used, n (%)
Binary
73 (89)
75 (84)
No. of units transfused, n (%)
Ordinal, output
1
807 (54)
372 (55)
2
313 (21)
115 (17)
3
78 (6)
47 (7)
≥4
104 (8)
30 (5)
The cross-validation dataset consists of 1465 patients (1302 transfused). The testing dataset consists of 678 patients (564 transfused). Percentages in brackets are the patients who received a transfusion, unless otherwise specified.
Abbreviations: ASA, American Society of Anesthesiologists; SD, standard deviation; Tx, transfused.
Model Selection and Training
The first step in meeting the study’s primary aim was to “classify” the subjects into 2 groups: one group that required transfusion and one that did not. The second part required a machine-learning “regression” model to predict how many units of blood product will be required for an individual during his/her surgery. We considered the recommended number of units of blood to be ordinal, as only whole units can be ordered, and therefore rounded the outcome of the regression model to the closest integer. We trained and tested 6 different machine-learning classification and regression models to help identify patients who would require blood transfusion during craniosynostosis surgery. These models included random forest, adaptive boosting (AdaBoost), neural network, gradient boosting machine (GBM), support vector machine (SVM), and elastic net.
Random forest, which is an ensemble learning method, uses the concept of bootstrap aggregating (bagging), which averages the predictions over bootstrap samples from the original sample set. The observations in bootstrap samples are used for building a set of weak trees; all of the observations that are not in these samples are considered out-of-bag observations and are used for evaluating model performance.
AdaBoost is an ensemble learning method that uses multiple iterations to generate a classifier. AdaBoost creates the classifier by iteratively adding weak learners together. During each round of training, a new weak learner is added to the ensemble and a weighting vector is adjusted to focus on examples that are misclassified in previous rounds. The resulting final classifier has higher accuracy than the weak learners.
Neural networks make up a class of machine-learning algorithms that use a directed acyclic graph structure to model the relationship between inputs and outputs. They are able to achieve accurate results with highly varied data because they incorporate many parameters, enabling modeling of highly complex nonlinear systems such as those found in medicine.
A GBM is another ensemble learning method. At each step, a new weak base-learner model is trained with respect to the error of the whole ensemble learned so far. In a GBM, the learning procedure consecutively fits new models to provide a more accurate estimate of the response variable.
An SVM is a type of statistical learning model that provides good generalization capability in classification and prediction tasks. SVM is based on the concept of structural risk minimization for finding an optimal hyperplane that separates the data into different groups or classes.
Elastic net is a class of logistic regression models that incorporate L 1 and L 2 weight penalties to prevent these regression weights from becoming very large during model training. Hence, it prevents overfitting of the model to the training data.
After model selection, we sought the optimal set of hyperparameters for each model. This step often requires an exhaustive search of all possible combinations of the model hyperparameters. Such a search is time consuming and not computationally or time effective. Therefore, in this study, we used a cross-validation “random search” technique to find the optimal hyperparameter values that result in the smallest validation error.28 In a random search technique, a number of potential hyperparameter combinations are sampled by using a uniform distribution function. In comparison with the commonly used grid search technique, a random search finds better models by effectively searching a larger configuration space.28
Model Evaluation
To compare the performance of the classification models, we calculated the area under the receiver operating characteristic curve (AUROC), F-score, sensitivity, and positive predictive value (PPV). The receiver operating characteristic (ROC) curve is a plot of true positive rate versus false positive rate and illustrates the diagnostic capability of a machine-learning classifier. F-score is a harmonic mean of PPV and sensitivity that represents the accuracy of a classifier.
To compare performance of the regression models, we calculated the mean squared error (MSE) and root MSE (RMSE) between the predicted and actual number of blood product units as the main measure of accuracy. We also calculated the R -squared (R 2 ) metrics to further analyze the accuracy of the regression models. R 2 is the proportion of variance in the output that can be explained and predicted from the input variables.
Feature Ranking
Using the GBM classification model, we predicted the impact of each variable on the need for transfusion. A gradient boosting feature selection algorithm reliably extracts relevant features and is able to identify nonlinear feature interactions that scale linearly with the number of features and patients. It uses the L 1 regularized gradient boosting method to find the features that are most used in building the weak learner models. It is important to note that the feature ranking step is performed independently of the classification and regression machine-learning model development.
Analytic Software
We used the “Python scikit-learn” library (Alphabet Inc, Mountain View, CA)29 (which contains implementations of all machine-learning models used in this study) to execute all aspects of the machine-learning analysis.
RESULTS
Figure 1 shows a scatter plot of total transfused blood product units versus age. Their respective distributions are represented within marginal frequency histograms. By inspecting this plot, we can observe that most patients in the population (positive skew) were under 25 months (≈2 years) of age and required 0–2 units of blood during their surgical procedure. The plot also provides a class age grouping: 0–6 months (red: 316 patients), 7–36 months (green: 1379 patients), 37–120 months (African American: 305 patients), and >120 months (purple: 62 patients). This plot helps to provide a high-level understanding of the population distribution and correlations between age and transfused blood product and will later assist with the interpretation of the predictive model results.
Figure 1.: Plot of total number of units of blood transfused versus age for the entire patient population. The histogram on the right bar shows the distribution of total number of blood products, and the histogram on the top bar shows the distribution of population age. The various colors indicate the age groups as described in the text. Red = 0–6 mo (316 patients), green = 7–36 mo (1379 patients), black = 37–120 mo (305 patients), purple = >120 mo (62 patients).
We then used the training dataset to find the optimal hyperparameters for each model. Table 2 shows the list of possible values and the optimal set of parameters for each machine-learning model for both the classification and regression tasks. Table 2 also provides the performance results of the machine-learning models.
Table 2. -
List of Tunable Hyperparameters of Machine-Learning Models and Result of Testing the Models
Model and Parameter Range
Optimal Values
Classification Model
Regression Model
C
R
Sensitivity
Specificity
F-Score
AUROC
MSE
R
2
RMSE
Testing
Cross-Validation
Testing
Cross-Validation
Testing
Cross-Validation
Testing
Cross-Validation
Testing
Cross-Validation
Testing
Cross-Validation
Testing
Cross-Validation
Random forest
164
0.88
0.91
0.82
0.84
0.83
0.87
0.83
0.84
1.32
1.28
0.72
0.75
1.08
1.15
No. of trees = u(150, 250)
yes
153
± 0.03
± 0.02
± 0.05
± 0.04
± 0.03
± 0.05
± 0.03
± 0.04
± 0.08
± 0.1
± 0.04
± 0.05
± 0.05
± 0.07
Bootstrapping
5
yes
Maximum depth = (3, 4, 5)
17
5
Maximum no. of features = u(5,22)
14
AdaBoost
181
189
0.89
0.91
0.83
0.85
0.86
0.88
0.82
0.87
1.38
1.3
0.68
0.71
1.21
1.17
No. of trees = u(150, 250)
0.05
0.07
± 0.04
± 0.02
± 0.02
± 0.04
± 0.03
± 0.02
± 0.04
± 0.05
± 0.05
± 0.09
± 0.03
± 0.04
± 0.11
± 0.14
Learning rate = u(0.01, 0.99)
Square
Square
Loss = (linear, square, exponential)
Neural network
85
95
0.86
0.90
0.79
0.80
0.81
0.83
0.79
0.82
1.36
1.29
0.67
0.70
1.25
1.20
No. of neurons in first hidden layer = u(80, 120)
41
40
± 0.05
± 0.01
± 0.02
± 0.03
± 0.04
± 0.03
± 0.02
± 0.04
± 0.12
± 0.10
± 0.03
± 0.04
± 0.10
± 0.09
No. of neurons in second hidden layer = u(30, 70)
0.2
0.1
Learning rate = u(0.01, 0.5)
Gradient boosting machine
Huber
Huber
0.92
0.95
0.89
0.90
0.91
0.93
0.87
0.90
1.15
1.10
0.73
0.79
1.05
1.01
Loss = (Huber, quantile)
0.85
0.95
± 0.03
± 0.03
± 0.04
± 0.03
± 0.04
± 0.02
± 0.03
± 0.03
± 0.12
± 0.07
± 0.02
± 0.05
± 0.06
± 0.08
Subsample fraction = u(0.01, 0.99)
0.07
0.04
Learning rate = u(0.01, 0.99)
5
6
Maximum depth = u(2, 8)
Support vector machine
L2
L2
0.82
0.87
0.71
0.76
0.75
0.81
0.77
0.80
1.57
1.43
0.57
0.63
1.39
1.31
Penalty = (L1, L2)
0.1
0.2
± 0.05
± 0.04
± 0.02
± 0.05
± 0.04
± 0.05
± 0.05
± 0.02
± 0.15
± 0.11
± 0.07
± 0.04
± 0.12
± 0.11
Penalty parameter = u(0.1, 0.9)
RBF
RBF
Kernel = (RBF, polynomial, linear)
Elastic net α = u(0, 1)
0.3
0.4
0.67
0.75
0.63
0.69
0.64
0.71
0.76
0.77
1.83
1.77
0.48
0.52
1.67
1.46
α = u(0, 1)
0.8
0.7
± 0.04
± 0.03
± 0.04
± 0.05
± 0.03
± 0.01
± 0.04
± 0.01
± 0.14
± 0.09
± 0.06
± 0.03
± 0.09
± 0.16
Mixing parameter = u(0, 1)
The term u() represents uniform random distribution. Specificity is the true negative rate that is used when the number of positive conditions in the dataset is high. The reported numbers show performance metrics with 95% confidence intervals for testing sets and standard deviation for cross-validation sets.
Abbreviations: AdaBoost, adaptive boosting; AUROC, area under receiver operating characteristic curve; C, classification; MSE, mean squared error; R, regression; R 2 , R -squared; RBF, radial basis function; RMSE, root mean squared error.
The ROC curves for all of the classifiers are shown in Figure 2 . As the results of Table 2 and Figure 2 show, the GBM performed best for both the classification and regression problems. The numbers reported in the text show performance metrics with 95% confidence intervals. The performance of the GBM in classification is AUROC (0.87 ± 0.03) and F-score (0.91 ± 0.04). The performance of the GBM in regression is R 2 (0.73 ± 0.02), MSE (1.15 ± 0.012), and RMSE (1.05 ± 0.06). The results in Table 2 demonstrate that the standard deviation for the cross-validation training set is very close to the 95% confidence interval of the testing set, especially for the best performing classification and regression models. This is an indication of robustness of the mathematically more complex machine-learning models, such as neural networks and GBMs, in comparison with more traditional less sophisticated statistical models. One explanation for such behavior is that the rigorous training methodology for more advanced machine-learning techniques results in robust estimation of model parameters, which in turn results in better performance on the testing set.
Figure 2.: Comparison of receiver operating characteristic curves for machine-learning classifiers. The gradient boosting machine performed best, reaching 0.5 true positive rate at a very low false positive rate of 0.07. ADA (AdaBoost) AUROC = 0.82; EN AUROC = 0.77; GBM AUROC = 0.88; NN AUROC = 0.79; RF AUROC = 0.83; SVM AUROC = 0.76. AdaBoost (ADA) indicates adaptive boosting; AUROC, area under the receiver operating characteristic; EN, elastic net; GBM, gradient boosting machine; NN, neural network; RF, random forest; SVM, support vector machine.
The GBM model showed the following variables to be of high relative importance for risk of blood transfusion: platelet count, weight (kg), preoperative hematocrit, surgical volume per institution, age (months), and preoperative hemoglobin. Figure 3 shows a comparison of importance of each variable in predicting the need for blood transfusion during craniofacial surgery.
Figure 3.: Plot showing the relative importance of different variables for predicting blood transfusion during craniofacial surgery as derived from the gradient boosting machine classification model. ASA indicates American Society of Anesthesiologists physical status; Hb, hemoglobin; Hct, hematocrit; ICP, intracranial pressure.
Figure 4.: User interface for the blood transfusion recommendation calculator. The anesthesiologist enters patient-specific data, and the calculator provides the recommended number of blood units to be requested. ASA indicates American Society of Anesthesiologists physical status; DX, diagnosis; F, female; FFP, fresh frozen plasma; Hb, hemoglobin; Hct, hematocrit; M, male; PLT, platelets; Preop, preoperative; RBC, packed red blood cells; TXA, tranexamic acid.
We built a user interface that allows anesthesiologists to interact with the GBM regression model by entering parameter values specific for each individual patient. The anesthesiologist enters the patient-specific details that contribute to blood transfusion prediction, and the model produces a prediction of the type and required amount of blood products to be requested (Figure 4 ). The calculator predicts the total intraoperative blood product requirement in whole units of blood but is not able to accurately predict a ratio of components. The recommended “ratio” of blood products is therefore calculated by using the registry average. The calculator will be made publicly available online following peer review publication, which is a condition of the funding required to support this provision.
DISCUSSION
In this study, we present machine-learning models for the accurate prediction of blood product requirement during pediatric craniosynostosis surgery, and a calculator to enable these models to be used by clinicians to guide preoperative planning. Current evidence-based practice is driven through review of published trials and other data, and consensus guidelines are implemented over an entire patient population. A calculator that makes individualized data-driven recommendations for a patient’s perioperative blood product requirement is a prototypical application that is the first of its kind, to the best of our knowledge. The future clinical utility of this method lies in the opportunity to modify the inputs to the model for each individual, thereby producing precision medicine recommendations.
The goals in developing our machine-learning models are 2-fold: (1) to support resource allocation by suggesting the optimal amount of allogenic blood to order in advance and (2) to help the operative team minimize transfusion requirements in the perioperative period by suggesting potentially modifiable factors in those at risk of bleeding. Our predictions may also help to prevent waste of overordered blood products and the intraoperative use of emergency transfusion resources.
In the future, clinicians could use the calculator in their decision-making process to make preoperative orders for blood products.
To reduce the predicted transfusion requirement for a particular patient and therefore potentially decrease transfusion-related morbidity, the operative team could consider changing factors such as preoperative hemoglobin and platelet count, age at the time of the operation, and the use of cell salvage and tranexamic acid.
It should be noted that the only variables that can be modified by an anesthesiologist on the day of surgery—the use of antifibrinolytic therapy and the use of a cell salvage system—do not carry high predictive weight in the GBM feature ranking shown in Figure 3 . The current literature, however, indicates that both techniques are underutilized2 and that antifibrinolytic therapy in this patient population is considered low risk.9 , 10 , 24 , 30 , 31 Feature ranking does support previous research showing that centers at which high volumes of this type of surgery are performed transfuse patients less frequently.25 Clinicians are however advised to use caution when drawing conclusions from feature ranking, as it is performed for the entire registry population and the important features for individual cases may vary dramatically. A need to accommodate this wide interpatient variability within the registry population is the reason for developing the precision medicine calculator.
It is important to note that the hierarchical structures within the machine-learning algorithm are such that linear relationships may not exist between features and outputs, so we cannot assume that, for instance, increasing preoperative hemoglobin or choosing to use an antifibrinolytic would lead to a reduction in blood loss for every child. It is also important to highlight that no threshold values are associated with the features. For instance, as shown in Figure 3 , preoperative hemoglobin is an important variable. Yet, the model cannot tell us an exact preoperative hemoglobin value that would predict the need for intraoperative blood transfusion.
Another point of emphasis is that machine learning is a tool that can identify factors playing a predictive role for a given outcome, but it does not prove causality. One of the powers of machine learning is to help identify new hypotheses for further study. In this case, machine learning has identified platelet count to play a role in prediction, but this does not imply a causative role of platelet count in the need for transfusion. Future clinical trials would be required to help us understand the causal nature of relationships identified as important to prediction by machine learning.
In our clinical application, the classification model could be considered as a screening tool to quickly and accurately identify patients in need of transfusion and highlight them to the regression model, which is then used to predict the amount of blood products needed for these patients in a more accurate way than using a regression model alone. The output from the regression model is the “total number of units” of blood required for a given case. The model is unable to predict the type or ratio of components accurately because the data variation introduced by institutional differences in use of whole blood versus separate components (packed red blood cells, fresh frozen plasma, and platelets) is too great. Instead, the calculator uses the registry average to suggest a ratio of products.
This prototypical transfusion predictive model is not based on theoretical best practice but instead on current working practice—the average number of units clinicians have historically transfused for all the cases in the PCSPR registry. Because no national guideline exists as a standard for best practice in this patient cohort, we have used the consensus action of practicing expert pediatric anesthesiologists who contributed to the registry. As with all decision support tools, clinical judgment must still be applied to the output of the calculator. The possibility remains that during the actual procedure, the clinician may transfuse more or less blood product than recommended by the calculator for instance to limit blood donor exposure or because actual blood loss varies from that expected.
Predicting blood product transfusion requirement is mathematically complex. For higher accuracy prediction, we decided to first introduce classification modeling (“will the patient need a transfusion?”) and then expand on it using regression modeling (“if the patient needs a transfusion, how many units of blood will they need?”). The regression model has an expected lower accuracy than the classification model owing to variations in clinical practice among institutions. High variation makes it more difficult for the machine-learning model to predict the exact number of units of blood required. The predictability of the outcomes measured and therefore the accuracy of the model could be improved by using a larger dataset or by more coherence of practice between institutions (for instance, by having a national guideline) would increase.
This variation in practice observed in the current dataset does however mean that a diverse range of techniques are represented. Hence, the machine-learning model and calculator have the potential to be used in any equivalent health care system, regardless of its participation in PCSPR data collection. The performance of the calculator at sites not represented in the PCSPR dataset could be assessed by using new testing data supplied from locally curated registry information or collected retrospectively from local electronic medical records (EMRs).
In addition to its use in predicting potential transfusion requirements based on retrospective registry data, the proposed machine-learning model can be further enhanced by continuously learning from prospective data as it is entered into the PCSPR or extracted from local EMRs. As the dataset grows in size with input of these new data, and as clinical practice evolves, the prediction calculator may enable an even more accurate prediction of the blood transfusion requirements for each individual patient and could identify new patterns in the data.
The breadth of available data limited the development of our predictive models. The choice of features available to the modeling process was limited by the number of data fields recorded in the registry system, which is kept narrow to improve completeness of data collection. Currently, the dataset is considered low fidelity in the statistical realm. The availability of additional transfusion-related variables such as time stamps for blood administration and laboratory data would improve the models’ predictive performance. During data selection, some patients had to be excluded because their registry entries were missing values. We cannot retrospectively determine the mechanism leading to these missing data, and it is possible that the accuracy of the models could be improved by addition of these patients.
CONCLUSIONS
We have shown that machine-learning methods can be used to develop accurate predictive models for blood product transfusion requirements in individual patients undergoing surgery for craniosynostosis. Our predictive calculator for blood product transfusion is the first of its kind to our knowledge and will continue to improve in accuracy with learning from ongoing prospective data input and if higher-fidelity data can be used. Our team is working on the implementation of this prototypical calculator to bring it to the future clinical workflow.
CONTRIBUTORS
Christopher Abruzzese, DO ([email protected] ), Jesus Apuya, MD ([email protected] ), Angelina Bhandari, MD ([email protected] ), Amy Beethe, MD ([email protected] ), Hubert Benzon, MD ([email protected] ), Wendy Binstock, MD ([email protected] ), Victoria Bradford, MD ([email protected] ), Alyssa Brzenski, MD ([email protected] ), Stefan Budac, MD ([email protected] ), Veronica Busso, MD ([email protected] ), Surendrasingh Chhabada, MD ([email protected] ), Franklin Chiao, MD ([email protected] ), Franklyn Cladis, MD ([email protected] ), Danielle Claypool, MD ([email protected] ), Michael Collins, MD ([email protected] ), Lynnie Correll, MD, PhD ([email protected] ), Andrew Costandi, MD ([email protected] ), Rachel Dabek, BS ([email protected] ), Nicholas Dalesio, MD ([email protected] ), Piedad Echeverry, MD ([email protected] ), Ricardo Falcon, MD ([email protected] ), Patrick Fernandez, MD ([email protected] ), John Fiadjoe, MD ([email protected] ), Meera Gangadharan, MD ([email protected] ), Katherine Gentry, MD ([email protected] ), Chris Glover, MD ([email protected] ), Susan M. Goobie, MD, FRCPC ([email protected] ), Amanda Gosman, MD ([email protected] ), Anastasia Grivoyannis, MD ([email protected] ), Shannon Grap, MD ([email protected] ), Heike Gries, MD ([email protected] ), Allison Griffin, BS ([email protected] ), John Hajduk, BS (JHajd[email protected] ), Thorsten Haas, MD ([email protected] ), Rebecca Hall, MD ([email protected] ), Jennifer Hansen, MD ([email protected] ), Mali Hetmaniuk, MD ([email protected] ), H. Mayumi Homi, MD ([email protected] ), Vincent Hsieh, MD ([email protected] ), Henry Huang, MD ([email protected] ), Pablo Ingelmo, MD ([email protected] ), Iskra Ivanova, MD ([email protected] ), Ranu Jain, MD ([email protected] ), Siri Kanmanthreddy, MD ([email protected] ), Michelle Kars, MD ([email protected] ), Mike King, MD ([email protected] ), John Koller, MD ([email protected] ), Courtney Kowalczyk-Derderian, MD ([email protected] ), Jane Kugler, MD ([email protected] ), Kristen Labovsky, MD ([email protected] ), Indrani Lakheeram, MD ([email protected] ), Alina Lazar, MD ([email protected] ), Andrew Lee, MPH ([email protected] ), Jennifer Lee, MD ([email protected] ), Jose Luis Martinez, MD ([email protected] ), Brian Masel, MD ([email protected] ), Aaron Mason, MD ([email protected] ), Eduardo Medellin, BS ([email protected] ), Vivek Mehta, MD ([email protected] ), Petra Meier, MD ([email protected] ), Heather Mitzel Levy, MD ([email protected] ), Wallis T. Muhly, MD ([email protected] ), Bridget Muldowney, MD ([email protected] ), Jonathon Nelson, MD ([email protected] ), Julie Nicholson, BS ([email protected] ), Kim-Phuong Nguyen ([email protected] ), Thanh Nguyen, MD ([email protected] ), Margaret Owens-Stubblefield, BSN ([email protected] ), Matt Pankratz, PhD (mpankratz[email protected] ), Uma Ramesh Parekh, MD ([email protected] ), Jasmine Patel, MD ([email protected] ), Roshan Patel, MD ([email protected] ), Carolina Perez-Pradilla, MD ([email protected] ), Timothy Petersen, PhD ([email protected] ), Julian Post, BS ([email protected] ), Kim Poteet-Schwartz, MD ([email protected] ), Pavithra Ranganathan, MD ([email protected] ), Srijaya Reddy, MD ([email protected] ), Russell Reid, MD ([email protected] ), Karene Ricketts, MD ([email protected] ), Megan Rodgers McCormick, MD ([email protected] ), Laura Ryan MD ([email protected] ), Kaitlyn Sbrollini, BS ([email protected] ), Peggy Seidman, MD ([email protected] ), Davinder Singh, MD ([email protected] ), Neil R. Singhal, MD ([email protected] ), Rochelle Skitt, MD ([email protected] ), Codruta Soneru, MD ([email protected] ), Emad Sorial, MD ([email protected] ), Rachel Spitznagel, MD ([email protected] ), Bobbie Stubbeman, BS ([email protected] ), Rani Sunder, MD ([email protected] ), Wai Sung, MD ([email protected] ), Tariq Syed, MS ([email protected] ), Peter Szmuk, MD ([email protected] ), Brad M. Taicher, DO ([email protected] ), Jenna Taylor, MD ([email protected] ), Douglas Thompson, MD ([email protected] ), Lisa Tretault, RN, BSN, CCRP ([email protected] ), Galit Ungar-Kastner, MD ([email protected] ), John Wieser, BS ([email protected] ), and Karen Wong, MBBS ([email protected] ), Hannah Yates, BS ([email protected] ).
ACKNOWLEDGMENTS
The authors acknowledge Ariel Vincent, BA, who manages the Pediatric Craniofacial Surgery Perioperative Registry at the Children’s Hospital of Philadelphia, and Dr Ernest Amankwah, PhD, for statistical advice.
DISCLOSURES
Name: Ali Jalali, PhD.
Contribution: This author helped design the study, collect the data, analyze the statistical data, prepare the manuscript, and accept the final manuscript.
Name: Hannah Lonsdale, MBChB.
Contribution: This author helped design the study, collect the data, analyze the statistical data, prepare the manuscript, and accept the final manuscript.
Name: Lillian V. Zamora, MD.
Contribution: This author helped design the study, collect the data, analyze the statistical data, prepare the manuscript, and accept the final manuscript.
Name: Luis Ahumada, MSCS, PhD.
Contribution: This author helped design the study, collect the data, analyze the statistical data, prepare the manuscript, and accept the final manuscript.
Name: Anh Thy H. Nguyen, MSPH.
Contribution: This author helped analyze the statistical data, prepare the manuscript, and accept the final manuscript.
Name: Mohamed Rehman, MD.
Contribution: This author helped design the study, collect the data, prepare the manuscript, and accept the final manuscript.
Name: James Fackler, MD.
Contribution: This author helped design the study, collect the data, prepare the manuscript, and accept the final manuscript.
Name: Paul A. Stricker, MD.
Contribution: This author helped design the study, collect the data, prepare the manuscript, and accept the final manuscript.
Name: Allison M. Fernandez, MD, MBA.
Contribution: This author helped design the study, collect the data, prepare the manuscript, and accept the final manuscript.
This manuscript was handled by: Thomas M. Hemmerling, MSc, MD, DEAA.
References
1. Stricker PA, Goobie SM, Cladis FP, et al.; Pediatric Craniofacial Collaborative Group. Perioperative outcomes and management in pediatric complex cranial vault reconstruction: a multicenter study from the Pediatric Craniofacial Collaborative Group. Anesthesiology. 2017;126:276–287.
2. Stricker PA, Shaw TL, Desouza DG, et al. Blood loss, replacement, and associated morbidity in infants and children undergoing craniofacial surgery. Paediatr Anaesth. 2010;20:150–159.
3. White N, Bayliss S, Moore D. Systematic review of interventions for minimizing perioperative blood transfusion for surgery for craniosynostosis. J Craniofac Surg. 2015;26:26–36.
4. Dahmani S, Orliaguet GA, Meyer PG, Blanot S, Renier D, Carli PA. Perioperative blood salvage during surgical correction of craniosynostosis in infants. Br J Anaesth. 2000;85:550–555.
5. Deva AK, Hopper RA, Landecker A, Flores R, Weiner H, McCarthy JG. The use of intraoperative autotransfusion during cranial vault remodeling for craniosynostosis. Plast Reconstr Surg. 2002;109:58–63.
6. Duncan C, Richardson D, May P, et al. Reducing blood loss in synostosis surgery: the Liverpool experience. J Craniofac Surg. 2008;19:1424–1430.
7. Jimenez DF, Barone CM. Intraoperative autologous blood transfusion in the surgical correction of craniosynostosis. Neurosurgery. 1995;37:1075–1079.
8. Krajewski K, Ashley RK, Pung N, et al. Successful blood conservation during craniosynostotic correction with dual therapy using Procrit and cell saver. J Craniofac Surg. 2008;19:101–105.
9. Dadure C, Sauter M, Bringuier S, et al. Intraoperative tranexamic acid reduces blood transfusion in children undergoing craniosynostosis surgery: a randomized double-blind study. Anesthesiology. 2011;114:856–861.
10. Goobie SM, Meier PM, Pereira LM, et al. Efficacy of tranexamic acid in pediatric craniosynostosis surgery: a double-blind, placebo-controlled trial. Anesthesiology. 2011;114:862–871.
11. Fenger-Eriksen C, D’Amore Lindholm A, Nørholt SE, et al. Reduced perioperative blood loss in children undergoing craniosynostosis surgery using prolonged tranexamic acid infusion: a randomised trial. Br J Anaesth. 2019;122:760–766.
12. Isaac KV, MacKinnon S, Dagi LR, Rogers GF, Meara JG, Proctor MR. Nonsyndromic unilateral coronal synostosis: a comparison of fronto-orbital advancement and endoscopic suturectomy. Plast Reconstr Surg. 2019;143:838–848.
13. Braun TL, Eisemann BS, Olorunnipa O, Buchanan EP, Monson LA. Safety outcomes in endoscopic versus open repair of metopic craniosynostosis. J Craniofac Surg. 2018;29:856–860.
14. Arts S, Delye H, van Lindert EJ. Intraoperative and postoperative complications in the surgical treatment of craniosynostosis: minimally invasive versus open surgical procedures. J Neurosurg Pediatr. 2018;21:112–118.
15. van Veelen MC, Kamst N, Touw C, et al. Minimally invasive, spring-assisted correction of sagittal suture synostosis: technique, outcome, and complications in 83 cases. Plast Reconstr Surg. 2018;141:423–433.
16. Hallén T, Maltese G, Olsson R, Tarnow P, Kölby L. Cranioplasty without periosteal dissection reduces blood loss in pi-plasty surgery for sagittal synostosis. Pediatr Neurosurg. 2017;52:284–287.
17. Fernandez PG, Taicher BM, Goobie SM, et al.; Pediatric Craniofacial Collaborative Group. Predictors of transfusion outcomes in pediatric complex cranial vault reconstruction: a multicentre observational study from the Pediatric Craniofacial Collaborative Group. Can J Anaesth. 2019;66:512–526.
18. Park C, Wormald J, Miranda BH, Ong J, Hare A, Eccles S. Perioperative blood loss and transfusion in craniosynostosis surgery. J Craniofac Surg. 2018;29:112–115.
19. Lonsdale HJA, Ahumada L, Matava C. Machine learning and artificial intelligence in pediatric research: current state, future prospects and applied examples in perioperative and critical care. J Pediatr. 2020;221S:S3–S10.
20. Jalali A, Buckley EM, Lynch JM, Schwab PJ, Licht DJ, Nataraj C. Prediction of periventricular leukomalacia occurrence in neonates after heart surgery. IEEE J Biomed Health Inform. 2014;18:1453–1460.
21. Jalali A, Simpao AF, Gálvez JA, Licht DJ, Nataraj C. Prediction of periventricular leukomalacia in neonates after cardiac surgery using machine learning algorithms. J Med Syst. 2018;42:177.
22. Lundberg SM, Nair B, Vavilala MS, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018;2:749–760.
23. Dewan M, Galvez J, Polsky T, et al. Reducing unnecessary postoperative complete blood count testing in the pediatric intensive care unit. Perm J. 2017;21:1651.
24. Goobie SM, Cladis FP, Glover CD, et al.; The Pediatric Craniofacial Collaborative Group. Safety of antifibrinolytics in cranial vault reconstructive surgery: a report from the Pediatric Craniofacial Collaborative Group. Paediatr Anaesth. 2017;27:271–281.
25. Fernandez AM, Reddy SK, Gordish-Dressman H, et al. Perioperative outcomes and surgical case volume in pediatric complex cranial vault reconstruction: a multicenter observational study from the Pediatric Craniofacial Collaborative Group. Anesth Analg. 2018;129:1069–1078.
26. Goobie SM, Zurakowski D, Isaac KV, et al.; Pediatric Craniofacial Collaborative Group. Predictors of perioperative complications in paediatric cranial vault reconstruction surgery: a multicentre observational study from the Pediatric Craniofacial Collaborative Group. Br J Anaesth. 2019;122:215–223.
27. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. J Clin Epidemiol. 2015;68:134–143.
28. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13:281–305.
29. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–2830.
30. Oppenheimer AJ, Ranganathan K, Levi B, et al. Minimizing transfusions in primary cranial vault remodeling: the role of aminocaproic acid. J Craniofac Surg. 2014;25:82–86.
31. Hsu G, Taylor JA, Fiadjoe JE, et al. Aminocaproic acid administration is associated with reduced perioperative blood loss and transfusion in pediatric craniofacial surgery. Acta Anaesthesiol Scand. 2016;60:158–165.