Establishment and validation of an artificial intelligence web application for predicting postoperative in-hospital mortality in patients with hip fracture: a national cohort study of 52 707 cases

Background: In-hospital mortality following hip fractures is a significant concern, and accurate prediction of this outcome is crucial for appropriate clinical management. Nonetheless, there is a lack of effective prediction tools in clinical practice. By utilizing artificial intelligence (AI) and machine learning techniques, this study aims to develop a predictive model that can assist clinicians in identifying geriatric hip fracture patients at a higher risk of in-hospital mortality. Methods: A total of 52 707 geriatric hip fracture patients treated with surgery from 90 hospitals were included in this study. The primary outcome was postoperative in-hospital mortality. The patients were randomly divided into two groups, with a ratio of 7:3. The majority of patients, assigned to the training cohort, were used to develop the AI models. The remaining patients, assigned to the validation cohort, were used to validate the models. Various machine learning algorithms, including logistic regression (LR), decision tree (DT), naïve bayesian (NB), neural network (NN), eXGBoosting machine (eXGBM), and random forest (RF), were employed for model development. A comprehensive scoring system, incorporating 10 evaluation metrics, was developed to assess the prediction performance, with higher scores indicating superior predictive capability. Based on the best machine learning-based model, an AI application was developed on the Internet. In addition, a comparative testing of prediction performance between doctors and the AI application. Findings: The eXGBM model exhibited the best prediction performance, with an area under the curve (AUC) of 0.908 (95% CI: 0.881–0.932), as well as the highest accuracy (0.820), precision (0.817), specificity (0.814), and F1 score (0.822), and the lowest Brier score (0.120) and log loss (0.374). Additionally, the model showed favorable calibration, with a slope of 0.999 and an intercept of 0.028. According to the scoring system incorporating 10 evaluation metrics, the eXGBM model achieved the highest score (56), followed by the RF model (48) and NN model (41). The LR, DT, and NB models had total scores of 27, 30, and 13, respectively. The AI application has been deployed online at https://in-hospitaldeathinhipfracture-l9vhqo3l55fy8dkdvuskvu.streamlit.app/, based on the eXGBM model. The comparative testing revealed that the AI application’s predictive capabilities significantly outperformed those of the doctors in terms of AUC values (0.908 vs. 0.682, P<0.001). Conclusions: The eXGBM model demonstrates promising predictive performance in assessing the risk of postoperative in-hospital mortality among geriatric hip fracture patients. The developed AI model serves as a valuable tool to enhance clinical decision-making.


Introduction
Hip fractures are a common and serious injury among the geriatric population, resulting in significant morbidity and mortality [1] .It is a significant public health issue, affecting individuals, healthcare systems, and economies globally [2] .With an aging population, the prevalence and burden of hip fractures are expected to rise [2,3] .In-hospital mortality following hip fractures is a major concern, and accurate prediction of this outcome is crucial for appropriate clinical management and resource allocation [4] .
The early prediction of in-hospital mortality in geriatric hip fracture patients has been challenging, but is of utmost importance.Because identifying patients at higher risk of mortality can help healthcare providers prioritize their care and implement preventive measures [5,6] .Traditionally, prediction models have relied on clinical risk factors and scoring systems, which have demonstrated moderate accuracy [5] .However, recent advancements in artificial intelligence (AI) and machine learning techniques offer new opportunities for improving mortality prediction in this population [5] .
Machine learning algorithms have the ability to analyze large and complex datasets, identify patterns, and generate predictive models.These algorithms can incorporate multiple variables to develop accurate and personalized predictions [7,8] .By harnessing the power of machine learning, we can potentially enhance the early identification of high-risk patients and improve clinical decision-making in the management of geriatric hip fractures [5] .The use of machine learning algorithms in the field of hip fracture prediction has shown promising results [4,5,9] .Studies have demonstrated the potential of these models in diagnosing hip fractures [4,9] and predicting outcomes [5] .However, limited research has focused specifically on predicting in-hospital mortality.
Therefore, this study aims to develop and validate an AI model using machine learning techniques to predict the risk of postoperative in-hospital mortality in geriatric hip fracture patients.The findings of this study have the potential to significantly impact clinical practice by providing clinicians with a valuable tool to assess mortality risk and guide treatment decisions.

Patients
A total of 52 707 geriatric patients with hip fractures treated surgically in 90 hospitals from January 2011 to September 2021 were retrospectively included in this study.Patients with a diagnosis of hip fracture based on ICD-9-CM codes (820.x) or ICD-10-CM codes (S72.x) were extracted.We collected the data through a comprehensive review of electronic medical records based on standardizing data collection procedures across all participating hospitals.The participating hospitals were located in various regions of the country, ensuring a broad representation of healthcare settings.In this study, only patients without missing data were included for analysis.Patients were excluded from the analysis if they were not surgically treated, were under the age of 60, or had an unclear fracture type in the hip.The patient flowchart is depicted in Supplementary Figure 1 (Supplemental Digital Content 1, http://links.lww.com/JS9/C559).
In this study, the 'Military Medical and Health No. 1' system was used to extract data, and the system's reliability stems from its ability to gather comprehensive and accurate information from a wide range of medical facilities within the alliance hospitals.By enabling the unified extraction of data from multiple hospitals through a standardized system, we could access a larger, more diverse dataset, enhancing the generalizability and robustness of our findings.Moreover, this approach ensured data consistency and accuracy across all participating hospitals, minimizing errors and bias in the research.Data cleaning procedures and regular quality checks were performed to maintain data accuracy and completeness.A secure data management system was established, and monitoring and auditing processes were conducted to identify and address any issues.
The included patients were randomly divided into two groups, with a 7:3 ratio [10] .The training cohort, consisting of the majority of patients, was used for model development, while the validation cohort, consisting of the remaining patients, was used for validation.The study design is illustrated in Figure 1.This study was registered and approved by the Ethics Committee of our hospital, and this study also registered at a national clinical trial registry.All data were analyzed anonymously, and the study complied with the principles outlined in the Helsinki Declaration.Written informed consent was obtained from all patients, and the study adhered to the STROCSS criteria (Supplemental Digital Content 2, http://links.lww.com/JS9/C560) [11] and the TRIPOD Checklist [12] .

Variables and outcome
Demographic information (age and sex), fracture type, surgical procedure, and a range of comorbidities (number of comorbidities, anemia, hypertension, coronary heart disease, cerebrovascular disease, heart failure, atherosclerosis, renal failure, nephrotic syndrome, respiratory system disease, gastrointestinal bleeding, gastrointestinal ulcer, liver failure, cirrhosis, gastritis, diabetes, dementia, and cancer) were collected based on the availability of data.Diagnosis was determined by physicians at each institution, following standardized diagnostic guidelines.The primary outcome was in-hospital mortality after surgery.In the study, we collated data on various features, encompassing patient demographics, fracture types, and diverse pre-existing conditions, upon the patient's admission.The outcome measure, in-hospital death, was subsequently recorded following the surgical intervention during the hospital stay.By examining a broad spectrum of variables, this study aims to offer a comprehensive insight into the factors that contribute to in-hospital mortality among elderly patients with hip fracture.

Data process
The SMOTETomek resampling strategy [13,14] was employed to address the issue of imbalanced data and generate robust models.SMOTETomek combines the Synthetic Minority Oversampling Technique with Tomek Links Undersampling to create a new dataset with a larger sample size and a more balanced distribution.This strategy enhances the statistical power and generalizability of the findings, providing a solid foundation for further analyses and model development.Additionally, a data preprocessing pipeline was utilized to ensure consistent and reproducible data transformation, thereby improving the accuracy and reliability of the machine learning models.The scikit-learn library was used for data preprocessing pipelines.A stratified strategy was implemented to maintain consistent outcome class proportions in the sub-datasets.

Modeling
Various machine learning algorithms, including logistic regression (LR), decision tree (DT), naïve bayesian (NB), neural network (NN), eXGBoosting machine (eXGBM), and random forest (RF), were used to model development.Each model received the same input features to ensure consistency.Grid and random hyperparameter searches, combined with fivefold cross-validation, were performed to identify the optimal hyper-parameters for each model.The area under the curve (AUC) was used as the optimization metric.To account for variability in model performance, a broad range of upper and lower bounds for the hyper-parameters was set in the search, resulting in a mix of underfitted and overfitted models.

Validation
The models were validated using a variety of evaluation metrics, including AUC, accuracy, precision, specificity, recall, F1 score, Brier score, log loss, calibration slope, and intercept.AUC was calculated after applying 100 bootstraps.Calibration curve, density curve, and decision curve were utilized to evaluate the calibration ability, discriminative ability, and clinical net benefits of the models, respectively.Furthermore, a scoring system was developed based on previous studies to comprehensively evaluate the predictive performance of the models.The scoring system incorporated 10 evaluation metrics, with higher scores indicating superior predictive performance (range: 0-60).Confusion matrix was used to determine the accuracy, precision, recall (sensitivity), and specificity of the models.The confusion matrix is typically presented as a table with four main components: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).Each component represents a specific outcome of the model's predictions.
The Brier score, as an overall performance measure, was calculated using the following equation.It is calculated by summing the squared differences between the predicted probability and the actual probability for each sample, divided by the total sample size.
where, N represents the total sample, p i represents the pre- dicted risk of postoperative in-hospital mortality, and o i represents the actual probability of postoperative in-hospital mortality.
The log loss formula, commonly used in classification model evaluation, measures the quality of predictions made by the model.It calculates the negative logarithm of the predicted probability for each class, multiplied by the true label.The sum of these values is then divided by the total number of samples.

∑∑
where, N represents the number of samples, M represents the number of classes, y ij represents the true label of sample i for class j (0 or 1), and p ij represents the predicted probability of sample i belonging to class j.
We extracted data from a medical center in the United State (Medical Information Mart for Intensive Care III [MIMIC-III] database) [15] , and the data on patients with hip fracture were collected as an external validation cohort.Based on the same inclusive and exclusive criteria in the study, a series of 246 patients were included for analysis.Patient's flowchart baseline characteristics of patients, and detailed information are summarized in Supplementary File 1 (Supplemental Digital Content 3, http://links.lww.com/JS9/C561).The utilization of the MIMIC-III database was authorized by the institutional review board of a medical center in another nation.As the data within the database has been de-identified, patient consent was not necessary.This study pledged to scrupulously adhere to ethical guidelines and legal regulations throughout the research process, assuring the appropriate use and robust protection of patient privacy.

Feature importance
Feature importance was employed to enhance the interpretability of machine learning models [16,17] , as it allows for the quantification of the relative importance of each feature in the model's predictions.SHAP is a unified framework designed to interpret machine learning predictions, and it serves as a novel approach to explain various black-box machine learning models [18] .By identifying the most influential features, clinicians can better understand the factors that contribute the most to the outcome, making the model's predictions more interpretable and easier to validate.In the present study, Shapley additive explanation (SHAP) values were employed to determine the importance of each input parameter.SHAP values quantify the contribution of each feature to the model's output.
In the context of this explanation, certain variables were defined to describe the interpretation model.The variable g symbolizes the interpretation model itself, representing its underlying principles and mechanisms.M denotes the total number of input parameters in the model, ϕ 0 signifies a constant value within the model.Furthermore, ϕ j embodies the Shapley value assigned to each specific feature of the model, capturing its contribution to the overall interpretation, and ′ Z j represents the coalition vector.
Among the coalition vectors, their values play a critical role in providing insight into the interpretability of the model.A value of '1' denotes that a particular feature in the coalition vector aligns with the corresponding feature of the case x being explained.Conversely, a value of '0' indicates that the feature is absent in the present case x.By applying this concept to our analysis, we can consider case x as a scenario where all simplified features hold a value of 1.With this assumption, the SHAP expression can be further simplified and outlined as follows:

Individual prediction
In the present study, we employed the SHAP method to perform individual prediction, aiming to offer a comprehensive insight into the prediction-making process of the model at an individual level [10,19] .By utilizing the trained model, we calculated SHAP values for each feature within the dataset.These SHAP values signify the average marginal contribution of a feature towards the model's predictions.They span from − 1 to 1, with − 1 denoting a negative contribution and 1 indicating a positive contribution.To render the SHAP values comprehensible, we employed a waterfall plot.This visual aid aided in identifying the crucial features for the model's predictions and comprehending their individualized contributions towards the predictions.

Development of an online AI application
The online AI application was developed using the powerful combination of Streamlit and GitHub.This interactive AI application encompasses various elements that enhance its usability and functionality.One notable component was a userfriendly panel that allowed individuals to select model parameters from a comprehensive list of seventeen significant variables.This empowered users to personalize their inputs and tailor them to their specific scenario.The application also included a dedicated interface for calculating the probability of postoperative in-hospital mortality based on the chosen parameters.By leveraging this interface, users could gain valuable insights and obtain an anticipated probability for this outcome.
To promote a thorough understanding, an additional panel was incorporated that showcases the contribution of each model predictor towards the ultimate outcome, employing the SHAP

Comparative evaluation of prediction performance: humans vs. the AI application
To thoroughly assess and compare prediction performance, we embarked on a comprehensive study pitting doctors against an AI application.Eight highly esteemed doctors, known for their expertise, willingly took part in this study.Each doctor autonomously provided predictions on the risk of death during hospitalization, drawing from their own assessments.Through this comparative analysis, our objective was to gain valuable insights into the relative performance and effectiveness of human experts versus the AI platform.In order to ensure a fair evaluation, we established a standardized set of criteria for both the doctors and the AI platform.These criteria encompassed factors such as AUC, accuracy, precision, recall, and overall predictive power.By employing these metrics, we aimed to gage the true capabilities of both human doctors and the AI application in accurately predicting the risk of death.
The study was conducted in a controlled environment, with each doctor and the AI platform being presented with the same dataset of patient records.The doctors meticulously reviewed the medical histories, conducted thorough examinations, and utilized their extensive knowledge and experience to make their predictions.On the other hand, the AI platform employed advanced algorithms and machine learning techniques to analyze the data and generate predictions.

Statistical analysis
Categorical variables were presented as proportions and compared using the χ 2 test in subgroup analysis.Python (version 3.9.

Patient's demographics
Table 1 provides an overview of the baseline characteristics of the patients included in the study.The total number of patients in the study was 52 707.In terms of age distribution, 25.2% of the patients were in the 60-69 age range, while 43.8% were aged 70-79.Patients aged 80-89 accounted for 27.5%, and 3.3% Figure 2. Heatmap of the scoring system for comprehensively evaluating the prediction performance of all models.The scoring system incorporated 10 evaluation metrics, and each metric was rated on a scale of 1 to 6, where higher scores denoted superior predictive performance.In this visualization, green represents relatively poor prediction performance, while red represents relatively good prediction performance.
were in the 90-100 age range.A small portion of patients (0.1%) were aged 100 or above.The sex distribution showed that 36.4% of the patients were male and 63.6% were female.Fracture type analysis revealed that 57.4% of the patients had femoral neck fractures, while 42.6% had intertrochanteric fractures.In terms of the type of operation performed, hip joint replacement accounted for 51.7%, while internal fixation made up 48.3%.
The number of comorbidities varied among the patients.

A comparison of clinical characteristics stratified by postoperative in-hospital death
The study further compared the clinical characteristics between patients who experienced in-hospital death and those who did not (Table 1).Age, sex, fracture type, operation type, and the number of comorbidities showed significant differences between the two groups (P < 0.001, P < 0.001, P = 0.018, P = 0.040, and P < 0.001, respectively).In detail, older age, male, patients with intertrochanteric fractures, those who underwent hip joint replacement, and a higher number of comorbidities had slightly higher in-hospital mortality rates.In addition, patients with variables comorbidities, including coronary heart disease (P < 0.001), cerebrovascular disease (P < 0.001), heart failure (P < 0.001), renal failure (P < 0.001), nephrotic syndrome (P = 0.046), respiratory system disease (P < 0.001), gastrointestinal bleeding (P < 0.001), gastrointestinal ulcer (P = 0.013), liver failure (P = 0.011), cirrhosis (P < 0.001), diabetes (P = 0.001), and cancer (P = 0.004), were more likely to experience in-hospital mortality than patients without these comorbidities after surgery.Thus, the above 17 variables were used as predictors in the model.Due to the significant predictive value of these factors in determining mortality after hip fracture [20][21][22][23][24][25][26] ,   Zhang et al. [32] 2019 Retrospective 448 60 years or above Surgery Bayesian belief network Bayesian belief network 1-year mortality: 0.85 Huang et al. [30] 2022

Feature importance and individual prediction
Feature importance analysis using SHAP in the eXGBM model revealed that the number of comorbidities, age, and operation type were the most influential features in predicting postoperative in-hospital mortality (Supplementary Figure 4, Supplemental Digital Content 10, http://links.lww.com/JS9/C568).The result suggested that patients with a higher number of comorbidities, older age, and specific operation types were at a higher risk of inhospital mortality following surgery.Furthermore, we employed the SHAP method to gain detailed insights into the model's interpretation at an individual level.By quantifying the importance of individual features in the model's prediction, a deeper understanding of the factors influencing the outcome can be achieved.To further enhance the interpretability of the model, we presented two distinct cases.Supplementary Figure 5 (Supplemental Digital Content 11, http://links.lww.com/JS9/C569) depicts a true negative case, whereas Supplementary Figure 6 (Supplemental Digital Content 12, http://links.lww.com/JS9/C570) illustrates a true positive case.The arrows demonstrate the impact of each factor on the prediction.Blue and red arrows denote whether the factor reduces (blue) or increases (red) the risk of in-hospital death.The combined influence of all factors generates the final SHAP value, which corresponds to the prediction score.For the representative case one, the SHAP value was low (− 1.90); for the representative case two, the SHAP value was high (5.07).

Online AI prediction
The eXGBM model has been deployed online and was freely accessible at https://in-hospitaldeathinhipfracture-l9vhqo3l55 fy8dkdvuskvu.streamlit.app/.By clicking the provided link, users can access the online AI application (Fig. 9).Once the model parameters were chosen and submitted, the risk of postoperative in-hospital mortality would be showcased.Additionally, recommendations for mitigating postoperative in-hospital mortality would be offered based on this risk assessment.Furthermore, a risk report highlighting risk or protective factors and feature importance was provided within the AI application.For instance, in the depicted case (Fig. 9), the number of comorbidities and age served as significant protective factors, while the surgical process and sex were important risk factors.In the event that the online application becomes unresponsive or inaccessible, users can reactivate it by clicking on 'Yes, get this app back up!'.The web-based application will be operational again within ~30 s after restarting the application, allowing users to resume utilizing the application without waiting for technical support or assistance.This simple and effective approach ensures uninterrupted access to the online application.

Comparative evaluation of prediction performance between doctors and the AI application
The findings revealed intriguing insights into the comparative performance of human doctors and the AI platform.While the doctors demonstrated their expertise and ability to consider various factors beyond the dataset, the AI platform showcased its remarkable computational power and ability to process vast amounts of data quickly.In terms of accuracy, the AI platform exhibited a higher level of precision in predicting the risk of death during hospitalization with an AUC value of 0.908, while the average AUC value was 0.682 for doctors (P < 0.001) (Fig. 10 and Supplementary Table 5, Supplemental Digital Content 13, http://links.lww.com/JS9/C571).Its ability to analyze large datasets and identify subtle patterns allowed it to make predictions with a higher degree of accuracy compared to human doctors.It is important to note that the doctors' predictions were far behind.Furthermore, the AI platform demonstrated consistent performance across different subsets of the dataset, indicating its robustness and reliability.On the other hand, the doctors' predictions showed some variability, highlighting the potential for subjective biases and individual differences in their assessments.

Main findings
The study developed and validated an AI model using machine learning techniques to predict the risk of postoperative in-hospital mortality in geriatric hip fracture patients.Among the six models tested, the eXGBM model demonstrated the highest predictive performance, with an AUC of 0.908 and the highest scores in accuracy, precision, specificity, and F1 score.In addition, the eXGBM model achieved the highest total score according to the comprehensive scoring system, indicating superior predictive capability.The developed AI model can serve as a valuable tool to enhance clinical decision-making in assessing the risk of postoperative in-hospital mortality in geriatric hip fracture patients.

Risk factors for predicting in-hospital mortality among hip fracture
Our study demonstrated that older age, male, patients with intertrochanteric fractures, those who underwent hip joint replacement, and comorbidities were associated with in-hospital mortality.The findings were consistent with previous studies.Previous studies have also proposed that older age [20][21][22] , male sex [21][22][23] , presence of comorbid diseases [20,21] were associated with increased odds of in-hospital mortality.In addition, a meta-analysis revealed that cardiovascular disease, pulmonary disease, and diabetes significantly increased the risk of mortality after hip fracture surgery [26] .Heart failure, renal failure, pulmonary disease were contributive factors for in-hospital mortality among hip fracture patients after surgery [22] .A small retrospective study illustrated that type of fracture and type of treatment were associated with 1-year mortality after fragility hip fracture [24] .Our study also newly found that fracture type and operation type also impact the in-hospital mortality, and it suggested that preoperative surgical plan should be carefully made.

Prediction of in-hospital mortality among hip fracture
Several studies have utilized machine learning algorithms to predict mortality in patients with hip fractures (Table 3), considering factors such as age, comorbidities, functional status, and laboratory markers.Forssten et al. [25] developed a machine learning model and achieved an AUC of 0.74 for predicting 1year mortality in a cohort of 124 707 hip fracture surgery patients.They also developed another model for predicting 30day mortality, with an AUC of 0.76 [27] .Oosterhoff et al. [28] proposed a machine learning model for predicting 90-day and 2year mortality in 2478 femoral neck fracture patients aged 65 or above.They obtained a c-statistic of 0.74 for 90-day mortality and 0.70 for 2-year mortality prediction.In a prospective study by Dijkstra et al. [29] , machine learning models were established to predict 90-day and 1-year mortality in elderly patients with femoral neck fractures, achieving c-statistics of 0.80 and 0.76, respectively.Our team previously developed a machine learningbased model for predicting in-hospital mortality, specifically in critically ill hip fracture patients [30] .The model included parameters such as age, sex, anemia, mechanical ventilation, cardiac arrest, and chronic airway obstruction, with an AUC of 0.715 [30] .A meta-analysis demonstrated that machine learning models outperformed the main clinical scale (Nottingham Hip Fracture Score, C-index: 0.702) with a pooled C-index of 0.762 in the training set and 0.838 in the validation set [5] .This study concludes by suggesting further research to improve predictive performance by incorporating machine learning models into clinical scoring systems.
However, most of these models were limited to specific populations or single-center studies with smaller sample sizes.In addition, the prediction models were mainly designed to predict the 30-day, 90-day, or 1-year mortality among hip fracture patients [24,25,28,29,[31][32][33][34][35] .Furthermore, the prediction performance of these models needs improvements.Our study, on the other hand, utilized a national cohort of a large sample size, allowing for more generalizability of the results.Notably, our comprehensive scoring system incorporating multiple evaluation metrics provides a robust assessment of the prediction performance, allowing for a more accurate comparison between the models.The AUC of our eXGBM model was up to 0.908, indicating favorable prediction performance.The superiority of our eXGBM model can be attributed to its ability to handle complex interactions and nonlinear relationships within the data, thereby capturing important predictive patterns.The high AUC, accuracy, precision, specificity, and F1 score of the eXGBM model also demonstrated its excellent discrimination and predictive capability.Additionally, the high score achieved by the eXGBM model in our comprehensive evaluation system further confirms its superior performance compared to other models.

The epidemiology of in-hospital mortality among hip fracture
Our findings revealed a postoperative in-hospital mortality rate of 0.9%, which is consistent with previous studies reported in the literature.Previous studies showed similar mortality rates ranging from 1.8 to 3.0% [22,36,37] .Nonetheless, some studies reported a relatively high in-hospital mortality among hip fracture patients.For instance, Peterle et al. [38] illustrated that the mean hospital mortality rate was 18.4% after analyzing 402 patients over 60 years of age admitted to hospital due to osteoporotic hip fracture.Sanz-Reig et al. [39] reported that the inhospital mortality rate was 11.4% in a prospective study conducted on 311 hip fracture patients with an age of more than 65 years.The variation in in-hospital mortality could be explained by the heterogeneity of populations.The relatively low in-hospital mortality rate observed in our study can be attributed to several factors.Firstly, the inclusion of geriatric hip fracture patients from multiple hospitals across the nation provides a representative sample, reducing selection bias.Additionally, improvements in surgical techniques, perioperative care, and rehabilitation protocols have contributed to better patient outcomes and decreased mortality rates over the years [40] .It is also possible that the exclusion of patients with severe comorbidities or high-risk surgical candidates in our study might have influenced the mortality rate.

Individualized management for geriatric hip fracture
The implementation of our developed AI model using the eXGBM technique holds significant clinical implications.Accurate risk prediction of postoperative in-hospital mortality can guide clinicians in optimizing treatment strategies, such as tailored surgical approaches, specific anesthesia plans, and early mobilization protocols, to improve patient outcomes.Moreover, identification of high-risk patients allows for appropriate allocation of healthcare resources and timely intervention to reduce mortality rates.Individualized management can be achieved.For geriatric hip fracture patients identified as high-risk for postoperative in-hospital mortality, a proactive and individualized management approach is essential.Close monitoring of vital signs and regular pain assessment should be implemented to promptly identify any signs of deterioration.Early intervention by a multidisciplinary team, comprehensive medication management, and optimization are crucial to address potential complications and minimize adverse drug events.Specialized rehabilitation programs tailored to their specific needs and comorbidities can optimize functional recovery.Postoperative infection prevention strategies and regular communication among healthcare providers ensure seamless coordination and continuity of care.Geriatric hip fracture patients identified as low-risk for postoperative in-hospital mortality also require comprehensive postoperative management.Standard care protocols such as pain management, wound care, and mobilization should be followed consistently.Early mobilization protocols and physical therapy should be integrated into their care plan to facilitate optimal recovery and functional outcomes.Adequate nutrition support, psychological support, and regular follow-up appointments are important for healing and well-being.Patient education and engagement in the care plan ensure a clear understanding of postoperative instructions and adherence to medication regimens, contributing to a smooth transition to a home setting.

Limitations
Despite the promising findings of our study, there are several limitations that should be acknowledged.Firstly, our study focused on predicting in-hospital mortality and does not provide insights into long-term outcomes or mortality after discharge.Future studies should address this limitation to provide a comprehensive understanding of the overall prognosis of geriatric hip fracture patients.Secondly, the variables included in our model were limited to those available in the dataset, and there might be other important predictors not included in our analysis.Incorporation of additional variables, such as frailty indices or surgical complications, could further improve the predictive accuracy of the model.Lastly, our study was retrospective in nature, which might pose challenges in determining the causal relationship between variables.Thus, further research is needed to overcome these limitations and enhance our understanding of the factors influencing the prognosis of geriatric hip fracture patients.Additionally, the use of interventional studies and larger, more diverse datasets could help to validate and refine the predictive models, ultimately improving the accuracy and clinical utility of these models for geriatric hip fracture patients.

Conclusions
In conclusion, our study successfully develops an AI model to predict the risk of postoperative in-hospital mortality in geriatric hip fracture patients.The eXGBM model demonstrates excellent predictive performance, outperforming other models in terms of discrimination and calibration.The implementation of this model in clinical practice could enhance risk stratification and aid clinicians in making informed treatment decisions to optimize patient outcomes and resource allocation in the management of geriatric hip fractures.Further research is warranted to address the limitations and explore the long-term prognostic implications of our AI model.

Figure 1 .
Figure 1.Study design and machine learning process.The experimental design consisted of three main components: data collection, resampling and randomization, and modeling and validation.The study employed six techniques for modeling.including the Logistic Regression (LR) model, as well as five machine learning models: Decision Tree (DT), Naïve Bayesian (NB), Neural Network (NN), eXGBoosting Machine (eXGBM), and Random Forest (RF)

Figure 3 .
Figure 3. Area under the curve (AUC) achieved by different models after applying 100 bootstraps.The X-axis of the graph represents the number of bootstraps, while the Y-axis represents the AUC value.Each model is represented by a different color.

Figure 5 .
Figure 5. Calibration curves and histograms of mean prediction probabilities for all the models.A. Calibration curve; B. Logistic Regression; C. Decision Tree; D. Naïve Bayesian; E. Neural Network; F. eXGBoosting Machine; G. Random Forest.The calibration curve plots the mean predicted probability against the fraction of positives in deciles.This curve provides insights into the calibration or alignment of the predicted probabilities with the actual probabilities.Additionally, the histogram displays the mean predicted probability along with its corresponding count.

Figure 6 .
Figure 6.Density curves for all the models.A. Logistic Regression; B. Decision Tree; C. Naïve Bayesian; D. Neural Network; E. eXGBoosting Machine; F. Random Forest.The green indicates patients without in-hospital mortality, and the red indicates patients with in-hospital mortality.The less overlap between the red and green colors, the better the model's ability to discriminate.

Figure 7 .
Figure 7. Decision curve analysis for all the models in the validation cohort.The horizontal dotted gray line represents the 'treat none' scenario, indicating the net benefit when no treatment is given.On the other hand, the dotted inclined gray line represents the 'treat all' scenario, indicating the net benefit of different threshold probability when all individuals receive treatment.

Figure 8 .
Figure 8. Subgroup analysis of prediction performance of the developed model.A. Age; B. Sex; C. Fracture type; D. Operation process.

Figure 9 .
Figure 9.The online AI application is equipped with the optimal machine learning model.A. A panel for selecting model parameters; B. An interface for calculating the probability; B. An interface for model explanation and introduction of the model.To utilize the application, users can select the desired model parameters through the parameter selection panel.Once the parameters are set, they can simply click the 'Submit' button to initiate the calculation process.The application will subsequently generate and display the estimated probability of experiencing in-hospital mortality based on the chosen parameters, along with recommended therapeutic strategies.

Figure 10 .
Figure 10.Comparative evaluation of prediction performance between doctors and the AI platform using the area under the curve (AUC) analysis.

Table 1
A comparison of clinical characteristics between patients with and without in-hospital death.This component sheds light on the significance and influence of various features within the model.By leveraging the AI platform and examining the contribution of model features to predictions for each individual case, researchers can pinpoint variables that act as protective or risk factors for specific patients.Moreover, the AI platform assigns priority rankings to variables on an individualized basis, emphasizing the crucial risk factors that necessitate heightened clinical attention.Moreover, the application prioritizes clarity by providing an interface solely dedicated to introducing and explaining the model.This section offers in-depth information about the model's methodology, principles, and underlying factors.It serves as a valuable resource for users seeking a detailed understanding of how the model operates.Furthermore, the application classifies patients into high-risk or low-risk groups based on the optimal risk threshold.This categorization enhances the utility of the application by providing actionable insights that can guide medical decisionmaking.
7) was used for machine learning algorithms and hyperparameter tuning based on scikit-learn (version 1.2.2).R language program (version 4.1.2) was employed for all statistical analyses.A significance level of less than 0.05 was considered statistically significant.

Table 2
Prediction performance of machine learning-based and traditional models in the validation set.

Table 3
Prediction of mortality among hip fracture patients using machine learning according to current literature.

Table 4 (
Supplemental Digital Content 9, http://links.lww.com/JS9/C567).The above results indicate that the AUC values of the model were relatively stable across different subgroups, suggesting that the model performed well in predicting outcomes in diverse population subgroups.