Accurate predictions of life expectancy in patients with skeletal metastases are important to help guide appropriate treatment selection. We previously introduced PATHFx, a clinical decision-making support tool for physicians treating patients with metastatic disease . We then externally validated it in Scandinavian  and Italian patients . The models were expanded to include 1-month and 6-month survival estimates  and were externally validated in Japanese patients . Finally, 18-month and 24-month models were added to provide a more complete survival trajectory on which to base treatment decisions, which were externally validated in Australian patients .
The applicability of any machine-learning tool depends not only on successful external validation in unique patient populations but also on remaining relevant as more effective systemic treatments are introduced. Importantly, the original PATHFx models were created before targeted therapies, such as checkpoint inhibition and adoptive immunotherapy , became widely used in patients with metastatic bone disease. Over time, estimates derived by PATHFx may underestimate survival, particularly for those with non-small cell lung cancer  and other diagnoses responding to targeted therapies. To date, validation studies have only focused on patients who have undergone surgical fixation to treat impending or actual pathologic fractures. However, it remains unclear whether PATHFx may be useful in patients undergoing palliative therapy such as external-beam radiotherapy for symptomatic lesions.
Therefore, we sought to (1) generate updated PATHFx models using recent data from patients treated at one large, urban tertiary referral center and (2) externally validate the models using two contemporary patient populations treated either surgically or nonsurgically with external-beam radiotherapy alone for symptomatic skeletal metastases for symptomatic lesions.
Patients and Methods
After obtaining institutional review board approval, we collected demographic, disease-related, and outcome information from 208 patients undergoing surgical treatment for pathologic fractures at Memorial Sloan Kettering Cancer Center between 2015 and 2018. These data were combined with the original PATHFx training set (n = 189)  to create the final training set (n = 397) used for this study. We created six Bayesian belief networks designed to estimate the likelihood of 1-month, 3-month, 6-month, 12-month, 18-month, and 24-month survival after treatment using the bnlearn package in R© Version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria).
We extracted the records of patients treated between 2016 and 2018 from the International Bone Metastasis Registry. Launched in 2016, the International Bone Metastasis Registry was designed to store data from patients treated for pathologic fractures internationally. In addition, we extracted the records of patients treated nonoperatively with external-beam radiation therapy for symptomatic skeletal metastases from 2012 to 2016, using the Military Health System Data Repository (MDR), a database containing patient-level details on all healthcare encounters for military beneficiaries. From each record, we collected the date of treatment, laboratory values at the time of treatment initiation, demographic data, and details of diagnosis and the date of death. All records reported sufficient follow-up to establish survival (yes/no) at 24-months after treatment.
The International Bone Metastasis Registry contained 197 complete records. In the MDR, we identified 315 records of patients undergoing radiotherapy for symptomatic skeletal metastasis during the time period chosen for this study; however, 123 lacked follow-up information to determine survival at 24-months or were missing more than four PATHFx variables and were therefore excluded. This left 192 radiotherapy (RT)-only records for analysis. For each group, follow-up was sufficient to establish survival at 1 month, 3 months, 6 months, 12 months, 18 months, and 24 months after surgery or radiotherapy, as appropriate. Other recorded variables included age at the time of surgery, race, sex, primary oncologic diagnosis, whether the pathologic fracture was impending or complete, number of bone metastases (solitary or multiple), presence or absence of visceral (organ) metastases, presence or absence of lymph node metastases, preradiation hemoglobin level (g/dL), pre-radiation absolute lymphocyte count (K/mL), and radiation oncologist-documented Eastern Cooperative Oncology Group Performance Status. We then generated calibration curves that plotted the predicted risk against the actual risk to determine the accuracy of each model.
We evaluated differences between continuous variables using the Bayes factor t-test and Welch’s t-test to compare means using the Bayes factor library in in R© Version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria). Differences between categorical variables were assessed using a Bayes factor contingency table comparison (contingency TableBF in R©). Similar to any other clinical decision support tool, the development of PATHFx involves internal validation  and multiple external validation sets [5, 6, 9, 11, 12]. These previous studies provide evidence of how and in what way the distributions vary, which can then be used to determine whether there are meaningful statistical differences in the distributions contained in the two validation sets used for this study. When comparing variables in the training and validations sets, we also included the Welch’s t-test, chi-square, and Fisher’s exact test as comparisons to Bayesian testing, given their familiarity to most readers. Bayes factor analysis of the of model variables demonstrated the demographic and clinical features of patients in the validation sets differed from those of patients in the training set. The Bayes factor is not a probability but a ratio of the probability of the alternative hypothesis (meaningful differences) to the null hypothesis (no meaningful differences). We observed the impact of Bayes factors on various prior probabilities to understand what is strong versus weak evidence for the clinical question or hypothesis.
As the size of the Bayes factor increases, so does the strength of the evidence, supporting our belief that there are meaningful differences between the distributions of features between the training set (Memorial Sloan Kettering Cancer Center) and the International Bone Metastasis Registry (Table 1) as well as the RT-only group (Table 2). The features with larger Bayes factors represent notable differences between the training set and the International Bone Metastasis Registry validation set, and they included the presence of visceral metastases, number of bone metastases, and survival longer than 3-months, 6-months, 12-months, and 18 months. On the other hand, differences in the RT-only validation set included sex (more male patients), presence of lymph node metastases, presence of visceral metastases, and number of bone metastases. In the International Bone Metastasis Registry validation sets, the variables with greater evidence against the null hypothesis, as indicated by larger Bayes factors, also had relatively low p values in corresponding Welch’s t-test, chi-square, and/or Fisher’s exact test (Table 1). This was also done in the RT-only validation sect (Table 2).
Some features in the validation sets had a degree of missing data due to originating from large database sources. Notable features included lymph node metastases (missing in 100% of the International Bone Metastasis Registry validation set) and the surgeon’s estimate of survival (missing in 100% of the RT-only validation set), all of which are important first-degree or second-degree predictors of survival in the PATHFx tool.
To perform external validation, we applied the data from each record, as is, to the new PATHFx models. We generated models containing the outcomes of 1-month, 3-month, 6-month, 12-month, 18-month, and 24-month survival (yes or no) and generated calibration curves that plotted the predicted risk against the actual risk to assess the accuracy of model predictions.
We determined the discriminatory ability of each model in estimating the likelihood of survival at each time interval by sensitivity, specificity, and receiver operating characteristic (ROC) curve analysis . A minimum ROC area under the curve (AUC) of 0.7 was considered an acceptable predictive value [3, 5, 11]. We chose this threshold because we considered it to be the lowest acceptable limit of discriminatory ability . We assessed the accuracy of each model by using the Brier score.
The concept of decision curve analysis is to determine clinical utility as expressed as the surgeon’s estimation of survival in this study . Thresholds vary by surgeon, patient, and clinical presentation; however, surgeons usually have a low threshold for treatment when treating healthy patients with straightforward problems and higher thresholds with sicker patients undergoing more complex care [3,13]. For example, in the context of this study, the decision curve analysis represents the surgeon’s estimate of probability of a patient surviving to a certain time point would have to be met or exceeded for the surgeon to select particular treatment options, such as radiotherapy-alone versus doing nothing or surgical intervention versus doing nothing.
Finally, we included, where possible, information recommended in the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis .
Generating Updated PATHFx Models
Our updated PATHFx version 3.0 models successfully classified survival at each time interval in both external validation sets and demonstrated appropriate discriminatory ability and model calibration.
Externally Validating PATHFx v3.0
The Bayesian models were reasonably calibrated to the Memorial Sloan Kettering Cancer Center training set (Fig. 1A-F). The models overestimate actual survival over the lower range probabilities and underestimate survival over the higher ranges (Fig. 2A-F). This is a subjective determination limited to the training data. Though methods exist to quantify the degree of calibration along a continuum, there is no consensus regarding what is, and is not, calibrated. However, we believe the favorable Brier scores and decision curve analysis indicate that these models are suitable for clinical use. The Brier scores were all less than 0.20 with upper bound of the 95% CIs all less than 0.25 for both the radiotherapy-only and International Bone Metastasis Registry groups. Additionally, AUC estimates were all greater than 0.70 with lower bounds of the 95% CI all greater than 0.68, except for the 1-month radiotherapy-only group (Table 3). The decision curve analysis indicated that models possessed clinical utility in patients undergoing surgery and those not undergoing surgery for the International Bone Metastasis Registry group (Fig. 2A-F) and the RT-only group.
In this study, the models could be used rather than assume all or none of the patients with skeletal metastatic disease treated with surgical fixation will survive greater than 3-months, 6-months, 12-months, 18 months, and 24-months (Fig. 2A-F). However, for the RT-only group, the models could be used when making decisions about patients who survive longer than 12-months, 18 months, or 24-months (Fig. 3A-F). Not only are decision curves helpful in telling us when to use a model, but also they can tell physicians when support tools such as PATHFx should not be used. For example, for patients treated with RT alone, a clinician may achieve better outcomes by assuming all patients will survive longer than 1-month, 3-months, or 6-months unless their threshold for treatment exceeds 95%. This is the most evident for the 1-month model when prescribing RT alone (Fig. 3A-F). This is also seen in the surgically treated patients (Fig. 2A-F). We found it is better to assume patients will survive 1-month than to rely solely on the PATHFx model, partly because more than 90% of patients in all of the sets tested in this study (Memorial Sloan Kettering Cancer Center, International Bone Metastasis Registry, and RT-only) survived for longer than 1-month.
With advancement in medical and surgical therapies, it is necessary to risk stratify patients with metastatic bone disease. Accurate estimates of patient survival are important because they help determine treatment choices. Poor decision-making impacts patient quality of life and resource utilization. PATHFx is a freely available clinical support tool (https://www.pathfx.org) that provides objective survival estimations for patients who undergo surgical fixation for skeletal metastases . It helps guide surgical decision-making and avoids undertreatment or overtreatment of the disease . Once a PATHFx model is developed, it undergoes rigorous evaluation for statistical accuracy and clinical utility. The introduction of a clinical support tool is only the beginning of the model lifecycle. For a tool to remain clinically relevant, it requires regular updating. To establish appropriate model lifecycle management, we successfully linked PATHFx to a large international registry, the International Bone Metastasis Registry. This linkage will allow for regular model improvements as new data become available. Finally, we successfully applied PATHFx to patients undergoing palliative radiotherapy for symptomatic metastatic bone disease.
When evaluating the results of this study, its limitations must be considered. Given the data used in this study, it is possible that other statistical techniques could be used to develop similar or superior prognostic models. Exploring other machine-learning techniques such as gradient boosting algorithms and decision-tree and random forest models are slated for future projects. This may be indicated in future studies, given that the models appeared to be miscalibrated, slightly underestimating actual short-term survival over the lower range of probability estimates and overestimating long-term survival over the higher ranges. However, we have extensive experience using Bayesian inference for model development and application. Overfitting can occur with Bayesian belief networks, which would cause the results of the decision curve analysis to be overly optimistic. We sought to mitigate overfitting by including a larger training set than has been previously used for this application, in addition to two unique external validation sets. The techniques used to establish discriminatory ability and net benefit based on external validation have been used multiple times by our study group in the validation and external validation of PATHFx  and multiple external validation sets [5, 6, 11, 12]. However, these techniques are the minimum statistical requirements for a clinical decision-making support tool. Meares et al.  compared PATHFx with six other published means of estimating survival in patients with skeletal metastases and found PATHFx to be most accurate for estimating 3-month and 6-month survival in patients with femoral metastatic bone disease. The same group found PATHFx to be the most consistent tool, providing accurate estimates of survival at all study time periods .
Another limitation is that the data in the training set were derived from a large tertiary referral center, which could limit universal application to community centers. To mitigate this and ensure the models remain applicable to the broadest international and community patient populations, we retained the original Memorial Sloan Kettering Cancer Center training data collected before the widespread use of immunotherapy and targeted therapy. The diverse nature of the training data may help explain the appearance of the calibration curves. Future models may need to acknowledge treatment details such as the use of immunotherapy or targeted therapy, including tyrosine kinase inhibitors, and whether the disease responds to initial treatment. Moreover, the International Bone Metastasis Registry is an international registry that was established in 2016. The validation and linkage of PATHFx to the International Bone Metastasis Registry may address the question of universal utility over time as the registry continues to grow. The RT-only group is from a highly selective, military tertiary center and may not represent other populations, specifically civilian populations. Further external validation studies in additional RT-only patient populations (such as those in the International Bone Metastasis Registry) is necessary before the PATHFx models may be used in additional settings. Both validation sets had model features with varying degrees of missing data. However, the models retained discriminatory ability with AUCs above the priori cutoff of 0.70 despite missing data. Previous validation sets [5, 6, 11, 12] all had missing input data. This current validation in the presence of missing data once again highlights an advantage of Bayesian belief networks as a machine learning method. This form of modeling uniquely performs well in the presence of missing data.
Additionally, Bayesian hypothesis testing allows investigators to incorporate prior knowledge when comparing groups of variables obtained from experimental observations. We chose to apply a Bayes factor analysis because conventional parametric and nonparametric tests tend to overstate the evidence against the null hypothesis by ignoring prior evidence and assuming that the null hypothesis is true. In contrast, the Bayes factor demonstrates whether and how beliefs derived from prior knowledge of this extensively studied patient population are altered by new data obtained in this study . New information from validation sets allows for a better understanding of the distributions of variables, enabling one to extend the lifecycle of PATHFx through continued model improvement . Similar to any other clinical decision support tool, the development of PATHFx involves internal validation  and multiple external validation sets [5, 6, 11, 12]. However, the differences between each external validation set and the Memorial Sloan Kettering Cancer Center training set indicate that PATHFx continues to have room for improvement as a clinical support tool. As machine-learning methods continue to advance, the objective is to identify a modeling method that can be applied to diverse patient populations such as those chosen for this study.
On external validation, the discriminatory ability of the PATHFx models for the International Bone Metastasis Registry dataset and the RT-only dataset was similar those of previous operative populations studied [3, 5, 6, 12]. Low Brier scores and favorable decision curve analysis results indicate that all models could be used for patients undergoing radiotherapy; however, the 1-month model should be used with caution and interpreted within the context of the other five models. For the RT-only population, with the decision curve analysis, we can derive clinical relevance and provide evidence that when deciding to treat a patient with radiotherapy, it is better to assume a patient will live 1-month than to use PATHFx. With improved survival in patients with skeletal metastases and a heightened awareness of skeleton-related events guiding referrals, this study poses the question as to whether a 1-month model could be defined by other machine-learning methods or whether a 1-month model is clinically necessary. Nevertheless, the probability estimates for the 1-month PATHFx models are displayed in conjunction with the other timepoints, which allows the user to interpret them in the context of each patient’s survival trajectory. Despite this, we successfully showed the ability of the PATHFx Bayesian models to function in patients treated with radiotherapy for skeletal metastatic disease.
In summary, we found the data support upgrading the PATHFx web site, www.pathfx.org, with version 3.0 of the PATHFx models, to improve the lives of patients with metastatic bone disease. The study highlights our commitment to improving PATHFx over time as new therapies continue to extend the lives of patients with metastatic bone disease. Clinicians might use PATHFx version 3.0 to provide survival estimates for patients receiving palliative external-beam radiotherapy alone and those receiving surgical treatment for symptomatic metastatic bone disease, except for 1-month survival in the radiotherapy-only group. Although PATHFx version 3.0 groups oncologic diagnoses together, new models are being developed to allow for patient-centered and disease-specific (for example, breast, lung, and prostate cancer) decision-making. Further studies continue to explore the application of the models to other patient populations and patients undergoing nonoperative or other palliative treatments, such as cryotherapy or radiofrequency ablation of metastatic lesions. With continued advancements in metastatic disease care, it is our fiduciary duty to maintain up-to-date clinical support tools to help patients and other providers navigate these complex treatment algorithms. As machine-learning methods advance, the objective is to identify a modeling method that can be applied to diverse patient populations such as those chosen for this study.
We thank Clare F. Grazal MS, for her contributions to data analysis and Benjamin K. Potter MD, FACS, for arranging helpful institutional support.
1. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Ann Intern Med. 2015;162:55.
2. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928-935.
3. Forsberg J, Eberhardt J, Boland PJ, Wedin R, Healey JH. Estimating survival in patients with operable skeletal metastases: An application of a Bayesian belief network. PLoS ONE. 2011;6:e19956-8.
4. Forsberg J, Sjoberg D, Chen Q-R, Vickers A, Healey JH. Treating metastatic disease: Which survival model is best suited for the clinic? Clin Orthop Relat Res. 2013;471:843-850.
5. Forsberg J, Wedin R, Bauer HC, Hansen BH, Laitinen M, Trovik CS, Keller JØ, Boland PJ, Healey JH. External validation of the Bayesian Estimated Tools for Survival (BETS) models in patients with surgically treated skeletal metastases. BMC Cancer. 2012;12:493.
6. Forsberg JA, Wedin R, Boland PJ, Healey JH. Can we estimate short-and intermediate-term survival in patients undergoing surgery for metastatic bone disease. Clin Orthop Relat Res. 2017;475:1252-1261.
7. Goodman SN. Toward evidence-based medical statistics. 2: The Bayes factor. Ann Intern Med. 1999;130:1005.
8. Horn L, Mansfield AS, Szczęsna A, Havel L, Krzakowski M, Hochmair MJ, Huemer F, Losonczy G, Johnson ML, Nishio M, Reck M, Mok T, Lam S, Shames DS, Liu J, Ding B, Lopez-Chavez A, Kabbinavar F, Lin W, Sandler A, Liu SV; IMpower133 Study Group. First-line atezolizumab plus chemotherapy in extensive-stage small-cell lung cancer. New Engl J Med. 2018;6;379:220-2229.
9. Meares D, Badran D, Dewar D. Prediction of survival after surgical management of femoral metastatic bone disease – a comparison of prognostic models. J Bone Oncol. 2019;15:100225.
10. Newick K, Moon E, Albelda SM. Chimeric antigen receptor T-cell therapy for solid tumors. Mol Ther Oncolytics. 2016;3:16006.
11. Ogura K, Gokita T, Shinoda Y, Kawano H, Takagi T, Ae K, Kawai A, Wedin R, Forsberg JA. Can a multivariate model for survival estimation in skeletal metastases (PATHFx) be externally validated using Japanese patients? Clin Orthop Relat Res.
12. Piccioli A, Spinelli SM, Forsberg JA, Wedin R, Healey JH, Ippolito V, Daolio PA, Ruggieri P, Maccauro G, Gasbarrini A, Biagini R, Piana R, Fazioli F, Luzzati A, Di Martino A, Nicolosi F, Camnasio F, Rosa MA, Campanacci DA, Denaro V, Capanna R. How do we estimate survival? External validation of a tool for survival estimation in patients with metastatic bone disease-decision analysis and comparison of three international patient populations. BMC Cancer. 2015;15:424.
13. Vickers AJ, Elkin EB. Decision curve analysis: A novel method for evaluating prediction models. Med Decis Making. 2006;26:565-574.