Oermann, Eric K. BS*,‡; Kress, Marie-Adele S. MD‡; Collins, Brian T. MD‡; Collins, Sean P. MD, PhD‡; Morris, David MD§; Ahalt, Stanley C. PhD¶,║; Ewend, Matthew G. MD*
Artificial neural networks (ANNs) are computer programs based on the biological functioning of neurons within neural networks. Famously described by Hebb and later summarized by Carla Shatz, the first principle of biological neural networks is that “cells that fire together, wire together.”1 ANNs harness this fundamental principle to learn patterns within a data set and then apply that learning for recognition and prediction. Crucially, ANNs trained on a data set can perform this task without significant prior assumptions and restrictions on the data to be analyzed.2 Because of this, ANNs excel at modeling complex, nonlinear data when little is known of underlying distributions and relationships among variables and when information may be incomplete or unavailable. The use of ANNs for clinical decision support started in the late 1980s with the publication of several articles in the internal medicine and urological-oncology literature.3-6 Within neurosurgery, however, perhaps ironically, there has been little use of this analytical technique despite its obvious appeal for solving the complicated clinical questions that neurosurgeons encounter on a daily basis, one of the most complicated tasks being predicting neurosurgical outcomes.7-9
Properly predicting the survival outcomes of patients with brain metastases is a clinical challenge and a necessity for properly formulating a management plan. Traditionally, prognostication has been performed by clinicians using their clinical judgment in light of readily available clinical variables such as age, performance status, and systemic disease burden. Although more advanced prognostic tools have been developed in recent years, they often rely on regression analyses and other common statistical tools to create facile scoring systems (Table 1).10-14 Part of the difficulty with predicting survival, however, by either clinical intuition or traditional predictive statistics, is that survival is often determined by a set of complicated, nonlinear variables with implicit interactions between the variables themselves.
In light of this, we hypothesized that ANN analysis could predict 1-year survival for patients with intracranial metastatic disease with an efficacy equal or superior to that of traditional statistical models represented by logistic regression, the Graded Prognostic Assessment (GPA), and the Golden Grading Scale (GGS).12,13 Additionally, we hypothesized that a sensitivity analysis of the ANN would identify primary tumor histology as the most important individual pretreatment variable for assessing patient prognosis, a variable that is conspicuously absent from many of the earlier prognostic indexes.
PATIENTS AND METHODS
A multi-institutional, retrospective database of patients was created for the purpose of training and validating the ANN. Patients treated at the University of North Carolina at Chapel Hill and Georgetown University between 2007 and 2011 with stereotactic radiosurgery on the CyberKnife (Accuray, Sunnyvale, California) were reviewed for this study. Patients were included in the study if they underwent primary treatment with stereotactic radiosurgery for newly diagnosed brain metastases, had a biopsy-confirmed primary diagnosis, and had > 1 year of potential follow-up. Institutional Review Board approval for the accumulation of clinical data for the purpose of this study was obtained at both universities.
Although the computational and algorithmic framework for implementing ANN is well established within the computer science and mathematics literature, we believe that it is helpful to provide a simplified theoretical and practical discussion to better acquaint practicing clinicians with these novel computational tools. Our chosen model in the present study was a multilayer perceptron (MLP) ANN, and for the remainder of this discussion, we focus on the MLP type of ANN, the most common type.2,8,15 MLP-ANNs consist of a series of nodes arranged in layers (input, hidden, and output) with each node functioning like a biological neuron. The output of each node, like the action potential of a neuron, is determined by a formula known as an activation function. Nodes, analogous to neurons, are connected by weighted edges that play a role akin to that of synapses. Just as in synapses, each weighted edge either magnifies or diminishes input signals to the node, resulting in a computational equivalent to long-term potentiation or long-term depression. This system of nodes/neurons and weighted edges/synapses constitutes an ANN that can functionally mimic a biological neural network in its capacity to learn. The actual learning in an MLP-ANN, as in a biological neural network, occurs at the weighted edges/synapses through adjustments in the weights (long-term potentiation/long-term depression).
The most common means by which MLP-ANNs adjust weights and learn is a method known as back-propagation. During this process, initial inputs (age, performance status, genomic information, and other variables) and real outcomes (whatever the network is intended to predict) are provided to the network. The initial inputs are then multiplied by the values of the weighted edges and passed to the nodes, where they are fed into the activation functions to generate outputs. The process repeats itself throughout the network in a feed-forward manner until a final output has been reached. This expected outcome is then compared with the known outcome, which has already been provided, and the discrepancy between the two is calculated as the error in the network. This error is then used to adjust the values of the weighted edges with the intent of minimizing the error on the next iteration of the network.16 Through numerous iterations, network error is minimized as the weights are adjusted, ultimately resulting in an ANN that has been trained on the provided sample.
Statistical Methods and ANN Construction
We used the SPSS Neural Networks package (IBM, Inc, Armonk, New York) for our study. Key pretreatment variables identified as having prognostic significance by current prognostic scoring systems were chosen as input variables for our ANN (Figure 1). These variables were age, performance status, number of metastases, systemic disease status, and primary tumor histology. Because of the heterogeneity of performance status scoring used at both institutions, all scores were converted into the Eastern Cooperative Oncology Group system on account of its simplicity.17,18 For the purpose of data classification, age was treated as a continuous variable. The number of metastases on T1 postcontrast MRI, counted by a neuroradiologist, was treated as an ordinal number. All other variables were treated as categorical variables.
Patients were partitioned by a 2:1:1 ratio to generate training, testing, and validation data sets consisting of 98, 49, and 49 patients, respectively. The second set of testing data was reserved to provide an independent assessment of the learning by the ANN and to reduce overfitting. Lastly, validation of the ANNs was performed on the third validation set of data that were used for neither training nor testing purposes to provide an unbiased assessment of network accuracy and error.
After ANN training, the relative importance of each variable in the ANN model was examined using a sensitivity analysis whereby each variable was removed from the model that was subsequently tested against the entire data set. The resulting errors were recorded and normalized to generate a relative importance of each variable to the overall ANN results. As a second assessment of ANN utility, a series of 30 ANNs were constructed simultaneously using random 2:1:1 partitions of the patient database. The 5 most accurate ANNs on the validation set then had their outputs averaged to produce a 5-member ANN ensemble that could be compared for performance with both the single ANN model and traditional statistical methods. We then compared the accuracy in predicting 1-year survival of the standard logistic regression with the best single ANN (called single ANN) and the prediction of the 5-member ANN ensemble (called ANN vote). Note that the best ANN is included as 1 of the 5 networks in ANN vote.
Traditional statistical analysis of variable significance was performed with standard logistic regression on the same data set that the ANN was evaluated on with SPSS 20.0 (IBM, Inc). For each individual parameter and for the logistic regression model, receiver-operating characteristic (ROC) curves were generated and used to calculate specificities, positive predictive value (PPV), and negative predictive value (NPV) for all models and parameters at 95% sensitivity. Discrimination capability was assessed by calculating the area under the curve (AUC) from the ROC analysis and plotted as bar graphs for comparison between the models. Significance between the different models was assessed by 2-sided t tests comparing the different predicted distributions. The GPA and GGS were calculated per Table 1.12,13 We scored patients using each index individually and then assessed whether each system predicted patient survival beyond 1 year to facilitate comparison with the ANN model as designed. For variables describing the patient population, continuous variables were compared by use of the Mann-Whitney U test; categorical variables were compared by use of Pearson χ2 testing. Additionally, on univariate analysis, the Bonferroni correction was applied to account for multiple comparisons.
Patient Sample Characteristics
Two hundred fifty-one charts at both institutions were reviewed for this study, generating a final set of 196 patients, with 55 patients being excluded because of a lack of sufficient follow-up. Forty-seven percent of the patients were male; 53% of the patients were female. The median age was 61 years, with a range of performance statuses (Table 2). The patients surviving at 1 year tended to be female (P = .001), to have better performance status (P = .01), and to have more favorable primary tumor histology (P = .001). The average diameter of treated lesions was 1.49 cm (SEM, 0.07 cm). Thirty-three patients additionally underwent surgical resection of their lesions with palliative intent for symptomatic resolution, and 41 patients underwent whole-brain radiation therapy (WBRT) either before presentation or within the 1-year follow-up period (Table 3). These patients were subsequently randomly partitioned for subsequent ANN construction and validation.
ANN Construction and Comparison
The pooled ANN vote performed significantly better than the multivariate logistic regression (P = .02) and trended toward exceeding the individual ANN (P = .07; Figure 2). Table 4 lists comparative values for how the multivariable models and individual parameters performed as tests for predicting 1-year survival. The ANN vote, single ANN, and linear regression had AUCs on ROC analysis at 84%, 78%, and 75%, respectively, indicating the success rate of each test at discriminating between patients who would be alive or dead at 1 year (Table 4 and Figure 3). With a fixed sensitivity at 95%, the ANN vote had a specificity of 38%, whereas the ANN alone had a specificity of 32% and linear regression had a specificity of 26%. The ANN vote had a PPV of 66% and an NPV of 86%. This compared favorably with the single ANN with a PPV of 64% and an NPV of 83% and the linear regression with a PPV of 62% and an NPV of 80%.
Sensitivity Analysis and Variable Importance
The sensitivity analysis identified primary tumor histology and Eastern Cooperative Oncology Group performance status as being the 2 individual parameters of greatest importance in the ANN model, with Eastern Cooperative Oncology Group being scored as 71% as useful as histology to the mode (Table 4). Interestingly, systemic disease status was considered only 33% as useful, less so than both number of metastases (65%) and age (58%).
Comparison With Existing Prognostic Indexes
To further test our technique, we sought to directly compare the ANN models with 2 existing prognostic indexes, the GPA and GGS (Table 5). We were able to calculate a GPA score for 158 of the 196 patients. Five of 87 patients were correctly predicted to survive beyond 1 year, and 88 of 109 patients were correctly predicted to die within 1 year. We were able to calculate a GGS score for 158 of the 196 patients as well, with 49 of 87 patients being correctly predicted to survive beyond 1 year and 33 of 109 patients correctly predicted to die within 1 year. For comparison, the ANN vote fixed at 95% sensitivity correctly predicted 33 of 87 patients to survive at 1 year and 104 of 109 patients to die within 1 year. ROC analyses yielded an AUC of 54.3% for the GGS and 53.6% for the GPA (Figure 2).
Predicting survival is hard. Survival is intrinsically a complicated function of numerous parameters that often defy intuitive understanding. Individual parameters such as performance status and the presence or absence of systemic disease can provide a measure of prognostic utility, but as this study shows, their discriminatory ability to independently predict 1-year survival barely rises above chance (AUC = 50%). To circumvent these problems, multivariate statistical models and comprehensive scoring indexes can synthesize individual parameters into sophisticated tools for predicting outcomes. Such systems, in a sense, attempt to mirror clinical thinking itself by considering variables such as performance status and age in the context of an overarching clinical disease.
In the present study, we sought to take this imitation of clinical thinking a step further by using a series of ANNs designed to predict 1-year mortality using the same pretreatment variables that have been consistently identified as having prognostic significance throughout the neurosurgical literature. We have shown that an ensemble of ANN can outperform traditional predictive statistical models of survival, represented here by multivariate logistic regression; exceed the ability of individual parameters to predict 1-year mortality; and outperform the GGS and GPA indexes. Limiting our model to already identified prognostic variables serves 2 purposes. First, it allows a fair comparison of the ANN technique itself with other models by restricting the ANN to the same limited set of variables that other models use. Second, given the relatively small number of cases included in this analysis, restricting the number of input variables further reduces the risk of overfitting.
Maximizing the accuracy of the ANN for optimal clinical efficacy was not a goal in the present study, but it is an achievable end point that will be pursued in a follow-up study using an increased number of cases and additional variables. Nevertheless, in its current state as presented, the ANN ensemble is an acceptable test for ruling out the possibility of 1-year mortality. An intuitive metric for interpreting these results is discriminatory capacity, provided by the AUC analysis. Given 2 patients, 1 patient who will die within a year and 1 patient who will not, the ANN vote can distinguish between the 2 patients 84% of the time compared with 78% and 74% for the single ANN and logistic regression models, respectively. In the present case, we included an analysis of sensitivity and specificity with the test parameters chosen so that the sensitivity would be fixed at 95%. Practically speaking, by fixing the sensitivity of the test at 95%, we have ensured that the models will successfully identify 95% of the patients who are at risk of dying within 1 year. Subsequent comparison of the specificities of each model then reveals the percent of patients who will be alive at 1 year who are successfully captured by each model. Our intent in including these statistical measures typically used for comparing screening tests is that they comment on the ability of the models to predict individualized patient outcomes. It is our goal to refine this technique so that it can effectively deliver individualized prognostic results in a clinical setting so that it may be used in a manner similar to common screening tests. A highly specific and highly sensitive computerized prognostic tool could prove to be very useful both in the clinical treatment of patients and perhaps in clinical study design by improving risk stratification.
An example of the importance of a facile determination of prognosis for clinical management is the impact of prognosis on the choice of treatment plans, for instance, with regard to the use of WBRT vs alternative neurocognitive-sparing therapies. As an example, 4 patients in the present study underwent WBRT and subsequently died in < 1 year of systemic disease. In the present ANN analysis, all of these patients were found to have a 5% chance of surviving > 1 year. Would it have made sense, looking back, to forgo WBRT and its potential side effects in these patients, knowing their poor prognosis? Questions such as this are clinically significant, and ANN analysis proffers a facile means of assisting clinical decision making in this regard.5
The outperformance of the ANN models compared with the GPA and GGS indexes on this data set leads to 3 important observations. First, the increased performance of the ANN is due in large part to the fact that the GGS and the GPA do not take tumor histology into account. Primary tumor histology was identified as the most significant variable on sensitivity analysis; therefore, it comes as no surprise that the ANN model outperformed these 2 indexes, particularly on a data set comprising such a wide variety of tumor types. In several early prognostic studies for brain metastases, primary tumor type was noted as being a significant predictor of patient outcomes despite not being incorporated into many of the early scoring indexes.14,19 Second, the GGS and GPA were not able to be calculated for several patients, demonstrating one of the weaknesses of scoring indexes for predicting prognosis. Although the ANN is able to account for missing information, the indexes simply cannot be used. Third, to appropriately account for both missing data or a different relative significance of prognostic variables for different tumor histology such as in the disease specific GPA (DS-GPA), prognostic indexes must become arbitrarily complicated to account for all possible variations and to deliver the personalized prognostic results necessary for clinical implementation. ANNs, however, are able to deliver accurate, individualized prognoses while remaining simple to implement on a computerized platform once constructed. Despite these advantages, ANNs are underused in clinical neurosurgery, with prior studies using ANNs to analyze head trauma outcomes8,20,21 and imaging or genetic profiling in brain tumor patients.22,23 Although early computer hardware limitations hindered the adoption of ANNs, advances in modern hardware have created an amazing opportunity. Electronic medical records, automatically updating clinical databases, and other forms of electronic data gathering and storage have created environments where it is possible to construct ANN models that continuously learn while simultaneously delivering results in a clinical setting.
A key strength of this study is the multi-institutional pool of patients on which it drew. As with other clinical studies, the more diverse patient pool helps to control for regional or even institutional variation that can make both studies and models less generalizable. On the same token, however, a weaknesses of the present report is the homogeneity of radiation therapy, with all patients having undergone radiosurgery. Although it is probable that the results are generalizable to patients not undergoing radiosurgery, the study ought to be repeated in a larger, prospective cohort of general brain tumor patients to validate and hopefully improve on these results. Additionally, this study suffers from the drawbacks of all retrospective clinical studies and would benefit from validation in a prospective trial to mitigate bias.
The practical implementation of ANNs and other machine-learning algorithms depends in a large part on the ability of neuro-oncology programs to generate large, diverse data sets on which to train the networks. For the practical implementation of the described techniques, we expect that programs with the necessary data sets and computer science support might venture to create their own ANN ensembles for the prediction of individual patient outcomes within their own institutions. These institutional ANNs could then be incorporated as a standard part of the pretreatment clinical workup to inform clinical decision making. More ambitiously, however, we hope that this pilot project and future work by our group and others may lay the foundation for the creation of a large, freely available ANN within the neurosurgical community for predicting patient prognosis and potentially other clinical measures.
ANN analysis is a facile and powerful means of predicting survival in neuro-oncology patients that is well suited to use in a clinical environment. ANN analysis is equivalent or superior to traditional means of predicting survival and has several appealing properties for implementation in a clinical setting. With the explosion in clinical data over the past few years, driven in part by the adoption of electronic medical records and modern computer database techniques, we anticipate a growing need for modern machine-learning techniques such as ANNs to properly make use of the information at our disposal and to realize its full benefit for clinical care.
E.K. Oermann received financial support from the Doris Duke Charitable Foundation. Drs B.T. Collins and S.P. Collins have received honoraria from Accuray, Inc. The other authors have no personal financial or institutional interest in any of the drugs, materials, or devices described in this article.
We would like to thank Rianne Hoffman, BA, and Huma Chaudry, BS, for assisting in the collection of data for this work. We would like to thank Charles Schmidt, PhD, for his thoughtful review of the manuscript. We would also like to thank the Doris Duke Charitable Foundation for providing funding for this research.
1. Hebb DO. The Organization of Behavior: A Neuropsychological Theory. New York, NY: Wiley; 1949.
2. Cross SS, Harrison RF, Kennedy RL. Introduction to neural networks. Lancet. 1995;346(8982):1075–1079.
3. Anagnostou T, Remzi M, Lykourinas M, Djavan B. Artificial neural networks for decision-making in urologic oncology. Eur Urol. 2003;43(6):596–603.
4. Baxt WG. Use of an artificial neural network for the diagnosis of myocardial infarction. Ann Intern Med. 1991;115(11):843–848.
5. Baxt WG. Application of artificial neural networks to clinical medicine. Lancet. 1995;346(8983):1135–1138.
6. Zlotta AR, Remzi M, Snow PB, Schulman CC, Marberger M, Djavan B. An artificial neural network for prostate cancer staging when serum prostate specific antigen is 10 ng./ml. or less. J Urol. 2003;169(5):1724–1728.
7. Dumont TM, Rughani AI, Tranmer BI. Prediction of symptomatic cerebral vasospasm after aneurysmal subarachnoid hemorrhage with an artificial neural network: feasibility and comparison with logistic regression models. World Neurosurg. 2011;75(1):57–63; discussion 25-58.
8. Rughani AI, Dumont TM, Lu Z, et al.. Use of an artificial neural network to predict head injury outcome. J Neurosurg. 2010;113(3):585–590.
9. Arle JE, Perrine K, Devinsky O, Doyle WK. Neural network analysis of preoperative variables and outcome in epilepsy surgery. J Neurosurg. 1999;90(6):998–1004.
10. Weltman E, Salvajoli JV, Brandt RA, et al.. Radiosurgery for brain metastases: a score index for predicting prognosis. Int J Radiat Oncol Biol Phys. 2000;46(5):1155–1161.
11. Sperduto PW, Chao ST, Sneed PK, et al.. Diagnosis-specific prognostic factors, indexes, and treatment outcomes for patients with newly diagnosed brain metastases: a multi-institutional analysis of 4,259 patients. Int J Radiat Oncol Biol Phys. 2010;77(3):655–661.
12. Sperduto PW, Berkey B, Gaspar LE, Mehta M, Curran W. A new prognostic index and comparison to three other indices for patients with brain metastases: an analysis of 1,960 patients in the RTOG database. Int J Radiat Oncol Biol Phys. 2008;70(2):510–514.
13. Golden DW, Lamborn KR, McDermott MW, et al.. Prognostic factors and grading systems for overall survival in patients treated with radiosurgery for brain metastases: variation by primary site. J Neurosurg. 2008;109(suppl):77–86.
14. Gaspar L, Scott C, Rotman M, et al.. Recursive partitioning analysis (RPA) of prognostic factors in three Radiation Therapy Oncology Group (RTOG) brain metastases trials. Int J Radiat Oncol Biol Phys. 1997;37(4):745–751.
15. Djavan B, Remzi M, Zlotta A, Seitz C, Snow P, Marberger M. Novel artificial neural network for early detection of prostate cancer. J Clin Oncol. 2002;20(4):921–929.
16. Hansen LK, Salamon P. Neural network ensembles. IEEE Trans Pattern Anal Machine Intelligence. 1990;12:993–1001.
17. Ma C, Bandukwala S, Burman D, et al.. Interconversion of three measures of performance status: an empirical analysis. Eur J Cancer. 2010;46(18):3175–3183.
18. Oken MM, Creech RH, Tormey DC, et al.. Toxicity and response criteria of the Eastern Cooperative Oncology Group. Am J Clin Oncol. 1982;5(6):649–655.
19. Lagerwaard FJ, Levendag PC, Nowak PJ, Eijkenboom WM, Hanssens PE, Schmitz PI. Identification of prognostic factors in patients with brain metastases: a review of 1292 patients. Int J Radiat Oncol Biol Phys. 1999;43(4):795–803.
20. Segal ME, Goodman PH, Goldstein R, et al.. The accuracy of artificial neural networks in predicting long-term outcome after traumatic brain injury. J Head Trauma Rehabil. 2006;21(4):298–314.
21. Pang BC, Kuralmani V, Joshi R, et al.. Hybrid outcome prediction model for severe traumatic brain injury. J Neurotrauma. 2007;24(1):136–146.
22. Georgiadis P, Cavouras D, Kalatzis I, et al.. Improving brain tumor characterization on MRI by probabilistic neural networks and non-linear transformation of textural features. Comput Methods Programs Biomed. 2008;89(1):24–32.
23. Petalidis LP, Oulas A, Backlund M, et al.. Improved grading and survival prediction of human astrocytic brain tumors by artificial neural network analysis of gene expression microarray data. Mol Cancer Ther. 2008;7(5):1013–1024.
This article takes on the problem of how poor clinicians are at predicting survival in cancer patients. Understanding prognosis is an important issue in stratifying treatment strategies and looking at treatment benefits. This group goes about this task by using a neural network model, a model that is beneficial when you do not know what variables are really involved in the outcome and exactly how they are involved. The math behind this modeling can be quite complex. This group does a good job of explaining this model and how it can be clinically applied to the everyday practice of the oncologist. As a clinician constantly put in the position of deciding whether treatment of a patient’s disease would benefit the patient, I found this model very useful. I look forward to the day that such algorithms are publicly available online for physician use to help make difficult treatment decisions and give patients and families some real understanding of their outcome.