Machine Learning in Modeling High School Sport Concussion Symptom Resolve : Medicine & Science in Sports & Exercise

Journal Logo


Machine Learning in Modeling High School Sport Concussion Symptom Resolve


Author Information
Medicine & Science in Sports & Exercise 51(7):p 1362-1371, July 2019. | DOI: 10.1249/MSS.0000000000001903



Concussion prevalence in sport is well recognized, so too is the challenge of clinical and return-to-play management for an injury with an inherent indeterminant time course of resolve. A clear, valid insight into the anticipated resolution time could assist in planning treatment intervention.


This study implemented a supervised machine learning–based approach in modeling estimated symptom resolve time in high school athletes who incurred a concussion during sport activity.


We examined the efficacy of 10 classification algorithms using machine learning for the prediction of symptom resolution time (within 7, 14, or 28 d), with a data set representing 3 yr of concussions suffered by high school student-athletes in football (most concussion incidents) and other contact sports.


The most prevalent sport-related concussion reported symptom was headache (94.9%), followed by dizziness (74.3%) and difficulty concentrating (61.1%). For all three category thresholds of predicted symptom resolution time, single-factor ANOVA revealed statistically significant performance differences across the 10 classification models for all learners at a 95% confidence interval (P = 0.000). Naïve Bayes and Random Forest with either 100 or 500 trees were the top-performing learners with an area under the receiver operating characteristic curve performance ranging between 0.656 and 0.742 (0.0–1.0 scale).


Considering the limitations of these data specific to symptom presentation and resolve, supervised machine learning demonstrated efficacy, while warranting further exploration, in developing symptom-based prediction models for practical estimation of sport-related concussion recovery in enhancing clinical decision support.

Concussion prevalence in sport is well recognized, so too is the challenge of clinical and return-to-play management for an injury with an inherent indeterminant time course of resolve (1–7). Although it has been commonly reported that most athletes generally recover from a sport-related concussion (SRC) in 7–10 d postinjury (8), a more prolonged recovery course is increasingly recognized and evident for many of those affected (4,9,10). Notably, the most recent Berlin consensus statement on concussion in sport notes that the term “persistent symptoms” after SRC should be used when symptoms linger beyond the normal clinical recovery time frames (i.e., >10–14 d in adults and >4 wk in children) (4). However, regardless of the criteria for prolonged recovery, clear and valid insight into the anticipated resolution time based on reported symptoms and other available relevant information could measurably assist in planning an individualized stratified care approach to medically managing SRC in young athletes.

Certain SRC symptoms are more characteristically prevalent at initial presentation (e.g., headache, fatigue, and dizziness), whereas others (e.g., sleep disturbance, frustration, and forgetfulness) typically develop subsequently as delayed or unclear timing of onset and/or reporting, and each has varying durations through the course of concussion recovery (9,10). Several notable factors have been reported to increase the risk for prolonged SRC symptoms in youth, including a history of recent or multiple concussions (1,11), although these findings are inconsistent (10,12). Young age has also been recognized as a modulating factor specific to reported symptoms and respective severity and duration, although the effect may be only small (13). Some evidence indicates that girls report more symptoms and are purportedly more susceptible to a protracted recovery course than boys (14). However, other findings do not support this perspective (10).

Previous approaches have examined selected symptoms in attempt to clarify an anticipated SRC protracted recovery. Lau et al. (15) noted dizziness at the time of injury, whereas Teel et al. (6) cited amnesia as predictive of markedly greater neurocognitive deficits and a slower symptom resolve trajectory after concussion in athletes. However, Meehan et al. (16) did not find sex, age, or amnesia to be associated with prolonged symptom duration, whereas initial symptom severity as indicated by total Post-Concussion Symptom Scale score was the most strongly predictive of prolonged symptom resolve. These findings specific to initial symptom severity are consistent with the most recent Berlin consensus statement (4). Nonetheless, there remains a need for better valid prediction models for poor and/or protracted SRC outcome.

With the complexity and consequent challenge in integrating and interpreting a comprehensive neurological and functional approach to concussion assessment and optimally guiding informed clinical management, advances in technology supported by augmented intelligence and supervised machine learning could provide a distinct practical advantage to the practitioner. Bittencourt et al. (17) challenge sports injury researchers and clinicians to extend beyond traditional statistics in adopting a nonlinear complexity paradigm and systems approach to more appropriately address multifaceted and complicated human health conditions. Accordingly, the sport injury domain may indeed be a responsive proving ground and beneficiary via machine learning. Kampakis (18) reported on a preliminary classification examination using machine learning tools to predict injury recovery time in professional football (soccer) players. However, this study used only three learning methods and had very limited data available regarding the circumstances of the injuries, signs and symptoms, and treatment. Moreover, recovery time in this study was based on how long the athlete sat out—not necessarily signs and/or symptom resolve. Recent other examples of using machine learning techniques and predictive modeling have emerged in addressing hamstring and other lower-extremity muscle injuries (19,20). More specific to concussions, machine learning has been used for screening detection. Falcone et al. (21) used support vector machine (SVM) to successfully detect (with high prediction accuracy) the occurrence of a concussion based on isolated vowel sounds pulled from speech recordings. Dabek and Caban (22) also used SVM and validated their model in predicting with high accuracy the likelihood of military service members developing posttraumatic stress disorder after concussion. The advent of using machine learning in addressing the complexity of various human health challenges is only recently underway, but the advantages in more aptly and concomitantly considering interrelated multidomain data reflecting a real-world systems biology are increasingly being demonstrated and realized.

Examining data from an established national student-athlete injury database, our study represents a novel pursuit on the practical utility of supervised machine learning techniques for the prediction of SRC symptom resolution time. Our aims were to identify the highest-performing learners and to externally validate the resulting new concussion symptom resolve prediction models based on these data, specific to three recovery category thresholds—7, 14, and 28 d. We anticipated that the models would perform well with a prioritized utility of features that are clinically consistent with previously established indicators of protracted concussion resolve in Sport, while also concomitantly providing practical insights into using a viable prediction model–based strategy in managing this complex injury. Demonstrated efficacy in this initial exploration and modern approach would help clarify and advance the expanded application and utility of machine learning as an effective and complementary instrument in sports medicine practice to aid in SRC clinical decision support. In particular, the projected utility and high value of these preliminary machine learning–based models in clinical decision support for managing SRC set the stage for practical development and implementation of readily usable tools in the clinic that will be reinforced with much larger and continually expanding in breadth and depth integrated databases and corresponding advanced analytics.


National Athletic Treatment, Injury and Outcomes Network data

The data used for this experiment were from the National Athletic Treatment, Injury and Outcomes Network (NATION) injury surveillance program on high school student-athletes between 2011 and 2014 (23), which contained injury data collected over the course of three academic years from 147 high schools in 26 states. These original data included details on 2004 concussion incidents in 22 sports. The NATION project was previously reviewed by the Western Institutional Review Board (Puyallup, WA) and deemed exempt from human subject protections review. Broadly, the data could be reduced to three fundamental groupings of information, all recorded by athletic trainers (AT) in respective electronic medical records, then deidentified and sent to a database using a common data element standard. First, there were circumstances of the injury, including details such as the playing surface on which the incident occurred and the activity of the player at the time of injury. Second, there was nonidentifiable information about the injured student-athlete, including sex and class year. Lastly, there were details on the SRC injury, including whether certain symptoms were present, as well as the amount of time until all symptoms reportedly resolved and time elapsed (number of days) for each affected athlete before returning to scheduled team activities (even if with limitations and/or accommodations) in his or her respective sport. All these data were included in all models. Comparing athlete and injury ID, we were able to deduce the number of concussions each affected student-athlete suffered previously (only within the boundaries of these 3 yr of data collection). However, the data entry for each incident was not always definitive in distinguishing recurrent symptoms from a new injury; thus, this information was not included in the models.

For each SRC incident, symptoms were recorded from verbal reporting by the student-athlete to the AT or responses to an administered 17-item yes/no checklist, based on the National Collegiate Athletic Association Injury Surveillance Program (7). Symptoms could be amended after the initial concussion injury data were recorded, so that an individual’s SRC symptom profile included the aggregate of initial symptoms and any delayed-onset symptoms that presented during the recovery period.

From the original data, we constructed a new data set based on the sport in which the injury occurred. Because more than half of the injuries occurred during (American) football, our new data set included concussive injuries in football, as well as those from all other contact sports (i.e., wrestling and field hockey, as well as boys’ and girls’ basketball, soccer, and lacrosse) as this is where most of the rest of the concussion incidents occurred. We anticipated that focusing the analyses in this way (contact sports only) would improve model performance by minimizing variability in contributing sport characteristics. Moreover, these sports are generally consistent with an accepted definition of contact sports to include those sports that emphasize or require physical contact between participating players, or in which regular physical contact between athletes is an accepted part of play (24).

Each injury contained a categorical variable that indicated the amount of elapsed time until all SRC symptoms were resolved. This was used to create our model classification labels. The possible values for symptom resolution time included “within 1 min,” “within 15 min,” “within 1 h,” “within 1 d,” “within 3 d,” “within 7 d,” “within 14 d,” “within 28 d,” “more than 28 d,” or “did not resolve.” For our data set, we assigned class labels for three distinct category thresholds of symptom resolution time. These category thresholds were chosen based on the distribution of our data, as well as those findings from other research on the general distribution of concussion symptom resolution time. Multiple studies have shown that most SRC symptoms typically resolve within 1 wk (7,25); however, our aggregate contact sports data had a higher frequency of injuries (866) in which symptom resolve took longer than 1 wk (i.e., SRC symptom resolve time recorded as “within 14 d” or longer). The specific distribution was within 7 d (745 instances), 8–13 d (391 instances), and 14 or more days (475 instances).

Accordingly, we examined symptom resolve category thresholds of 7 and 14 d. For the 7-d threshold, the positive class (class of interest) was defined as any SRC injury with a symptom resolve time of “within 7 d” or greater, whereas the negative class encompassed all instances with symptom resolve time inclusive of and between “within 1 min” and “within 3 d.” The data were split in a similar fashion for the 14-d threshold with the positive class including those SRC incidents with a recorded resolve time of “within 14 d” or greater. Besides the practical clinical relevance of these recovery times, the 14-d threshold facilitated splitting the data near evenly between the two classes, which can be beneficial in building more robust machine learning models. Otherwise, with measurably disproportionate class imbalance, the models would tend to be biased toward the higher-populated class. In addition, we looked at a 28-d threshold to specifically account for more protracted concussion symptom resolve time. For this category threshold, the positive class contained SRC instances with a recorded resolve time of “within 28 d,” “more than 28 d,” or “did not resolve,” whereas the negative class contained all other concussive injuries in our data set.

After discarding instances in which the symptom resolution time was unknown, we included and examined a total of 922 football concussions and 689 concussions from other contact sports (totaling 1611 concussion incidents from all contact sports).

Descriptive statistics

The distribution of the total number of reported symptoms (0–17), indicated by each respective percentage and number of SRC incidents, was determined for the data set. The prevalence of specific reported symptoms and the percentage of total reported concussions for each respective symptom were also determined. With the predominance of categorical data, we used the Pearson chi-square test to determine whether significant relationships existed between those various features.


With no available (in the literature or from our own experience) SRC model or otherwise relevant precedent to guide specific classifier selection, 10 candidate classification algorithms were used in this study to determine the optimal learners for our data. The classifiers we selected included logistic regression, Naïve Bayes (NB), SVM, 5-nearest neighbors (5NN), C4.5 Decision Tree (C4.5D and C4.5N), Random Forest (RF100 and RF500), multilayer perceptron, and radial basis function network. All classifiers were trained using the data mining tool Weka (26). Detailed descriptions to contrast these algorithms can be found in the Appendix (see Text, Supplemental Digital Content 1, where details on each of these classification algorithms are provided,

Experimental design and performance metric

Each model was built using 10-fold cross validation. We randomly divided the data into 10 equal segments, using 9 of these segments to train the model and the remaining segment for testing. With our data set, each segment had effectively 161 instances (1/10 of the total number of SRC incidents for this data set). This procedure was repeated 10 times, in which a different segment was used as the test set in each iteration, and the results were combined to build the final model. For each learner, this entire process was repeated 10 times to reduce bias, as a result of the data being split differently each time, and to ensure replicability as well as determine the overall model performance. In total, 300 models were built (10 learners × 3 thresholds × 10 runs = 300 models).

To measure model performance and graphically represent the trade-off between sensitivity and specificity, we used the area under the receiver operating characteristic curve (AUC). AUC graphs the true positive rate (symptom resolution time classified correctly) on the y-axis, that is, the proportion of concussion incidents where symptom resolve occurred within or took longer than the predicted category threshold. Specificity represents the true negative rate; however, AUC uses the false-positive rate, calculated by 1 − specificity. False positives (i.e., the proportion of concussions where symptoms resolved sooner than the predicted category threshold period and were thus misclassified) are on the x-axis. The area under this curve is a value between 0 and 1, with 1 indicating perfect classification performance.

To examine performance differences across the classification models for all learners, with each SRC symptom resolution category threshold, a single-factor (1 × 10) ANOVA with a 95% confidence interval and choice of learner as the factor was performed. We then examined the overall difference between each of the 10 learners, using the Tukey honest significant difference (HSD) test, comparing the performance mean of the 10 models for each learner.

Feature ranking

To better appreciate the predictive value of each SRC symptom (feature), we applied three commonly used feature selection methods to our data set at all three prediction thresholds to examine the strength of the respective associations to each prediction model. Each feature was ranked according to its predictive value in determining the classification decision. The three feature ranking methods used were chi-square, information gain, and gain ratio (see Text, Supplemental Digital Content 1, where details on these feature ranking processes are provided,

The chi-square ranker used the chi-square statistic to measure the importance of each feature. A higher chi-square value indicated dependence between a feature and the class. Information gain compared the entropy in the full data set as compared with the data set without the feature of interest. Because information gain could have been potentially biased toward those features with large numbers of values, we also examined gain ratio, which normalized the information gain result by dividing it by the information value of the feature of interest.



For our data set of all contact sports, the total number of symptoms reported per SRC incident ranged from 0 to 17, with 55.0% of the student-athletes reporting five or more symptoms (Fig. 1). The most prevalent reported SRC symptom was headache (94.9%), followed by dizziness (74.3%) and difficulty concentrating (61.1%). Each of the remaining symptoms (or no symptoms) was reported far less frequently (Fig. 2).

Distribution of the total number of reported symptoms after the SRC incidents sustained by these student-athletes representing all contact sports. Percentage of total reported concussions is indicated on the y-axis, with the total number of concussion incidents indicated for each total number of symptoms bar along the x-axis. Adapted from O’Connor et al. (5).
Prevalence of symptoms reported after the SRC incidents sustained by these student-athletes representing all contact sports, with the percentage of total reported concussions for each respective symptom indicated on the x-axis (and labeled at each bar). Adapted from O’Connor et al. (5).

The Pearson chi-square test revealed a significant sex dependency in reported prevalence (~2 times greater in boys) of disorientation (P = 0.0005) and retrograde (P = 0.024) and posttraumatic amnesia (P = 0.002). The relationship between sex and symptom resolution time was statistically significant for the 14- and 28-d categorical thresholds (P = 0.005 and P = 0.017, respectively), but not at the 7-d threshold (P = 0.131). Class year was significantly associated with headache (P = 0.002) and loss of consciousness (P = 0.039), but no other symptoms. However, there was no statistically significant association between class year and any of the three SRC symptom resolution categorical thresholds (P = 0.075–0.082).

Symptom resolve

Table 1 provides details on the distribution of SRC instances across each of the three symptom resolve category thresholds. Instances were considered part of the positive class if the symptom resolution time was within or greater than the category threshold, and negative if it was less than the threshold period. This format was dictated by how the symptom resolve data were originally entered specific to a respective category (vs a numeric value representing a specific number of days from each incident). Notably, when symptom resolve was 7 d or less, there was generally an additional 8–11 d before returning to play, whereas with a recovery period of 14 or 28 d, the student-athletes tended to return right after their symptoms were no longer present.

Distribution of SRC positive and negative class instances (and % of 1611 total SRC incidents) of reported symptom resolution time for each symptom resolve category threshold used in developing the prediction modeling.


The classification performance results for all contact sports are shown in Table 2. Tukey HSD results across all learners and each categorical threshold of predicted SRC symptom resolution time can be seen in Figure 3 for all three data subsets.

Classification performance (AUC) results for each of the 10 learners.
Tukey HSD test results for each concussion recovery (symptom resolve) prediction threshold in all contact sports. Data points for each of the 10 classification learners are plotted along the x-axis, based on mean learner performance (0.0–1.0) with each respective confidence interval indicated. Those learners’ confidence intervals that are completely within the shaded area indicate the learners in the respective model that share some statistical similarity (overlap) among one or more of the three top-performing learners and are statistically different than the rest (unshaded or not completely shaded learners).

For all learners, the strongest performances were with the 7-d symptom resolution threshold. This was followed by the 14-d threshold for all learners except for 5NN, whose performance with the 28-d predicted symptom resolution was slightly better than for the 14-d resolution category. The strongest overall prediction model was NB with an AUC of 0.727 when predicting a 7-d symptom resolution time.

For all three category thresholds of predicted symptom resolution time, single-factor ANOVA revealed statistically significant performance differences across the 10 classification models for all learners at a 95% confidence interval (P = 0.000). With each of the category thresholds for symptom resolve, NB and both versions of Random Forest were the top performers. However, the Tukey HSD tests generally indicated a statistically similar between-learner performance among these top-performing learners (Fig. 3).

For the 7-d symptom resolution threshold (Fig. 3A), NB was the strongest learner, followed by RF500 and then RF100. There was a statistically similar between-learner performance between RF500 and both others, but NB performed significantly better than RF100. For the 14-d threshold for symptom resolution (Fig. 3B), there was no statistically significant difference between any of the top-performing three learners, but all three were significantly stronger than any of the others. With the 28-d threshold for symptom resolve, RF500 was the top performer, but there was no statistically significant difference between it and the other two top performers. Furthermore, there was a statistically similar between-learner performance between RF100, NB, and 5NN (Fig. 3C).

To explore potentially improving the models, narrowing the concussion incidents data set and these analyses to football alone generally produced only slightly better performance results among each of the learners (Table 2).

Priority features

Among the three prediction categories for symptom resolve in contact sports, all three feature selection methods identified the same top five symptoms, as well as number of symptoms, as the priority features. The ranked order differed depending on the category threshold and the ranking method, but in each case, the symptoms with the most predictive value were identified as difficulty concentrating, sensitivity to noise, sensitivity to light, balance issues, and insomnia. The number of symptoms ranked ahead of any single symptom with chi-square or information gain and ranked among the top five features with gain ratio. Partial results from identification by these three feature selection methods (chi-square, information gain, and gain ratio rankers) of all priority features are shown in Table 3. Importantly, a high predictive value as determined by these rankers does not necessarily indicate that the respective specific feature was used in all (or any) of the models.

Partial results from the feature selection methods (chi-square, information gain, and gain ratio rankers) identification of priority features.

Regarding symptoms with the least predictive value, the rankings varied, but in general, amnesia, loss of consciousness, tinnitus, and hyperexcitability consistently ranked near the bottom of every list. However, these specific symptoms had the fewest reported instances in the data (see Fig. 2).


We examined the efficacy of 10 classification algorithms (learners) across three different recovery category thresholds –7, 14, and 28 d–for predicting SRC symptom resolve using supervised machine learning. With no SRC machine learning prediction model reported in the literature or otherwise relevant precedent to guide specific classifier selection, it was essential to consider more than one algorithm (or even several) based on our own experience with machine learning and sport injury data. With this initial exploratory approach, we achieved our aim by determining the most optimal-performing classifiers and cross validating our models with our current data set. These classifiers could also serve as preferred initial selections for future model development with similar data. In general, the 7-d classification was more robust (learner performances closer to 1.00) than the other two, but there was not always a large model performance difference between the symptoms’ resolution time categories. NB, RF100, and RF500 were the top-performing learning algorithms indicated by a moderate AUC performance. Overall, these initial findings uphold (while warranting further exploration) using supervised machine learning in developing symptom-based prediction models for practical estimation of SRC recovery (based on symptom resolve) to enhance clinical decision support.

The prevalence of concussions in our data set that took longer than 2 wk for symptom resolve contrasts the traditional 7–10 d suggested recovery time for most SRC (8). Although the Berlin consensus statement (4) and others (27) emphasize that there is no typical window of recovery, it is increasingly recognized that many affected young athletes will experience a protracted recovery (9,10). Interestingly, our data also indicated an apparent sense of urgency to return to play when there was a protracted recovery period. While recognizing that some student-athletes returned to play with restrictions and/or limitations, an accelerated resumption generally conflicts with best practices standard of care specific to returning only after a symptom-free waiting period, followed by an appropriate graduated return-to-play process (4,28,29).

The prioritized utility of features (indicated by ranked order) in our analysis was consistent with our hypothesis and in the context of other related research specific to those recognized symptoms aligned with a protracted SRC recovery. Focusing on information gain in determining classification performance, we also found total number of symptoms (16), sensitivity to noise or light, difficulty concentrating, insomnia, and balance issues to have priority predictive value, indicating their likely important contributing role and utility in our models. By contrast, we did not find amnesia, hyperexcitability, loss of consciousness, or tinnitus to be relevant candidates for measurably facilitating the top-performing models. However, specific low rankings might have been a function of the limited number of reported instances of those respective presented symptoms. Moreover, other classifiers might work better for another data set, and a single classifier may ultimately prove to be the most optimal choice for a valid predictive model when more robust and varied domain data are included.

Further narrowing the data and analysis to football alone generally produced slightly better performance results among each of the learners, compared with the likely already added gain when all contact sports were considered collectively. This may have been due to, in this instance, the football only data and accordingly many of the respective recorded concussion incidence characteristics (e.g., activity at the time of injury) being comparatively more homogeneous and narrow in scope. With the relatively limited size of either data set (football only or all contact sports collectively), too many possible entries in any one field reduces the extent of creating learning scenarios with the respective field value held constant, thus limiting the “education” for each learner and the likelihood for that feature to have a significant impact on the final model. The performance of the learners would have likely been similarly improved by examining from our original data any other single sport by itself (had there been enough concussion incidents reported for that sport). Accordingly, with larger data sets, single-sport analysis and model development should be considered to enhance these learner and model performances and utility of any secondary clinical application tools.

Machine learning in SRC symptom resolve modeling

Machine learning is best used and effectual in predictive modeling when there are numerous observations and a concomitant wide array of high-value (contributing) attributes; that is, the useful data are extensive and multi (high)-dimensional (even if unstructured). Thus, it is not surprising that, in our analysis, the lowest ranked symptoms were the ones that were reportedly presented the least frequently. Therefore, these features may be more predictive than apparent with just these data.

A greater number and multifaceted scope of contributing predictor variables more closely reflects the integrated complexity of the systems biology under consideration when examining SRC. With generally no prior assumptions about distribution or any underlying relationships between these variables, machine learning looks for significant patterns in the data. Of course, as the extent of information under consideration includes more data that are not high- or even moderate-value contributors, a proportional degree of more sophisticated and informed dimensionality reduction is often required to eliminate the “noise” before building the desired model(s).

The longstanding additional recognized advantage of machine learning is that the models can synthesize and inductively “learn” from the relevant new information provided by the ongoing use of their application software platform and automatically update the relevant features, algorithms, analyses, and decisions accordingly in real time (30). Thus, the effective decision support utility of our SRC symptom resolve top-performing models for all contact sports will improve as the models are applied and enriched with more cases and pertinent information, resulting in more robust prediction performance.

Clinical relevance and application

This novel application of supervised machine learning to sport concussion epidemiology is an important step in advancing our approach in clinically managing this complex condition. Supervised machine learning has the potential to more effectively reveal meaningful patterns and potentially unique vital insights into the complex interdependent array of clinical determinants in anticipating concussion symptom resolve and myriad other aspects of concussion management and recovery. It can also be effective in data reduction and highlighting priority indicators (in this case, symptoms) when overseeing affected patients.

The practical clinical utility of this supervised machine learning approach in SRC extends immeasurably beyond simply identifying specific discrete symptoms (or other factors) that are predictive. The application of the priority features and our top-performing learners identified here in clinical practice begins with recognizing those individual indicators with high predictive value that would alert clinicians to an increased likelihood of protracted concussion recovery. However, the clinical relevance and the practical value added will be more fully and practically realized in the technology transfer specific to developing readily usable tools (accessible via a handheld device and app, for example) that a health care provider could use in real time in the clinic. Embedded in an application software using our top-performing classifiers, the respective models could then be used to assist in considering all presenting symptoms and other contributing measures (features) in combination to estimate a uniquely individualized anticipated concussion recovery period. We also appreciate that this is an early stage of introducing machine learning in SRC predictive modeling with a limited set of retrospectively analyzed records of concussion incidents. Thus, using in practice an aggregate of top-performing classifiers is a more appropriate conservative approach for clinical decision support versus prematurely focusing on just one learner and model.

Immediately after a concussive injury, it is especially important to identify those patients who are likely going to have a protracted recovery time. Having the advance insight into anticipated symptom resolve duration from these validated prediction models would thus augment a stratified approach to case management and patient care. This same advance clarity on the time course of anticipated concussion resolve could also be valuable in stratifying SRC patients in clinical trials. A narrower distribution of recovery time could reduce the inherent variability in selected outcome measures and thus the number of subjects needed to achieve adequate power. Accordingly, our emphasis on high-performing learners with high sensitivity (i.e., minimizing false negatives) is arguably a greater priority than an undue weight on classification specificity (i.e., minimizing false positives). Notably, however, more data to enhance the models would lessen the need to accommodate this trade-off.

Defining the positive class in the way we did in this analysis practically translates to clinical decision support by indicating to the health care provider that, based on the postconcussion symptoms presented and other data considered in our models, symptom resolve is expected to take “at least this long” specific to each patient’s assigned category threshold. With an anticipated extended disability, a more aggressive and greater intensity of intervention would be warranted (and potentially cost-effective) versus a traditional “stepwise” approach in which treatment is progressively administered based on the patient’s failure to respond to more conservative interventions first. Such insight would also assist with planning academic accommodations and informing the coaching staff and other stakeholders who must consider personnel needs projections and manage practices and game preparation. Moreover, if framed correctly, this advance insight can mitigate against the Nocebo effect (31), that is, an unintended prompting of negative symptom(s) perception by suggestion of negative expectations, such as when the affected student-athlete might feel “different” or that he or she has a more “severe concussion” if symptoms do not promptly resolve.

Although our findings are encouraging in this simple initial examination, it is important to broaden the multifactorial scope of the models to include genomics, promising biomarkers, changes in autoregulatory control, and relevant pre- and comorbid factors such as inherent preexisting diagnosed or tendency toward anxiety disorder, obsessive compulsivity, and other mood- or personality-related factors. Details on the biokinetics of the injury and head impact and participation exposure, as well as postinjury changes in objective assessment technologies, such as quantitative electroencephalography, would be very helpful as well. The aggregate (or subset) of these and other relevant measures of high-quality data would arguably improve our preliminary SRC recovery prediction models and possibly reveal more optimal learners. The practical value added would be analogous to what a multidimensional assessment battery provides in the acute SRC assessment environment (32).

Although SRC symptoms are indeed indicators to the changing integrated combination of physiological, cognitive, and psychological responses, as well as preinjury susceptibility for the expression of these manifestations, symptom resolve (clinical recovery) alone does not equate to complete neurobiological recovery (27). Moreover, applying such a multifactorial machine learning approach longitudinally to concomitantly consider the temporal profile of an array of fluctuating biomarkers and changes in cognitive and other functional indicators could also be more aptly sensitive in revealing ongoing concussion-related deficits or even an early onset of a resultant chronic (possibly neurodegenerative) pathology from repeated head impact exposure. These examples and myriad other practical arguments underscore why a machine learning and multidimensional predictive model strategy that can consider far more information and seamlessly adapt to dynamic scenarios in a complex systems approach is more appropriately suited to provide more optimal and practical clinical utility and decision support.


Overcoming several key shortcomings of these collected data could readily improve the prediction models. For example, in lieu of local institution or state guidelines, the AT managing and reporting the concussive injuries were encouraged to follow the international definition of concussion provided by the 4th International Consensus Statement on Concussion in Sport (33). However, there was no specific instruction provided or recorded documentation to further characterize each “concussion,” that is, whether diagnosis of concussion was definite, probable, or possible. This distinction would have likely influenced clinical management in many instances and improved our modeling. Moreover, reported symptoms were updated throughout each student-athlete’s recovery period without noting elapsed time since the respective SRC incident. Although this allowed for capturing delayed onset of one or more symptoms, the clinical value of an application (resulting from this current research) when used promptly and based on initial symptoms presentation would be somewhat compromised by our models that were built on all reported symptoms with no consideration for concomitant time since the incident. Capturing symptoms presented in the first few days, and then including when in the recovery period any additional symptoms were reported, would have been more ideal for our objective. Nonetheless, we were able to achieve reasonable model performance even with this prevailing limitation. Concussion recovery (symptom resolve) being a pre-defined categorical variable (vs a specific number of days) limited the analyses and development of our models, including potentially determining more appropriate prediction thresholds and precisely examining the potential effects of returning to sport prematurely. Similarly, with concussion history, it was challenging to precisely differentiate new concussive injuries from recurrent incidents, that is, when symptoms presented from a previous concussion in the same or previous academic year. Thus, the contributing role of concussion history in the models could not be confidently determined; thus, it was not included. Lastly, more data (i.e., concussion incidents and potential contributing factors) would have expanded the groupings and strengthened the models.


We presented a supervised machine learning approach for the prediction of SRC symptom resolution time. From these initial findings, conclusions and key recommendations include the following:

  • NB performed (based on AUC) consistently well; however, both versions of Random Forest were also notable standouts. Moreover, with no significant difference between RF100 and RF500 in predicting SRC symptom resolve with any of these data, we recommend a priority focus on using 100 trees. The more efficient utility of computing resources with this version of Random Forest could be a measurable time and cost-saving advantage with Big Data.
  • With total number of symptoms, sensitivity to noise or light, difficulty concentrating, insomnia, and balance issues having priority predictive value, indicating their likely important contributing role and utility in our models, clinical application begins with recognizing these individual indicators as advance insight into a greater likelihood of protracted concussion recovery.
  • We recommend applying these machine learning approaches to single-sport data sets when possible, to enhance learner and model performance so that the resultant clinical decision support tools can be more specific with respect to practical utility.
  • To strengthen and expand the generalization of the prediction models for SRC symptom resolve, future research should use additional machine learning techniques such as gradient boosting or other robust ensemble methods or by reformulating the concussion resolution prediction as a multiclass classification problem using all thresholds in one model.
  • However, a validated and reliable prediction tool for SRC symptom resolve, even if based only on these initial findings and recognized indicators of likely protracted recovery, would have appreciable utility in guiding stratified case management and treatment decisions, which could result in better outcomes for the affected patients.

The ongoing and rapidly accelerating development of technology and availability of data from myriad sources (e.g., wearable sensors and complementary intelligent devices, biomarkers, advanced imaging, and an array of neuropsychological, neuromotor, and cognitive functional testing options) are increasingly publicized to providers and their patients. Not surprisingly, the scope of information available today already extends far beyond one’s unassisted integrative and informed interpretative ability. For health care providers to fully embrace this new opportunity in considering the real-world systems biology of each patient, advanced data mining and analytics and machine learning modeling will imminently be recognized as essential, precisely because of the breadth and complexity of the numerous integrated factors affecting and characterizing the patient in real time. We are introducing a modern approach and new tool in clinical SRC management … a practice that will measurably improve with more and more inclusive data.

The authors thank the Datalys Center for Sports Injury Research and Prevention for providing the NATION data for us to conduct this research and examine SRC in a new way. This study would not be possible without the assistance of the many high school AT who participated in the original data collection. The NATION project was funded by the National Athletic Trainers’ Association Research and Education Foundation and Central Indiana Corporate Partnership Foundation in cooperation with BioCrossroads. The content of this report is solely the responsibility of the authors and does not necessarily reflect the views of any of the funding organizations.

The authors declare that the results of this study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. The results of the present study do not constitute endorsement by the American College of Sports Medicine.


1. Eisenberg MA, Andrea J, Meehan W, Mannix R. Time interval between concussions and symptom duration. Pediatrics. 2013;132(1):8–17.
2. Haider MN, Leddy JJ, Pavlesen S, et al. A systematic review of criteria used to define recovery from sport-related concussion in youth athletes. Br J Sports Med. 2018;52(18):1179–90.
3. Leddy J, Baker JG, Haider MN, Hinds A, Willer B. A physiological approach to prolonged recovery from sport-related concussion. J Athl Train. 2017;52(3):299–308.
4. McCrory P, Meeuwisse W, Dvorak J, et al. Consensus statement on concussion in sport—the 5(th) International Conference on Concussion in Sport held in Berlin, October 2016. Br J Sports Med. 2017;51(11):838–47.
5. O’Connor KL, Baker MM, Dalton SL, Dompier TP, Broglio SP, Kerr ZY. Epidemiology of sport-related concussions in high school athletes: National Athletic Treatment, Injury and Outcomes Network (NATION), 2011–2012 through 2013–2014. J Athl Train. 2017;52(3):175–85.
6. Teel EF, Marshall SW, Shankar V, McCrea M, Guskiewicz KM. Predicting recovery patterns after sport-related concussion. J Athl Train. 2017;52(3):288–98.
7. Wasserman EB, Kerr ZY, Zuckerman SL, Covassin T. Epidemiology of sports-related concussions in National Collegiate Athletic Association Athletes from 2009–2010 to 2013–2014: symptom prevalence, symptom resolution time, and return-to-play time. Am J Sports Med. 2016;44(1):226–33.
8. Nelson LD, Janecek JK, McCrea MA. Acute clinical recovery from sport-related concussion. Neuropsychol Rev. 2013;23(4):285–99.
9. Eisenberg MA, Meehan WP 3rd, Mannix R. Duration and course of post-concussive symptoms. Pediatrics. 2014;133(6):999–1006.
10. Kerr ZY, Zuckerman SL, Wasserman EB, et al. Factors associated with post-concussion syndrome in high school student-athletes. J Sci Med Sport. 2018;21(5):447–52.
11. Covassin T, Moran R, Wilhelm K. Concussion symptoms and neurocognitive performance of high school and college athletes who incur multiple concussions. Am J Sports Med. 2013;41(12):2885–9.
12. Brooks BL, Silverberg N, Maxwell B, et al. Investigating effects of sex differences and prior concussions on symptom reporting and cognition among adolescent soccer players. Am J Sports Med. 2018;46(4):961–8.
13. Foley C, Gregory A, Solomon G. Young age as a modifying factor in sports concussion management: what is the evidence? Curr Sports Med Rep. 2014;13(6):390–4.
14. Covassin T, Elbin RJ, Bleecker A, Lipchik A, Kontos AP. Are there differences in neurocognitive function and symptoms between male and female soccer players after concussions? Am J Sports Med. 2013;41(12):2890–5.
15. Lau BC, Kontos AP, Collins MW, Mucha A, Lovell MR. Which on-field signs/symptoms predict protracted recovery from sport-related concussion among high school football players? Am J Sports Med. 2011;39(11):2311–8.
16. Meehan WP 3rd, Mannix RC, Stracciolini A, Elbin RJ, Collins MW. Symptom severity predicts prolonged recovery after sport-related concussion, but age and amnesia do not. J Pediatr. 2013;163(3):721–5.
17. Bittencourt NFN, Meeuwisse WH, Mendonca LD, Nettel-Aguirre A, Ocarino JM, Fonseca ST. Complex systems approach for sports injuries: moving from risk factor identification to injury pattern recognition-narrative review and new concept. Br J Sports Med. 2016;50:1309–14.
18. Kampakis S. Comparison of machine learning methods for predicting the recovery time of professional football players after an undiagnosed injury. In: Machine Learning and Data Mining for Sports Analytics ECML/PKDD 2013 Workshop; 2013 September 27 in Prague, Czech Republic. 2013. pp. 46–55.
19. López-Valenciano A, Ayala F, Puerta JM, et al. A preventive model for muscle injuries: a novel approach based on learning algorithms. Med Sci Sports Exerc. 2018;50(5):915–27.
20. Ruddy JD, Shield AJ, Maniar N, et al. Predictive modeling of hamstring strain injuries in elite Australian footballers. Med Sci Sports Exerc. 2018;50(5):906–14.
21. Falcone M, Yadav N, Poellabauer C, Flynn P. Using isolated vowel sounds for classification of mild traumatic brain injury. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC. 2013. pp. 7577–81.
22. Dabek F, Caban JJ. Leveraging big data to model the likelihood of developing psychological conditions after a concussion. Procedia Computer Science. 2015;53:265–73.
23. Dompier TP, Marshall SW, Kerr ZY, Hayden R. The National Athletic Treatment, Injury and Outcomes Network (NATION): methods of the surveillance program, 2011–2012 through 2013–2014. J Athl Train. 2015;50(8):862–9.
24. Rice SG. American Academy of Pediatrics Council on Sports M, Fitness. Medical conditions affecting sports participation. Pediatrics. 2008;121(4):841–8.
25. Gessel LM, Fields SK, Collins CL, Dick RW, Comstock RD. Concussions among United States high school and collegiate athletes. J Athl Train. 2007;42(4):495–503.
26. Hall MA, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD Explorations. 2009;11(1):10–8.
27. Kamins J, Bigler E, Covassin T, et al. What is the physiological time to recovery after concussion? A systematic review. Br J Sports Med. 2017;51(12):935–40.
28. Harmon KG, Drezner J, Gammons M, et al. American Medical Society for Sports Medicine position statement: concussion in sport. Clin J Sport Med. 2013;23(1):1–18.
29. Halstead ME, Walter KD, Moffatt K. Sport-related concussion in children and adolescents. Pediatrics. 2018;142(6):e20183074.
30. Samuel AL. Some studies in machine learning using the game of checkers. IBM J Res Dev. 1959;3(3):210–29.
31. Hauser W, Hansen E, Enck P. Nocebo phenomena in medicine: their relevance in everyday clinical practice. Dtsch Arztebl Int. 2012;109(26):459–65.
32. Garcia GP, Broglio SP, Lavieri MS, McCrea M, McAllister T; CARE Consortium Investigators. Quantifying the value of multidimensional assessment models for acute concussion: an analysis of data from the NCAA-DoD care consortium. Sports Med. 2018;48(7):1739–49.
33. McCrory P, Meeuwisse WH, Aubry M, et al. Consensus statement on concussion in sport: the 4th International Conference on Concussion in Sport held in Zurich, November 2012. Br J Sports Med. 2013;47(5):250–8.


Supplemental Digital Content

Copyright © 2019 by the American College of Sports Medicine