Current return to play guidelines specify that an athlete must be asymptomatic or symptom free after a concussive incident, for at least 7 d before returning to sport activity (9–11,20). Historically, clinicians have used a multifaceted approach to concussion assessment and decision-making related to the return of concussed individuals to the field of play. This multifaceted approach involves several categories of evaluation, including self-reported symptoms (SRS), physical examination, posturography, neurocognitive performance, and neuroimaging. Scores from each type of measure presumably reflect a gradient of impairment, which assists clinicians in determining injury severity and measuring postinjury recovery (1,3,11,14–17,21,22,25,26). We are unaware of strong evidence supporting the validity of this inference from scores of self-report measures.
Because decisions about the athlete's condition are drawn, in part, on scores from self-report measures, it is appropriate to assess score validity. Validity is a unitary concept that refers to the correctness, meaningfulness, and efficacy of the inferences from scores on a measure (24). Validity is based on the degree to which theoretical and empirical evidence supports the inferences that are made from test scores. The unified theory of validity outlines six types of evidence for score validity. One is factorial validity, which describes how well an underlying dimension can be used to summarize relationships among items. Although all types of validity evidence are important, factorial validity is singularly distinguishable because it provides a basis for the generation of summative scores, which are then used for subsequent tests of score validity, such as divergent and convergent validity (24).
We are aware of only one investigation that has tested the factorial validity of responses to a summative measure of self-report concussion-related symptoms (26). That study evaluated responses of collegiate athletes (N = 279) to a 16-item measure of symptom duration using confirmatory factor analysis (CFA). Results demonstrated that a theoretically derived three-factor model provided a good, but not excellent, fit to scores from the 16-item symptom scale. Based on substantive arguments about item content, the scale was modified to include only 9 of the original 16 symptoms, and the subsequent analysis indicated that the three-factor model provided strong evidence of factorial validity with an excellent fit for scores from the nine items.
Although previous research provides some initial structural evidence for the validity of scores on a self-reported scale among nonconcussed athletes, we note a limitation of that research. The sample of respondents was large enough initially to evaluate the underlying structure of the instrument, but it was not sufficient to support adequacy of either model in another sample (13,18). Also, a novel attribute of that study was that the instructions required participants to report the duration of each symptom over a specific time span. Typically, self-reported concussion scales require a response associated with symptom severity (12,15,21).
To address the issues of the aforementioned research, this investigation evaluated the factorial validity of a self-reported symptom severity scale among a large sample of healthy, nonconcussed high-school athletes.
This study was IRB approved. All participants signed parental consent as well as informed assent documents before participation in accordance with the rules and regulations designed for the protection of human subjects. Participants (N = 1089) were healthy, nonconcussed, active members of interscholastic high-school football teams from across the eastern United States. The sample was all male with a mean age of 16.3 ± 0.9 yr.
The instrument used in this study was the graded symptom checklist (GSC) (15,23). The GSC contains 20 concussion-related symptoms that are rated on a seven-point Likert-type scale with anchors ranging from none to mild to moderate to severe. Participants were asked to respond whether or not they were experiencing a particular symptom on the day of the questionnaire administration and, if so, to rate the severity of the symptom on a scale ranging from 0 (no symptom reported), and 1 (very mild) to 6 (very severe). The GSC is similar in design to the post concussion scale (PCS) introduced in 1998 (21). Consistent with previous research (26), we evaluated the responses to 16 of the 20 GSC items that were identical to the symptoms on the head injury scale (HIS) (25,26). The 16 items included headache, nausea, vomiting, balance problems, sensitivity to light or noise, numbness or tingling, sleeping more than usual, drowsiness, fatigue, sadness, nervousness, trouble falling asleep, feeling “slowed down,” feeling like “in a fog,” difficulty concentrating, and difficulty remembering. We also evaluated a subset of 9 of the 16 items that have been shown to provide the best potential for model-data fit (26). These nine GSC items include headache, nausea, balance problems, fatigue, drowsiness, trouble falling asleep, feeling “slowed down,” feeling like “in a fog,” and difficulty concentrating.
Based on reported model-data findings from previous research (26), factorial validity was analyzed by incorporating a strictly confirmatory approach (2). The models were tested using confirmatory factor analysis. The confirmatory factor analysis was performed using LISREL (version 8.50; Scientific Software Incorporated, Chicago, IL) with weighted least squares estimation. The input polychoric correlation matrix and asymptotic variance–covariance matrix were computed with PRELIS (version 2.50; Scientific Software Incorporated, Chicago, IL). Weighted least squares estimation with polychoric correlations and asymptotic variances–covariances were selected because of the ordered categorical nature of the data and violations of normality, as evidenced by the skewness and kurtosis estimates in Table 1. The size of the sample was adequate to both estimate and tests the models using weighted least squares estimation (19).
We initially tested the fit of a model containing three correlated factors for the 16 items listed previously (Fig. 1), which was tested for the nine GSC items (Fig. 2). The matrix of factor loadings was specified to reflect simple structure (i.e., each item loaded on only a single factor). The factor loading for the first item on each factor was set to be 1.0 to establish the metric of the latent variables. The matrix of item uniqueness was diagonal, and did not include correlated uniqueness. The matrix containing the factor variances and covariances was symmetric.
We then tested the fit of a second-order model for the 16-item (Fig. 3) and 9-item (Fig. 4) versions of the GSC. The second-order model contained three first-order factors subordinate to a single second-order factor. Again, the matrix of factor loadings for the first-order factors was specified to reflect simple structure, and the factor loading for the first item on each first-order factor was set to be 1.0 to establish its metric. The matrix of item uniqueness was diagonal. The matrix of first-order factor variances and covariances was diagonal. The second-order factor loadings were freely estimated, and the second-order factor variance was set to be 1.0 to establish its metric.
The χ2 statistic and subjective indices were used to evaluate the fit of the models. The χ2 statistic assessed absolute fit of the model to the data, but it is sensitive to sample size and assumes the correct model (4,7,19). Accordingly, subjective indices of fit were employed to judge and compare the fit of the models. The root mean square error of approximation (RMSEA) represents closeness of fit (8). The RMSEA value should approximate or be <0.05 to demonstrate close fit of the model (8). The 90% confidence interval (CI) around the RMSEA point estimate should contain 0.05 to indicate the possibility of close model-data fit (8). Both the non-normed fit index (NNFI) and comparative fit index (CFI) are incremental fit indices, and test the proportionate improvement in fit by comparing the target model with a baseline model with no correlations among observed variables (4,5,8). NNFI and CFI values approximating 0.95 were considered indicative of good model-data fit (6,18).
The means, standard deviations, skewness, and kurtosis values for the individual items on the scale are reported in Table 1. The distribution of responses were both highly skewed and kurtosed, meaning that the data were not normal. This was expected for baseline nonconcussed values
CFA on the 16-item scale.
The three-factor measurement model represented a good, but not excellent, fit to the 16-item scale (χ2 = 315.97, df = 101, RMSEA = 0.04 (90% CI = 0.04–0.05), NNFI= 0.96, CFI = 0.97). The χ2 statistic was significant (P < 0.0001), but the RMSEA was less than the acceptable threshold value of 0.05. The CFI and the NNFI values were above the criteria value of 0.95 recommended by Hu and Bentler (18). This provided adequate support for the good fit of the three-factor model to the data, but previous findings indicate that model-data fit can be improved based on reported model modifications (26). The standardized covariances between latent constructs were 0.93 (somatic and neurobehavioral), 0.94 (somatic and cognitive), 0.93 (neurobehavioral and cognitive). The mean of the factor loadings for the 16-item scale was 0.85, with a median of 0.86; the factor loadings ranged between 0.68 and 0.96. Consequently, support was seen for the three-factor model to the 16-item scale.
We tested the fit of a single second-order factor to describe the covariances among the three first-order factors. The higher-order model represented a good fit to the 16-item HIS (χ2) = 315.97, df = 101, RMSEA = 0.04 (90% CI = 0.04–0.05), NNFI= 0.96, CFI = 0.97), and fit identically compared with the correlated, three-factor measurement model. This was expected as the higher-order model contained only three first-order latent variables (7). Therefore, support was seen for a single second-order factor underlying the three first-order factors underlying the 16-item GSC.
CFA on the 9-item scale.
The structure analysis of the 9-item scale resulted in stronger indices of fit than those reported with the 16-item scale. The three-factor measurement model represented an excellent fit to the 9-item scale (χ2) = 55.79, df = 24, RMSEA = 0.03 (90% CI = 0.02–0.05), NNFI = 0.98, CFI = 0.99). The χ2 statistic was significant (P < 0.0001), but the RMSEA did not exceed the acceptable threshold value of 0.05 and was lower than the RMSEA of the 16-item scale. The CFI and the NNFI values exceeded acceptable threshold values of 0.95 and were higher than the previous 16-item scale. These improvements in indices of fit provide strong support for the excellent fit of the three-factor model to the data. The standardized covariances between latent constructs were 0.86 (somatic and neurobehavioral), 0.91 (somatic and cognitive), and 0.90 (neurobehavioral and cognitive). The mean of the factor loadings for the nine-item scale was 0.83, with a median of 0.88: the factor loadings ranged between 0.62 and 0.93. Consequently, strong support was seen for the three-factor model to the nine-item scale.
We tested the fit of a single second-order factor to describe the covariances among the three first-order factors. The higher-order model represented an excellent fit to the nine-item HIS (χ2)= 55.79, df = 24, RMSEA = 0.03 (90% CI = 0.02–0.05), NNFI = 0.98, CFI = 0.99), and fit identically compared with the correlated, three-factor measurement model. As with the 16-item model, this was expected because the higher-order model contained only three first-order latent variables (7). Therefore, support was seen for a single second-order factor underlying the three first-order factors underlying the nine-item GSC.
This study provided additional evidence for the factorial validity of a summative self-reported measure of concussion-related symptoms. Although the variance associated with the present data is small, our results support that noninjured athletes report concussion-related symptoms (e.g., headache, nausea, balance difficulty, fatigue, trouble falling asleep, drowsiness, feeling “slowed down,” feeling “in a fog,” and difficulty concentrating) at nonconcussed preseason baseline. In addition, these symptoms are reported in a predictable scoring structure. This structure represents a tightly bound, cohesive group of nine symptoms that can be explained by three underlying latent variables (“somatic” symptoms, “neurobehavioral” symptoms, and “cognitive” symptoms) and a single higher-order factor (concussion symptomatology). Such evidence provides strong support for the structural validity of the measurement of sport-related concussion.
The present study successfully addressed limitations of previous research that examined the structural aspects of score validity for self-reported concussive symptoms (26). The sample of respondents was large enough to establish the underlying scoring structure of the 16-item instrument, and sufficient to support the data-generated 9-item instrument. Moreover, findings suggest that the underlying scoring structure is not affected by the type of descriptor (duration or severity) required by the instrument, although this issue needs to be evaluated in greater detail by more appropriate means in a sample of athletes with concussion.
A novel aspect of this study was that our sample was composed of high-school student athletes. The implication of using this sample is that younger student-athletes respond to a self-reported measure of concussion-related symptoms in a similar fashion to that of the older college-aged population, and that separate scales are not likely necessary for clinical use in these two populations. The relevance of this finding is also substantiated by the fact that a greater number of physicians and certified athletic trainers actively work with the younger of the two populations. Often, clinicians who serve this level of athlete do not have access to the more expensive and time-consuming measures of sport-related concussion and, thus, rely on more subjective or informal methods of concussion assessment. Finding a similarity of response structure between the different aged groups of student athletes provides strong support for the potential inferences drawn across varying age groups and justifies the need for the continued development of a systematic measure of self-report symptoms following sport-related concussion.
It is critical to realize that across the vast spectrum of the test and measurement validation process, our findings are mere first steps that provide a solid foundation for future investigations. It is commonly realized that methods of self-report are suspect to violations of external and internal validity. Issues related to the generalizability of baseline responses to postconcussion responses, test administration, respondent honesty, socially motivated response bias, clarity of instructions, and relationships of baseline SRS scores to other purported measures of concussion severity all play integral roles into the interpretation and understanding of score meaning. With this understanding, the investigation of these issues is more reliably performed with measures shown to be adequate. The findings from this study build on the adequacy of the investigated measure, which provides support for its use as an investigative tool. It is also important to understand that empirical evidence for the validity of concussed responses to summative SRS scales has not yet been performed with large samples. Such evidence must be presented before clinicians' base treatment decisions on inferences made from scores obtained via postinjury self-report symptom scales.
1. Alves, W., S. N. Macciocchi, and J. T. Barth. Postconcussivesymptoms after uncomplicated mild head injury. J. Head Trauma Rehabil.
8: 48–59, 1993.
2. Anderson, and J., D. Gerbing. Structural equation modeling in practice: a review and recommended two-step approach. Psychol. Bull.
3. Barth, J. T., S. Macciocchi, A. Giordano, R. W. Rimel, J. A. Jane, and J. Boll. Neuropsychological sequelae of mild head injury. Neurosurgery
4. Bentler, P. M., and D. G. Bonett. Significance tests and goodness of fit in the analysis of covariance structures. Psychol. Bull.
88: 588–606, 1980.
5. Bentler, H. Comparative fit indexes in structural modeling. Psychol. Bull.
6. Bentler, H., and D. Bonett. Significance tests and goodness of fit in the analysis of covariance structures. Psychol. Bull.
7. Bollen. K. A. Testing Structural Equation Models
, Newbury Park, CA: Sage, 1993, pp. 10–39.
8. Browne, M. W., and R. Cudeck. Alternate ways of assessing model fit. In: Testing Structural Equation Models
, K.A. Bollen and J.S. Long (Eds.). Newbury Park, CA: Sage, 2002, pp. 136–162.
9. Cantu, R. C. Return to play guidelines after a head injury. Clin. Sports Med.
10. Cantu, R. C. Posttraumatic retrograde and anterograde amnesia: pathophysiology and implications in grading and safe return to play. J. Athl. Train.
11. Collins, M. W., S. H. Grindel, M. R. Lovell, et al. Relationship between concussion and neuropsychological performance in college football players. JAMA
12. Collins, M. W., G. L. Iverson, M. R. Lovell, D. B. McKeag, J. Norwig, and J. Maroon. On-field predictors of neuropsychological and symptom deficit following sports-related concussion. Clin. J. Sports Med.
13. Cudeck, R., and M. Browne. Cross validation of covariance structures. Multivariate Behavior Research
14. Grindel, S. H., M. R. Lovell, and M. Collins. The assessment of sport-related concussion: the evidence behind neuropsychological testing and management. Clin. J. Sports Med.
15. Guskiewicz, K. M., M. McCrea, S.W. Marshall, et al. Cumulative effects associated with recurrent concussion in collegiate football players: the NCAA Concussion Study. JAMA
16. Guskiewicz, K. M. Postural stability assessment following concussion: one piece of the puzzle. Clin. J. Sports Med.
17. Guskiewicz, K. M., S. E. Ross, and S. W. Marshall. Postural stability and neuropsychological deficits after concussion in collegiate athletes. J. Athl. Train.
18. Hu, L., and P. M. Bentler. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structure Equation Model
19. Jöreskog, K., and D. Sörbom. LISREL 8: User's Reference Guide.
Chicago, IL: Scientific Software International, Inc. 1996, pp. 271–274.
20. Kelly, J. P., and J. H. Rosenberg. Practice parameters: the management of concussion in sports: report of the quality standards committee. Neurology
21. Lovell, M. R., and M. W. Collins. Neuropsychological assessment of the college football player. J. Head Trauma Rehabil.
22. Macciocchi, S. N., J. T. Barth, W. Alves, R. W. Rimel, and J. A. Jane. Neuropsychological functioning and recovery after mild head injury in collegiate athletes. Neurosurgery
23. McCrea, M., K. M. Guskiewicz, S. W. Marshall, et al. Acute effects and recovery time following concussion in collegiate football players: the NCAA concussion study. JAMA
24. Messick, S. Validity of psychological assessment: validation of inferences from persons' responses and performances as scientific inquiry into score meaning. Am. Psychol.
25. Peterson, C., M. Ferrara, M. Mrazik, S. Piland, and R. Elliott. Evaluation of neuropsychological domain scores and postural stability following cerebral concussion in sports. Clin. J. Sports Med.
26. Piland, S., R. Motl, M. Ferrara, and C. Peterson. Evidence for the factorial and construct validity of a self-report concussion symptoms scale. J. Athl. Train.