Selections of Predictors for Regression Analysis
Based on the results of the correlation analysis above, 40-Y, 20-Y, 10-Y, and CONE were first selected as the potential predictors for the MLR analysis of RBs, as these variables had at least a small-effect correlation with the first 3-year Y/A or career Y/A. However, 40-Y, 20-Y, and 10-Y were highly correlated with each other (i.e., presence of collinearity). In fact, 20-Y and 10-Y are the split times for 40-Y in the NFL Scouting Combine (www.nfl.com/combine/workouts). For the analysis of RBs, we selected 10-Y as one of the predictors over 40-Y and 20-Y because RBs typically gain less than 10 yards per attempt, making 10-Y more relevant to the performance of RBs. Bench press had a correlation coefficient of −0.143 with the career Y/A in the aforementioned correlation analysis. Meanwhile, when the correlation coefficients of the career Y/A to 10-Y, CONE, and BP, along with HT, WT, #G, and Draft, were reanalyzed with missing cases deleted listwise, BP no longer had a small-effect correlation (r = −0.079). Therefore, BP was excluded from the list of predictors. The other potential predictors still had at least a small-size correlation with the first 3-year Y/A and career Y/A. Consequently, we chose 10-Y and CONE, along with HT and WT, as the set of predictors for the MLR analysis of RBs, although, as mentioned previously, including #G and Draft as covariates. The resulting sample size was 84 for the analyses of the first 3-year Y/A and career Y/A.
Regarding the MLR analysis of WRs, BP, 40-Y, 20-Y, VJ, and BJ were first chosen as the potential predictors because of their small-effect correlations with the first 3-year and career Y/R. Of these variables, BP was first excluded from the list of predictors, as the sample size was relatively small (n = 33). Next, as was the case with selecting the predictors for the MLR analysis of RBs, the presence of collinearity was a concern between 40-Y and 20-Y. We chose 40-Y as one of the predictors for the MLR analysis of WRs because 40-Y had a slightly higher correlation coefficient with Y/R than did 20-Y, and because, as mentioned above, the correlation between 20-Y and the career Y/R was less than a small-sized effect. We decided to retain both VJ and BJ in the set of predictors despite a somewhat high correlation between these 2 variables (r = 0.596). We believed that VJ and BJ assess different components of athletic abilities (vertical power vs. horizontal power), and that both measurements are important for WRs. In addition, an intercorrelation in this range generally does not induce serious multicollinearity problems in a MLR model (11). The reanalysis of correlation coefficients indicated that 40-Y no longer had a small-size correlation with the first 3-year Y/R (r = −0.060) and career Y/R (r = −0.080), when all the potential predictors above were included with missing values deleted listwise; therefore, 40-Y was excluded from the list of predictors. The other potential predictors still had at least a small-size correlation with the first 3-year Y/R and career Y/R. As a result, we chose VJ and BJ, along with HT and WT, as the set of predictors for the MLR analysis of WRs, however again, including #G and Draft as covariates. The resulting sample size was 170 for the analyses of the first 3-year Y/R and career Y/R.
Checking Assumptions for Regression Analysis
First, we inspected the residual plots between each of the selected predictors and the measures of on-field performance in the NFL (Y/A for RBs and Y/R for WRs). There was no apparent evidence of heteroscedasticity between each predictor and Y/A or Y/R. Next, we examined the normal probability and residual plots, and ensured that the assumptions of normality, linearity, and homoscedasticity were not violated for all regression models developed in our data analysis. There was no demonstrable multicollinearity among the predictors as evidenced by the tolerance values below 0.1 in all models (5). With respect to outlying cases, none of the Mahalanobis distances for any of the regression models exceeded the critical values of the χ2 distribution at an alpha level of 0.001, indicating no apparent outliers (11). In addition, there were no cases that potentially had any undue influence on the regression coefficients, as no cases had Cook's distances greater than 1 in any of the models (11). However, some cases had standardized residual values above 3.0 or below −3.0, and thus were considered potential outliers (5). Although Cook's distances for these cases were still below 1, we excluded them from the final regression models to eliminate any undue influence of these cases on the regression coefficients. As a result, there were 3 and 2 cases excluded from the MLR analysis for the first 3-year Y/A and career Y/A for RBs, respectively, whereas, 7 cases were excluded from the MLR analysis for the first 3-year Y/R and career Y/R for WRs, respectively.
Regression Analysis on Combine Measures and Future Performance of Running Backs
The results of the MLR analysis on the Combine measures and performance of RBs in the first 3 years of the NFL career are presented in Table 6. The linear relationship of the selected predictors to the first 3-year Y/A was significant (N = 81, F (6,74) = 5.907, p < 0.001, R 2 = 0.324, Adjusted R 2 = 0.269). The regression model explained 26.9% of the variance in the first 3-year Y/A. After adjusting for #G in the first 3 years and Draft, 10-Y was the only significant predictor of the first 3-year Y/A (p = 0.002), which uniquely accounted for 9.2% of the variance in Y/A (sr 2 = 0.092). The negative regression coefficient for 10-Y (B = −4.930) indicated that faster times on 10-Y among RBs were associated with the greater Y/A in the first 3 years of the NFL. However, CONE was not significant to the regression model (p = 0.650), and neither was HT nor WT. The first 3-year #G was positively related to Y/A.
The analysis of predicting the career Y/A yielded the similar results with those for the first 3-year Y/A (Table 7). The regression model was significant (N = 82, F (6,75) = 12.290, p < 0.001, R 2 = 0.496, Adjusted R 2 = 0.455), explaining 45.5% of the variance in the career Y/A by the predictors. After adjusting for the career #G and Draft, 10-Y was significant to the model (p < 0.001), whereas CONE was not (p = 0.250). Ten-yard dash alone uniquely explained 14.5% of the variance in the career Y/A (sr 2 = 0.145). As was the case with the regression model for the first 3-year Y/A, faster times on 10-Y among RBs were associated with greater career Y/A, shown by the negative regression coefficient of 10-Y (B = −4.979). Furthermore, the career #G was positively related to the career Y/A. Height and WT were not the significant predictors of the career Y/A.
Regression Analysis on Combine Measures and Future Performance of Wide Receivers
Table 8 summarizes the results of the MLR analysis on the Combine measures and performance of WRs in the first 3 years of the NFL career. There was a significant linear relationship between the selected predictors and the first 3-year Y/R (N = 163, F (6,156) = 11.968, p < 0.001, R 2 = 0.315, Adjusted R 2 = 0.289). The regression model explained 28.9% of the variance in the first 3-year Y/R. Height, WT, and VJ were the significant predictors of the first 3-year Y/R (p ≤ 0.05) after adjusting for #G in the first 3 years and Draft. Based on the sr 2 values, each of these predictors uniquely explained 12.5, 4.2, and 5.4% of the variance in the first 3-year Y/R, respectively. The regression coefficient (B = 0.264) for VJ indicated that greater scores on VJ were associated with greater Y/R of the first 3 years. Height (B = 0.659) was positively associated with, and WT (B = −0.055) was negatively associated with the first 3-year Y/R, suggesting that taller and lighter WRs tended to have greater Y/R in the first 3 years of the NFL. However, BJ was not significant to the regression model (p = 0.152). The #G in the first 3 years was positively related to the first 3-year Y/R. According to the standardized regression coefficient, HT was the most important predictor of the first 3-year Y/R, as its standardized regression coefficient (β = 0.486) was the highest among the predictors.
Similar results were obtained for the regression model predicting the career Y/R (Table 9). The regression model was significant (N = 163, F (6,156) = 11.941, p < 0.001, R 2 = 0.315, Adjusted R 2 = 0.288), explaining 28.8% of the variance in the career Y/R. After adjusting for the career #G and Draft, VJ was significant to the model (p = 0.004) whereas BJ was not (p = 0.907). Vertical jump alone could uniquely explain 3.7% of the variance in the career Y/R (sr 2 = 0.037). As was the case with the model for the first 3-year Y/R, greater VJ ability was associated with the greater career Y/R (B = 0.189). Once again, being taller and lighter among WRs were associated with the better career Y/R (B = 0.627 for HT and B = −0.056 for WT). The career #G was significantly associated with the better Y/R (p < 0.001). The standardized regression coefficients (β = 0.532) indicated that HT was the most important factor in predicting the career Y/R of WRs.
The PCA of the 8 athletic drills in the Combine using the data for RBs revealed that there were 4 underlying components, which was determined by analyzing Kaiser-Meyer-Olkin value = 0.725; greater than the recommended cutoff value of 0.6; (7), Bartlett's test of sphericity (p < 0.001; (1)), the scree plot test (3), and the varimax rotated factor loadings. These 4 components explained 48.8, 14.9, 12.9, and 10.0% of the variance, respectively, with a total of 86.6% of the variance explained. Likewise, 4 components were identified in the data of WRs through the same analyses above, explaining 45.2, 18.2, 13.1, and 12.6% of the variance, respectively, with a total of 89.1% of the variance explained. Table 10 shows the rotated component matrix of the 8 athletic Combine drills for each of the RB and WR data. It was apparent that the 8 athletic drills in the Combine could be classified under the following categories: speed (40-Y, 20-Y, and 10-Y), lower-body strength and power (VJ and BJ), quickness and agility (SHUTTLE and CONE), and upper-body strength (BP). These classifications were in accordance with those proposed by the NFL Scouting Combine (www.nfl.com/combine/workouts).
The results of the data analysis have indicated that of the 8 athletic measures in the NFL Combine, 10-Y for RBs and VJ for WRs seem to be the most important in predicting their future performance in the NFL, when their performance is assessed by Y/A for RBs and Y/R for WRs, respectively. In addition, HT for WRs seems to be critical in producing better Y/R in the NFL. Meanwhile, our data analysis suggests that the Combine measures explain less than half of the variance in future performance of RBs and WRs in the NFL. The 8 athletic drills in the Combine were classified under 4 different categories: speed, lower-body strength and power, quickness and agility, and upper-body strength, indicating construct validity of these drills.
We found some significant associations of the Combine measures to future NFL performance of RBs and WRs which were not previously reported (8). This discrepancy in the results between the studies is likely due to the difference in data analysis strategies used (regression analysis vs. correlation analysis) between the studies. The MLR analysis used in our study enabled us to examine the relationships between Combine measures and NFL performance, although controlling for other Combine variables and potential covariates. Our study also included more Combine data than did Kuzmits and Adams (10 years vs. 6 years of data, (8)). These factors perhaps led to the different findings in this study.
It is important to underscore that a regression analysis can account for the effects of more than 1 independent/predictor variable when exploring the relationship to an outcome variable. This allows for a better understanding of the importance of predictors that can be overlooked by a correlation analysis. In our data analysis, for example, CONE had the highest correlation with the first 3-year Y/A and career Y/A in RBs. However, the MLR analysis revealed that 10-Y, not CONE, was a significant predictor of Y/A when #G, Draft, HT, WT, and 10-Y/CONE were held constant. An explanation for this finding is that 10-Y and CONE share some variance in the first 3-year Y/A and career Y/A; therefore, the relationship between CONE and Y/A can also be explained partly by the relationship between 10-Y and Y/A. This is not surprising in that players with faster times on 10-Y should, in general, have faster times on CONE. Using a regression analysis enabled us to identify 10-Y as a more important Combine measure in predicting future performance of RBs in the NFL, although, CONE had the highest bivariate correlation with Y/A. In the case of predicting future NFL performance of WRs, VJ had the highest bivariate correlation with Y/R. Meanwhile, the MLR analysis revealed that HT, not VJ, would be the most important in predicting their future performance when both were accounted for. If comparing 2 players with similar HTs, VJ would seem to be the next important predictor in Y/R. We believe therefore that our data analysis uncovered key parameters of the NFL Scouting Combine in relation to future success of RBs and WRs.
There are limitations associated with this study. Some Combine data contained a large number of missing values because not every player takes all of the Combine measurements. For example, of the 276 RBs examined in this study, only 149 and 146 of them took the SHUTTLE and CONE tests, respectively. Approximately, 71–78% of RBs took the BP, VJ, and BJ drills. Similar trends were also observed in WRs. In particular, only 60 of 447 WRs participating in the Combine during 2000–09 took the BP test. Because, missing cases should ideally be excluded listwise in a regression analysis (5), which was performed in our study, the presence of these missing values in the Combine data limited the sample sizes in our MLR analysis, potentially reducing statistical power. In addition, it is challenging to precisely quantify performance of football players, including RBs and WRs. We used Y/A for RBs and Y/R for WRs as the measures of performance in the NFL. However, these measures could also be influenced by factors other than the ability of an individual player. For example, some RBs and WRs are used mainly for a short-yard gain, and their performance may not be greatly reflected in their Y/A and Y/R statistics; yet, these players are still valuable to their teams. Some RBs are also used as a receiver quite often (e.g., LaDainian Tomlinson of the San Diego Chargers and New York Jets had at least 50 receptions 9 times in his career and recorded 100 receptions in the 2003 season, www.pro-football-reference.com/players/T/TomlLa00.htm). Furthermore, playing alongside a good offensive line for RBs and a good QB for WRs would certainly have a positive impact on their statistics, which is not accounted for in this study. If other appropriate measures of on-field NFL performance for RBs and WRs are identified and included in the set of dependent/outcome variables, another analytical approach, such as a canonical correlation analysis that allows for examining the relationship between 2 sets of multiple variables (i.e., more than 1 dependent/outcome variable and more than 1 independent/predictor variable), may shed light on the predictive value of the Combine measures that were not manifested in this study. The NFL Scouting Combine also conducts drills for position-specific skills that should also be important for football players. In addition, the NFL teams review medical history and perform physical examination of players during the Combine (2). However, to the best of our knowledge, the results on position-specific drills of the NFL Combine, and those of medical history and physical examination, are not readily available in the public domain; therefore, we were unable to include these data in the analysis. Combining position-specific skills and/or injury risk with athletic measures in the data analysis might better be able to predict future success of football players in the NFL.
This study indicates that the NFL Scouting Combine has some value for predicting future success of RBs and WRs in the NFL. The Combine data could be used to supplement the evaluation of college football players. Specifically, performance on 10-Y and VJ may be used to predict future performance for RBs and WRs, respectively. Still, the Combine measures cannot explain a large part of variability in future performance of RBs and WRs. Hence, team executives, coaches, and scouts need to be cautious about using the Combine measures for selecting players for the NFL Draft. One approach to potentially improve the predictive value of the NFL Combine is to add tests that are not currently included in the Combine items, but that could be important for football players, such as visual response/reaction tests for RBs and hand coordination tests for WRs. Another approach could be to tailor the Combine test items specific to each position. For instance, BP performance does not seem to have a significant value for predicting future success of RBs or WRs. Eliminating the BP test for these positions could help the NFL teams save time and hence focus more on measuring other variables that are more relevant to the skill sets of these positions. We believe that it is also important to consider the overall physicality of players. For example, HT was found to be the most important predictor of Y/R in WRs. As WRs are getting taller and bigger, “how high WRs can reach” (i.e., taller players have an advantage) and “how much WRs can jump” is critical in catching the football. Therefore, it might be necessary to combine HT and VJ together in a future analysis of the Combine data. Future studies could be conducted to determine the usefulness of the NFL Scouting Combine for player evaluation when position-specific skills and/or injury risk are taken into account in the analysis. Moreover, the associations of the Combine measures to the future success of players in other positions would be of interest to investigate in the future.
The authors declare that there is no conflict of interest regarding the publication of this article. This research received no specific funding from any agency in the public, commercial, or not-for-profit sectors.
1. Bartlett MS. A note on the multiplying factors for various χ 2 approximations. J R Stat Soc Ser B Stat Methodol 16: 296–298, 1954.
2. Brophy RH, Chehab EL, Barnes RP, Lyman S, Rodeo SA, Warren RF. Predictive value of orthopedic evaluation and injury history at the NFL combine. Med Sci Sports Exerc 40: 1368–1372, 2008.
3. Cattell RB. The scree test for the number of factors. Multivariate Behav Res 1: 245–276, 1966.
4. Cohen JW. Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum Associates, 1988.
5. Field A. Discovering Statistics Using SPSS. Los Angeles, CA: SAGE, 2009.
6. Hoffman J. Norms for Fitness, Performance, and Health. Champaign, IL: Human Kinetics, 2006.
7. Kaiser HF. An index of factorial simplicity. Psychometrika 39: 31–36, 1974.
8. Kuzmits FE, Adams AJ. The NFL combine: Does it predict performance in the National Football League? J Strength Cond Res 22: 1721–1727, 2008.
9. Robbins DW. The national football league (NFL) combine: Does normalized data better predict performance in the NFL draft? J Strength Cond Res 24: 2888–2899, 2010.
10. Stimel D. A statistical analysis of NFL quarterback rating variables. J Quant Anal Sports 5: 1, 2009.
11. Tabachnick BG, Fidell LS. Using Multivariate Statistics. Boston, MA: Pearson Education, Inc., 2007.
Keywords:Copyright © 2016 by the National Strength & Conditioning Association.
American football; rushing yards per attempt; receiving yards per reception; correlation; regression; principal component analysis