Secondary Logo

Journal Logo

Original Research

Test-Retest Reliability and Concurrent Validity of Athletic Performance Combine Tests in 6–15-Year-Old Male Athletes

Gillen, Zachary M.1; Miramonti, Amelia A.1; McKay, Brianna D.1; Leutzinger, Todd J.2; Cramer, Joel T.1

Author Information
Journal of Strength and Conditioning Research: October 2018 - Volume 32 - Issue 10 - p 2783-2794
doi: 10.1519/JSC.0000000000002498
  • Free

Abstract

Introduction

Athletic performance combines, such as the National Football League (NFL) and Under Armour All-American Combines, are used by coaches and scouts to assess athletic performance among high school, college, and professional American football players through a variety of power, agility, and speed tests (17,24). The growth of combines at the high-school level highlights the importance of exposing youth athletes to combine testing to ensure familiarity with the testing procedures. The most common combine tests include the vertical jump (VJ) and broad jump (BJ) to assess power, the pro-agility (PA) and L-cone (LC) drills to assess agility, and 40-yd (40 yd) dash with 10- and 20-yd (10 and 20 yd, respectively) splits to assess speed (17,20,22,27). In an examination of NFL Combine participants, strong correlations were reported among combine performance tests (20). The authors (20) showed nearly perfect correlations among 9.1- vs. 18.3- vs. 36.6-m sprint times, as well the PA vs. LC, suggesting that these tests may be assessing similar physiological or biomechanical outcomes. For example, the VJ and BJ were strongly correlated with longer sprint distances (r ≥ −0.66), with similar relationships between the PA or LC and speed or jumping performances (|r| ≤ 0.65). When comparing drafted and undrafted NFL Combine participants, drafted participants scored higher on the 40-yd dash, VJ, PA, and LC (22). Thus, combine performance results may at least partially predict NFL draft success. Growing popularity for exhibitions of athletic performance for elite, sometimes famous, athletes at organized combines may present a unique opportunity for youth athletes to display their own athleticism in similar combines.

As preadolescent and adolescent athletes observe their high school, collegiate, and professional role models perform combine testing, these young athletes, parents, and coaches aspire to follow. At a minimum, youth athletic performance combines provide exciting opportunities to engage children in physical activity that they are familiar with from watching elite-level athletes (role models) perform the same tests. Although previous authors have reported the results of power, agility, and speed tests in youth athletes (9,11,12,16,23,25), none have reported the results of these tests during athletic performance combine testing, and many of these previous studies used different tests and drills than are usually conducted during contemporary combine performance testing. In our opinion, if youth athletes aspire to perform combine testing, then the tests and drills that mimic the popular, contemporary combine events should be evaluated scientifically in youth athletes. Evaluating the reliability and redundancy of the current combine battery of tests may aid coaches and practitioners in identifying talent in various sporting activities, recruitment, and long-term athletic development programs through the development of a valid, reliable battery of athletic performance tests (8).

Evaluations of test-retest reliability for combine performance testing in youth athletes would be practically useful to assess repeatability, errors of measurement, and magnitudes of improvement. The most common methods of reporting reliability are the intraclass correlation coefficient (ICC) and coefficient of variation (CV) (4,28), which provide relative and absolute metrics of reliability, respectively. Specifically, the CV allows for comparisons of errors across measurements, regardless of the units used in each individual test (4). The minimum detectable change (MDC) can also be estimated as a means for interpreting changes in performance over time. For example, Lloyd et al. (11) and Meylan et al. (12) reported ICCs ≥0.67 with CVs ≤15% for the VJ, and ICCs ≥0.91 with CVs ≤5% for the BJ in 9–16-year-old athletes. Also, Stewart et al. (23) reported ICCs ≥0.67, SEM ≤0.13, and CVs ≤2% for the PA, and ICCs ≥0.80, SEMs ≤0.18, and CVs ≤3% for the LC in young athletes (age = 16.7 ± 0.6 years). However, these previous studies did not include evaluations of the MDC or any of the common sprint tests (10, 20, and 40 yds). Thus, further research reporting test-retest reliability of contemporary combine evaluations in youth athletes is necessary before the inclusion of these drills as performance measures in youth combine testing.

Previous research (20) has also suggested that multiple speed, agility, and power tests may be redundant and potentially inefficient when testing athletes. Some authors use Pearson product-moment correlation coefficients to examine concurrent validity, as well as collinearity and redundancy, among athletic performance tests (9,20,23). However, first-order partial correlations may be particularly useful when attempting to determine specific test redundancy. Pedhazur (18) suggests that partial correlations allow for an examination of a correlation from which the effects of another variable are removed. Thus, if 2 tests appear collinear, partialing out shared variability may allow researchers and practitioners to make better informed decisions about which tests are necessary from a measurement perspective. Therefore, the purposes of this study were to report the test-retest reliability and evaluate concurrent validity, and practical measurement redundancy, among contemporary youth athletic performance combine events in 6–15-year-old athletes.

Methods

Experimental Approach to the Problem

A cross-sectional design was used to compare commonly measured athletic performance tests among 3 different age groups in young, male athletes: young (6–9 years), middle (10–11 years), and old (12–15 years). Test-retest reliability was also calculated for each test across the entire sample as well as each individual age group using a repeated-measures design. Subjects visited an indoor field turf (FieldTurf Classic HD; Tarkett, Auchel, France) facility for 2 experimental trials (trials 1 and 2) separated by 5 days. Trials 1 and 2 occurred at the same time of day (17:00–20:00). Performance tests included: VJ, BJ, PA, LC, and the 40-yd dash (40 yd). Ten- and 20-yd splits (10 and 20 yd, respectively) were also captured during the 40 yd. Height, body mass, selected skinfolds, and leg circumferences were measured during trial 1 from which 3 body composition variables were estimated: body fat percentage (BF%), fat-free mass (FFM), and estimated thigh cross sectional area (eCSA). The cross-sectional, age-related comparisons were evaluated with the data captured during trial 2, whereas test-retest reliability was calculated from trial 1 to trial 2.

Subjects

Seventy-two boys volunteered to participate in this study. Three boys did not participate in the second trial and were excluded from analysis. Therefore, the data from n = 69 (mean ± SD; age = 10.9 ± 2.1 years, height = 154.4 ± 13.6 cm, body mass [BM] = 46.8 ± 16.0 kg) were analyzed. The participants were divided into 3 age groups for analysis: 6–9 years (n = 16), 10–11 years (n = 26), and 12–15 years (n = 27). These age groups have been associated with prepubescence, onset of pubescence, and during pubescence in boys, respectively (1,10). All participants reported participating in one or more sports for 1–5 hours per week during the year before the study. Sports included baseball, basketball, cheerleading, cross country, football, gymnastics, lacrosse, rugby, soccer, softball, swimming/diving, tennis, track and field, trap shooting, volleyball, weightlifting, and wrestling. Eighteen boys also listed 1–5 hours per week of speed/power/agility training as a physical activity. Participants (with the help of one of their parents or legal guardians) completed the PAR-Q+ 2015 (27) and were allowed to participate in the study if questions 1–7 were answered “no” or all the follow-up questions of the PAR-Q+ 2015 were answered “no.” This study was approved by the University of Nebraska-Lincoln Institutional Review Board for the protection of human subjects (IRB # 20160315950EP, Title: The effects of a youth performance training camp on sport-specific performance measures). Each participant signed an approved youth assent form if they were 7–18 years old, whereas 5- and 6-year-old boys verbally assented after being read a simpler assent script. One parent or legal guardian of each boy signed an approved informed written consent document. Each of the minors also provided written informed consent.

Procedures

Anthropometrics and Body Composition

Height and BM were measured using a beam scale with attached stadiometer (Mechanical Column Scale and Stadiometer; Seca gmbh & co. kg, Hamburg, Germany). Body composition measurements included BF%, FFM, and eCSA. Body fat percentage was calculated from skinfold measurements taken with a Lange caliper (Model 68902; Cambridge Scientific Industries, Inc., Cambridge, MD, USA). Skinfolds were taken on the right side of the body at the triceps (vertical fold in the middle of the upper arm, midway between the acromion and olecranon process) and anterior suprailiac (diagonal fold immediately superior to the anterior superior iliac spine) sites and were recorded to the nearest 0.5 mm (6). Equations established by Housh et al. (5) and Brozek et al. (2) were used to estimate BF%, respectively. A total error of estimate of 3.6% has been reported for the BF% estimates from the Housh et al. (5) equation. Fat-free mass was calculated as the difference between BM and fat mass as determined from BF%. Thigh circumference was measured using a Gulick measurement tape (Baseline measurement tape with Gulick attachment; Fabrication Enterprises, White Plains, NY, USA) and was recorded to the nearest 0.1 cm. Thigh circumference and thigh skinfold (vertical fold on the anterior of the thigh, midway between the hip and knee joints) (6) were used to calculate thigh eCSA using a previously described procedure (14).

Vertical Jump

Vertical jump performance was assessed with a Vertec (Freestanding Vertec Jump Trainer; Sports Imports, Hilliard, OH, USA) (22). Standing reach of each subject was measured using a measuring tape fixed to a wall. Subjects overlapped their left and right hands by aligning the third digits while the forearms were fully extended and the arms fully flexed (mimicking a diver's position). Like the VJ, standing reach was measured with shoes on. While facing the wall with the left and right toes of each shoe in contact with the wall, standing reach was recorded as marked by the tips of the aligned, overlapping third digits. To perform the VJ, subjects began with feet shoulder-width apart directly underneath the Vertec. Subjects were instructed to perform a fast countermovement to approximately one quarter squat depth and jump as high as possible without the aid of an approach step. Jump height was recorded as the highest vane touched on the Vertec to the nearest 0.5 inch (1.3 cm). Two attempts were given with a minimum of 30-second rest between attempts. The representative VJ score was calculated as the difference between standing reach and the highest recorded jump achieved on the Vertec.

Broad Jump

Horizontal jumping performance was assessed with the BJ test performed on field turf (22). Subjects began with their toes even with the starting line, directly adjacent to a measuring tape secured to the field turf and perpendicular to the starting line. Subjects were instructed to perform a fast countermovement and jump out as far as possible. Jump distance was recorded as the distance from the starting line to the subject's heel closet to the starting line to the nearest 0.5 inch (1.3 cm). If any part of the subject other than the feet touched the turf during any BJ attempt, that attempt was disqualified. Two valid attempts were given with a minimum of 30-second rest between attempts. The farthest BJ distance was used as the representative score.

Pro-agility Drill

The PA drill (also known as the 5-10-5 drill) was used to assess agility and was measured in seconds using a digital, laser beam–actuated timing gate with motion start (Brower TC Motion Start Timer; Brower Timing Systems, Knoxville, TN, USA) (22). Subjects began by straddling the starting line in a 3-point stance perpendicular to the running direction with the down hand placed in front of the motion sensor. Subjects were instructed to turn 90° to the right and run 5 yds, touch a line on the turf with their right hand, turn 180° to the left and run 10 yds, touch a line on the turf with their left hand, and turn back 180° to the right and run 5 yds through the starting line. If a subject did not touch the line on either side during any attempt, that attempt was disqualified. The test began when the subject moved his hand from the motion sensor and ended when the subject finished through the starting line. Two attempts were given with a minimum of 30-second rest between attempts. The fastest time was used as the representative score.

L-Cone Drill

The LC drill (also known as the 3-cone drill) was used to assess agility and was measured in seconds using a digital, laser beam–actuated timing gate with motion start (Brower TC Motion Start Timer; Brower Timing Systems). Each subject started from a 3-point stance with the down hand placed in front of the motion sensor on the starting line. The following foot placement was instructed: (a) Temporarily place the toe of the nonleading foot even with the starting line, (b) position the toe of the lead-leg foot even with the nonleading heel, and (c) position the nonleading toe approximately 6–8 in behind and 3–4 in laterally from the heel of the lead-leg foot. Once in the 3-point stance, the legs were flexed to a one quarter squat for the start position. From the start position, subjects ran forward for 5 yds, touched a line on the turf, turned 180° and ran back to touch the starting line, turned 180° and ran 5 yds, turned 90° to the right around the first cone, ran 5 yds, ran 180° around the second cone, ran 5 yds, turned 90° to the left, and ran 5 yds through the starting line. If the subject failed to touch the required lines or unsuccessfully navigated the cones, that attempt was disqualified. The test began when the subject moved his hand from the motion sensor and ended when the subject finished through the starting line. Two attempts were given with a minimum of 30-second rest between attempts. The fastest attempt was used as the representative score.

10-, 20-, and 40-yd Dashes

The 40-yd dash was used to assess sprint performance and was measured in seconds using a digital, laser beam–actuated timing gate with motion start (Brower TC Motion Start Timer; Brower Timing Systems) (22). Each subject was instructed to assume the same starting position described earlier for the LC. Each subject was instructed to sprint straight forward as fast as possible through the finish line marked 40 yds from the starting line. The test began when the subject moved his hand from the motion sensor and ended when the subject finished through the finish line. Splits were recorded with laser beams placed at the 10- and 20-yd marks. Two attempts were given with a minimum of 30-second rest between attempts. The fastest attempt of each split (10-, 20-, and 40-yds) was used as the representative score.

Statistical Analyses

All statistical analyses were performed using IBM SPSS version 23 (Chicago, IL, USA) and custom Microsoft Excel 2016 worksheets. Data were assessed for normality using modified Shapiro-Wilk tests (19). One-way factorial analysis of variances (ANOVAs) (age [6–9 vs. 10–11 vs. 12–15]) were used to analyze height, body mass, BF%, FFM, eCSA, and combine performance data.

Repeated-measures ANOVAs were used to assess test-retest reliability using procedures described by Weir (28). Intraclass correlation coefficient model “2,1” was used (21) per the suggestion by Weir (28) that ICCs from model 2,1 can be generalized to other testers and laboratories. The 95% confidence interval was calculated for each ICC2,1 to test if each ICC was equal to zero (21,26). The SEM, CV, and MDC were then calculated using equations previously described (4,28):

Pearson product correlation coefficients were calculated to examine the relationships among the combine performance tests. The following qualitative evaluations of the strength of association were made according to Mukaka (15) based on the absolute values of correlation coefficients: 0.90–1.00 = very high, 0.70–0.89 = high, 0.50–0.69 = moderate, 0.30–0.49 = low, and 0.00–0.29 = negligible. As per Pedhazur (18), first-order partial correlations (rxy.z, r2xy.z) were calculated for reliable combine performance tests to assess the unique variability explained by each test. An alpha of p ≤ 0.05 was considered statistically significant for all comparisons and correlations.

Results

Distributions for all variables in this study were not statistically different from the normal distribution (p = 0.06–0.49). Table 1 shows the age-related differences among measured demographics. Height, body mass, and FFM progressively increased with age such that the 6–9-year group <10–11-year group <12–15-year group (p ≤ 0.01). Estimated CSA for the 12–15-year group was greater than (p ≤ 0.04) the 6–9- and 10–11-year groups. There were no differences (p = 0.34–0.90) among age groups for BF%.

T1
Table 1.:
The mean values (±95% confidence interval, CI), intraclass correlation coefficients (ICCs), SEM, coefficients of variation (CVs), and minimum detectable changes (MDCs) for each age group.*

For the VJ and BJ, Figure 1 shows that the 6–9-year group <10–11-year group <12–15-year group (p ≤ 0.05). For the PA and LC, Figure 1 shows that the 6–9-year group >10–11- and 12–15-year groups (p < 0.01). The 6–9- and 10–11-year groups >12–15-year group (p ≤ 0.03) for the 10-yd dash (Figure 1), whereas the 6–9-year group >10–11-year group >12–15-year group (p ≤ 0.04) in the 20- and 40-yd dashes (Figure 1). There were no differences among the 10–11- and 12–15-year groups for the PA and LC (p = 0.15–0.22), and no differences between the 6–9- and 10–11-year groups for the 10-yd dash (p = 0.08).

F1
Figure 1.:
Mean values of the (A) VJ, (B) BJ, (C) PA, (D) LC drill, (E) 10-yd split, (F) 20-yd split, and 40-yd dash for each age group. *Indicates a difference from the age 6–9 group and †indicates a difference from the age 10–11 age group. VJ = vertical jump; BJ = broad jump; PA = pro-agility; LC = L-cone; 10 yd = 10-yd split; 20 yd = 20-yd split; 40 yd = 40-yd dash.

Table 1 and Figure 2 show the age-specific reliability statistics for each combine performance test. Table 2 reports the age-specific and overall correlations among each of the combine performance tests. When collapsed across age, all combine performance variables were significantly (p < 0.01) correlated. Specifically, the VJ and BJ exhibited a high positive correlation (r = 0.87), as well as moderate to high negative correlations (r = −0.63 to −0.87) with the PA, LC, and 10-, 20-, and 40-yd dashes. The PA, LC, and 10-, 20-, and 40-yd dashes were intercorrelated with moderate to very high positive correlations (r = 0.60–0.93) with one another. Relationships qualitatively categorized as very high or high (|r| ≥ 0.70) included VJ vs. BJ, PA, LC, 20 yd, and 40 yd; BJ vs. PA, LC, 20 yd, and 40 yd; PA vs. LC, 20 yd, and 40 yd; LC vs. 20 yd, and 40 yd; 10 yd vs. 20 yd and 40 yd; and 20 yd vs. 40 yd. All other relationships were moderate or lower.

F2
Figure 2.:
Intraclass correlation coefficients ± 95% confidence intervals for the vertical jump (VJ), broad jump (BJ), pro-agility (PA), L-cone (LC), 10-yd split (10yd), 20-yd split (20yd), and 40-yd dash (40yd).
T2
Table 2.:
Pearson product-moment correlation coefficients for relationships between combine performance tests.*

The 6–9-year group exhibited significant correlations (p ≤ 0.01) among all combine performance variables. Compared with the overall sample, the correlation coefficients for the 6–9-year group were qualitatively similar. Relationships categorized as high or very high were VJ vs. PA, 20 yd, and 40 yd; BJ vs. PA, LC, 10 yd, 20 yd, and 40 yd; PA vs. LC, 10 yd, 20 yd, and 40 yd; LC vs. 10 yd, 20 yd, and 40 yd; 10 yd vs. 20 yd and 40 yd; and 20 yd vs. 40 yd. The 10–11-year group exhibited significant correlations (p ≤ 0.05) among all variables except the relationship between the 10-yd dash and BJ, PA, and LC. Compared with the overall sample, the correlation coefficients for the 10–11-year groups were generally qualitatively lower. Relationships categorized as high or very high were VJ vs. BJ, PA, LC, 20 yd, and 40 yd; BJ vs. PA, LC, 20 yd, and 40 yd; PA vs. LC, 20 yd, and 40 yd; LC vs. 20 yd and 40 yd; 20 yd vs. 40 yd. The 12–15-year group exhibited significant correlations among all combine performance variables (p ≤ 0.02). Compared with the overall sample, the correlation coefficients for the 12–15-year groups were also generally qualitatively lower. Relationships categorized as high or very high were VJ vs. BJ, PA, 10 yd, 20 yd, and 40 yd; BJ vs. PA, LC, 20 yd, and 40 yd; PA vs. LC, 20 yd, and 40 yd; 10 yd vs. 40 yd; and 20 yd vs. 40 yd.

When comparing the correlation matrices qualitatively between the overall sample and each individual age group, the relationships that were consistently high or very high included VJ vs. PA, 20 yd, and 40 yd; BJ vs. PA, LC, 20 yd, and 40 yd; PA vs. LC, 20 yd, and 40 yd; and 20 yd vs. 40 yd. The relationships that were high or very high for the overall sample but were moderate or lower for each individual age group included VJ vs. BJ and LC for the 6–9-year group; 10 yd vs. 20 yd and 40 yd for the 10–11-year group; and VJ vs. LC; LC vs. 20 yd and 40 yd; and 10 yd vs. 20 yd for the 12–15-year group.

Table 3 reports first-order partial correlations for all reliable combine performance tests. Relationships between the PA and VJ, BJ, 20 yd, and 40 yd after LC was partialed out were moderate and statistically significant (|rPA,y.LC| = 0.50–0.51, p < 0.01), whereas relationships between the LC and VJ, BJ, 20 yd, and 40 yd after PA was partialed out were negligible and not statistically significant (|rLC,y.PA| = 0.01–0.20, p ≥ 0.11). First-order partial correlations of PA and LC are visually depicted as Venn diagrams in Figure 3, where the white circles represent the variable being partialed out. For speed, when the 40-yd dash was partialed out, the relationships between the 20 yd and VJ and PA were negligible and not statistically significant (|r20 yd,y.40 yd| = 0.09–0.18, p ≥ 0.14), whereas the relationship between the 20 yd and BJ was negligible yet statistically significant (r20 yd,BJ.40 yd = −0.25, p = 0.04). When the 20 yd was partialed out, the relationships between the 40 yd and VJ, BJ, and PA were low and statistically significant (|r40 yd,y.20 yd| = 0.32–0.50, p ≤ 0.01). Among the VJ and BJ, when BJ was partialed out, the relationships between VJ and PA, 20 yd, and 40 yd were negligible to moderate and statistically significant (|rVJ,y.BJ| = 0.25–0.51, p ≤ 0.04). Similarly, when the VJ was partialed out, the relationships between the BJ and PA, 20 yd, and 40 yd were low to moderate (|rBJ,y.VJ| = 0.33–0.50, p ≤ 0.01).

T3
Table 3.:
First-order partial correlations (r xy.z) for reliable combine performance tests.*
F3
Figure 3.:
Venn diagrams depicting Pearson product-moment correlation coefficients (r) and coefficients of determination (r 2, dark gray) for individual relationships between the (A) pro-agility (PA) and vertical jump (VJ), broad jump (BJ), 20-yd dash (20yd), and 40-yd dash (40yd) as well as (C) L-cone (LC) and VJ, BJ, 20yd, and 40yd. First-order partial correlation coefficients (r xy.z) and coefficients of determination (r 2 xy.z, dark gray) for individual relationships after (B) LC (r xy.LC) and (D) PA (r xy.PA) have been partialed out (white).

Discussion

Test-retest reliability is considered an essential prerequisite to external validity; thus, examining the reliability of a measurement is important before interpreting any potential differences or changes in that measurement. The most commonly reported measure of reliability is the ICC, allowing for the evaluation of the relative reliability of an individual test; however, the ICC does not provide an index of absolute measurement error (28). The SEM expressed as a percentage of the grand mean, termed the CV, allows for a comparison of measurement error across variables (4). Lloyd et al. (11) and Meylan et al. (12) reported ICCs ≥0.62 and CVs ≤15% for the VJ, and ICCs ≥0.91 and CVs ≤5% for the BJ in 9–16-year old athletes. To the best of our knowledge, this study is the first to report ICCs and CVs for the PA or LC in youth athletes. However, Stewart et al. (23) reported ICCs ≥0.67 and 0.80 and CVs ≤2 and 3% for the PA and LC, respectively, in adults. In adults, ICCs ≥0.89 have been reported for sprint tests ranging from 10- to 50-m (11–55 yds) (3,13), whereas CVs ≤3% have been reported for 10- and 20-m (11 and 22 yds) (13). This study demonstrated ICCs ≥0.80 and CVs ≤11% for all variables across all age groups, except for the LC and 10-yd dash. The LC in the youngest and oldest age groups showed slightly lower but equivalent ICCs ≥0.76 and CVs ≤5%. The 10-yd dash in 10–11 and 12–15-year groups showed much lower ICCs = 0.46–0.51 and higher CVs ≤9%, which may have been due to low between-subjects variability, but still questioned the reliability of the 10-yd dash (7,28). Overall, our study demonstrates consistent absolute and relative reliability for all the youth athletic performance combine tests, except the 10-yd dash.

The findings of this study showed that, within and across age groups, the highest relationships among reliable test results were: 20 yd vs. 40 yd (r = 0.89–0.93), PA vs. LC (r = 0.84–0.91), and BJ vs. VJ (r = 0.66–0.89). These correlations were consistently “very high” or “high” (15). Incidentally, these particular pairings also reflected similar practical and physiological underlying constructs of speed (20 yd vs. 40 yd), agility (PA vs. LC), and lower-body power (BJ vs. VJ). Thus, these high or very high correlations suggest a high degree of collinearity and possible redundancy between the individual predictors of speed, agility, and lower-body power. Collinearity between tests of the same underlying constructs may imply that these tests are measuring the same things. To evoke a certain degree of parsimony for strength and conditioning professionals using these measurements to assess speed, agility, and lower-body power in youth athletes, choosing the fewest, easiest tests may allow for more efficient athletic performance combine events while simultaneously assessing the constructs of interest (speed, agility, and lower-body power).

Vescovi and McGuigan (25) reported a very high relationship (r = 0.94, r2 = 0.88) between 18.3- and 36.6-m (20- and 40-yds) sprint times in high school female soccer players, which were comparable (r = 0.89–0.93, r2 = 0.79–0.87) with the 20- and 40-yd dashes in this study. The 40 yd dash (36.6-m) is still used in the NFL combine to assess speed; therefore, possibly due to popularity and familiarity, it is also used in youth athletic performance testing. However, based on an informal analysis by an investigator of this study (Z.M.G.), the 20-yd dash may be more appropriate for younger athletes than the 40-yd dash. To illustrate, NFL combine participants are taller and typically have longer stride lengths than most younger athletes. We are not aware of any studies that have quantified the average number of strides to complete the 40-yd dash in NFL combine participants. However, an informal count by an investigator of this study (Z.M.G.) from 10 participants' 40-yd dash videos available publicly online from the 2017 NFL combine showed that 18–20 strides were taken during the 40-yd dash. An informal count from the same investigator (Z.M.G.) viewing parent-recorded videos captured from 3 participants in the current study (age ≈ 8–12 years) showed that 27–30 strides were taken during the 40-yd dash, whereas 16–18 strides were taken during the 20-yd dash. In addition, the 40-yd dash times in the 2017 NFL combine ranged from 4.22 to 5.84 seconds (17), whereas in this study, the 40-yd dash times ranged from 5.27 to 8.37 seconds, and the 20-yd dash times ranged from 3.01 to 4.53 seconds. Thus, we hypothesize that due to the similarity in strides and times, the 20-yd dash in 8–12-year-old boys may be more biomechanically and physiologically similar to the 40-yd dash in NFL combine participants. The 20-yd dash may also be more practically convenient for young athletes when considering the space required for testing. Although it is tempting to recommend the 20-yd dash rather than the 40-yd dash for testing speed in young athletes, further research is necessary to formally compare the biomechanical and physiological demands in children vs. adults. The results of this study, as well as informal evaluations of available videos, however, suggest that measurements of 20-yd dash times provide very similar (possibly redundant) information about speed in young athletes, is equally reliable, and may be more convenient to measure than the 40-yd dash.

To the best of our knowledge, this is the first study to report collinearity between the PA and LC in youth athletes. Robbins (20) reported a very high relationship (r = 0.95, r2 = 0.90) between the PA and LC in NFL combine participants. Our study demonstrated similar associations between the PA and LC (r = 0.84–0.91, r2 = 0.71–0.83) in younger athletes. Robbins (20) noted that among NFL combine participants, the PA has lower correlations with other speed and power drills, indicating that the PA may provide more unique performance information. In addition, the LC in this study exhibited higher measurement errors (CV = 4.92–5.08%) than the PA (CV = 3.65–4.95%), which may be related to the complexity of the LC drill compared with the PA. Therefore, we hypothesize that the PA may be more appropriate for measuring agility than the LC in young male athletes, possibly related to the uniqueness of performance information, simplicity, and lower measurement errors.

Previous research (9) reported a high relationship (r = 0.73, r2 = 0.53) between the VJ and BJ in preadolescent boys. Similarly, Robbins (20) reported a high correlation (r = 0.74, r2 = 0.55) between the VJ and BJ in NFL combine participants. Our findings supported those of Jones and Lorenzo (9) and Robbins (20) by showing moderate to high correlations (r = 0.66–0.89, r2 = 0.44–0.79) between VJ and BJ performance. Although it is tempting to consider a similar collinearity between the VJ and BJ as suggested between the 20- and 40-yd dashes as well as the PA and LC, the relationships between the VJ and BJ in this study as well as previous studies (9,20) did not reach the same magnitude. Furthermore, Robbins (20) reported that among NFL combine participants, the BJ had a stronger relationship with speed and agility drills than VJ. It was suggested that the horizontal propulsion in the BJ may be more closely related to the horizontal movements in speed and agility than the vertical propulsion in the VJ. However, the VJ and BJ demonstrated similar independent relationships with agility (PA and LC, r = −0.61 to −0.88, r2 = 0.37–0.77) and speed (20- and 40-yd dashes, r = −0.70 to −0.94, r2 = 0.49–0.88) in this study. Jones and Lorenzo (9) also reported very similar independent relationships (r = −0.61 to −0.65, r2 = 0.37–0.42) between the VJ/BJ and the PA and 20-yd dash in preadolescent boys. In conjunction, these findings suggest that the independent relationships between the VJ/BJ and speed/agility measures may actually be stronger than the relationships between the VJ and BJ. Therefore, it may be erroneous to suggest redundancy between the VJ and BJ in young athletes to eliminate one or the other. Plus, both tests are quick, easy, and require little equipment.

To further examine issues related to collinearity and to better understand the “need” vs. “redundancy” of common combine performance tests of speed, agility, and power, we chose to examine partial correlations among the test results. Pedhazur (18) states, “A partial correlation is a correlation between two variables from which the linear relations, or effects, of another variable(s) have been removed,” (pg. 160). Thus, if 2 tests are overly redundant, partialing out the common variability explained by these 2 tests is one statistical approach to examine the unique variability explained by each separate measurement. For example, we hypothesized earlier that the PA and LC tests are redundant measures of agility, and we recommended the PA over the LC as the most appropriate test in youth athletes. To statistically examine this hypothesis, we examined the relationships between PA and VJ, BJ, 20 yd, and 40 yd after LC had been partialed out (denoted as rPA,VJ.LC, rPA,BJ.LC, rPA,20 yd.LC, and rPA,40 yd.LC, respectively). Similarly, we examined the relationships between LC and VJ, BJ, 20 yd, and 40 yd after PA had been partialed out (rLC,VJ.PA, rLC,BJ.PA, rLC,20 yd.PA, and rLC,40 yd.PA, respectively). The results of these partial correlation analyses indicated that, when the LC is partialed out, the individual relationships between PA and VJ (rPA,VJ.LC = −0.50, r2PA,VJ.LC = 0.25, p < 0.01), BJ (rPA,BJ.LC = −0.51, r2PA,BJ.LC = 0.26, p < 0.01), 20-yd dash (rPA,20 yd.LC = 0.51, r2PA,20 yd.LC = 0.26, p < 0.01), and 40-yd dash (rPA,20 yd.LC = 0.51, r2PA,20 yd.LC = 0.26, p < 0.01) drop from high to moderate and remain statistically significant. However, when the PA is partialed out, the relationships between LC and VJ (rLC,VJ.PA = 0.01, r2LC,VJ.PA < 0.01, p = 0.93), BJ (rLC,BJ.PA = −0.09, r2LC,BJ.PA < 0.01, p = 0.49), 20-yd dash (rLC,20 yd.PA = 0.09, r2LC,20 yd.PA < 0.01, p = 0.50), and 40-yd dash (rLC,40 yd.PA = 0.20, r2LC,40 yd.PA = 0.04, p = 0.11) drop from high to negligible and are no longer statistically significant. These findings illustrated in Figure 3 with Venn diagrams support our hypothesis that the LC test was not able to explain any unique variability beyond what was explained by the PA test in this sample. This also supports our recommendation to use the PA to measure agility, rather than the LC, in youth athletes.

Incidentally, the partial correlation analyses that were used to examine collinearity between the 20 yd and 40 yd, as well as the VJ and BJ test results did not yield the same conclusions as the PA and LC tests. For speed, when 40-yd dash is partialed out, first-order partial correlations show the relationships between 20-yd dash and VJ (r20 yd,VJ.40 yd = −0.09, r220 yd,VJ.40 yd = 0.01, p = 0.46) and PA (r20 yd,PA.40 yd = 0.18, r220 yd,PA.40 yd = 0.03, p = 0.14) drop from high to negligible and are no longer statistically significant. However, the relationship between 20-yd dash and BJ (r20 yd,BJ.40 yd = −0.25, r220 yd,BJ.40 yd = 0.06, p = 0.04) drops from high to negligible, yet remains statistically significant. When 20-yd dash is partialed out, the relationships between 40 yd and VJ (r40 yd,VJ.20 yd = −0.50, r240 yd,VJ.20 yd = 0.24, p < 0.01), BJ (r40 yd,BJ.20 yd = −0.32, r240 yd,BJ.20 yd = 0.10, p = 0.01) and PA (r40 yd,PA.20 yd = 0.45, r240 yd,PA.20 yd = 0.20, p < 0.01) drop from high to low, but remain statistically significant. Therefore, we cannot be certain that the 20- or 40-yd dashes are redundant in this sample, but most of the partial correlations suggest that the 40-yd dash may explain more unique common variability than the 20-yd dash. These findings are in opposition to our earlier recommendation that the 20-yd dash may be more physiologically and biomechanically representative of the speed that is displayed in older, elite NFL football players. Thus, there may be measurement value in assessing both the 20- and 40-yd dashes in youth athletes.

Previously, we suggested that both the VJ and BJ provide unique athletic performance information among youth athletes. When partialing out the BJ, the relationships between the VJ and PA (rVJ,PA.BJ = −0.25, r2VJ,PA.BJ = 0.07, p = 0.04), 20-yd dash (rVJ,20 yd.BJ = −0.37, r2VJ,20 yd.BJ = 0.13, p < 0.01), and 40-yd dash (rVJ,40 yd.BJ = −0.51, r2VJ,40 yd.BJ = 0.26, p < 0.01) drop from high to negligible/moderate and remain statistically significant. On the other hand, when the VJ is partialed out, the relationships between the BJ and PA (rBJ,PA.VJ = −0.50, r2BJ,PA.VJ = 0.25, p < 0.01), 20-yd dash (rBJ,20 yd.VJ = −0.40, r2BJ,20 yd.VJ = 0.16, p < 0.01), and 40-yd dash (rBJ,40 yd.VJ = −0.33, r2BJ,40 yd.VJ = 0.11, p = 0.01) drop from high to low/moderate and remain statistically significant. Thus, based on the first-order partial correlations it seems that the VJ may be more independently related to speed, whereas, the BJ may be more independently related to agility. These findings support our previous recommendation that both the VJ and BJ provide unique information in youth athletic performance combine testing.

In conclusion, the VJ, BJ, PA, 20 yd, and 40 yd tests seem to be an appropriate battery of athletic performance tests during combine testing. Limitations of this study include that 18 participants previously participated in 1–5 hours per week of speed/power/agility training during the last year. Because all participants participated in some form of athletic activity for a minimum of 1–5 hours per week, the influence of speed/power/agility training compared with other sport participation was likely negligible. In addition, although previous authors (1,10) have determined the age groups of 6–9-, 10–11-, and 12–15-years to be associated with prepubescence, onset of pubescence, and during pubescence in boys, the current study did not include any measures of sexual maturation such as peak height velocity. Therefore, future assessments of athletic performance combine tests among youth athletes should include measures of sexual maturation.

Practical Applications

Our results show consistent absolute and relative reliability for all tests, except the 10-yd dash in 6–15-year-old athletes. Furthermore, the indicators of measurement errors, SEM, CV, and MDC in Table 1, will be useful to professionals and researchers who evaluate meaningful vs. trivial changes in these test results over time. The highest correlations within and across age groups were 20 yd vs. 40 yd (r = 0.89–0.93), PA vs. LC (r = 0.84–0.91), and BJ vs. VJ (r = 0.66–0.89), which tentatively suggested collinearity (redundancy) when measuring speed, agility, and power, respectively. For speed (20 yd and 40 yd), an informal video review by an investigator (Z.M.G.) revealed that the 20-yd dash for kids in this study was similar in strides and times to the 40-yd dash in NFL combine participants. However, first-order partial correlations suggest that the 40 yd may explain more unique variability in agility and power measurements than the 20 yd. Thus, it is unclear whether the 20 yd and 40 yd are redundant or complementary, despite physiological and biomechanical similarities. For agility (PA and LC), the PA drill explains more unique performance variability in speed and power than the LC. The PA also has lower measurement errors (Table 1). However, the VJ and BJ tests may separately provide unique performance information in youth athletes because the first-order partial correlations indicated that the VJ is more independently related to speed, whereas the BJ is more independently related to agility. Based on these findings, we recommend both the 20- and 40-yd dashes to measure speed, the PA drill to measure agility, and the VJ and BJ to assess power in 6–15-year-old youth athletes. The test-retest ICCs and measurement errors (Table 1) are generalizable to other young athletes in this population, which will be useful to examine the training and growth and development necessary to observe meaningful improvements in these performance tests.

Acknowledgments

The authors thank the following individuals for their assistance with data acquisition: Dr. Karsten Koehler, Dr. Terry J. Housh, Cory Smith, Ethan Hill, Josh Keller, Alegra Mendez, Alex Martin, Chaise Murphy, Jay Peterson, Pete Danielson, Christina Gregory, and Brian Smith. The authors also thank the significant, volunteer, nonfinancial assistance from the following business collaborators in Lincoln, Nebraska: Mike Selvage of Lincoln Midget Football, Inc.; Maj. Paul Erickson, Ann Erickson, Mike Lemanu, and Samantha Gillen of Fundamental Athletics Academy; Dean DeBoer and Preston Harris of Don Beebe's House of Speed; and Dr. Robert Lane of Speedway Village. This study was supported, in part, by a Hatch project from the U.S. Department of Agriculture, National Institute of Food and Agriculture (Accession Number: 1009500), and funding from the Nebraska Beef Council.

References

1. Berkey CS, Dockery DW, Wang X, Wypij D, Ferris B Jr. Longitudinal height velocity standards for U.S. Adolescents. Stat Med 12: 403–414, 1993.
2. Brozek J, Grande F, Anderson JT, Keys A. Densiometric analysis of body Composition: Revision of some quantitative assumptions. Ann N Y Acad Sci 110: 113–140, 1963.
3. Hetzler R, Stickley C, Lundquist K, Kimura I. Reliability and accuracy of handheld stopwatches compared with electronic timing in measuring sprint performance. J Strength Cond Res 22: 1969–1976, 2008.
4. Hopkins WG. Measures of reliability in sports medicine and science. Sports Med 30: 1–15, 2000.
5. Housh TJ, Johnson GO, Housh DJ, Stout JR, Eckerson JM. Estimation of body density in young wrestlers. J Strength Cond Res 14: 477–482, 2000.
6. Jackson AS, Pollock ML. Practical assessment of body composition. Phys Sportsmed 13: 76–90, 1985.
7. Jenkins NDM, Palmer TB, Cramer JT. Comparing the reliability of voluntary and evoked muscle actions. Clin Physiol Funct Imaging 34: 434–441, 2013.
8. Johnston K, Wattie N, Schorer J, Baker J. Talent identification in sport: A systematic review. Sports Med 48: 97–109, 2018.
9. Jones MT, Lorenzo DC. Assessment of power, speed, and agility in athletic, preadolescent youth. J Sports Med Phys Fitness 53: 693, 2013.
10. Lee PA. Normal ages of pubertal events among American males and females. J Adolesc Health Care 1: 26–29, 1980.
11. Lloyd RS, Oliver JL, Hughes MG, Williams CA. Reliability and validity of field-based measures of leg stiffness and reactive strength index in youths. J Sports Sci 27: 1565–1573, 2009.
12. Meylan C, Cronin J, Oliver J, Hughes M, McMaster D. The reliability of jump kinematics and kinetics in children of different maturity status. J Strength Cond Res 26: 1015–1026, 2012.
13. Moir G, Button C, Glaister M, Stone MH. Influence of familiarization on the reliability of vertical jump and acceleration sprinting performance in physically active men. J Strength Cond Res 18: 276–280, 2004.
14. Moritani T, DeVries HA. Neural factors versus hypertrophy in the time course of muscle strength gain. Am J Phys Med 58: 115–130, 1979.
15. Mukaka MM. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24: 69, 2012.
16. Negra Y, Chaabene H, Hammami M, Amara S, Sammoud S, Mkaouer B, Hachana Y. Agility in young athletes: Is it a different ability from speed and power? J Strength Cond Res 31: 727–735, 2017.
17. NFL events: Combine top performers2017. Available at: http://www.nfl.com/combine/top-performers. Accessed May 25, 2017.
18. Pedhazur EJ. Statistical Control: Partial and Semipartial Correlation. In: Multiple Regression in Behavioral Research. Fort Worth, TX: Harcourt Brace College Publ, 1997. pp. 156–194.
19. Rahman MM, Govindarajulu Z. A modification of the test of Shapiro and Wilk for normality. J Appl Stat 24: 219–235, 1997.
20. Robbins D. Relationships between National Football League combine performance measures. J Strength Cond Res 26: 226–231, 2012.
21. Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull 86: 420–428, 1979.
22. Sierer SP, Battaglini CL, Mihalik JP, Shields EW, Tomasini NT. The National Football League combine: Performance differences between drafted and nondrafted players entering the 2004 and 2005 drafts. J Strength Cond Res 22: 6–12, 2008.
23. Stewart PF, Turner AN, Miller SC. Reliability, factorial validity, and interrelationships of five commonly used change of direction speed tests. Scand J Med Sci Sports 24: 500–506, 2014.
24. U.S. Army All American Bowl Website, 2015. Available at: http://www.usarmyallamericanbowl.com/national-combine/testing-information/. Accessed May 11, 2017.
25. Vescovi JD, McGuigan MR. Relationships between sprinting, agility, and jump ability in female athletes. J Sports Sci 26: 97–107, 2008.
26. Vincent WJ, Weir JP. Quantifying Reliability. In: Statistics in Kinesiology. Champaign, IL: Human Kinetics, 2012. pp. 213–228.
27. Warburton DER, Jamnik VK, Bredin SSD, Gledhill N. The physical activity readiness questionnaire for everyone (PAR-Q+) and electronic physical activity readiness medical examination (ePARmed-X+). Health Fit J Can 4: 3–23, 2011.
28. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res 19: 231–240, 2005.
Keywords:

youth; power; speed; agility; athletic development

© 2018 National Strength and Conditioning Association