Secondary Logo

Journal Logo

Original Research

Predicting Soldier Task Performance From Physical Fitness Tests: Reliability and Construct Validity of a Soldier Task Test Battery

Spiering, Barry A.; Walker, Leila A.; Larcom, Kathleen; Frykman, Peter N.; Allison, Stephen C.; Sharp, Marilyn A.

Author Information
Journal of Strength and Conditioning Research: October 2021 - Volume 35 - Issue 10 - p 2749-2755
doi: 10.1519/JSC.0000000000003222
  • Free

Abstract

Introduction

Soldiers' duties often include strenuous tasks, such as heavy lifting, load carriage (LC), and various battlefield maneuvers (18). Because these arduous tasks place soldiers at an increased risk of occupational injury (7), a need exists to develop a valid and reliable battery of tests that can help military researchers assess soldiers' fitness for duty. Although multiple physical tests currently exist for soldiers (e.g., Army Physical Fitness Test [APFT], occupational physical assessment test [OPAT], etc.), these existing tests possess limitations. For instance, the OPAT is intended to predict soldiering performance in untrained soldiers, and although it was validated against soldiering tasks (3), it does not measure performance on soldiering tasks themselves. This can be problematic because of the degree of error associated with predicting soldier task performance.

To address the limitations of the existing soldier fitness tests, our laboratory group has developed several “soldier task tests” that simulate occupational and combat-related duties of soldiers (4,6,12,14,17). These tests are based on observations of common soldier tasks (18), essential tasks listed within the Soldier's Manual of Common Tasks (20), and input from subject matter experts. These tests possess face validity, as they replicate demanding soldiering tasks. Moreover, in a previous study (19), we established the reliability of 7 of these tests when performed individually on separate days.

The overarching aim of the present research was to further assess the usefulness of these soldier task tests for future use by military researchers. More specifically, we sought to develop a battery of tests that could be used to holistically assess a soldier's ability to perform common tasks. To achieve this aim, we selected 4 soldier task tests that represented a cross-section of common soldier tasks, nearly all soldiers might have to perform when deployed, and demonstrated acceptable reliability from our previous study (19). Next, we examined the test-retest reliability and the construct validity of the 4 soldier task tests when performed in sequence on the same day. Ultimately, this research allowed us to answer the following questions: (a) does performing the soldier task tests as a battery on the same day affect the test-retest reliability vs. performing the tests individually on separate days? (b) what is the test-retest reliability of composite performance on the overall test battery? (c) what physical fitness constructs underlie performance on the soldier task tests? (This information can be used to design fitness training programs to improve performance on soldier tasks.); and (d) can physical fitness tests be used to predict performance on the soldier task test battery (STTB)? (This information can be used to select field-expedient tests to assess soldier performance.)

Methods

Experimental Approach to the Problem

This study comprised 2 parts. In the first part, 33 enlisted soldiers (31 men and 2 women) completed the 4-event STTB on 4 occasions, each separated by at least 1 week. The STTB consisted of the following tests, in order: (a) 30-m grenade throw (GT) for accuracy; (b) running long jump (RLJ) while wearing a 20.5-kg fighting load; (c) 1 repetition maximum box lift (1RMBL) performed from the ground to the height of 155 cm; and (d) 3.2-km LC time trial while wearing a 33-kg approach march load. Approximately 10–15 minutes of rest was provided between each test. The STTB required approximately 2–3 hours to complete. Raw scores were examined to determine the reliability of each individual test and then compared with our previously published findings (19) to determine whether performing the tests on the same day affected the reliability of the tests compared to performing the tests individually on separate days. Subsequently, raw scores were converted to z-scores and then summed across the 4 tests to generate a composite score that reflected each soldier's overall performance on the STTB for a given trial. The sum of z-scores was then analyzed to assess the test-retest reliability of the STTB as a whole system.

In the second part, 41 enlisted male soldiers (31 of whom participated in the first part) completed the STTB, as well as a series of physical fitness tests. The physical fitness tests included measurements of body composition (i.e., lean mass, fat mass, and percent body fat), muscular strength (i.e., 1RM bench press and 1RM leg press), muscular power (i.e., ballistic bench throw using 30% 1RM, seated medicine ball put, standing long jump [SLJ], and vertical jump), muscular endurance (i.e., 2-minute push-up and 2-minute sit-up), and cardiovascular endurance (i.e., treadmill peak oxygen uptake [V˙o2peak] and 3.2-km run). Physical fitness tests were conducted over 4 visits, and the STTB was conducted on a separate visit. All visits were separated by at least 48 hours. Bivariate correlation was used to determine the physical fitness constructs (i.e., body composition, muscular strength, muscular power, muscular endurance, and cardiovascular endurance) that underlie performance on the soldier task tests. Subsequently, forward-stepwise linear regression was used to determine the validity of using field-expedient tests to predict performance on the common soldier tasks.

Subjects

The first part of the study (test-retest reliability) involved 33 enlisted soldiers (±SD; 31 men and 2 women; 23 ± 3 years; 1.75 ± 0.08 m; and 81.4 ± 12.8 kg). The second part of the study (construct validity) involved 41 enlisted male soldiers (22 ± 3 years; 1.75 ± 0.08 m; and 81.4 ± 12.9 kg). To be included, soldiers had to be 18–35 years old and have passed their most recent APFT. Soldiers were excluded if they were identified as having a current or previous injury or other medical condition that would contraindicate participation, if they recently (within the previous 2 months) began a physical training program designed to increase muscle strength/power, or if they were pregnant. An Army Medical Officer screened all volunteers before participation in the study. Subjects were instructed to maintain their normal exercise routine for the duration of the study. Subjects were informed of the requirements and potential risks of participation and then voluntarily signed a written informed consent document approved by the institutional review board of the U.S. Army Research Institute of Environmental Medicine. The study was approved by institutional review board of the U.S. Army Research Institute of Environmental Medicine. The investigators adhered to the policies for protection of human subjects as prescribed in Army Regulation 70–25, and the research was conducted in adherence with the provisions of 45 CFR part 46.

Procedures

Soldier Task Tests

The soldier task tests were all performed as previously described (19). A brief description of the individual tests is provided below.

Grenade Throw

Volunteers began the test in a squatting position, with both feet behind a line, and the nonthrowing arm/shoulder pointed toward the target. Volunteers stood-up, took 1 step forward, and threw a mock grenade at a target placed on the ground 30 m away. For each trial, soldiers were given 1 practice attempt followed by 5 throws for record. The distance from where the grenade landed to the center of the target was measured using a laser distance meter (Disto Plus; Leica Geosystems, Inc., Norcross, GA). Volunteers were given ∼30–60 seconds of rest in between attempts. The average distance to the target of 5 GTs was the variable entered into analysis.

Running Long Jump

Soldiers performed the RLJ while wearing a 20.5-kg fighting load. Soldiers ran 3 m and leapt as far forward as possible. The distance from the take-off point to the landing point closest to the take-off point was measured using a laser distance meter. Subjects were given 3 attempts, with ∼30–60 seconds of rest between attempts. The average distance of the 3 attempts was the variable entered into analysis.

One Repetition Maximum Box Lift

The 1RMBL test assessed the heaviest box that a soldier could lift from the ground and place onto a 155-cm platform (i.e., the height of a 5-ton army truck bed). Weight was gradually increased until the lift could not be completed using correct technique. Volunteers were given ∼2–3 minutes of rest between attempts. The variable entered into analysis was the maximal weight lifted onto the platform.

Load Carriage

The LC test consisted of a 3.2-km time trial (run or walk as fast as possible) on level ground while carrying an approach march load (∼33 kg), which consisted of weighted vest (15.5 kg), combat boots (2.3 kg), uniform (1.4 kg), helmet (1.4 kg), backpack (9.3 kg), and simulated rifle (3.4 kg). The variable entered into analysis was time-to-completion.

Predictor Tests: Body Composition

Body composition was assessed through whole-body scans using dual-energy x-ray absorptiometry (Prodigy; GE Medical Systems, Chicago, IL). Total-body estimates of percent fat, bone mineral density, and body content of bone, fat, and nonbone lean tissue were determined using manufacturer described procedures and supplied algorithms (Total Body Analysis, version 3.6; Lunar Corp., Madison, WI).

Muscular Strength

One repetition maximum was assessed for the bench press and leg press. Following a warm-up, weight was gradually increased until the lift could not be completed using correct technique. In general, <7 attempts were given to reach 1RM, with ∼2–3 minutes of rest between attempts.

Muscular Power

Lower-body explosive power was measured using a vertical jump and a horizontal jump test. Vertical jump height using a countermovement jump and arm swing was measured using a Vertec measuring device (Gill Athletics, Champaign, IL). Vertical jump height was measured as the distance from standing reach height to peak jump height to the nearest half-inch. Standing long jump was measured with the volunteer standing with toes behind a line. Using a countermovement and arm swing, the volunteer jumped as far forward as possible. The distance from the starting line to the heel of the foot closest to the starting line was recorded. Subjects were given 3 attempts for each test, with ∼30–60 seconds of rest between each attempt. The maximal score was used for analysis.

Upper-body explosive power was measured using a 2-handed medicine ball put and a ballistic bench throw. For the medicine ball put, the volunteer sat in a chair placed against a wall with the spine pushed firmly against the chair back and held a 2-kg medicine ball to the chest with both hands. On command, the volunteer pushed the ball off the chest and threw/putted it as far forward as possible. Subjects were observed to ensure that their backs remained pressed against the chair and that feet remained planted during the throw. Volunteers were provided 3 attempts, with ∼30–60 seconds of rest between attempts. The farthest distance achieved was recorded. For the ballistic bench throw, the volunteer lay on a flat bench while in a MaxRack (MaxRack, Inc., Columbus, OH) interfaced with a ballistic measurement system (Optimal Kinetics, Muncie, IN). The barbell was loaded with 30% of measured 1RM. The subject threw the bar vertically as high as possible in a ballistic, explosive manner. Subjects were observed to ensure that backs and hips remained pressed against the bench and that feet remained planted during the throw. Volunteers were given 3 attempts, with ∼30–60 seconds of rest between attempts. Peak power (W and W·kg−1 body mass) was the primary variable of interest.

Cardiovascular Endurance

Peak oxygen uptake (V˙o2peak) was measured using a continuous treadmill running protocol and a metabolic measurement system (TrueMax 2400; ParvoMedics, Salt Lake City, UT). Subjects performed a 5-minute warm-up at 0% grade and 2.68 m·s−1. After the warm-up, the treadmill grade was increased to 5%. The grade was then increased by 2.5% every 3 minutes until voluntary exhaustion. The criteria for achieving V˙o2peak were a plateau (<2 ml·kg−1·min−1) in oxygen consumption despite an increased workload, heart rate in excess of 90% of age-predicted maximum, or a respiratory exchange ratio in excess of 1.0.

Army Physical Fitness Test

Each soldier's most recent APFT scores (3.2-km run, 2-minute push-ups, 2-minute sit-ups, and overall score) were obtained through self-report. Previous research has reported the validity of this approach (9,13).

All testing was conducted indoors, with the exception of the GT and LC tests, which were performed outdoors. Outdoor testing was suspended if the Wet Bulb Globe Temperature (WBGT) was above heat category 2 (31° C WBGT).

Statistical Analyses

The statistical approach used to assess reliability was in accordance with procedures recommended by Atkinson and Nevill (1) and described in detail in our previous publication (19). Briefly, repeated-measures analysis of variance (ANOVA) was used to identify potential learning effects (i.e., significant improvements in performance from one trial to the next). In the event of a significant (p ≤ 0.05) F value, a Fisher least significant difference post hoc test was used to determine pairwise differences. Next, the reliability of each test was examined (see Ref. (1) for a review of reliability analyses). Because learning effects existed for some of the tests (i.e., significant F scores), and to standardize the analysis between the various tests, only trial 3 and trial 4 were compared to calculate the reliability statistics (i.e., intraclass correlation coefficients [ICCs], SEM, and limits of agreement [LOA]) for each of the 4 soldier task tests.

To assess the test-retest reliability of the STTB as a whole system, raw scores for each test were converted to z-scores and then summed across the 4 tests to generate a composite score that reflected each soldier's overall performance on the test battery. The inverse score was used for the GT and LC, so that a higher z-score indicated a better outcome (smaller throwing error and faster time, respectively). The reliability of the composite performance was assessed using ICC. To remain consistent with the methods above, only trial 3 and trial 4 were compared to calculate the ICC.

For the construct validity part of the research, Pearson product-moment correlations were used to determine the associations between soldier task tests and the physical fitness tests/underlying physical fitness constructs. To evaluate the best predictors of the STTB, forward-stepwise multiple linear regression was used to predict performance on the individual soldier task tests and the composite score using the physical fitness tests.

Results

In the first part of the study, repeated-measures ANOVA revealed significant (p < 0.05) learning effects for RLJ, 1RMBL, and LC (Table 1). Interestingly, RLJ did not demonstrate any learning effects between trial 1 and trial 2; however, performance improved 2% (p < 0.05) between trial 2 and trial 3 and 2% (p < 0.05) between trial 3 and trial 4. The 1RMBL required 3 trials (i.e., 2 familiarization trials and 1 trial for record) to obtain stable values, while LC required 2 trials. More specifically, 1RMBL performance increased 5% between trial 1 and trial 2 and 4% between trial 2 and trial 3, and LC performance improved by 4% between trial 1 and trial 2. Average GT performance did not change significantly across trials.

Table 1 - Performance (mean ± SD) during repeated measurements for individual soldier readiness tests (n = 33).*
Test Trial 1 Trial 2 Trial 3 Trial 4
GT (cm from target) 430 ± 312 435 ± 340 406 ± 301 384 ± 301
RLJ (cm) 215 ± 31 218 ± 33 222 ± 35 227 ± 37
1RMBL (kg) 59 ± 14 61 ± 13 64 ± 13 64 ± 15
LC (min) 26.4 ± 3.8 25.5 ± 3.2 25.9 ± 3.3 25.9 ± 3.8
*GT = mock grenade throw for accuracy; RLJ = running long jump; 1RMBL = 1 repetition maximum box lift; LC = 3.2-km load carriage time trial.
Significantly (p < 0.05) different from following trial.

Values for ICCs ranged from 0.81 to 0.97 for all tests (Table 2). The RLJ, 1RMBL, and LC produced SEM values that were 4–6%, while GT produced an SEM of 18%. With regards to the LOA analysis, LC and GT data were heteroscedastic and the RLJ and 1RMBL data were homoscedastic; the LOA results are provided in Table 2. The ICC of the composite performance (reflected by the sum of z-scores) was 0.95 (Figure 1).

Table 2 - Reliability of soldier task tests when performed as a battery on the same day (current study) vs. performed individually on separate days (Spiering et al. (19)).*
Test Data source Trial 1 value (mean ± SD) Learning effects ICC (95% CI) SEM (% of mean) LOA Ratio LOA
GT (cm from target) Current 430 ± 312 None 0.97 (0.94–0.99) 73 cm (18%) ±51%
Spiering et al. (19) 401 ± 290 Trial 1 v. 2, trial 2 v. 3 0.79 (0.62–0.89) 113 cm (36%) ±99%
RLJ (cm) Current 215 ± 31 Trial 2 v. 3, trial 3 v. 4 0.96 (0.92–0.98) 9 cm (4%) ±26 cm
Spiering et al. (19) 221 ± 34 None 0.89 (0.80–0.94) 11 cm (5%) ±31 cm
1RMBL (kg) Current 59 ± 14 Trial 1 v. 2, trial 2 v. 3 0.94 (0.88–0.97) 3 kg (5%) ±10 kg
Spiering et al. (19) 62 ± 10 Trial 1 v. 2 0.88 (0.78–0.94) 4 kg (5%) ±10 kg
LC (min) Current 26.4 ± 3.8 Trial 1 v. 2 0.81 (0.65–0.90) 1.6 min (6%) ±17%
Spiering et al. (19) 23.8 ± 2.6 None 0.81 (0.62–0.91) 1.3 min (5%) ±15%
*CI = confidence interval; ICC = intraclass correlation coefficient; LOA = limits of agreement; GT = mock grenade throw for accuracy; RLJ = running long jump; 1RMBL = 1 repetition maximum box lift; LC = 3.2-km load carriage time trial.
Trial 1 values are included as a point of reference. Trial 3 and trial 4 data were used to calculate ICC, SEM, and LOA values for the current study.
Significantly (p < 0.05) different than values obtained in the current study.

Figure 1.
Figure 1.:
Reliability of the summed z-scores for individual soldiers (n = 33) during trial 3 and trial 4. Summed z-scores provide an index of the individual's overall performance on the 4 soldier task tests. ICC (2, 1) = intraclass correlation coefficient, 2-way random-effects model, single measures.

Tables 3 and 4 list the descriptive statistics for each soldier task test and physical fitness test, respectively, for the second part of the study. Examining the correlations between the soldier task tests and the physical fitness tests (Table 5), GT was not significantly correlated with any measure of body composition or physical fitness; RLJ was related to body composition (i.e., lean mass, fat mass, and percent body fat) and measures of muscular power (i.e., bench throw, medicine ball put, SLJ, and vertical jump); 1RMBL was related to lean body mass and measures of strength, power, and cardiovascular endurance, but was not correlated with the APFT events; LC was correlated with V˙o2peak. The sum of z-scores was correlated to lean body mass and measures of strength, power, and cardiovascular endurance, but was not correlated with the APFT events.

Table 3 - Descriptive statistics for the soldier task tests (n = 41).*
Soldier task tests Mean ± SD Range
Grenade throw (cm) 427 ± 282 123–1,218
Running long jump (cm) 217 ± 30 161–281
1RM box lift (kg) 59.6 ± 11.2 38.6–90.9
Load carriage (min) 26.27 ± 3.44 18.70–35.62
*1RM = 1 repetition maximum.

Table 4 - Descriptive statistics for the physical fitness tests (n = 41).*
Physical fitness tests Mean ± SD Range
Lean mass (kg) 60.5 ± 8.0 40.7–79.4
Fat mass (kg) 17.4 ± 6.3 4.0–33.5
Body fat (%) 22.0 ± 5.7 8.8–32.4
Bench press (kg) 88.5 ± 21.5 50.0–147.7
Leg press (kg) 203.0 ± 43.4 127.7–355.2
Bench throw peak power (W) 693 ± 181 408–1,185
Bench throw peak power (W·kg−1) 8.52 ± 1.70 5.79–12.85
Medicine ball put (cm) 707 ± 91 554–950
Standing long jump (cm) 210 ± 27 151–260
Vertical jump (cm) 52 ± 8 37–69
APFT 2-min push-ups (#) 65 ± 13 45–100
APFT 2-min sit-ups (#) 69 ± 11 34–93
o 2peak (ml·kg−1·min−1) 51.6 ± 6.9 34.8–64.7
o 2peak (L−1·min−1) 4.13 ± 0.52 2.87–4.99
APFT 3.2-km run (min) 14.59 ± 1.32 11.70–18.40
APFT overall score 251 ± 28 195–300
*APFT = Army Physical Fitness Test.

Table 5 - Pearson product-moment correlations between soldier task tests and physical fitness tests (n = 41).*
Physical fitness test Underlying physical construct Grenade throw (cm) Running long jump (cm) 1RM box lift (kg) Load carriage (min) Sum of z-scores
Lean mass (kg) BC −0.20 0.32 0.74 −0.21 0.53
Fat mass (kg) BC −0.11 −0.35 0.27 0.12 −0.04
Body fat (%) BC −0.04 −0.54 0.02 0.16 −0.23
Bench press 1RM (kg) MS −0.26 0.13 0.60 −0.10 0.39
Leg press 1RM (kg) MS −0.04 0.19 0.62 0.00 0.31
Bench throw peak power (W) MP −0.12 0.28 0.69 −0.03 0.41
Bench throw peak power (W·kg−1) MP −0.01 0.33 0.36 0.02 0.24
Medicine ball put (cm) MP −0.23 0.32 0.65 −0.10 0.47
Standing long jump (cm) MP −0.08 0.86 0.41 −0.10 0.52
Vertical jump (cm) MP −0.09 0.70 0.39 0.02 0.42
APFT 2-min push-ups (#) ME 0.04 0.10 0.05 −0.13 0.09
APFT 2-min sit-ups (#) ME 0.13 0.13 −0.05 −0.21 0.06
o 2peak (ml˙kg−1˙min−1) CV −0.12 0.17 −0.38 −0.32 0.02
o 2peak (L˙min−1) CV 0.04 0.23 0.43 −0.49 0.46
APFT 3.2-km run (min) CV −0.06 −0.15 0.12 0.30 −0.10
APFT overall score Other 0.17 0.15 −0.05 −0.29 0.08
*1RM = 1 repetition maximum; BC = body composition; MS = muscular strength; MP = muscular power; ME = muscular endurance; CV = cardiovascular endurance; APFT = Army Physical Fitness Test.
p < 0.05.

The linear regression models developed to predict solder task performance are listed in Table 6. Linear regression revealed no significant predictors of GT performance, as there were no significant correlations between GT and any of the predictors. Standing long jump accounted for 73% of the variation in RLJ performance. Lean mass and peak power during the bench throw (W) accounted for 59% of the variation in 1RMBL. Thirty percent of the variance in LC performance was accounted for by V˙o2peak (L·min−1) and fat mass (kg). Forty-one percent of the variability in the sum of z-scores was accounted for by lean body mass and SLJ distance. The coefficient of variation was within 12% for the RLJ, 1RMBL and LC predictive equations.

Table 6 - Predictive equations, adjusted R2 values, SEE, and coefficients of variation (CV) for individual soldier task tests as well as overall performance on the test battery (sum of z-scores).
Adjusted R 2 SEE (CV%)
GT: no significant predictors were identified
RLJ = 14.78 + 0.96 (SLJ cm) 0.73 15 cm (4%)
1RMBL = −2.66 + 0.69 (lean mass kg) + 0.02 (bench throw W) 0.59 7.2 kg (12%)
LC = 39.53–3.96 (V̇o 2peak L·min−1) + 0.18 (fat mass kg) 0.30 2.89 min (11%)
Sum of z-scores = −17.87 + 0.15 (lean mass kg) + 0.04 (SLJ cm) 0.41 2.11 (N/A)
CV = SEE/mean; GT = grenade throw; RLJ = running long jump; SLJ = standing long jump; 1RMBL = 1 repetition maximum box lift; LC = load carriage; N/A = CV not calculated for sum of z-scores because mean value equals zero.

Discussion

The overarching aim of the present research was to develop a battery of tests that could be used by military researchers to holistically assess a soldier's ability to perform common tasks. In general, we found that the present results agreed with our previous findings (19) (see Table 2 for details), indicating that performing the tests as a battery had minimal effect on the reliability of the individual tests. However, there were some exceptions. With respect to the GT test, the current study found no learning effects, while our previous study found significant learning effects. In addition, reliability outcomes for the GT test were better in the current study compared with our previous study (ICC: 0.97 vs. 079; SEM: 73 vs. 113 cm; LOA: ±51 vs. ± 99%, respectively). One procedural difference between the 2 studies might partially explain these discrepant findings. In the previous study (19), soldiers were allowed 5 practice throws before performing the GT test for record. However, in this study, only 1 practice throw was allowed to improve time efficiency when testing groups of soldiers. We speculate that perhaps this reduction in practice contributed to the lack of significant learning effects in this study. We are unable to explain why the reliability of the GT test was improved in the current study compared with our earlier study (19). To the best of our knowledge, there are no other published studies on the reliability of GT accuracy to which we might compare these findings.

The RLJ test is a unique measure of soldier performance developed by our laboratory and not otherwise reported in the literature. For the RLJ test, the present results strongly agree with our previous findings (19). Neither study found significant learning effects between trial 1 and trial 2. However, in the current study, soldiers' RLJ performance significantly improved by 2% between trial 2 and trial 3, and by another 2% between trial 3 and trial 4. Although statistically significant, these 2% improvements represent a relatively small effect size (<0.2). The 1RMBL results also agree with our previous findings (both studies found an SEM of 5% and LOA of ± 10 kg). That said, in the current study, soldiers required 3 trials to reach a stable value, while in our previous study, the soldiers required only 2 trials. Conversely, other previously published research indicated no learning effects for maximal box lifting task tests (16). With respect to the LC test, the reliability indices were nearly identical between the 2 studies (Table 2). The only discrepancy was that soldiers in this study required 2 trials to obtain a stable value, which might be explained by the significantly slower mean values obtained during trial 1 in this study compared with our previous study (26.4 vs. 23.8 minutes, respectively; Table 1). The LC reliability indices generally agree with previously published reports as well (2). Overall, the results of our 2 studies indicate (a) 4 trials are necessary to obtain stable baseline values on all tests; (b) the RLJ, 1RMBL, and LC tests demonstrate excellent reliability; (c) the GT test demonstrates marginal reliability; and, importantly, (d) performing the tests as a battery had minimal influence on the reliability of the individual tests.

The second part of this research was designed to assess the construct validity of the test battery. We accomplished this goal by examining the correlations between the physical fitness tests and the soldier task tests. The RLJ test was highly correlated with measures of body composition (i.e., lean mass, fat mass, and percent body fat) and muscular power (i.e., bench throw, medicine ball put, vertical jump, and SLJ). Therefore, to enhance RLJ performance, training should focus on developing lean body mass and muscular power.

The GT test is considered a common task for soldiers, and the importance can be inferred by its inclusion in the U.S. Marine Corps' Combat Fitness Test. Although GT may be an important skill for the military, it stands out as a test that is not highly correlated with standard measures of physical fitness. No reports of significant correlations between GT accuracy and measures of physical fitness were found in the literature. This is likely because GT is a highly skilled, technique-dependent activity. Therefore, to enhance GT accuracy, training should focus on performing GT-specific practice and drills.

The 1RMBL was associated with lean mass and all measures of physical fitness except for the APFT. Recently, Hauschild et al. (5) published a systematic review and meta-analysis of correlations between military-relevant occupational tasks and physical fitness tests. They (5) reported correlations between single-repetition lifting tasks and upper-body strength/power of r = 0.75 (95% confidence interval [CI]: 0.66–0.81), lower-body strength/power of r = 0.60 (95% CI: 0.52–0.67), and cardiovascular endurance of r = 0.30 (95% CI: 0.15–0.44); their findings (5) generally agree with the present results (Table 5). Similarly, Hydren et al. (8) published a systematic review and meta-analysis of correlations between physical fitness tests and maximal lift capacity. Our results strongly agree, in that both studies found (a) lean mass was the strongest overall predictor and (b) tests of muscular strength had stronger correlations than tests of muscular endurance.

The LC was significantly correlated with V˙o2peak (L·min−1) (r = −0.49). A previous report supports this finding (11). Extrapolation of these results supports the notion that soldiers should conduct endurance exercise training to improve their LC performance. However, previous research indicates that soldiers need to conduct supplementary, LC-specific training to optimize improvements in LC performance (10). More specifically, previous research indicates that twice monthly sessions of supplementary LC training produces optimal improvements in LC performance (10).

The sum of z-scores was used to quantify performance on the overall test battery. Using this metric, we found significant correlations between overall performance and lean body mass, as well as measures of muscular strength, power, and cardiovascular endurance. Extrapolation of these results indicates that the overall performance on the test battery might be enhanced by conducting a resistance and aerobic training program designed to improve lean body mass, and measures of muscular strength, power, and cardiovascular endurance.

This study also sought to determine whether physical fitness tests can be used to predict performance on the battery of soldier task tests. This information can be used to select inexpensive, field-expedient tests to assess soldier performance. We accomplished this goal by conducting multiple linear regression analysis between soldier task tests and basic physical fitness tests. The advantages of using field-expedient tests are, in many cases, low cost and time requirements of test administration (due to the lack of specialized equipment), as well as potentially greater safety. Alternatively, the advantage of conducting the soldier task tests themselves is greater face validity and acceptance by soldiers and, ostensibly, greater specificity of associated training (i.e., “train as you would fight”).

Stepwise forward linear regression identified SLJ as the sole predictor of RLJ; the equation produced an R2 of 0.73 and a coefficient of variation of 4%. The SLJ predictor test could easily be implemented in a field setting to assess RLJ performance. Linear regression indicated that lean mass and bench throw peak power (W) were significant predictors of 1RMBL performance (adjusted R2 = 0.59; SEE = 7.2 kg). Considering the equipment-intensive nature of the bench throw test, we tried removing it from the model to find a more field-expedient approach to predicting 1RMBL performance; subsequent forward-stepwise regression revealed that lean body mass plus SLJ performance produced an adjusted R2 of 0.57 and an SEE of 7.3 kg. This indicates that 1RMBL performance can be predicted using field-expedient tests. Linear regression identified V˙o2peak (L·min−1) and fat mass (kg) as significant predictors of LC performance; the equation produced an R2 of 0.30 and a coefficient of variation of 2.89 minutes. Finally, we found that the sum of z-scores could be predicted using lean body mass and SLJ performance, although the validity of this approach (R2 = 0.41; SEE = 2.11) is less than optimal.

The primary limitation of this test battery is that the individual tests were not derived from a thorough job task analysis (15), as was performed in a previously published study of ours (2). Therefore, this test battery cannot be used as a valid physical employment test or to assign soldiers to a specific military occupational specialty. With this limitation in mind, however, the advantages of using the STTB to assess soldier performance are that it directly measures performance on soldiering tasks (e.g., materials handling and LC) and it has greater face validity (and thus, presumably, greater acceptability by the user) than a general physical fitness test.

In conclusion, we investigated the reliability and construct validity of a battery of soldier task tests. The salient findings were (a) performing these tests as a battery on the same day resulted in similar test-retest reliability as performing these tests individually on different days; (b) the overall test battery demonstrated excellent reliability (ICC: 0.95); (c) the overall test battery was correlated to measures of lean mass, muscular strength, muscular power, and cardiovascular endurance, but not to measures of muscular endurance; and (d) field-expedient tests could be used to predict RLJ, 1RMBL, and, to a lesser extent, LC performance, while field-expedient tests could not be used to predict GT accuracy. The results of this study provide practical information for researchers and practitioners who wish to assess and develop soldiers' fitness for occupational and combat-related duties.

Practical Applications

Researchers who wish to assess the occupational and combat-related fitness of soldiers should use tests that are valid (i.e., represent common soldier tasks) and reliable (i.e., provide repeatable results). The battery of tests described in this article represent common soldier tasks (GT accuracy, RLJ while wearing a fighting load, heavy lifting, and LC); moreover, the tests provide consistent results. Researchers should consider using this battery of tests as a benchmark with which to assess soldiers' fitness. In addition, this study sought to identify the basic physical abilities underlying the performance of soldier task tests. Based on these associations, customized physical training programs can be developed to preferentially improve performance on each of the soldier tasks. More specifically, strength and conditioning specialists should focus on developing soldiers' strength, power, and lean body mass (to enhance RLJ and box lifting performance) and cardiovascular endurance (to enhance LC performance). Supplementary, event-specific training is necessary to enhance GT accuracy and LC performance.

Acknowledgments

The authors thank Reeshemah Ward for her help during data collection. This research was supported by appointments to the Research Participation Program at the U.S. Army Research Institute of Environmental Medicine administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Army Medical Research and Materiel Command. This study was funded by the U.S. Army Medical Research and Materiel Command. The opinions or assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the U.S. Army, the Department of Defense, the U.S. Government, or the National Strength and Conditioning Association.

References

1. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med 26: 217–238, 1998.
2. Foulis SA, Redmond JE, Frykman PN, et al. US. Army physical demands study: Reliability of simulations of physically demanding tasks performed by combat arms soldiers. J Strength Cond Res 31: 3245–3252, 2017.
3. Foulis SA, Sharp MA, Redmond JE, et al. Army physical demands study: Development of the occupational physical assessment test for combat arms soldiers. J Sci Med Sport 20(Suppl 4): S74–S78, 2017.
4. Harman EA, Gutekunst DJ, Frykman PN, et al. Effects of two different eight-week training programs on military physical performance. J Strength Cond Res 22: 524–534, 2008.
5. Hauschild VD, DeGroot DW, Hall SM, et al. Fitness tests and occupational tasks of military interest: A systematic review of correlations. Occup Environ Med 74: 144–153, 2017.
6. Hendrickson NR, Sharp MA, Alemany JA, et al. Combined resistance and endurance training improves physical capacity and performance on tactical occupational tasks. Eur J Appl Physiol 109: 1197–1208, 2010.
7. Hollander IE, Bell NS. Physically demanding jobs and occupational injury and disability in the U.S. Army. Mil Med 175: 705–712, 2010.
8. Hydren JR, Borges AS, Sharp MA. Systematic review and meta-analysis of predictors of military task performance: Maximal lift capacity. J Strength Cond Res 31: 1142–1164, 2017.
9. Jones SB, Knapik JJ, Sharp MA, Darakjy S, Jones BH. The validity of self-reported physical fitness test scores. Mil Med 172: 115–120, 2007.
10. Knapik JJ, Bahrke M, Staab J, Reynolds K, Vogel J, O'Connor J. Frequency of Loaded Road March Training and Performance on a Loaded Roach March. Natick, MA: U.S. Army Research Institute of Environmental Medicine, 1990.
11. Knapik JJ, Staab J, Bahrke M, et al. Relationship of Soldier Load Carriage to Physiological Factors, Military Experience and Mood States, Technical Report No. T17-90. Natick, MA: U.S. Army Research Institute of Environmental Medicine, 1990.
12. Kraemer WJ, Mazzetti SA, Nindl BC, et al. Effect of resistance training on women's strength/power and occupational performances. Med Sci Sports Exerc 33: 1011–1025, 2001.
13. Martin RC, Grier T, Canham-Chervak M, et al. Validity of self-reported physical fitness and body mass index in a military population. J Strength Cond Res 30: 26–32, 2016.
14. Nindl BC, Leone CD, Tharion WJ, et al. Physical performance responses during 72 h of military operational stress. Med Sci Sports Exerc 34: 1814–1822, 2002.
15. Payne W, Harvey J. A framework for the design and development of physical employment tests and standards. Ergonomics 53: 858–871, 2010.
16. Rayson M, Holliman D. Physical Selection Standards for the British Army. Phase 4: Predictors of Task Performance in Trained Soldiers, DRA/CHS/PHYS/CR95/017, 1–87. Farnborough, United Kingdom: United Kingdom Defense Research Agency, 1995.
17. Sharp MA, Harman EA, Boutilier BE, Bovee MW, Kraemer WJ. Progressive resistance training program for improving manual materials handling performance. Work 3: 62–68, 1993.
18. Sharp MA, Patton JF, Vogel JA. A Database of Physically Demanding Tasks Performed by U.S. Army Soldiers, Technical Report No. T98-12. Natick, MA: U.S. Army Research Institute of Environmental Medicine, 1998.
19. Spiering BA, Walker LA, Hendrickson NR, et al. Reliability of military-relevant tests designed to assess soldier readiness for occupational and combat-related duties. Mil Med 177: 663–668, 2012.
20. STP-21-1-SMCT. Soldier's Manual of Common Tasks. Washington, DC: Government Printing Office, 2009. DotA Headquarters.
Keywords:

army; materials handling; military; occupational

© 2019 National Strength and Conditioning Association