The analysis of physical requirements of special forces soldiers (4) clearly shows that aerobic endurance, agility, muscle strength, power and endurance of the upper limbs are required (15). During the diverse operations performed by military, police, and emergency personnel (e.g., emergency medical technicians, fire fighters) in their daily activities, these individuals must control their mass and their relatively heavy equipment with their upper limbs. This physical requirement is therefore of paramount importance for their overall physical performance, personal safety, and safety of others. The inability to transport themselves and their equipment rapidly and reliably over or around obstacles can result in injury and possibly death. Similarly, many athletes can only perform optimally if they have sufficient relative strength and power to maneuver their body mass (e.g., gymnasts, rock climbers) as well as equipment (e.g., hockey goalies). Hence, appropriate training, reliable testing, and valid testing are necessary to prepare and identify those personnel who are ready to perform challenging operations and activities.
Historically, the typical methods for assessing upper limb power have been pull-ups (25), push-ups (7), bench press power test (6), and medicine ball put tests (24). To assess the power of the upper limbs, execution time (ET) and relative power output (RPO) indices are widely used in specific tests in different sports (18), standard field tests (6), and standard laboratory tests (28). In the context of military, police, emergency medical personnel and athletes, a strong individual with lower-body mass has the advantage in weight bearing tests and activities (e.g., pull-up or rope climb tests) (3). Compared with individuals with greater muscle mass, the lighter individual is disadvantaged when required to pull, push, lift, or carry an object with greater absolute mass (e.g., a victim, goalie equipment). Dhahbi et al. (11) reported that RPO was a more convenient parameter than ET in the specific rope-climbing test (RCT). The latter test has been recently validated for assessment of power of the upper limbs of Commando soldiers (11).
The concurrent validity, reliability, and responsiveness are basic attributes used for evaluating the validity of any test in sport physiology (1,16). The external responsiveness and intrasession reliability of RCT have not been reported. Dhahbi et al. (11) only considered the intersession reliability and the criterion-related validity of RCT. External responsiveness determines the discriminative ability of a test and usually is assessed by testing differences between 2 groups of individuals with different performance profiles (16). One of the most important aims of the RCT test is to select soldiers. Thus, the external responsiveness of the RCT should discriminate between soldiers of different specialty operations levels (e.g., Commandos vs. Intervention-Brigade). The intrasession error is free of methodological errors, cannot be reduced, and thereby serves as an appropriate baseline for comparisons, remaining independent of other error sources (22). An unreliable or invalid test could allow for the placement of incapable professionals (or athletes), which could impact the safety of the individual and the dependent individuals (e.g., victims in a fire, injured victim in a car accident).
This theoretical background reveals the lack of knowledge on the assessment of the RCT to distinguish performance profiles and its intrasession reliability. Therefore, the aims of this study were to (a) investigate the discriminant ability of RCT (Commandos vs. Intervention-Brigade) and (b) to examine the absolute and relative intrasession reliabilities of RCT.
Experimental Approach to the Problem
The external responsiveness of the RCT was determined by comparing ET, absolute power output (APO), and RPO between 2 groups of soldiers of different specialty operations levels (Commandos vs. Intervention-Brigade). During the second study phase, which aimed to establish the relative and absolute intrasession reliabilities of RCT, the experimental protocol consisted of performing 3 trials of RCT in a single session.
Forty male soldiers (Their ages ranged from 22 to 29 years) belonging to the special units of the National Guard voluntarily participated (Table 1). Twenty-one Commandos soldiers were employed to investigate the discriminant ability of RCT. The inclusion criteria of Commandos soldiers was having regularly trained for at least 4 months in the National Guard School of Commandos, for ∼32 h·wk−1. Training was divided into ∼14 h·wk−1 for fitness training and ∼18 h·wk−1 dedicated to technical and tactical training. Another group of 19 soldiers participated from an Intervention-Brigade. The inclusion criteria of Intervention-Brigade was having trained for at least 8 weeks in the National Guard School of Intervention-Brigade/Commandos, for 4 sessions per week (1 session for strength and conditioning and 3 sessions per week for technical and tactical training), for approximately 2 hours in duration each. Both groups were used to establish external responsiveness, whereas only the anti-terrorism Commandos participated in the intrasession reliability study.
All participants were free from any injury or pain that would prevent maximal effort during performance testing. All the participants gave their written informed consent to the study after receiving a thorough explanation about the protocol. This protocol conformed to internationally accepted policy statements regarding the use of human subjects and was approved by the University Ethics Committee in accordance with the Declaration of Helsinki.
Participants were requested to follow their normal diet, eat a light meal at least 3 hours before each session, keep their usual sleep schedule, and stop any strenuous activity during the last 24 hours before the test. Seven days before baseline testing, 1 session was carried out to familiarize the participants with the measurement protocol. Before starting the tests, the participants achieved 15 minutes of standardized specific warm-up with 5 minutes of rest. Data were collected from participants at approximately the same time of day (between: 9:00 and 11:00 AM) to eliminate any influence of circadian variations on performance (12).
The session was performed outdoors in the following conditions (measurements monitored by a digital environmental station; VaisalaOyj, Helsinki, Finland; every 30 minutes during the experiment): temperature ranged from 15 to 17° C, humidity ranged from 55 to 56% and the wind velocity was light (less than 10 km·h−1). Participants performed the tests wearing the army combat uniform without a bulletproof vest and tactical footwear (the mass of the equipment was ∼5 kg). The protocol consisted of performing 3 trials of RCT, with 5 minutes rest between trials. The experimenter provided strong verbal encouragement during the tests so as to obtain maximum efforts. The Rate of Perceived Exertion (RPE) was recorded immediately after the RCT using the Borg scale (RPE, 1–10) (14).
The RCT test was performed using the criteria outlined in the investigation of Dhahbi et al. (11). The participant was instructed to climb the rope as fast as possible and hit the finish mark (see description below). The manual timer was triggered at the signal of the assessor and stopped when the participant touched the mark that was situated at a height of 5 m above the starting mark. Dhahbi et al. (11) showed excellent concurrent validity of hand timing with no significant difference between the stopwatch and video timer with a low systematic bias (0.18 seconds) and very little difference in standard errors of measurement (SEM) value (<5%). Moreover, Dhahbi et al. (11) found high agreement both within and between the 2 timing methods with the coefficient of correlation at r = 0.99 (p < 0.001) and the intraclass correlation coefficient (ICC) at 0.98. The RCT began with the participant sitting on his buttocks with the rope between his legs, both hands placed on the rope without exceeding the starting mark situated at 1 m above the ground. The climbing was performed without skipping (without momentum), without the use of any gloves, and without using lower limbs (i.e., the legs and feet were not allowed to touch the rope to help climbing) (Figure 1).
The ET was defined as the time between the starting signal and the noise of the slap of the hand hitting the finish mark. Both visual and auditory cues were used by the assessor to ensure that substantial and solid contact was made with the finish mark. The 2 best attempts of the 3 trials were kept for analysis. The removal of the worst trial was an attempt to ensure that a single poor performance did not substantially affect the analysis. To provide greater reproducibility of measurement, only one assessor measured the ET (no interassessor differences in reaction and movement time). The measurement of ET allowed for the estimation of APO and RPO, which were calculated using the following equations:
Data analyses were performed using SPSS version 18.0 for Windows. Mean and SD were calculated after verifying the normality of distributions using Kolmogorov-Smirnov statistics. Estimates of effect size, mean differences, and 95% confidence intervals (CIs) protected against type 2 errors. Independent t-tests were used to evaluate the equality of mean values for Commandos and Intervention-Brigade soldiers' RCT ET, APO, RPO, and RPE. The external responsiveness of the RCT was analyzed using the receiver operating characteristics (ROC) curve (16). The latter analysis determines the sensitivity and specificity of a tool to classify individuals according to a fixed criterion (9). The relative intrasession reliability (i.e., the degree to which individuals maintain their position in a sample over repeated measurements (2)) of the ET, APO, and RPO were determined by calculating the ICC (ICC[3,1]), and the absolute intrasession reliability (i.e., the degree to which repeated measurements vary for individuals (2)) was expressed in terms of SEM and coefficients of variation (CV). Heteroscedasticity was examined. Significance for all the statistical tests was accepted at p ≤ 0.05 a priori.
Discriminant Ability of RCT
Separate group (Commandos and Intervention-Brigade) anthropometric characteristics and RCT indices (ET, APO, RPO, and RPE) are displayed in Tables 1 and 2, respectively. Residual data for anthropometric characteristics and RCT indices were normally distributed (p = 0.052–0.200). Independent sample t-test revealed no difference between groups for age (years) (t = −0.188, p = 0.852, dz = 0.06 [trivial]); body mass (kilogram) (t = −1.018, p = 0.315, dz = 0.32 [moderate]); height (centimeter) (t = −0.043, p = 0.966, dz = 0.01 [trivial]); body mass index (kg·m−2) (t = −0.921, p = 0.363, dz = 0.29 [moderate]); or RPE (t = −0.269, p = 0.789, dz = 0.09 [trivial]). However, ET (t = −5.918, dz = 1.87 [large]), APO (t = 4.255, dz = 1.33 [large]), and RPO (t = 5.122, dz = 1.52 [large]) were significantly higher for Commandos compared with Intervention-Brigade group (p < 0.001). An ROC analysis was performed between Commandos and Intervention-Brigade soldiers: very good discriminant ability was found for RCT. The areas under the ROC curves of ET, APO, and RPO were of 0.91, 0.85, and 0.90, respectively (95% CI: 0.77 to 0.98, 0.70 to 0.94, and 0.77 to 0.98, respectively; p < 0.001) (Figure 2).
Absolute and Relative Intrasession Reliability of RCT
Absolute and relative intrasession reliability indices are expressed in Table 3. Dependent t-tests evaluating the equality of mean values showed no significant test-retest bias for ET (seconds) (t = −0.62, p = 0.55, dz = 0.13 [trivial]); APO (watts) (t = 0.78, p = 0.44, dz = 0.17 [trivial]); RPO (W·kg−1) (t = 0.85, p = 0.41, dz = 0.21 [moderate]), and RPE (t = 0.17, p = 0.87, dz = 0.05 [trivial]). The ET, APO, and RPO showed a high degree of relative reliability between the test-retest trials (ICC[3,1] ranging from 0.96 to 0.97). The SEM of ET, APO, and RPO were 0.23 seconds, 3.25 W, and 0.05 W·kg−1, respectively. The CVs of ET, APO, and RPO were all under 10%. Heteroscedasticity coefficients for ET, APO, RPO, and RPE were all small and statistically nonsignificant (r = 0.01 [p = 0.96], r = 0.40 [p = 0.08], r = 0.43 [p = 0.06], and r = −0.31 [p = 0.16], respectively).
The inability to provide reliable and valid strength and power testing to identify and progressively train athletes, military, police, and emergency medical personnel could result in serious personal injury or injuries to individuals who are dependent on them. Hence, this study assessed the discriminant ability of RCT to distinguish soldiers' specialty level as well as to establish the absolute and relative intrasession reliability. The main findings of this study showed that RCT is a highly reliable intrasession and sensitive tool to differentiate upper-limb power between the 2 groups of soldiers of different operational capacity levels.
One of the main characteristics of the RCT is its discriminant ability. A significant difference was found between ET, APO, and RPO performance of Commandos and Intervention-Brigade groups. Impellizzeri and Marcora (16) suggested that the ROC curve is an appropriate tool to validate the discriminant ability (and responsiveness) of physiological and performance tests and can determine test sensitivity and specificity to classify individuals according to a fixed criterion (5). The area under the ROC curve (AUC) was interpreted as the probability to correctly discriminate Commandos from Intervention-Brigade soldiers using the RCT protocol. An AUC value of 0.5 is interpreted as no discriminatory ability and 1.0 as complete discriminatory ability (9) with an AUC >0.70 considered to indicate good discriminative ability (10,21). In the present study, the AUC values were as follows: 0.91, 0.85, and 0.90 for ET, APO, and RPO, respectively (10). The test scores (ET, APO, and RPO) able to differentiate between groups of soldiers of different operational capacity levels were ≥20.14 seconds, ≥185.64 W, and ≥2.43 W·kg−1, respectively. The ROC consists of a plot of “true positive rate” (sensitivity) vs. “false positive rate” (1 − specificity) for each of several possible cut-off points in changing the score (10). These cut-off values give a true positive rate of 73.7%, for ET, APO, and RPO; and a false positive rate of 95.2%, 85.7%, and 95.2% for ET, APO, and RPO, respectively (Figure 2). Therefore, RCT has excellent discriminant ability if its purpose is to differentiate between Commandos and other specialty soldiers. These results are complementary with those of Dhahbi et al. (11) who included the same group of Commandos who participated in this study. They assessed the internal responsiveness (i.e., the ability to detect longitudinal changes) of the RCT by calculating the likelihood that differences in RCT outcomes were substantial (i.e., the smallest worthwhile change larger than the SEM) (19). This was the case for all ET, APO, and RPO (11), indicating that such data have a good potential to detect real changes in the power output of upper limbs. Also, in the Dhabbi et al. (11) study, the minimal detectable change (2) was used to find the score threshold corresponding to a true change in the performance. They showed that 1.62 seconds, 31.45 W, and 0.41 W·kg−1 or more of ET, APO, and RPO, respectively, were necessary to be 95% confident that a true change has occurred in Commandos soldiers.
Although the typical methods for assessing upper-limb power have been pull-ups (25), push-ups (7), bench press power test (6), and medicine ball put tests (24), few studies provide data on their discriminant ability. For example, there were no data reported for the discriminant ability of 15 seconds pull-ups (23), 15 seconds push-ups (23), bench press (6,26), medicine ball puts or throws (6,27), or single-arm seated shop puts (23). Using a laboratory Wingate test rather than a field test, Koutedakis et al. (17) had excellent discrimination as they could classify 91.8% of their subjects. A good level of discrimination was reported for bench press repeated power test (13) and a medicine ball throw test (8) with youth basketball players and children of 5–7 years, respectively. A rock climbing–specific test (arm jump board test) could discriminate between novice and experienced climbers (18). Hence, the excellent discriminant ability scores for the RCT, which are substantiated by a powerful statistical tool as the ROC curve, should be considered as an important assessment tool for professionals and practitioners in the field. Moreover, no significant difference was found between groups in RPE responses. This strongly suggests that both groups of participants did comparable efforts, most probably maximal efforts. The absence of a significant anthropometric and age differences between groups ensures these variables did affect performance.
The variability between trials may be considered as “intrinsic variation,” as it provides a basic indication of the variation independent from other sources of error. Intrasession reliability of RCT performance is critically important to ensure that observed differences between testing trials are not due to systematic bias, such as a learning effect, fatigue, or random error because of possible biological or mechanical variations. This variability is usually caused by the emotional state of the subject between the trials and his level of adaptation with the measuring system (22). The results demonstrated a very high level of relative reliability of RCT. Other upper-limb field tests such as 15-second pull-ups (0.99) (23), 15-second push-ups (0.96) (23), bench press (0.92–0.98) (6,13), medicine ball throws (0.92–0.97) (6,8), and rock climbing–specific test (0.98) (18) have also reported excellent ICC reliability scores. However, one of the weaknesses of ICC as a measure of relative repeatability is that it is affected by sample heterogeneity (29). Therefore, an examination of the SEM, which provides an absolute index of reliability in conjunction with the ICC is needed to confirm the results of ICC (20). The SEM is not affected by intersubject variability (29) and provides an estimate of measurement error. In addition, if data are homoscedastic, which is the case in the current study (r = 0.01, r = 0.40, and r = 0.43; p > 0.05 for ET, APO, and RPO, respectively), SEM index is more appropriate than CV to establish the absolute reliability (2,29). In this study, SEMs were low for all parameters, less than 5%, thereby confirming the excellent absolute intrasession reliability of RCT. Similarly, Dhahbi et al. (11) found an excellent intersession reliability of RCT: for ET, APO, and RPO; ICC[3,1] values were all higher than 0.90, SEM% all less than 5%, and CV% all less than 10%. Thus, it can be concluded that the RCT has excellent intrasession and intersession reliability.
In conclusion, the RCT has excellent relative and absolute intrasession reliability and a good discriminant ability to detect difference in power performance of upper limbs between the 2 groups of soldiers of different operational capacity levels. A score of 20.14 seconds, ≥185.64 W, and ≥2.43 W·kg−1 for ET, APO, and RPO, respectively, were the cut-off points discriminating elite Commandos from less-trained Intervention-Brigade soldiers. Although these scores were reliable and discriminant in the current study population, these cut-off points may not be the same in other populations and that this should be examined in future studies.
The RCT is a fitness-specific field test designed to evaluate the power of the upper limbs performance of Commando soldiers. The results showed that this test has a good absolute and relative reliability and successfully discriminates soldiers by operational level. Considering that (a) reliability and (b) discriminant ability of a test are 2 important aspects, RCT can therefore be recommended for similar professionals such as the military, police, fire fighters, and emergency medical personnel.
The authors are grateful to all the participants for their enthusiasm and commitment to the completion of this study. They also wish to express their sincere gratitude to Colonel Imed Mekki, director of the Tunisian Commandos' School, for advice and cooperation in this study.
1. Aaronson N, Alonso J, Burnam A, Lohr KN, Patrick DL, Perrin E, Stein RE. Assessing health status and quality-of-life instruments: Attributes and review criteria. Qual Life Res 11: 193–205, 2002.
2. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med 26: 217–238, 1998.
3. Bishop PA, Crowder TA, Fielitz LR, Lindsay TR, Woods AK. Impact of body weight on performance of a weight-supported motor fitness test in men. Mil Med 173: 1108–1114, 2008.
4. Carlson MJ, Jaenen SP. The development of a preselection physical fitness training program for Canadian Special Operations Regiment applicants. J Strength Cond Res 26(Suppl 2): S2–S14, 2012.
5. Chaabene H, Hachana Y, Franchini E, Mkaouer B, Montassar M, Chamari K. Reliability and construct validity of the karate-specific aerobic test. J Strength Cond Res 26: 3454–3460, 2012.
6. Clemons JM, Campbell B, Jeansonne C. Validity and reliability of a new test of upper body power. J Strength Cond Res 24: 1559–1565, 2010.
7. Cuddy JS, Slivka DR, Hailes WS, Ruby BC. Factors of trainability and predictability associated with military
physical fitness test success. J Strength Cond Res 25: 3486–3494, 2011.
8. Davis KL, Kang M, Boswell BB, DuBose KD, Altman SR, Binkley HM. Validity and reliability of the medicine ball throw for kindergarten children. J Strength Cond Res 22: 1958–1963, 2008.
9. de Vet HC, Bouter LM, Bezemer PD, Beurskens AJ. Reproducibility and responsiveness of evaluative outcome measures. Theoretical considerations illustrated by an empirical example. Int J Technol Assess Health Care 17: 479–487, 2001.
10. Deyo RA, Centor RM. Assessing the responsiveness of functional scales to clinical change: An analogy to diagnostic test performance. J Chronic Dis 39: 897–906, 1986.
11. Dhahbi W, Chaouachi A, Padulo J, Behm DG, Chamari K. Five Meters rope-climbing test: Commando-specific power test of the upper-limbs. Int J Sports Physiol Perform 10: 509–515, 2015.
12. Drust B, Waterhouse J, Atkinson G, Edwards B, Reilly T. Circadian rhythms in sports performance—An update. Chronobiol Int 22: 21–44, 2005.
13. Gonzalo-Skok O, Tous-Fajardo J, Arjol-Serrano JL, Mendez-Villanueva A. Determinants, reliability, and usefulness of a bench press repeated power ability test in young basketball players. J Strength Cond Res 28: 126–133, 2014.
14. Haddad M, Chaouachi A, Castagna C, Hued O, Wong DP, Tabbena M, Behm DG, Chamari K. Validity and psychometric evaluation of the French version of RPE scale in young fit males when monitoring training loads. Sci Sports 28: e29–e35, 2013.
15. Harman EA, Gutekunst DJ, Frykman PN, Sharp MA, Nindl BC, Alemany JA, Mello RP. Prediction of simulated battlefield physical performance from field-expedient tests. Mil Med 173: 36–41, 2008.
16. Impellizzeri FM, Marcora SM. Test validation in sport physiology: Lessons learned from clinimetrics. Int J Sports Physiol Perform 4: 269–277, 2009.
17. Koutedakis Y, Sharp NC. A modified Wingate test for measuring anaerobic work of the upper body in junior rowers. Br J Sports Med 20: 153–156, 1986.
18. Laffaye G, Collin JM, Levernier G, Padulo J. Upper-limb power test in rock-climbing. Int J Sports Med 35: 670–675, 2014.
19. Liow DK, Hopkins WG. Velocity specificity of weight training for kayak sprint performance. Med Sci Sports Exerc 35: 1232–1237, 2003.
20. Looney MA. When is the intraclass correlation coefficient misleading? Meas Phys Educ Exerc Sci 4: 73–78, 2000.
21. Mannion AF, Elfering A, Staerkle R, Junge A, Grob D, Semmer NK, Jacobshagen N, Dvorak J, Boos N. Outcome assessment in low back pain: How low can you go? Eur Spine J 14: 1014–1026, 2005.
22. McGinley JL, Baker R, Wolfe R, Morris ME. The reliability of three-dimensional kinematic gait measurements: A systematic review. Gait Posture 29: 360–369, 2009.
23. Negrete RJ, Hanney WJ, Kolber MJ, Davies GJ, Ansley MK, McBride AB, Overstreet AL. Reliability, minimal detectable change, and normative values for tests of upper extremity function and power. J Strength Cond Res 24: 3318–3325, 2010.
24. Sharp MA, Knapik JJ, Walker LA, Burrell L, Frykman PN, Darakjy SS, Lester ME, Marin RE. Physical fitness and body composition after a 9-month deployment to Afghanistan. Med Sci Sports Exerc 40: 1687–1692, 2008.
25. Sporis G, Harasin D, Bok D, Matika D, Vuleta D. Effects of a training program for special operations battalion on soldiers' fitness characteristics. J Strength Cond Res 26: 2872–2882, 2012.
26. Stock MS, Beck TW, DeFreitas JM, Dillon MA. Test-retest reliability of barbell velocity during the free-weight bench-press exercise. J Strength Cond Res 25: 171–177, 2011.
27. Stockbrugger BA, Haennel RG. Validity and reliability of a medicine ball explosive power test. J Strength Cond Res 15: 431–438, 2001.
28. Vossen JF, Kramer JE, Burke DG, Vossen DP. Comparison of dynamic push-up training and plyometric push-up training on upper-body power and strength. J Strength Cond Res 14: 248–253, 2000.
29. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM
. J Strength Cond Res 19: 231–240, 2005.
Keywords:Copyright © 2016 by the National Strength & Conditioning Association.
military; field testing; intrasession error; discriminant ability