Between-Session Reliability of Four Hop Tests and the Agility T-Test : The Journal of Strength & Conditioning Research

Secondary Logo

Journal Logo

Original Research

Between-Session Reliability of Four Hop Tests and the Agility T-Test

Munro, Allan G; Herrington, Lee C

Author Information
Journal of Strength and Conditioning Research 25(5):p 1470-1477, May 2011. | DOI: 10.1519/JSC.0b013e3181d83335
  • Free



Outcome measurement is an important tool in sports exercise science and medicine. It can be used to assess, evaluate, and justify training methods, treatment, and rehabilitation interventions through the identification of an athlete's ability to cope with the physical demands placed upon them (6). During rehabilitation, the use of outcome measurement allows practitioners to evaluate an athlete's progress to minimize the risk of reinjury on return to training and competition.

Outcome measures commonly used to assess knee joint function and subsequently inform when an athlete is ready to return to participation have included clinical measures such as knee joint laxity, range of motion, thigh circumference, and quadriceps strength (14,16). Recently, however, the relationship between such clinical measures and readiness for return to sport has been refuted (3,9,14). Barber et al. (3) realized that for functional limitations of the knee joint to be evaluated, testing that provided an objective measurement while simulating sporting activity was required. A number of tests that mimic sporting performance have been devised and investigated in recent years; these have been termed functional performance tests (FPTs). Functional performance tests, such as the hop tests and Agility T-test (3,6,24), are closed chain in nature and therefore assimilate the joint loading forces and kinematics that occur functionally (14).

A number of studies have shown that the hop tests can detect differences between limbs in injured subjects (3,10,17,20,22), and therefore, they are most commonly used with injured subjects to determine patient function. Hop tests can also be used in healthy populations to evaluate limb symmetry and predict muscular strength and power (11). Limb symmetry index (LSI) is the most commonly used method to assess this by giving a percentage value of 1 limb vs. the other (3,12,20). An LSI of ≥85% indicates that ‘normal’ limb symmetry exists and with regards to injured populations that function of the injured limb is being restored (3). The reliability of hop tests in both injured and uninjured subjects has been investigated and shown to be high (1,2,4,5,12,18,22,23); however, methodologies employed have varied throughout. Firstly, only 2 of the studies to date (1,23) gave information on participants' activity levels; this is important because findings from an athletic population cannot be applied to a sedentary population and vice versa. Furthermore, studies have often used an unequal mix of men and women (1,4,5,12,22), despite the fact that one study has shown significant differences in hop scores between genders (3), which may skew subsequent data analysis and reliability scores. Interestingly, authors have also reported that learning affects were present in some studies (1,4,5,12,22), which may make the reliability values of these studies invalid. Despite reports of learning affects only 1 study has adequately examined differences between trials (4), they found that 3 practice trials were adequate for the triple, crossover, and timed hops, whereas 4 trials may be needed for the single hop. The authors of this study concluded that further investigation of learning affects associated with the hop tests was required.

The Agility T-test has been shown to be sensitive to changes in training patterns and differences in athlete skill levels (8,15). With this in mind, the agility T-test may be a useful tool to assess athlete function and changes in performance during training and rehabilitation programs. Pauole et al. (19) found that within-day reliability for the Agility T-test was excellent and indicated that only 1 trial was needed to achieve a true score. However, participants in this study were given practice trials until they felt comfortable with the test, so it is unclear how many would be needed before performance stability is reached. Furthermore, the studies mentioned (8,15,19) have all employed differing numbers of practice and measured trials but have not presented data of the changes between these trials.

Although the reliability of the hop and Agility T-tests has been investigated previously, learning affects and reliability have not been adequately assessed. Furthermore, no study to date has taken into account the differences between genders reported previously (3) and clearly delineated between the 2 groups. Therefore, the aims of this study were firstly to investigate the learning affects associated with the 4 hop tests and Agility T-test. Secondly, once learning affects were established, a standardized protocol could be determined and the reliability of this protocol investigated to ascertain measurement error values that enable practitioners to evaluate changes in an individual's performance.


Experimental Approach to the Problem

The purpose of this study was threefold: (a) to establish whether gender differences are apparent for each test; (b) to assess learning affects using a single-group repeated-measures design; and (c) to establish a standardized protocol and assess the reliability and associated measurement error of the protocol for the single hop for distance, triple hop for distance, crossover hop for distance, 6-m timed hop, and the Agility T-test.

A sample of recreational athletes was used to determine the learning affects and reliability of the tests. Recreational athletes were used for the results to be applicable to active populations encountered by most practitioners. Gender differences were assessed using a t-test. To assess learning effects, all trials of each test were measured and analyzed through a repeated-measures 1-way analysis of variance (ANOVA). Reliability and measurement error were analyzed by repeating the testing over 3 sessions each separated by 1 week and analyzing scores using intraclass correlation coefficients, SEMs, smallest detectable differences, and 95% confidence intervals.


Twenty-two participants (11 women: age 22.3 ± 3.7 years, height 167.7 ± 6.2 cm, weight 59.2 ± 6.9 kg and 11 men: age 22.8 ± 3.1 years, height 179.8 ± 4 cm, weight 79.6 ± 10 kg) all of whom were university students volunteered for the study. Subjects were required to confirm that they had been free from lower extremity injury, defined as any complaint that stopped the participant from undertaking their normal exercise routine, for at least 6 months before testing, and have no history of lower extremity surgery. To qualify as recreationally active, subjects were required to participate in a minimum of 30 minutes of physical activity 3 times a week on a regular basis over the past 6 months, which included recreational and competitive sports. All participants gave written informed consent to participate, and the research was approved by the University of Salford Research and Ethics Committee.


Participants were tested at the same time of day on 3 separate occasions, separated by 1 week. All participants were asked not to participate in strenuous exercise in the 24 hours before testing and not to eat in the hour prior to testing. Participants were also asked to wear the same training shoes on each occasion so as to negate the effect of different designs of shoe and support they provide on individual performance. Dominant legs were noted as the leg with which the subject would preferentially kick a ball. Each participant's leg lengths were measured on the first test occasion and were measured from the anterior superior iliac spine to the distal tip of the medial malleolus using a standard tape measure while subjects lay supine. Leg length was used to normalize excursion distances by dividing the distance reached by leg length then multiplying by 100 and presented as a percentage value. Limb symmetry index was calculated by dividing the dominant limb by the nondominant limb and multiplying by 100 and presented as a percentage value.

Hop Tests

The single hop for distance, triple hop for distance, 6-m timed hop and crossover hop for distance tests were originally described by Noyes et al. (17). A 6-m long, 15-cm-wide line was marked on the floor, along the middle of which was a standard tape measure, perpendicular to the starting line. To record time for the 6-m timed hop 2 sets of electronic timing gates (Fitness Technology Inc., Australia) were placed on tripods at a height of 0.75 m, 3 m apart, at the start and finish line of the 6-m course. The setup for each hop test is shown in Figure 1.

Figure 1:
Hop test setup.

Subjects performed 6 trials of each hop test, with all trials being measured. Both limbs were tested, and no restrictions were given to subjects regarding the use of arm movement. A rest period of 30 seconds was given between trials and 2 minutes between each of the 4 hop tests (22). Each hop test began with the great toe of the testing leg on the marked start line and the distance hopped was measured to the rear of the foot upon final landing. Subjects were required to maintain the final landing in the single, triple and crossover hop tests for a minimum of 2 seconds. Unsuccessful hops were classified as a loss of balance, an extra hop on landing or touching down of either the contralateral lower extremity or the upper extremity (18).

For the single hop, subjects were required to hop forwards as far as possible along the line of the tape measure and land on the same limb. The triple hop involved participants performing 3 consecutive maximal hops along the line of the tape measure, whereas in the crossover hop subjects maximally hopped forward 3 times, alternately crossing the 15-cm-wide line. Distance was measured from the start line to the rear of the foot upon final landing. In the 6-m timed hop, participants hopped forward as quickly as possible from the start line through the timing gates at the end of the 6-m course. Time was measured from when the subject passed through the first timing gate and stopped when they passed through the second.

Agility T-Test

The Agility T-test was administered as originally set out by Semenick (24). Four cones were arranged in a T shape, with a cone placed 9.14 m from the starting cone and 2 further cones placed 4.57 m on either side of the second cone. All times were recorded using an electronic timing gate (Fitness Technology Inc.), a height of 0.75 and 3 m wide in line with the marked starting point. The test setup is shown in Figure 2.

Figure 2:
T-test setup.

Subjects were asked to sprint forwards 9.14 m from the start line to the first cone and touch the tip with their right hand, shuffle 4.57 m left to the second cone and touch with their left hand, then shuffle 9.14 m to the right to the third cone and touch with their right, shuffle 4.57 m back left to the middle cone and touch with their left hand before finally back pedaling to the start line. Time began upon subjects passing through the timing gates and stopped upon them passing through on return.

Trials were deemed unsuccessful if participants failed to touch a designated cone, crossed their legs while shuffling or failed to face forwards at all times. Subjects performed 4 trials, all of which were measured for learning affects to be evaluated. One minute's recovery was given between each trial.

Statistical Analyses

All statistical analyses were conducted using SPSS for Windows version 16.0 (SPSS Inc., Chicago, IL, USA). Independent t-tests were carried out to assess differences between men and women. Separate 1-way repeated-measures ANOVAs were then carried out on week 1 scores to assess learning affects, with Bonferroni correction applied in instances where significant differences were found. Alpha levels were set at 0.05 for all tests. Effect sizes were determined using the Cohen δ method (25), which defines 0.2, 0.5, and 0.8 as small, medium, and large respectively. Intraclass correlation coefficients (ICCs) (3,1) (21) assessed between-session reliability, from which 95% confidence intervals (CIs), SEM, and smallest detectable difference (SDD) were calculated to establish random error scores. Intracorrelation coefficient values were interpreted according to the following criteria (7): Poor = <0.40; Fair = 0.40-0.70; Good = 0.70-0.90; and Excellent = >0.90 SEM was calculated using the formula:

(25), whereas SDD was calculated from the formula:



Firstly, the results showed that men performed significantly better than women in all tests (p > 0.05); therefore, genders were separated for all further analysis. Effect sizes were high for all tests ranging from 1.08 to 2.99, except the timed hop that was 0.47. Statistical power was therefore low for the timed hop (0.28) and high for all other tests (0.79-1) (25).

Learning Affects

The results showed that learning affects were present in all tests in both men and women, where scores improved across trials. Table 1 shows the means and SDs for all tests and indicates where significant differences between trials were found. For the single and triple hop for distance, tests scores stabilized after 3 trials in all subjects, whereas crossover hop scores stabilized after 4 trials for all subjects. The timed hop stabilized after 4 trials in women and 3 in men. Only 1 trial was needed before scores stabilized for all subjects in the Agility T-test.

Table 1:
Week 1 mean ± SD values for all trials of the 4 hop tests and Agility T-test for male and female subjects (% of leg length * 100, except for timed hop and Agility T-test).

Between-Session Reliability

After establishing how many trials were needed for the scores to stabilize, subsequent trials were used for reliability analysis. Therefore, trials 4-6 for the single and triple hop for all subjects and timed hop in men, trials 5-6 for the crossover hop in all subjects and timed hop for women and trials 2-4 for all subjects in the Agility T-test were used to calculate ICC, 95% CI, SEM, and SDD values. These values are presented in Table 2.

Table 2:
Mean, SD, 95% CIs, SEM, SDD, and ICC values for the 4 hop tests and T-test (after practice trials).*

Limb Symmetry Index

The mean and SD LSI scores for both men and women are shown in Table 3 and ranged from 98.38 to 101.61%. Table 4 also shows that all subjects achieved an LSI score of at least 90% in all hop tests, whereas 40% of subjects achieved at least 95% LSI on all hop tests.

Table 3:
Limb symmetry index mean and SD values for all subjects.
Table 4:
Results of limb symmetry index values for each hop test and all 4 combined.*


The use of FPTs has become increasingly popular as a mode of assessment during rehabilitation and training programs. However, it is important that these tests are reliable and that the results of the tests can be interpreted appropriately. Therefore, information regarding whether practice trials are needed because of learning effects and the development of a reliable, standardized protocol that takes this into account is highly important for practitioners.

The results of the current study indicate that learning effects are present in the administration of the hop for distance tests and Agility T-test. Bolgla and Keskula (4) previously described learning affects being present during hop test administration, where they indicated that 3 practice trials should be included for all hop tests, but may not be adequate. In the current study, we found that 3 practice trials were enough in the single and triple hop tests, whereas 4 trials were needed during the crossover hop, probably because of the increased complexity of the task at hand. Learning effects were different between genders for the timed hop, with men needing less familiarization than women did. Learning effects in the agility T-test have not previously been investigated, with studies simply stating that participants were allowed to familiarize themselves with the test or no practice trials were given at all (8,19). We found that only 1 practice trial was needed in both genders for familiarization purposes.

For the results of these tests to be reliable when used with subjects, it is important for the correct number of practice trials to be included to allow subjects the chance to familiarize. In turn, this will give more consistent and reliable results that reflect an individual's performance.

Reliability is an important aspect of performance testing; if a test is not reliable, we are unable to gain anything from the results it produces. Test-retest reliability of all the hop tests in the current study except the timed hop for men had good or excellent reliability scores (7). Previous studies have reported ICC values of between 0.66 and 0.99 (1,2,4,5,18,22,23), which reflect the findings of the current study. Interestingly, the lowest score of 0.66 was for the timed hop (4), which mirrors our finding for the male timed hop. When the values for the timed hop are removed, ICC scores, including those from the current study, range from 0.80 to 0.99, which indicates that the hop for distance tests are reliable (8). The low reliability scores for the timed hop are reflected in the small effect size and power values this test produced; this calls into question whether this particular test should be included in injury and rehabilitation screening.

Only 2 studies that we are aware of have calculated SEM values for raw (nonnormalized) hop test scores (4,5). Booher et al. (5) only looked at the single and timed hop tests, whereas Bolgla and Keskula (4) conducted all 4 tests. In each of these studies, the number of men and women was unequal, and participant activity levels were not disclosed making direct comparison difficult. A comparison of SEM values for raw scores can be seen in Table 5. SEM values for the single hop in both of these studies were lower than in the current study, whereas timed hop values were higher. Values compared well for men in the triple hop and in the crossover hop for women. Differences in SEM values were more than likely caused by the differences between SD scores across the studies, where SDs were much higher in the 2 comparison studies (4,5) than the current one. The SDs have a marked effect on the SEM values produced; this suggests that the SEM values for the current study are likely to be more accurate because SD scores were much lower. The SD values in the comparison studies were probably higher because of a higher range of scores as a result of pooling men and women despite the significant differences in performance, which has been shown previously (3) and in the current study.

Table 5:
Comparison of SEM scores between studies.

The mean raw scores for the 4 hop tests in the current study, shown in Table 6, also compare well with those found on healthy subjects in previous data (1,4,5), although direct comparison is again difficult. The results of the current study compare favorably to those conducted on patients with previous anterior cruciate ligament (ACL) injury (3,10,12,22). In these cases, we compared our results to those of the uninjured limb, although once again direct comparison can only be made with 1 of these studies. The higher scores found in the recreational athletes in the current study compared to those of the uninjured limb of individuals in previous studies may suggest that these particular individuals possess functional deficiencies, which caused them to be at a greater risk of ACL injury. However, the decreased performance could also be a bilateral deficit that is a result of the injury itself.

Table 6:
Mean and SD raw scores for all hop tests for men and women.

The 4 hop tests have been shown to detect differences between injured and uninjured limbs (3,10,17,20,22). Measures of symmetry and statistical differences between limbs have been used previously to demonstrate the differences between limbs. Limb symmetry index is a measure commonly used to assess these differences by giving a percentage value of 1 limb vs. the other. In the case of a patient going through rehabilitation, this would compare the injured against the uninjured limb, whereas in healthy subjects, it may compare the dominant and nondominant limbs. An LSI of ≥85% indicating that ‘normal’ limb symmetry exists and function of the injured limb is being restored (3). Further to the previous idea that lower mean hop scores in the uninjured limb of ACL patients may show a functional deficiency which predisposed the individuals to injury, it may also be possible to screen healthy individuals for limb symmetry to see whether there is any relationship with future injury occurrence. Mean scores for LSI for subjects of the current study were between 98 and 102%. Upon further analysis, all subjects had an overall LSI value of >85% (±15%) in all hop tests. Perhaps of more importance is that we also found all subjects to have an LSI value of >90% (±10%) for all hop tests, whereas only 1 subject had an LSI of <95% (±5%) in all 4 tests. Furthermore 40% of subjects had an LSI of ≥ 95% in all tests and at least 64% of subjects had an LSI of ≥ 95% for one hop test. These findings suggest that despite previous recommendations that LSI scores of ≥85% indicate that ‘normal’ limb symmetry exists (3), this value should in fact be increased to 90%.

The Agility T-test is commonly used to assess the ability of team sport athletes to change direction, including acceleration, deceleration, and lateral movement during preseason testing protocols. The times achieved by the participants of the current study compare well with those of subjects in other studies (8,15,19), one of which included recreational athletes (19). To our knowledge no previous study has investigated the learning affects or the between-session reliability of the Agility T-test. There was evidence of a learning affect taking place in administration of the Agility T-test; therefore, we recommend a standardized protocol that includes the use of 1 practice trial followed by 3 measured trials. The ICC, SEM, and SDD values presented in this paper will give both coaches and practitioners reference data from which they can gain greater information regarding an individual's performance.

During the administration of the triple hop for distance, we found that some male subjects were consistently achieving scores in excess of 6 m; therefore, we would recommend extending the course to 7 m in future.

Practical Applications

The most important finding of the current study is the fact that all subjects achieved an LSI score of at least 90%, despite previous suggestions that 85% LSI is adequate (3). Therefore, we recommend practitioners adopt 90% LSI as a measure of adequate symmetry between limbs during rehabilitation and conditioning. Hop tests are frequently used during rehabilitation from injury, the idea of using the hop tests for the prediction of future risk of injury warrants further investigation and was beyond the scope of this study.

Assessment of learning affects during administration of these particular tests suggests that practice trials should be allowed for a more reliable outcome to be achieved. We suggest that 3 practice trials should be allowed for the single and triple hop for distance and 4 trials for the crossover hop, whereas the timed hop requires 3 practice trials for men and 4 for women. One practice trial is adequate for the agility T-test.

Good to excellent ICC scores allow practitioners to use the hop tests and agility T-test confidently with both male and female athletes to assess lower limb function during rehabilitation and conditioning. The SEM and SDD values presented give practitioners measures that allow them to make more informed decisions about changes in an individual's hop and Agility T-test performance. The SEM values show the range in which an individual's true score is likely to lie (25), whereas SDD values allow practitioners to decide whether a change in an individual's performance is significant (13). This means that a woman's true score for the single-hop test would lie within 7.93 (as a % of leg length) of the observed score, whereas a true improvement in performance could only be considered if their score improved by 21.98 (as a % of leg length).


No financial support was received, and the authors had no conflicts of interest while undertaking this study.


1. Ageberg, E, Zätterström, R, and Moritz, U. Stabilometry and one-leg hop test have high test-retest reliability. Scan J Med Sci Sports 8: 198-202, 1998.
2. Bandy, WD, Rusche, KR, and Tekulve, FY. Reliability and limb symmetry for five unilateral functional tests of the lower extremities. Isokinet Exerc Sci 4: 108-111, 1994.
3. Barber, SD, Noyes, FR, Mangine, RE, McCloskey, JW, and Hartman, W. Quantitative assessment of functional limitations in normal and anterior cruciate ligament-deficient knees. Clin Orthop Relat Res 255: 204-214, 1990.
4. Bolgla, LA and Keskula, DR. Reliability of lower extremity functional performance tests. J Orthop Sports Phys Ther 26: 138-142, 1997.
5. Booher, LD, Hench, KM, Worrell, TW, and Stikeleather, J. Reliability of three single-leg hop tests. J Sport Rehab 2: 165-170, 1993.
6. Clark, NC. Functional performance testing following knee ligament injury. Phys Ther Sport 2: 91-105, 2001.
7. Coppieters, M, Stappaerts, K, Janssens, K, and Jull, G. Reliability of detecting ‘onset of pain’ and ‘submaximal pain’ during neural provocation testing of the upper quadrant. Physiother Res Int 7: 146-156, 2002.
8. Delextrat, A and Cohen, D. Physiological testing of basketball players: Toward a standard evaluation of anaerobic fitness. J Strength Cond Res 22: 1066-1072, 2008.
9. Eastlack, M, Axe, M, and Snyder-Mackler, L. Laxity, instability, and functional outcome after ACL injury: Copers versus non-copers. Med Sci Sports Exerc 31: 210-215, 1999.
10. Goh, S and Boyle, J. Self evaluation and functional testing two to four years post ACL reconstruction. Aus J Physiother 43: 255-262, 1997.
11. Hamilton, RT, Shultz, SJ, Schmitz, RJ, and Perrin, DH. Triple-hop distance as a valid predictor of lower limb strength and power. J Athl Train 43: 144-151, 2008.
12. Hopper, DM, Goh, SC, Wentworth, LA, Chan, DYK, Chau, JHW, Wootton, GJ, Strauss, GR, and Boyle, JJW. Test-retest reliability of knee rating scales and functional hop tests one year following anterior cruciate ligament reconstruction. Phys Ther Sport 3: 10-18, 2002.
13. Kropmans, TJB, Dijkstra, PU, Stegenga, B, Stewart, R, and de Bont, LGM. Smallest detectable difference in outcome variables related to painful restriction of the temperomandibular joint. J Dent Res 78: 784-789, 1999.
14. Lephart, S, Perrin, D, Fu, F, Gieck, J, McCue, F, and Irrgang, J. Relationship between selected physical characteristics and functional capacity in the anterior cruciate ligament insufficient athlete. J Orthop Sport Phys Therapy 16: 174-181, 1992.
15. Miller, MG, Herniman, JJ, Ricard, MD, Cheatham, CC, and Michael, TJ. The effects of a 6-week plyometric training program on agility. J Sports Sci Med 5: 459-465, 2006.
16. Neeb, T, Aufdemkampe, G, Wagener, J, and Mastenbroek, L. Assessing anterior cruciate ligament injuries: The association and differential value of questionnaires, clinical tests, and functional tests. J Orthop Sports Phys Ther 26: 324-331, 1997.
17. Noyes, FR, Barber, SD, and Mangine, RE. Abnormal lower limb symmetry determined by function hop tests after anterior cruciate ligament rupture. Am J Sports Med 19: 513-518, 1991.
18. Paterno, MV and Greenberger, HB. The test-retest reliability of a one legged hop for distance in young adults with and without ACL reconstruction. Isokinet Exerc Sci 6: 1-6, 1996.
19. Pauole, K, Madole, K, Garhammer, J, Lacourse, M, and Rozenek, R. Reliability and validity of the T-test as a measure of agility, leg power, and leg speed in college-aged men and women. J Strength Cond Res 14: 443-450, 2000.
20. Petschnig, R, Baron, R, and Albrecht, M. The relationship between isokinetic quadriceps strength test and hop tests for distance and one-legged vertical jump test following anterior cruciate ligament reconstruction. J Orthop Sports Phys Ther 28: 23-31, 1998.
21. Rankin, G and Stokes, M. Reliability of assessment tools in rehabilitation: An illustration of appropriate statistical analyses. Clin Rehab 12: 187-199, 1998.
22. Reid, A, Birmingham, TB, Stratford, PW, Alcock, GK, and Giffin, JR. Hop testing provides a reliable and valid outcome measure during rehabilitation after anterior cruciate ligament reconstruction. Phys Ther 87: 337-349, 2007.
23. Ross, MD, Langford, B, and Whelan, PJ. Test-retest reliability of 4 single-leg horizontal hop tests. J Strength Cond Res 16: 617-622, 2002.
24. Semenick, D. Tests and measurements: The T-test. Strength Cond J 12: 36-37, 1990.
25. Thomas, JR, Nelson, JK, and Silverman, SJ. Research Methods in Physical Activity. Champaign, IL: Human Kinetics, 2005.

functional performance tests; ACL; outcome measure

© 2011 National Strength and Conditioning Association