Journal Logo

APPLIED SCIENCES: Physical Fitness and Performance

Reliability of Time-to-Exhaustion versus Time-Trial Running Tests in Runners


Author Information
Medicine & Science in Sports & Exercise: August 2007 - Volume 39 - Issue 8 - p 1374-1379
doi: 10.1249/mss.0b013e31806010f5
  • Free


Both time-to-exhaustion and time-trial exercise test protocols are commonly used to examine the influence of experimental interventions on endurance performance. An exercise test performed at a constant speed or power output is often termed a time-to-exhaustion test. Traditionally for this test, subjects are requested to perform to exhaustion at specific submaximal exercise intensities until they can no longer maintain the required work rate (i.e., speed or power output) (4,20). Conversely, a time-trial test is defined as an endurance performance test with a known endpoint. In this test format subjects are required to either complete a set distance in as fast a time as possible, or complete as much work as they can within a given time period. Moreover, subjects are usually made aware of the trial distance or duration in a time-trial so that they can adjust their work output in order to pace themselves towards this known endpoint (1).

Central to the administration of a meaningful physiological performance test is the concept of reproducibility or reliability, which gives indication of how well a physical performance test is able to give the same result repeatedly (3,12,14). Debate over the reliability of time-to-exhaustion and time-trial tests has received increased attention. Jeukendrup and colleagues (15) were the first to provide evidence that the variability of time-to-exhaustion tests was greater than that of time-trial tests. The authors drew this conclusion from different groups of subjects performing varying types of exercise protocols (15). Since then, a meta-analysis by Hopkins et al. (14) also has shown time-to-exhaustion tests to be more variable than time-trial tests. These authors argue, however, that the apparent poor reliability of time-to-exhaustion tests was an artifact of the relationship between power output and exercise duration (14). Using the relationship between power output and exercise duration, Hinckson and Hopkins (10) recently have shown how log-log modeling can be used to predict time-trial performance using a series of time-to-exhaustion tests. In this study, however, the authors do not validate their prediction equations using time-trials, and the authors' interpretation of their data has been debated (2,10,11,13,16). To date, there are no studies that have directly compared the reliability of time-to-exhaustion versus time-trial exercise tests of similar intensity and duration in the same subjects, and the log-log time-trial prediction equations of Hinckson and Hopkins (10) require validation.

The main purpose of the present study was to examine the reliability of comparable time-to-exhaustion and time-trial treadmill running tests in trained male distance runners. Because there is some debate in the literature that reliability may be altered with differences in exercise intensity and duration (2,14), we chose to examine reliability using a high (1500-m time-trial pace) and moderately high (5-km time-trial pace) exercise intensity (and distance/duration). Comparison of the reliability of test protocols over these different exercise distances also allowed us to validate the log-log time-trial prediction equations of Hinckson and Hopkins (10) with actual time-trial performances.



Eight endurance-trained male distance runners (age: 31 ± 6 yr; weight: 70.4 ± 6.9 kg, V˙O2max: 61 ± 8 mL·kg−1·min−1) from the local university community and local running clubs volunteered for this study. Subjects were required to have a history of completing at least one 5-km running race in less than 20 min. All study procedures were approved by the university's central human research ethics committee, and subjects gave their written informed consent to participate before commencement of the study.

Procedures and protocol.

Before each trial, subjects consumed the same breakfast and reported to the laboratory (22°C, 50% relative humidity) 2 h postprandially, having consumed one glass (~250 mL) of water 30 min before arrival. Subjects were instructed to consume 6 g of carbohydrate per kilogram of body weight on the day before each test, and to avoid strenuous exercise in the 24-h period before all tests. All tests were separated by 2-5 d and conducted at the same time of day (± 1 h). For each trial, subjects wore the same exercise clothing. Tests were completed on a motorized running treadmill (Trackmaster TM500E, JAS Manufacturing; Carrollton, TX), where a 35-cm fan was placed 1 m in front of the subject and provided a wind speed of 3.6 m·s−1. A standardized warm-up was performed before each test, whereby subjects were required to run on the treadmill at 10 km·h−1 for a period of 5 min and then at 14 km·h−1 for 3 min. The subject was then allowed 5 min to either stretch or to continue warming up (jogging or walking) before the test started.

To reduce the impact that learned effects have on test results (17), subjects first completed familiarization trials for both 1500-m and 5-km distances. Subjects were given the option of completing both familiarization trials on the same day, each separated by 30 min of rest, or completing the familiarization trials on separate days. For all trials, gradient was set at 0% and visual feedback provided to the subjects was limited to running speed and distance traveled only. Throughout the trials subjects were not provided with any reference to time. Further, no routine measurements were made, and no music, fluids or food were permitted during any of the trials.

Performance tests.

In total, subjects completed two 5-km and two 1500-m time-trials, as well as four time-to-exhaustion trials; two at the equivalent mean 5-km running speed and two at the equivalent mean 1500-m running speed. Tests were conducted in a semi-counterbalanced and randomized order. The study design was semicounterbalanced because the initial 5-km and 1500-m time-trials needed to be conducted first, to determine running speed for the time-to-exhaustion tests. Throughout the time-trials, subjects were permitted to adjust their speed as required and were asked to complete the distance in as fast a time as possible. During the time-to-exhaustion test, subjects were instructed to run for as long as possible at their individually prescribed running speeds. The speed chosen for each respective time-to-exhaustion test was determined as the average speed achieved during the subject's first 5-km and 1500-m time-trials.

Data analysis.

Differences in performance times and physiological variables for each specific test were examined first using a Student's t-test for paired samples. The mean coefficient of variation (standard deviation/mean × 100%) was calculated for the two performance times of each specific test. Statistical analysis was conducted on SPSS version 13.0 for Windows and significance was accepted at an alpha level of 0.05.

A spreadsheet for analyzing reliability (available at was used to further examine the reliability of consecutive time-to-exhaustion and time-trial tests at high and moderately high running speeds. This spreadsheet determined the mean difference (change in mean), intraclass correlation coefficient, limits of agreement, and the typical error of the estimate as a coefficient of variation between consecutive trials.

Following the methods of Hinckson and Hopkins (10), we used the relationship between the logarithmic value of running speed and the logarithmic value of time from each subject's four time-to-exhaustion tests to determine the standard error of the estimate for the prediction of time-trial time for each subject. We averaged the standard error of the estimate of the subjects and corrected by a factor of 1 + 1/(4 × 2 df) (6). We then compared the log-log predicted time-trial times from these regression equations with that of the logarithmic value of the actual time-trial performance (N = 16 for both 5-km and 1500-m time-trials) and calculated the average standard error. We validated these methods (10) using the same measures of reliability previously described from a spreadsheet for analyzing reliability (available at We expressed our data as means and 95% confidence limits (using a spreadsheet available at, except where indicated.


There were no significant differences between the first and second tests of the 5-km time-trial, 5-km time-to-exhaustion, 1500-m time-trial and 1500-m time-to-exhaustion tests (Table 1; all P > 0.05). Although the mean ± SD time of the two 5-km time-trials (1175 ± 101 s) tended to be longer than the mean ± SD of the two 5-km time-to-exhaustion times (1085 ± 175 s; P = 0.105), these times was not significantly different. Conversely, the mean ± SD of the two 1500-m time-trial times (314 ± 29) was significantly less than the mean ± SD of the two 1500-m time-to-exhaustion times (391 ± 58; P < 0.05). The mean ± SD coefficient of variation for the 5-km time-trial time, 5-km time-to-exhaustion time, 1500-m time-trial time and 1500-m time-to-exhaustion time were 1.7 ± 1.2, 11.2 ± 7.4, 2.6 ± 1.8, and 10.2 ± 10.1%, respectively. The mean ± SD difference between the second and first trials, intraclass correlation coefficient, limits of agreement, and typical error of the estimate (± 95% CL) between the various test protocols and distances are shown in Table 1 and Figure 1.

Mean (± SD) performance times for the first and second 5-km and 1500-m time-trial (TT) and time-to-exhaustion (TTE) tests, as well as the mean difference between the second and first trials, the intraclass correlation coefficient (ICC), limits of agreement (LOA), and typical error of measurement (TEM) as a coefficient of variation (CV; 95% CL).
Time difference of the second trial from the first trial in relation to the first trial (trial 2-1 vs trial 1) for the 5-km time-trial (TT) (A), 5-km running speed time-to-exhaustion (TTE) (B), 1500-m TT (C), and 1500-m running speed TTE tests (D).

Using the log-log modeling methods of Hinckson and Hopkins (10) to predict time-trial performance, we determined that the standard error of the estimate for predicted time-trial running speed was 0.67%. Comparison of our log-log time-trial predictions with actual time-trial performance using the same measures of reliability previously described (available at revealed good agreement between predicted and actual 5-km and 1500-m time-trial performances. However, the prediction of actual 5-km time-trial time was more precise compared with the prediction of actual 1500-m time-trial time (Table 2, Fig. 2).

Mean (± SD) performance times for the actual and log-log predicted 5-km and 1500-m time-trial tests, as well as the mean difference between the actual and predicted time-trial time, the intraclass correlation coefficient (ICC), limits of agreement (LOA), and typical error of measurement (TEM) as a coefficient of variation (CV; 95% CL).
Time difference of the log-log predicted time-trial times (10) minus actual time-trial times (predicted-actual vs actual) for the 5-km(A)and 1500-m (B) time-trials (TT).


The aims of the present study were to evaluate the reliability of comparable time-to-exhaustion versus time-trial running tests in trained runners, and to validate Hinckson and Hopkins' (10) log-log time-trial prediction methods using multiple time-to-exhaustion tests. The study has shown 1) a greater level of absolute reliability (less variability) of time-trial compared with time-to-exhaustion running tests, and 2) good reliability for Hinckson and Hopkins (10) log-log modeling methods for predicting time-trial performance from time-to-exhaustion tests.

The first important finding of the present study was the lower level of absolute reliability shown for the completion time of constant speed time-to-exhaustion running tests compared with variable self-selected speed running time-trials (Table 1). Although there was no significant difference between the first and second time-trial or time-to-exhaustion tests, the typical error of the estimate was consistently higher for the time-to-exhaustion compared with the time-trial tests (Table 1, Fig. 1). The difference in variability between trial protocols was apparent despite our best effort at creating tests of comparable intensity and using design controls such as familiarization tests and semi-randomization of the test order. Although previous studies have shown evidence that time-trial test performance is less variable compared with time-to-exhaustion tests (14,15), this is, to our knowledge, the first study to directly compare the same subjects during analogous test conditions. In light of these previous findings (14,15) and data from the present study (Table 1, Fig. 1), we can be confident in concluding that the absolute level of repeatability of time-trial tests is lower than that of time-to-exhaustion tests.

The lower variability found in the time-trial compared with time-to-exhaustion tests may be attributable to the fact that consequences of fatigue, boredom and lack of motivation may be more dramatic and influential on performance during a time-to-exhaustion protocol compared with a time-trial test (2). During time-trial tests, athletes are able to increase or decrease their exercise intensity according to interactions between their perception of fatigue and external motivational cues (7). In time-to-exhaustion tests, however, exercise intensity is fixed; as an athlete experiences increases in their perceptions of fatigue, he or she is only left with the choice of continuing or stopping the trial completely. This increased variability in time-to-exhaustion protocols should be appreciated by exercise scientists when designing exercise tests suitable to the requirements of their research. This in no way implies that a time-to-exhaustion test is not useful. Indeed, such a test protocol may be of an advantage to the researcher compared with time-trial exercise tests when assessment of exercise capacity at a steady rate of exercise is sought. Time-trial tests permit fluctuation of exercise intensity and may therefore add "noise" to the measurement of physiological markers (10). Jeukendrup and Currell (16) argue, however, that "pacing strategy is an inherent component of real performance rather than something that should be omitted from a performance test." With respect to findings from the present study however, the issue of pacing noise (11) actually may have been responsible for the inaccurate assessment of endurance ability within the 1500-m time-trial. Although the 5-km time-trial was not different compared with the time-to-exhaustion test performed at the same mean running speed, the runners were able to run for 18% longer at their mean 1500-m running speed (P < 0.05), and prediction of 1500-m time-trial performance using log-log predictions was not as reliable as with the 5-km distance (Table 2, Fig. 2). This may have occurred because of a lack of experience at running the shorter distance, because most subjects were experienced long-distance runners that routinely participated in road races ranging in distance from 5-km to the half-marathon.

Hopkins et al. (14) define reliability as the consistency or reproducibility of performance when someone performs a test repeatedly. Thus, reliability gives an indication of how well a physical performance test is able to give the same result repeatedly. Through basic definition then, reliability of the time-to-exhaustion test is less than that of a time-trial test. What has been less described, however, is the issue of rationalizing whether the level of measurement error found is acceptable. Atkinson and Nevill (3) note that "reliability could be considered as the amount of measurement error that has been deemed acceptable for the effective practical use of a measurement tool." In this respect, time-to-exhaustion and time-trial tests might be considered to be different measurement tools, each having its own level of acceptable noise or variability. In fact, it has been argued that the signal-to-noise ratio may be greater in time-to-exhaustion versus time-trial tests (11,14). By noise, Hinckson and Hopkins (11) refer to the variability of the measurement, whereas by signal, they refer to the change that can be witnessed by the measurement. To illustrate, whereas the variance in a time-trial is generally low (Table 1, Fig. 1), the change in time-trial performance after a training intervention is usually also relatively small (usually < 5% (18)). Conversely, time-to-exhaustion tests have a higher variability (Fig. 1) but can elicit large changes after interventions (> 20% (4,20)). The reason this is so is because small changes in the ability to output power result in larger changes in time-to-exhaustion tests (14). Perhaps a reliable test should instead be defined as one in which the described measurement error is judged to be acceptable on the basis of its sensitivity for detecting real changes. Research is needed to examine the reliability and subsequent change in time-trial and time-to-exhaustion exercise tests to determine the protocol that elicits the best possible signal-to-nose ratio.

The second important finding in the present study was the validation of Hinckson and Hopkins (10) log-log modeling methods for predicting time-trial performance from time-to-exhaustion tests. Our results showed a standard error of measurement of 0.67% for the prediction of time-trial speed from time-to-exhaustion tests, and a typical error of the estimate of 1.6 and 2.5% between actual and predicted 5-km and 1500-m time-trials, respectively (Table 2, Fig. 2). These good measures of reliability were found despite the nonoptimal time-to-exhaustion tests used for determining prediction equations from regression analysis (data from the present study was collected before the publication of Hinckson and Hopkins (10)). It should be mentioned, however, that the log-log predictive equations were less reliable at predicting actual 1500-m time-trial performance (Table 2, Fig. 2), but this observation is likely an artifact of the 18% longer time subjects ran for at their mean 1500-m time-trial speed (P < 0.05), and was again likely attributable to inexperience at running the shorter time-trial distance. The log-log predictive methods for determining time-trial performance in the present study might also be improved by using time-to-exhaustion exercise durations in the 1- to 10-min range (9). Despite the nonoptimal exercise duration used for our log-log time-trial prediction equations, our findings do illustrate the equivalency of reliability in both the time-trial and time-to-exhaustion test, as originally predicted by Hopkins et al. (14).

Conjecture surrounds the choice of endurance performance protocol in exercise science research (2,10,11,13,16). Some authors believe that there is little external validity in the ability of time-to-exhaustion test protocols to assess endurance performance ability (16). That is, there may be a minimal number of real-life exercise performance scenarios where an athlete exercises at a constant intensity until voluntarily stopping. However, it should be mentioned that not all endurance events are individual time-trials completed at self-selected exercise intensities. Indeed, during a number of athletic events, exercise intensity may be externally controlled inadvertently through competitors or other team members. For example, during mass-group-start races (i.e., running and cycling), athletes may be required to maintain high exercise intensities for a prolonged period of time in an attempt to stay with race leaders. Similarly, during various competitions, professional athletes may also attempt to perform trials at a specific exercise intensity necessary to finish within the medalists or at a personal best pace (5). Although, these events are not strictly time-to-exhaustion tasks, such tactics often result in premature fatigue and consequential exhaustion before completion of the race. Moreover, it should also be mentioned that one of the best predictors of endurance performance ability is that of a progressive exercise test (8,19), which could be considered to be a type of time-to-exhaustion test (14).

Some limitations within the present study warrant mention. First, subjects performing the time-trial test in the present study needed to manually increase or decrease their speed dependant on their perception of their ability to run faster or slower. Technology is now available whereby treadmill speed can be automatically controlled based on the subject's position on the treadmill. Such technology may reduce the error of measurement when subjects perform running time-trials. Second, time-to-exhaustion trials are typically run at lower submaximal speeds to those used in the present study. It is not clear how reliability of time-to-exhaustion trials might change with increasing duration, but some evidence suggests that it may be reduced by external factors and boredom (2,14). It should be mentioned, however, that no difference in time-to-exhaustion variability was shown within the longer or shorter trials in the present study (Table 1, Fig. 1). Third, the present study used only males to avoid possible hormonal differences attributable to the menstrual cycle. The influence of gender and/or menstrual cycle phase on variability in time-to-exhaustion and time-trial performance is not known.

In summary, the present study has shown a lower level of absolute reliability for the completion time of constant speed time-to-exhaustion running tests compared with variable self-selected speed running time-trial tests in trained runners. We have also presented calculations that validate Hinckson and Hopkins (10) log-log time-trial prediction methods using serial time-to-exhaustion tests, and we have shown good agreement between predicted and actual time-trial performance. The differences between time-trial and time-to-exhaustion tests should be appreciated by researchers when designing studies to assess changes in exercise performance after a training intervention.

The authors thank the subjects of this study for their enthusiastic participation. We are particularly grateful for the extensive comments of the reviewers of the manuscript and the Associate Editor, who served to improve our analysis, interpretation, and presentation of the data.


1. Albertus, Y., R. Tucker, A. St Clair Gibson, E. V. Lambert, D. B. Hampson, and T. D. Noakes. Effect of distance feedback on pacing strategy and perceived exertion during cycling. Med. Sci. Sports Exerc. 37:461-468, 2005.
2. Atkinson, G., and A. Nevill. Mathematical constants that vary? Med. Sci. Sports Exerc. 37:1822, 2005.
3. Atkinson, G., and A. M. Nevill. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 26:217-238, 1998.
4. Denadai, B. S., and M. L. Denadai. Effects of caffeine on time to exhaustion in exercise performed below and above the anaerobic threshold. Braz. J. Med. Biol. Res. 31:581-585, 1998.
5. Foster, C., J. J. deKoning, F. Hettinga, et al. Effect of competitive distance on energy expenditure during simulated competition. Int. J. Sports Med. 25:198-204, 2004.
6. Gurland, J., and R. C. Tripathi. A simple approximation for unbiased estimation of the standard deviation. Am. Stat. 25:30-32, 1971.
7. Hampson, D. B., A. St Clair Gibson, M. I. Lambert, and T. D. Noakes. The influence of sensory cues on the perception of exertion during exercise and central regulation of exercise performance. Sports Med. 31:935-952, 2001.
8. Hawley, J. A., and T. D. Noakes. Peak power output predicts maximal oxygen uptake and performance in trained cyclists. Eur. J. Appl. Physiol. 65:79-83, 1992.
9. Hill, D. W. The critical power concept. A review. Sports Med. 16:237-254, 1993.
10. Hinckson, E. A., and W. G. Hopkins. Reliability of time to exhaustion analyzed with critical-power and log-log modeling. Med. Sci. Sports Exerc. 37:696-701, 2005.
11. Hinckson, E. A., and W. G. Hopkins. Should time trial performance be predicted from several serial time-to-exhaustion tests? Response. Med. Sci. Sports Exerc. 37:1821, 2005.
12. Hopkins, W. G. Measures of reliability in sports medicine and science. Sports Med. 30:1-15, 2000.
13. Hopkins, W. G., and E. A. Hinckson. Mathematical constants that vary? Response. Med. Sci. Sports Exerc. 37:1823, 2005.
14. Hopkins, W. G., E. J. Schabort, and J. A. Hawley. Reliability of power in physical performance tests. Sports Med. 31:211-234, 2001.
15. Jeukendrup, A., W. H. Saris, F. Brouns, and A. D. Kester. A new validated endurance performance test. Med. Sci. Sports Exerc. 28:266-270, 1996.
16. Jeukendrup, A. E., and K. Currell. Should time trial performance be predicted from three serial time-to-exhaustion tests? Med. Sci. Sports Exerc. 37:1821, 2005.
17. Laursen, P. B., C. M. Shing, and D. G. Jenkins. Reproducibility of a laboratory-based 40-km cycle time-trial on a stationary wind-trainer in highly trained cyclists. Int. J. Sports Med. 24:481-485, 2003.
18. Laursen, P. B., C. M. Shing, J. M. Peake, J. S. Coombes, and D. G. Jenkins. Interval training program optimization in highly trained endurance cyclists. Med. Sci. Sports Exerc. 34:1801-1807, 2002.
19. Noakes, T. D., K. H. Myburgh, and R. Schall. Peak treadmill running velocity during the VO2max test predicts running performance. J. Sports Sci. 8:35-45, 1990.
20. Wilber, R. L., and R. J. Moffatt. Influence of carbohydrate ingestion on blood glucose and performance in runners. Int. J. Sport Nutr. 2:317-327, 1992.


©2007The American College of Sports Medicine