Secondary Logo

Journal Logo

Original Research

Validity and Reliability of the PowerCal Device for Estimating Power Output During Cycling Time Trials

Costa, Vitor P.1; Guglielmo, Luiz G.A.2; Paton, Carl D.3

Author Information
Journal of Strength and Conditioning Research: January 2017 - Volume 31 - Issue 1 - p 227-232
doi: 10.1519/JSC.0000000000001466
  • Free



The frequency and duration of the training sessions in cycling are components of the training program that are relatively easy to control; however, the measurement of exercise intensity is more complex because of the stochastic nature of cycling (13). Cycling is a sport where it is possible to measure power output (PO) directly through the use of mobile power meters, which have become available in the past decade (11). Most competitive cyclists are not professionals, and they normally use an affordable device to monitor and control training sessions. Heart rate (HR) monitors are affordable and provide useful information about the exercise intensity during aerobic training (2,8). However, HR scores may not reflect the demands of the activity because of hot environmental conditions, dehydration states, level of the ground, and sprint activities (2,13).

Recently, a new device called the PowerCal (PowerTap, Madison, WI, USA) was developed, which estimates PO by measuring the HR response during exercise (10). Essentially, the PowerCal estimates PO using an algorithm developed to diminish the variations between PO and HR during exercise. Thus, it is believed that PowerCal predicts accurate PO for aerobic intervals (i.e., ≤maximal HR). Also, the device costs lower (∼99 USD) than the other available power meters (∼700–3,000 USD) to detect the magnitude of the power level on a given ride. Furthermore, the PowerCal device is simple to use (i.e., HR strap) and does not require any calibration before use. Collectively, these features seem to be an advantage for competitive cyclists to monitor training sessions and racing.

To date, there are very few studies assessing the validity and reliability of the PowerCal device. In one recent study examining the PowerCal PO in cycling, Costa et al. (10) reported that the PowerCal PO during sprints of 15, 30, and 45 seconds showed a high within-subject variation (6.7–21.5%) in well-trained cyclists. Also, the authors found a high bias ranging from 32 to 129 W and limits of agreement ranging from −236 to 303 W between the PowerCal and the cycle ergometer Velotron. In addition, the maximal and mean POs during sprints were underestimated by 6.6–13.9% and 14.9–27.6%, respectively, when compared with the Velotron. It is not clear how the algorithm formula estimates PO for anaerobic stimuli. HR is not recommended to control such intensity because of a natural slow response in the first seconds of the maximal sprints exercise (2,8). In this regard, it is not surprising that predictive values of PowerCal during sprints would give inaccurate values compared with the Velotron because of rapid and almost instantaneous PO responses of the cycle ergometer. Thus, from the first perspective, the estimates of the PO of the PowerCal in short maximal activities is dubious (10); however, because of a linear relationship between HR and PO during aerobic exercise (3), it is hypothesized that the PowerCal estimates accurate scores of PO compared with the cycle ergometer.

A common measurement of cycling performance is the time trial (TT) races (16). In a TT competition, cyclists race maximally against the clock, completing a pre-established distance in the shortest time possible. Most TT events cover a distance between 5 and 60 km and are performed individually (5) except in grand tour and velodrome events that include team TT. The distribution of PO can be influenced by a number of factors, such as training, nutritional strategies, aerodynamics, ground level of the terrain, environmental conditions, and psychological factors (1,5). The mathematical models of cycling performance showed that athletes should vary the distribution of PO in response to changes in environmental resistance, as found in the hilly TT (7). Studies investigating the capability of the PowerCal to estimate PO during hilly TT were not found and could provide important insights showing the level of accuracy of the PO over a task from the beginning to the end point. Therefore, the aim of this study was to determine the validity and reproducibility of the PowerCal device for estimating PO during hilly cycling TTs.


Experimental Approach to the Problem

To investigate the validity and reproducibility of the PowerCal device for estimating PO during hilly cycling TTs, an incremental exercise test and 3 TTs were performed using the reliable and valid cycle ergometer Velotron (Velotron Dynafit Pro; RacerMate Inc., Seattle, WA, USA) (4,9,15,17,20,21). To this purpose, 21 well-trained men cyclists participated in 4 different sessions. In the first session, the subjects performed an incremental exercise test. In the second session, cyclists were familiarized with the self-paced TT, whereas in the third and fourth sessions, they also performed a self-paced TT using the PowerCal device.


Twenty-one well-trained men cyclists (34.1 ± 10.6 years; 73.2 ± 3.2 kg, 176.8 ± 6.2 cm; maximal PO, 334 ± 31 W; maximal oxygen uptake, 61.0 ± 4.2 ml·kg−1·min−1) volunteered to participate in this study. The cyclists had a minimum amount of experience from 3 years of regular competitions. The study was performed in the competitive season after a period of base and precompetition training. Because of the nature of each cyclist's competition program, it was not possible to control their individual training leading up to the study. However, immediately before (2 weeks) the start of the study, cyclists were completing individual self- or coach-determined training regimes consisting of a minimum of 10 hours (300 km) mixed intensity training per week. At the start of laboratory testing, cyclists were required to be in a well-prepared and nonfatigued state. All cyclists were informed of the purpose and the risks associated with participation before giving their written informed consent to participate. The study was approved by the institutional research ethics committee in accordance with the declaration of Helsinki and ACSM guidelines.


All cyclists had previously participated in laboratory cycle ergometer testing and were familiar with general exercise testing procedures. Cyclists reported to the laboratory on 4 separate occasions over a period of 2 weeks. During the initial visit to the laboratory, cyclists completed an incremental exercise test until volitional exhaustion to determine peak PO and maximal oxygen uptake. After the initial test, cyclists completed three 20-km cycling TT sessions, each separated by at least 72 hours. All testing sessions were conducted on an electronically braked cycle ergometer Velotron set up to replicate the cyclists' individual bicycle position. Furthermore, during the tests, cyclists used the PowerCal (Cyclops), a HR strap that calculates PO from the HR response recorded during exercise. All testing was conducted in a laboratory under controlled environmental conditions. Air temperature and relative humidity were 20.4 ± 1.7 ° C and 53.1 ± 2.9%, respectively.

Incremental Exercise Test

Cyclists completed an incremental exercise test until volitional exhaustion to determine their physiological parameters. The cycle ergometer was adjusted to replicate the participant's preferred racing position, which was recorded and replicated for the performance sessions. All tests were conducted on an electronically braked cycle ergometer. Cyclists performed a 15-minute warm-up at a self-selected intensity followed by 5 minutes of rest. Thereafter, the incremental exercise test started at 100 W and PO was increased at a rate of 40 W every 4 minutes until volitional exhaustion. The participants were instructed to maintain their preferred cadence (∼90 rpm). If the final stage of the exercise test was not completed, the peak power output (PPO) was calculated using the equation of (14):

where Pf is the last completed workload, t is the time in seconds of the uncompleted workload, 240 is the time of each stage in seconds, and 40 is the workload augments in each stage. The expired respiratory gases were collected into a metabolic Metamax 3B system (Cortex, Leipzig, Germany). Before each test, the system was calibrated in accordance with the manufacturer's instructions using known alpha gas standards. V̇o2max was defined as the highest oxygen uptake over a 30-second value recorded during the test. The HR was registered using a coded strap and recorded by the metabolic system.

Time Trials

Cyclists completed 3 hilly cycling TTs on a computer-simulated 20-km 3D course using the same cycle ergometer as used previously in the incremental exercise test. The cycle ergometer allowed the cyclists to manually shift the gears attached in the handle bar while pedaling. For each test, the Velotron was connected to a laptop computer interfaced with a projector that displayed the computer-generated image of the 3D course profile in front of the cyclist. The screen in front of the cyclist gives information about the current speed, power, cadence, HR, draft, distance, and time. In our study, we blinded the main variables to prevent pacing strategy. Participants were able to view their progress over the course on a computer monitor and were provided with information on distance completed and gear selections. Cyclists initially completed a 20-minute standardized warm-up consisting of 3 repeated increasing-intensity bouts. The first 2 minutes were completed at 2–2.5 W·kg−1, followed by 2 minutes at 3–3.5 W·kg−1, and finally 1 minute at 4–4.5 W·kg−1 and then repeated consecutively. For the final 5 minutes, cyclists pedaled at a fixed intensity of 100 W. Thereafter, a 20-km self-paced maximal TT was performed. The first TT was used for familiarization. The TT was completed on a designed course, which replicated a typical racing circuit and contained numerous changes in gradient represented by both ascents and descents (9). The total elevation gain over the 20 km was 300 m, leading to an average gradient of ∼1.5% (9). Furthermore, no verbal encouragement was provided that could possibly interfere with an individual's pacing effort. Participants were requested to complete each TT as quickly as possible with no restriction on gear selection, cadence, or cycling posture (seated or standing). Cyclists used the PowerCal HR strap. Experimental sessions were conducted at the same time of day for each individual to allow for diurnal variation and were separated by at least 72 hours. Moreover, cyclists were required to present themselves in a hydrated and non–carbohydrate-depleted state. Throughout the experimental sessions, cyclists were cooled with standing floor fans and permitted to consume only water ad libitum.

Statistical Analyses

Descriptive statistics are presented as mean (±SD). To display the PO pattern response during the TT, PO from the Velotron and the PowerCal were averaged for each interval of 1 km. Power output during the trials was compared using a 2-way analysis of variance with repeated measures, with TT session (1 vs. 2) and distance duration (in kilometers). When necessary, subsequent post hoc comparisons were made using Bonferroni correction. Power output of the Velotron vs. PowerCal was compared from the mean scores of the 2 TT sessions from each device. Two-way analysis of variance with repeated measures was used across the 2 devices (Velotron and PowerCal) and distance duration (in kilometers) with subsequent post hoc comparisons (Bonferroni) when necessary. The bias and limits of agreement (LoAs) of the mean PO differences between the PowerCal and Velotron were defined using the method of Bland and Altman (6). Reliability was quantified using the typical error between successive trials as percentages of coefficient of variation (CV) and intraclass coefficient of correlation (ICC) derived from log-transformed data (12). The CIs were fixed at 90%. Statistical significance was accepted at p ≤ 0.05.


The Velotron and PowerCal mean ± SD for the test-retest mean PO, CV, and ICC values are presented in Table 1. There were no significant differences between Velotron POTT1 vs. POTT2 (p = 0.07). However, significant differences were found between PowerCal POTT1 vs. POTT2 (p = 0.02). The mean PO from the Velotron was significantly higher than the PowerCal (p = 0.0001). Low scores were found for the mean PO of the Velotron's CV (1.8%) and high scores for the mean PO of the PowerCal's CV (4.9%). High values of ICC were found for the mean PO between the TT under the Velotron (0.96) and PowerCal (0.82), respectively.

Table 1.
Table 1.:
Mean power output, coefficient of variation, and intraclass coefficient of correlation of the PowerCal and Velotron during the time trials.*

High CV scores for the Velotron test-retest were found to be concentrated in the beginning and final meters of the TT (∼6.0%), whereas the CVs were lower in the middle of the trials (∼3.0%) (Figure 1A). In contrast, the PowerCal CV test-retest achieved high scores (∼6.0%) in each kilometer over the TT (Figure 1B).

Figure 1.
Figure 1.:
Velotron and PowerCal coefficients of variation during a 20-km hilly time trial.

The PO during each kilometer of the 20-km cycling test for the Velotron vs. PowerCal is shown in Figure 2. Regarding the distribution of PO from the Velotron, cyclists adopted a fast start with the first 4 km above the mean PO (282 ± 27 W); the remaining distance was below mean PO, except in the sixth, 10th, and final 2 km of the test. The PO in each kilometer over the test was significantly (p < 0.01) higher for the Velotron (5.8–23.4%) than that produced using the PowerCal device. The PowerCal distribution of PO showed a higher PO above the mean PO (242 ± 28 W) in the first, 11th, and final 3 km of the test.

Figure 2.
Figure 2.:
Velotron and PowerCal distribution of power output during a 20-km hilly time trial.

The Bland-Altman plots of mean PO between the Velotron vs. PowerCal are presented in Figure 3. The data showed that the mean PO bias was 32 W; and LoA was ranging from −51 to 115 W.

Figure 3.
Figure 3.:
Bland-Altman plots between the mean power output of the Velotron and PowerCal.


The aim of this study was to assess the reliability and validity of PO distribution from the PowerCal device during a 20-km hilly cycling TT in competitive cyclists. The finding of this study was that the mean PO of the PowerCal showed high within-subject variation (4.9%). Also, high CV scores were found during each kilometer between the TT test-retest. The mean PO of the PowerCal was underestimated by 15% compared with the Velotron cycle ergometer. In addition, the PowerCal PO in each kilometer over the TT was underestimated by 5.8–23.4%. The mean PO showed higher bias and LoA between the PowerCal and Velotron. Therefore, the PowerCal device displayed poor reliability, and the hypothesis of this study was refused because the device is unlikely to be valid during hilly cycling TTs.

The time course of the reliability of mean PO during hilly TTs investigated in this study was previously demonstrated in a group of competitive cyclists (9). Clark et al. (9) found that mean PO reliability from the Velotron was close to 2% when a short period between the trials was used; however, they reported a substantial decrease in reliability of cycling performance when increasing time between trials (∼3.5%). In our study, the mean PO showed a similar low within-subject variation (1.8%) for the Velotron test-retest. Sporer and McKenzie (17) evaluated the reliability of a 20-km TT using the Velotron and found a CV for the mean PO of 2.1% between trials 1 and 2 and a CV of 1.9% between trials 2 and 3. In a further study, Zavorsky et al. (21) found a little higher CV of 3.4% between trials 1 and 2, and a CV of 3.6% between trials 2 and 3 for a repeated 20-km TT using the Velotron. However, for the 8 top performers, the reliability improved (a CV of 1.2%). Noreen et al. (15) tested an 8-mile uphill TT using the Velotron and reported a CV for the mean PO of 3.5% between trials 1 and 2, and a CV of 1.7% between trials 2 and 3. In contrast to the previous studies using the Velotron, the reliability of the mean PO from the PowerCal device was poor with a high CV score of 4.9%. To the best of our knowledge, we found only one study from our group regarding the reliability of the PowerCal device in cycling (10). Costa et al. (10) found that the mean sprint PO estimation from the PowerCal was also unreliable for maximal intervals of 15, 30, and 45 seconds with CV scores of 13.5, 6.5, and 7.0%; respectively. Taken together, the studies above clearly reported that a short period of cycling TT test-retest using the Velotron in a simulation of a variety of terrains is reliable in well-trained cyclists, whereas the PowerCal device is nonreproducible.

Most of the studies that have investigated the reproducibility of the performance measures during cycling TT have focused on the mean PO and time to complete the trial without analyzing the pacing strategy during the tests. Thomas et al. reported a high degree of variability at the start and finish of the 20-km cycling TT using the Velotron. Moreover, the authors found a parabolic “U” shape curve for the PO that means a fast start and high end spurt. According to this, we found high CV values for the Velotron during the start and finish of the trials combined with high PO in the beginning and in the final meters of the trials. Therefore, it was demonstrated in our study that the cyclists displayed a fast start followed by a slow decrease in PO with a high end spurt. In contrast, the CV scores of the PowerCal were higher over the test. Furthermore, the PowerCal distribution of PO over the test was nearly flat with a slight increase in the final meters of the trials.

The mean PO estimated from the PowerCal (242 ± 28 W) was significantly lower than the cycle ergometer (282 ± 27 W; p < 0.001). During each kilometer of the performance test, the PowerCal distribution of PO was significantly lower than the Velotron (5.8–23.4%; p < 0.01). Moreover, the mean PO of the tests showed a high bias and LoA between the device and the cycle ergometer (Figure 3). Costa et al. reported that the estimates of PO from PowerCal during short sprints in cycling were significantly lower than the Velotron ranging from 6.6 to 27.6%. The authors found a high bias (32–129 W) and LoA (236–303 W) between the peak and mean PO of the PowerCal and Velotron. Taken together, the significant differences found in the PO values combined with the high bias and LoA of the PO between the PowerCal and the Velotron showed that PowerCal is unlikely to be valid during a hilly TT and during short maximal sprints of 15, 30, and 45 seconds in well-trained cyclists (10).

Several factors have influenced the PowerCal reliability and validity. First, the PowerCal device contains a secret embedded algorithm for calculating PO based on cycling-specific HR and time data. Actually, the PowerCal estimation of PO is related with the individual HR score measured while cycling. Individual PowerCal data from our study show that cyclists with low HR values during the TT displayed low PO (i.e., older and fittest cyclists). In contrast, cyclists with high HR produced high PO (i.e., young and less-trained cyclists). For untrained and lower physical-fitness subjects, it is expected that with the adaptations of a regular aerobic training program, the HR will decrease at rest and during submaximal exercise whereas submaximal and maximal PO will increase. Although we did not find any training study that detected changes in PO using the PowerCal, we hypothesized that the PowerCal estimates of PO will probably produce lower scores after training if the submaximal HR decreases.

Second, cycling is a stochastic sport where a variety of factors can influence the relationship between HR and PO, and consequently influence the PO estimates from the PowerCal device (2,13). In a hilly TT, the level of the ground is the major factor that could influence the relationship between HR and PO (19). The TT in this study was not flat and was based on numerous changes in gradient, represented by both ascents and descents (9). We found a drop in the Velotron PO characterized by the long descent of the TT regardless of changes in the HR and PO estimated by the PowerCal. The decrease in PO was due to the resistive forces on the flywheel of the cycle ergometer simulating the downhill phase of the course profile. During steep uphill and downhill climbs, the dissociation between HR and PO increases (18). In fact, linear relationship between HR and PO was found only in a controlled testing condition (i.e., graded exercise test) (3). Therefore, it is not surprising that the estimates of PO from PowerCal would be different compared with the Velotron.

Practical Applications

The PowerCal is a device that estimates PO from HR response during exercise. The results from our study showed that the PowerCal is a nonreproducible device and also underpredicts the PO compared with the traditional cycle ergometer Velotron during hilly TT. The PowerCal was designed to be an affordable tool to monitor and control PO during cycling training and racing. Indeed, it has a low price and does not need any calibration before training. However, our previous sprint study (10) and this study demonstrated that the PowerCal device should be used with caution during cycling activities because it is not reliable and underestimates PO.


1. Abbiss CR, Laursen PB. Describing and understanding pacing strategies during athletic competition. Sports Med 38: 239–252, 2008.
2. Achten J, Jeukendrup AE. Heart rate monitoring: Applications and limitations. Sports Med 33: 517–538, 2003.
3. Arts FJ, Kuipers H. The relation between power output, oxygen uptake and heart rate in male athletes. Int J Sports Med 15:228–231, 1994.
4. Astorino TA, Cottrell T. Reliability and validity of the velotron racermate cycle ergometer to measure anaerobic power. Int J Sports Med 33: 205–210, 2012.
5. Atkinson G, Peacock O, St Clair Gibson A, Tucker R. Distribution of power output during cycling: Impact and mechanisms. Sports Med 37: 647–667, 2007.
6. Bland M, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 327: 307–310, 1986.
7. Boswell GP. Power variation strategies for cycling time trials: A differential equation model. J Sports Sci 30: 651–659, 2012.
8. Buchheit M. Monitoring training status with HR measures: Do all roads lead to Rome? Front Physiol 27: 1–19, 2014.
9. Clark B, Paton CD, O'Brien BJ. The reliability of performance during computer-simulated varying gradient cycling time trials. J Sci Cycling 3: 29–33, 2014.
10. Costa VP, Guglielmo LGA, Paton CD. Reproducibility and validity of the PowerCal device for estimating power output during sprints in well-trained cyclists. Isok Exerc Sci 23: 127–132, 2015.
11. Gardner AS, Stephens S, Martin DT, Lawton W, Lee H, Jenkins D. Accuracy of SRM and PowerTap power monitoring systems for bicycling. Med Sci Sports Exerc 36: 1252–1258, 2004.
12. Hopkins WG. Analysis of reliability with a spreadsheet. A new view of statistics. Available at: 2007. Accessed August 2015.
13. Jeukendrup A, Van Diemen A. Heart rate monitoring during training and competition in cyclists. J Sports Sci 16: S91–S99, 1998.
14. Kuipers H, Verstappen FT, Keizer HA, Geurten P, Van Kranenburg G. Variability of aerobic performance in the laboratory and its physiologic correlates. Int J Sports Med 6: 197–201, 1985.
15. Noreen E, Yamamoto K, Clair K. The reliability of a simulated uphill time trial using the Velotron electronic bicycle ergometer. Eur J Appl Physiol 110: 499–506, 2010.
16. Paton CD, Hopkins WG. Tests of cycling performance. Sports Med 31: 489–496, 2001.
17. Sporer BC, McKenzie DC. Reproducibility of a laboratory based 20-km time trial evaluation in competitive cyclists using the Velotron Pro ergometer. Int J Sports Med 28: 940–944, 2007.
18. Stapelfeldt B, Schwirtz A, Schumacher YO, Hillebrecht M. Workload demands in mountain bike racing. Int J Sports Med 25: 294–300, 2004.
19. Swain DP. The influence of body mass in endurance bicycling. Med Sci Sports Exerc 26: 58–63, 1994.
20. Thomas K, Stone MR, Thompson KG, St Clair Gibson A, Ansley L. Reproducibility of pacing strategy during simulated 20-km cycling time trials in well-trained cyclists. Eur J Appl Physiol 112: 223–229, 2012.
21. Zavorsky GS, Murias JM, Gow J, Kim DJ, Poulin-Harnois C, Kubow S, Lands LC. Laboratory 20-km cycle time trial reproducibility. Int J Sports Med 28: 743–748, 2007.

reproducibility; cyclists; performance; testing

© 2016 National Strength and Conditioning Association