Journal Logo


Reliability of Maximal Cardiopulmonary Exercise Testing in Men with Prostate Cancer


Author Information
Medicine & Science in Sports & Exercise: January 2015 - Volume 47 - Issue 1 - p 27-32
doi: 10.1249/MSS.0000000000000370
  • Free


Prostate cancer (PC) is the most common non-skin cancer diagnosis among North American men (1). Radical prostatectomy (RP), a primary treatment for early-stage localized PC, is associated with 10-yr survival rates of approximately 90% (1,24). However, RP’s side effects include urinary incontinence, erectile dysfunction, pain, and reduced physical functioning (6). Collectively, these side effects negatively affect health-related quality of life for up to 12 months after surgery (9). Despite significant advances in pharmacotherapy, continuing development of new interventions to improve health-related quality of life remains a high priority for this patient population (22). We, and others, have proposed that aerobic-based exercise training may improve cardiorespiratory fitness and cardiovascular risk profiles in patients with PC (18,23). However, to accurately assess exercise interventions and evaluate acute and chronic cardiovascular late effects in patients with an early-stage disease, consistently reliable functional outcome measures must be obtained.

Over the past half decade, noninvasive cardiopulmonary exercise testing (CPET) has emerged as a research and clinical tool of immense value given its ability to accurately evaluate aerobic capacity (V˙O2peak) (which, in turn, can be used for risk stratification or evaluation of intervention efficacy), to assess the principal determinants of exercise intolerance, and to develop individualized aerobic training prescriptions (14,16,17). Importantly, the within-patient variability of CPET variables in PC has not been ascertained. Consequently, establishing repeatability of CPET in PC is critical not only to ensure accurate assessment of changes in cardiorespiratory fitness across interventions but also to precisely characterize acute and chronic cardiovascular effects for risk stratification. Accordingly, as a prespecified substudy of a randomized trial investigating the efficacy of aerobic training in men with clinically localized PC after RP, we evaluated the reproducibility of V˙O2peak and other important CPET parameters to assess the need for repeat testing.


Patient cohort, setting, and procedures.

This is a secondary analysis of baseline data obtained in the context of a randomized trial investigating the efficacy of aerobic training in men with clinically localized PC after RP. The full details of the study methods have been reported previously (18). In brief, the major eligibility criteria were as follows: 1) ≤4 months from RP to study enrollment, 2) no absolute contraindications to a maximal CPET, 3) willingness to travel to Duke University Medical Center, 4) primary attending urologist’s approval, and 5) V˙O2peak (mL·kg−1·min−1) below that of age-matched sedentary normative values (18). All study procedures were reviewed and approved by the Duke University Medical Center institutional review boards. All subjects signed a written consent before the initiation of any study-related procedures. After the initial eligibility determination, all patients performed a maximal CPET as part of the study’s screening procedures (visit 1). If no CPET-related adverse events were detected, all patients returned to perform a repeat CPET (visit 2).

Maximal CPET procedures.

All CPET were performed using an incremental treadmill test (modified Balke protocol) on a motorized treadmill with 12-lead ECG monitoring (Mac® 5000; GE Healthcare). Participants were provided with detailed, standardized, verbal instructions before each test and were told to avoid exercise, caffeine, and alcohol for 24 h and consume medications as usual. All testing was performed between 8:00 and 10:00 a.m. Complete calibration of gas concentrations using primary standard gases and flow using a 3-L syringe were performed before each test. All tests were performed by the same two certified exercise physiologists. To avoid bias, testers intentionally did not view the results of test 1 immediately before conducting test 2. Before beginning the test, subjects were familiarized with the mouthpiece and walking on the treadmill for approximately 10 min, followed by 10 min of rest. After stable resting metabolic values were achieved (including blood pressure and HR), subjects began walking at a pace selected in the warm-up as a comfortable but brisk pace (approximately 2.5–4.0 mph) at 0% grade for 2 min. During the test, participants were encouraged to walk for as long as possible, resulting in a peak, symptom-limited, exhaustive effort. Grade was increased every 2 min until ventilatory threshold (VT) (determined independently in each test) and then every minute thereafter until exhaustion or a symptom limitation (15). RPE were evaluated at the end of each workload using the modified Borg scale. Whether patients gave a maximal effort was determined on the basis of a combination of symptoms (dyspnea on exertion and/or fatigue as primary end points) and physiological measures (defined as a plateau in oxygen consumption during the final stage, HRmax > 85% of age-adjusted predicted HRmax, and RER > 1.10) (10). Metabolic gas exchange was measured continuously during exercise and averaged over 30-s intervals (Parvo Medics TrueOne® 2400; Parvo Medics, Sandy, UT). V˙O2peak was defined as the highest V˙O2 value for a given 30-s interval within the last 60 s of exercise. The V˙O2 at VT was independently determined by a blinded reader using standard methods (2,26). The minute ventilation–carbon dioxide production relation (V˙E/V˙CO2 slope) was determined by measuring the slope across the entire duration of the test (3).

Statistical analyses.

Means are expressed using either ±SD or 10th and 90th percentiles. Within-subject variability from test 1 to test 2 was evaluated by within-subject absolute change (i.e., either an increase or decrease from test 1 to test 2) and tested using paired samples t-tests. Bland–Altman plots were constructed, defined as the difference in values from test 2 and test 1 plotted versus the average of the values from tests 1 and 2 with SD of the difference (average difference ± 1.96 SD of the difference) (7). Reliability of the CPET outcome measures was assessed by Pearson correlations and intraclass correlations (ICC). One-sided 95% confidence intervals (CI) of Pearson correlations were made using the Fisher z transformation. For each variable, the coefficient of variation (CV) was defined as follows: (within-subject SD/within-subject mean) × 100%. Significance for all tests was set at P < 0.05.


Participant characteristics.

Details regarding response rates and profiles of the participants have been reported previously (18). In brief, 50 patients were randomized in the parent trial and 40 (80%) successfully completed two baseline CPET (2) (Table 1). Participant recruitment took place between November 2009 and July 2012. The two baseline CPET were separated by 5.6 ± 5.5 d. No serious or life-threatening adverse events were observed during any CPET procedures. During baseline CPET, ischemic ECG changes (≥2.0-mm ST depression) were observed in three patients.

Participant characteristics (n = 40).

Test–retest reliability.

Pearson correlations and ICC for peak and submaximal variables are summarized in Table 2. There was a significant correlation between the two tests, with correlation coefficients of 0.94 (absolute V˙O2peak, HR), 0.96 (VT), and 0.67 (RER). ICC ranged from 0.927 (absolute V˙O2peak, RER, exercise time), 0.790 (VT), and 0.938 (HR).

Reliability of CPET.

Within-subject test–retest variability.

Test–retest variability (CV) for peak and submaximal parameters ranged from 2.2% (HR), 4.2% (indexed and absolute V˙O2peak), and up to 7.6% (exercise duration). Exercise duration (11.3 ± 2.2 vs 11.5 ± 2.2 min), V˙O2peak (2.39 ± 0.5 vs 2.49 ± 0.5 L·min−1; 27.0 ± 5.6 vs 28.1 ± 5.3 mL·kg−1·min−1; P < 0.01), peak HR (161 ± 17 vs 163 ± 16 bpm), VT (1.91 ± 0.4 vs 1.97 ± 0.4 L·min−1), and V˙E/V˙CO2 (32.1 ± 5.8 vs 32.8 ± 3.4) increased from test 1 to test 2 (Table 3). The percentage of subjects who increased on test 2 was greater than the percentage of those who decreased for V˙O2peak (72%), exercise time (75%), HR (68%), V˙E/V˙CO2 (83%), and VT (65%). As shown in Figure 1A and B, there was a high variability between tests for peak variables (95% CI of 25.2–29.8 mL·kg−1·min−1, 2.2–2.7 L·min−1 (V˙O2peak), and 155–167 bpm (HR)) and submaximal variables (Fig. 2A and B) (95% CI of 31.0–33.8 (V˙E/V˙CO2), 1.72–2.09 L·min−1 (VT)). Variability was similar for subjects with lower versus those with higher V˙O2peak (Figs. 1 and 2).

Test–retest variability of CPET variables.
Bland–Altman plot for peak variables: V˙O2peak (A) and peak HR (B).
Bland–Altman plot for submaximal variables: (V˙E/V˙CO2) (A) and VT (B).


The principal finding of this ancillary analysis was that despite the good test–retest CPET reliability, there was a significant within-subject variability in peak and submaximal variables (CV ranging from 2% to 8%) in men with localized PC. Accordingly, we contend that our data suggest the need to perform two CPET at baseline to obtain valid baseline V˙O2peak and other CPET results in men after RP for clinically localized PC. Such variability has critical implications for the acquisition of accurate baseline V˙O2 data, which, in turn, could affect prognosis and risk stratification and the magnitude of intervention efficacy to change V˙O2 in patients with cancer.

Some studies have closely examined the variability of measurements obtained during CPET in other clinical populations with conflicting results. In the “Heart Failure: A Controlled Trial Investigating Outcomes of Exercise TraiNing (HF-ACTION)” trial, the mean V˙O2peak was not different in test 1 and 2 (15.2 ± 5.0 vs 15.2 ± 5.0 mL·kg−1·min−1, P = 0.78) (5). However, there was a large within-subject variability (6.6%) for V˙O2peak, which the authors attributed to variations in subject factors specific to patients with heart failure such as daily hemodynamic and volume status fluctuations. A more recent investigation involving patients with heart failure with preserved ejection fraction reported no difference in V˙O2peak between tests 1 and 2 (14.4 vs 14.3 mL·kg−1·min−1), with authors concluding that the incremental gain in information should be balanced by the increase in participant burden and cost (25). In contrast, current CPET guidelines in clinical populations with cardiovascular and respiratory disease recommend that at least two CPET be conducted in all clinical and research settings; these recommendations were formulated on the basis of previous work demonstrating that V˙O2peak significantly varies over serial tests (2,12). For example, Elborn et al. (8) performed three consecutive treadmill-based CPET separated by 2 wk in 30 subjects with heart failure. The mean V˙O2peak improved significantly from the first to the second test (14.1 vs 14.9 mL·kg−1·min−1, P < 0.005), with no difference between tests 2 and 3 whereas the average within-subject CV was 6%. These findings are consistent with those observed here, in which V˙O2peak increased significantly from 27.0 to 28.1 mL·kg−1·min−1 and the within-subject CV was 4.2%. In addition, the percentage of subjects who increased on test 2 (72%) was greater than the percentage of those who decreased (28%). Submaximal responses such as VT and V˙E/V˙CO2 slope have been used with greater frequency to classify exercise intolerance and to risk-stratify in clinical populations (5,25). In contrast to previous investigations on patients with heart failure reporting nearly identical mean values on tests 1 and 2 for submaximal indices (5,25), here, we found that 83% (33/40) and 65% (26/40) of subjects increased V˙E/V˙CO2 slope and VT, respectively, on test 2. Taken together, these findings are indicative of variability in men with localized PC. We did not conduct a third CPET; however, on the basis of previous literature, an additional test is unlikely to provide clinically relevant changes (8,13,21).

The variability in both maximal and submaximal CPET parameters observed in the present study is potentially clinically meaningful. For example, there is growing evidence supporting the use of V˙O2 for both prognostic (14,20) and risk stratification (4,27) purposes in numerous oncology clinical scenarios. These findings underscore the significance of accurate CPET-derived peak and submaximal measures, particularly given the increasing number of cancer diagnoses combined with the large and rapidly growing number of cancer survivors who could benefit from a global measure of the integrative capacity of the pulmonary, cardiovascular, and skeletal muscle systems to deliver and use O2. Furthermore, in the parent trial, we found that mean V˙O2peak increased from 28.1 mL·kg−1·min−1 at baseline (as recorded in the second CPET) to 30.1 mL·kg−1·min−1 at week 24, an increase of 9% among patients randomized to supervised aerobic training. There were no differences in V˙O2peak between baseline (as recorded in the second CPET) and 6 months in the usual care group. It could be speculated that had we not conducted two CPET at baseline, the increase in V˙O2peak may have been inflated to 12% (27.0 vs 30.1 mL·kg−1·min−1) in the exercise group and increased in the usual care group (27.9 vs 29.2 mL·kg−1·min−1). Interestingly, in a meta-analysis of six randomized controlled trials assessing the efficacy of exercise training on direct measurement of V˙O2peak in patients with cancer, the mean increase in V˙O2peak was 2.91 mL·kg−1·min−1 or approximately 12% compared with that in controls (19). It is important to note that all of these trials conducted only one CPET at baseline. On the basis of the present findings, testing variability may have contributed to the observed improvements. Although some testing variability could potentially be controlled for by comparing changes in exercise groups with those in controls groups, our findings raise important questions regarding CPET methodology before, during, or after intervention and for cardiovascular risk assessment in the oncology setting. For example, given that the V˙O2peak decreased in 28% of subjects on test 2 in the present investigation, future studies should examine whether the highest data point, the mean of two points, or the second data point should be used for final baseline assessments.

To accurately assess test–retest reliability, it is important to adhere to stringent CPET methodology. To this end, we adhered to the following procedures: standardized participant instructions, testing at the same time of the day, screening for intercurrent acute illness or injury or change in condition, calibration and validation of the gas exchange unit before each test, and strong encouragement to reach volitional exhaustion or symptom limitation. However, there were several factors and testing methodology limitations that may contribute to increased variability in the measurement of peak and submaximal CPET parameters. First, to facilitate the achievement of peak aerobic capacity in a timely and standardized way, several maximum incremental ramp protocols are available. These protocols are classified according to the application of work rate, as follows: constant increments (i.e., application of the same workload increments for all patients) or individualized increments (i.e., variable workload increments based on patient characteristics). We used individualized protocols, as recommended by the American Thoracic Society (2). Individualized protocols enable the selection of workload increments that are appropriate for a patient. Such flexibility is particularly useful for CPET of patients with cancer, many of whom have been treated with cytotoxic combination therapy and present with concomitant comorbid disease (16). Second, there is considerable day-to-day biological variation in humans that is well documented in healthy subjects (11). In patients with a chronic condition on a variety of medications, the biological variance could be even greater. Indeed, although all subjects in the present investigation achieved the maximal effort as outlined by the American Thoracic Society (2), there may have been inherent variability in the effort that deconditioned subjects were willing to exert from day to day (21). To this end, there were variations in HR between tests, suggesting inconsistency of effort. Third, there is also considerable technical variation when V˙O2peak occurs, depending on how many breaths occur during the sampling period (30 s for the present investigation). A small change may be associated with when the gas sampling occurs in comparison to breathing pattern. Finally, this was a prespecified substudy of a randomized trial with a modest sample size.

In summary, the findings of the present study demonstrating an increase in maximal and submaximal CPET-derived measures in the second test among men with clinically localized PC suggest that two baseline CPET are required, when feasible, in all studies examining the efficacy of interventions on V˙O2peak in the oncology setting. Importantly, analyses of similar studies in both children and adult patients with cancer that include evaluation of CPET before, during, and after therapy are now required to establish standardized methodology and interpretation of results, to formulate reference values, and to increase the clinical utility of CPET. Such efforts will lead to the development of standardized CPET guidelines/recommendations for patients with cancer.

This study was supported by a research grant from the National Cancer Institute (R21-CA133895) awarded to L. W. J.

The authors declare no conflicts of interest. The results of the present study do not constitute endorsement by the American College of Sports Medicine.


1. American Cancer Society [Internet]. Atlanta: American Cancer Society. Available from: Accessed February 21, 2014.
2. American Thoracic Society; American College of Chest Physicians. ATS/ACCP statement on cardiopulmonary exercise testing. Am J Respir Crit Care Med. 2003; 167 (2): 211–77.
3. Arena R, Myers J, Aslam SS, Varughese EB, Peberdy MA. Technical considerations related to the minute ventilation/carbon dioxide output slope in patients with heart failure. Chest. 2003; 124 (2): 720–7.
4. Beckles MA, Spiro SG, Colice GL, Rudd RM. Initial evaluation of the patient with lung cancer: symptoms, signs, laboratory tests, and paraneoplastic syndromes. Chest. 2003; 123 (1 Suppl): 97S–104S.
5. Bensimhon DR, Leifer ES, Ellis SJ, et al. Reproducibility of peak oxygen uptake and other cardiopulmonary exercise testing parameters in patients with heart failure (from the Heart Failure and A Controlled Trial Investigating Outcomes of exercise traiNing). Am J Cardiol. 2008; 102 (6): 712–7.
6. Bhatnagar V, Stewart ST, Huynh V, Jorgensen G, Kaplan RM. Estimating the risk of long-term erectile, urinary and bowel symptoms resulting from prostate cancer treatment. Prostate Cancer Prostatic Dis. 2006; 9 (2): 136–46.
7. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986; 1 (8476): 307–10.
8. Elborn JS, Stanford CF, Nicholls DP. Reproducibility of cardiopulmonary parameters during exercise in patients with chronic cardiac failure. The need for a preliminary test. Eur Heart J. 1990; 11 (1): 75–81.
9. Ficarra V, Novara G, Galfano A, et al. Twelve-month self-reported quality of life after retropubic radical prostatectomy: a prospective study with Rand 36-Item Health Survey (Short Form-36). BJU Int. 2006; 97 (2): 274–8.
10. Fletcher GF, Ades PA, Kligfield P, et al. Exercise standards for testing and training: a scientific statement from the American Heart Association. Circulation. 2013; 128 (8): 873–934.
11. Hawkins MN, Raven PB, Snell PG, Stray-Gundersen J, Levine BD. Maximal oxygen uptake as a parametric measure of cardiorespiratory capacity. Med Sci Sports Exerc. 2007; 39 (1): 103–7.
12. Hunt SA, Abraham WT, Chin MH, et al. 2009 focused update incorporated into the ACC/AHA 2005 guidelines for the diagnosis and management of heart failure in adults a report of the American College of Cardiology Foundation/American Heart Association Task Force on practice guidelines developed in collaboration with the International Society for Heart and Lung Transplantation. J Am Coll Cardiol. 2009; 53 (15): e1–90.
13. Janicki JS, Gupta S, Ferris ST, McElroy PA. Long-term reproducibility of respiratory gas exchange measurements during exercise in patients with stable cardiac failure. Chest. 1990; 97 (1): 12–7.
14. Jones LW, Courneya KS, Mackey JR, et al. Cardiopulmonary function and age-related decline across the breast cancer survivorship continuum. J Clin Oncol. 2012; 30 (20): 2530–7.
15. Jones LW, Douglas PS, Eves ND, et al. Rationale and design of the Exercise Intensity Trial (EXCITE): a randomized trial comparing the effects of moderate versus moderate to high-intensity aerobic training in women with operable breast cancer. BMC Cancer. 2010; 10: 531.
16. Jones LW, Eves ND, Haykowsky M, Joy AA, Douglas PS. Cardiorespiratory exercise testing in clinical oncology research: systematic review and practice recommendations. Lancet Oncol. 2008; 9 (8): 757–65.
17. Jones LW, Eves ND, Mackey JR, et al. Safety and feasibility of cardiopulmonary exercise testing in patients with advanced cancer. Lung Cancer. 2007; 55 (2): 225–32.
18. Jones LW, Hornsby W, Freedland SJ, et al. Effects of nonlinear aerobic training on erectile dysfunction and cardiovascular function following radical prostatectomy for clinically localized prostate cancer. Eur Urol. 2014; 65 (5): 852–5.
19. Jones LW, Liang Y, Pituskin EN, et al. Effect of exercise training on peak oxygen consumption in patients with cancer: a meta-analysis. Oncologist. 2011; 16 (1): 112–20.
20. Jones LW, Watson D, Herndon JE 2nd, et al. Peak oxygen consumption and long-term all-cause mortality in nonsmall cell lung cancer. Cancer. 2010; 116 (20): 4825–32.
21. Katzel LI, Sorkin JD, Macko RF, Smith B, Ivey FM, Shulman LM. Repeatability of aerobic capacity measurements in Parkinson disease. Med Sci Sports Exerc. 2011; 43 (12): 2381–7.
22. Mina DS, Matthew AG, Trachtenberg J, et al. Physical activity and quality of life after radical prostatectomy. Can Urol Assoc J. 2010; 4 (3): 180–6.
23. Ottenbacher AJ, Day RS, Taylor WC, et al. Long-term physical activity outcomes of home-based lifestyle interventions among breast and prostate cancer survivors. Support Care Cancer. 2012; 20 (10): 2483–9.
24. Quinn M, Babb P. Patterns and trends in prostate cancer incidence, survival, prevalence and mortality. Part I: international comparisons. BJU Int. 2002; 90 (2): 162–73.
25. Scott JM, Haykowsky MJ, Eggebeen J, Morgan TM, Brubaker PH, Kitzman DW. Reliability of peak exercise testing in patients with heart failure with preserved ejection fraction. Am J Cardiol. 2012; 110 (12): 1809–13.
26. Wasserman K, McIlroy MB. Detecting the threshold of anaerobic metabolism in cardiac patients during exercise. Am J Cardiol. 1964; 14: 844–52.
27. Win T, Jackson A, Groves AM, Sharples LD, Charman SC, Laroche CM. Comparison of shuttle walk with measured peak oxygen consumption in patients with operable lung cancer. Thorax. 2006; 61 (1): 57–60.


© 2015 American College of Sports Medicine