Aerobic exercise training in patients with stable heart failure (HF) has been shown to be beneficial with respect to increasing functional capacity and potentially reducing clinical events (10,5). However, because of individual heterogeneity, it is possible that functional capacity, as measured by peak oxygen uptake (V˙O2) during cardiopulmonary exercise (CPX) testing, significantly decreases in some individuals in response to exercise training. Such negative responders to exercise training, if they exist, would be important to identify so that they would not be prescribed this intervention.
In non-HF, sedentary subjects undergoing 4–6 months of aerobic exercise training, Bouchard et al. (4) recently reported negative response rates of 8%–13% in some metabolic variables, including resting systolic blood pressure, fasting HDL-cholesterol, triglycerides, and insulin. Prior investigations of the heterogeneous responses to exercise training have used gene mapping and linkage analysis (11) as well as familial aggregation methods (12). More recently, a 21-wk combined endurance and strength training study in older adults resulted in individual peak V˙O2 responses ranging from an 8% decrease to a 42% increase (8).
The primary challenge in identifying a negative responder to exercise training is that a decrease in peak V˙O2 can occur for reasons unrelated to the training intervention. These reasons can include true physiological changes due to cardiac disease progression or other changes in health such as anemia, pulmonary dysfunction, or neuromuscular disorders. A decrease in peak V˙O2 can also occur due to random variability, which includes day-to-day biological variability and technical variability involved in CPX testing. Thus, it is very important to set a reasonable threshold for declaring an individual to be a negative responder to training. For setting such a threshold, inclusion of a control group of subjects who did not exercise train but who received the same baseline and follow-up measures of peak V˙O2 measurements as the exercise training group is mandatory.
The recent randomized clinical trial of 2331 HF subjects, Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training (HF-ACTION), provides a unique opportunity to estimate the proportion of negative responders to exercise training among patients with HF with a reduced left ventricular ejection fraction (LVEF) (10). HF-ACTION randomized 2331 subjects in a 1:1 fashion to exercise and control groups. The vast majority of subjects in both groups underwent protocol-specified baseline and 3-month follow-up peak V˙O2 tests. The main objective of this article was to search for evidence of a negative peak V˙O2 response among subjects assigned to the HF-ACTION exercise training group using the distribution of change in peak V˙O2 in the control subjects to set a reasonable threshold for a negative response. Our secondary objective was to estimate the random variability inherent in peak V˙O2 testing using a subset of 405 subjects who underwent two peak V˙O2 tests at baseline, within approximately 1 wk of each other (1). This allowed us to determine the within-subject random variability, which helped gauge the relative magnitude of random variability versus the true change in peak V˙O2. Since exercise intolerance is a key manifestation of HF, better understanding peak V˙O2 response heterogeneity to exercise training is relevant to both patient care and whether or not exercise therapy should remain within guideline-based recommendations.
METHODS
Subjects.
A complete description of the design and primary results of HF-ACTION has been previously reported (14,10). Briefly, HF-ACTION was a multicenter trial that enrolled 2331 subjects with LVEF ≤35% and New York Heart Association functional class II–class IV HF symptoms despite optimal medical and device therapy. Subjects were randomized in a 1:1 fashion to either the exercise training group or the control group. The protocol was approved by the institutional review board or ethics committee for each center, and subjects provided written informed consent. The present study included the 1870 subjects (972 in the exercise group and 898 in the control group) who had both baseline and 3-month CPX testing data.
Exercise training protocol.
Exercise group subjects participated in a supervised exercise program of walking or stationary cycling with a goal of three sessions per week for a total of 36 sessions in 3 months. Exercise was initiated at 15–30 min per session at a heart rate of 60% heart rate reserve (i.e., maximal heart rate on the baseline CPX test minus resting heart rate). After six sessions, the duration of the exercise was increased to 30–35 min, and intensity was increased to 70% of heart rate reserve. After completing 18 sessions, subjects were asked to add a 2-d·wk−1 home-based exercise program, which continued throughout a mean follow-up of 2.5 yr.
Control group.
Subjects in the control group were not provided with a formal exercise prescription. Both exercise and control group subjects received HF-related educational materials at the time of enrollment, including information on medications, fluid management, symptom exacerbation, sodium intake, and general physical activity recommendations of 30 min (as tolerated) of moderate-intensity exercise on most days of the week (7).
Cardiopulmonary exercise testing.
CPX testing was performed at baseline and 3 months after randomization. Subjects were tested using either a modified Naughton treadmill protocol (n = 1706) or a ramp (10 W·min−1, n = 164) stationary cycle protocol; the same modality was used at both time points. During testing, subjects were encouraged to achieve a rating of perceived exertion >17 (very hard) on the Borg scale and respiratory exchange ratio >1.10. Peak V˙O2, as determined in the HF-ACTION CPX Core Laboratory, was defined as the highest V˙O2 per kilogram body mass for a given 15- or 20-s interval within the last 90 s of exercise or first 30 s of recovery (1).
Negative training response threshold.
Because the goal of this study was to estimate the proportion of negative responders to exercise training, we needed to use data that were independent of the exercise group’s to set a negative peak V˙O2 response threshold. Because the control group did not undergo formal exercise training, we used its distribution of baseline-to-3-month change in peak V˙O2 to set a negative peak V˙O2 response threshold. In particular, a subject was identified as a negative responder if the baseline-to-3-month peak V˙O2 decreased 2 SDcontrol mL·kg−1·min−1, where SDcontrol is the SD of the baseline-to-3-month changes in the control group. Those control group changes comprised a combination of true physiologic changes as well as those due to random biological and technical variability. However, the control group’s changes did not include those due to a formal training program. Thus, if there were significantly more negative responders in the exercise group, then that could suggest that exercise training could be a culprit. Assuming a zero mean normal distribution for the control group’s changes, we would expect approximately 95% of the control group’s changes to be within ±2 SDcontol mL·kg−1·min−1 of 0. The normality of the control group’s changes was assessed by a histogram.
Within-subject random variability.
For a single peak V˙O2 measurement, we may write the observed peak V˙O2 value (Y) as the sum of the true peak V˙O2 value (T) and the within-subject random variability (E).
;)
By “true” peak V˙O2, we mean the average peak V˙O2 measurement we would have obtained if the subject had undergone several peak V˙O2 tests within a short time period. If the magnitude of the variability does not depend on the true value—for example, the variability is not larger for higher V˙O2 values—then we may quantify the magnitude of within-subject random variability in a single measurement by the SD of E, which we denote by SDEsingle. A Bland–Altman plot (2) verified that the within-subject random variability was independent of the subject’s true peak V˙O2 value.
For a single peak V˙O2 measurement, we only observe Y, but not T and E, separately. The only way to estimate SDEsingle is by obtaining repeat baseline peak V˙O2 tests on some subset of subjects. To estimate SDEsingle, we used the 405 HF-ACTION subjects who had two repeat peak V˙O2 tests at baseline. For all 405 subjects, both tests were within approximately 1 wk of each other. Those 405 subjects were a nonrandom sample of the HF-ACTION cohort since the protocol specified that for quality control purposes, the first five subjects at each clinical site as well as the first 100 subjects overall would undergo repeat baseline CPX tests. SDEsingle was obtained by computing variances for each of the 405 subjects, then taking the average of those variances (one variance for each subject), and finally taking the square root of the average of those variances. We verified that the baseline demographic and clinical characteristics of the 405 subjects were similar to the 1870 subjects with peak V˙O2 measurements at baseline and 3 months, so that SDEsingle could be accepted as a reasonable estimate of within-subject random variability for the 1870 subjects.
The within-subject SD (SDEdelta) inherent in the baseline-to-3-month random change in peak V˙O2 equals √2 SDEsingle (3). To see why this is so, we note there is random variability inherent in both the baseline and follow-up peak V˙O2 measurements:
;)
Therefore, the change in peak V˙O2 is
;)
SDEdelta is the SD of Epost − Epre. We make the common assumption (6) that Epre and Epost are independent with the same SD (SDEsingle). Because the variances (i.e., SD squared) of independent variables sum to the variance of their difference, we have
;)
Analyses were performed using R, version 2.15.1 (The R Foundation for Statistical Computing, Vienna, Austria). All P values are two sided.
RESULTS
Table 1 gives the baseline characteristics for the 1870 subjects who had peak V˙O2 data both at baseline and 3 months. Table 1 also gives the baseline characteristics for the 405 subjects with two peak V˙O2 tests at baseline that were within 1 wk apart. As can be seen in Table 1, there were no practical baseline differences between the two cohorts.
TABLE 1: Baseline characteristics for the 405 HF-ACTION subjects with repeat baseline tests and 1870 subjects with baseline and 3-month peak V˙O2 tests.
Figure 1 is a Bland–Altman plot of the difference in repeat baseline peak V˙O2 tests for the 405 subjects versus the average of those two tests. This plot shows that the mean difference in baseline tests was 0.05 mL·kg−1·min−1, which was not significantly different from 0 (t-test P value = 0.57). Moreover, variability was independent of the true peak V˙O2, for example, the variability was not larger for higher V˙O2 values. Also, there was no significant difference between the treadmill and the cycle subjects with respect to within-subject variability (Levene test P value = 0.32). Thus, SDEsingle is a reasonable estimate of the within-subject random variability in a single peak V˙O2 test. SDEsingle was calculated to be 1.33 mL·kg−1·min−1. Thus, the random variability SDEdelta inherent in the baseline-to-3-month change in peak V˙O2 was √2 SDEsingle = 1.9 mL·kg−1·min−1.
FIGURE 1: Bland–Altman plot of difference between the repeat baseline peak V˙O2 tests versus the average of the repeat baseline peak V˙O2 tests for the 405 HF-ACTION subjects with repeat baseline tests.
Figure 2 a is a histogram of the baseline-to-3-month change in peak V˙O2 for the 898 control group subjects. Their mean change in peak V˙O2 was 0.2 mL·kg−1·min−1 with SDcontrol = 2.5 mL·kg−1·min−1. Thus, our negative response threshold was a decrease of 2 SDcontrol = 5 mL·kg−1·min−1. We deemed this threshold as reasonable because the histogram was generally normally distributed and because 21 (2.3%) of the usual care subjects met this threshold, which is nearly equal to the 2.5% of usual care subjects we would have expected if the change in peak V˙O2 were perfectly normally distributed.
FIGURE 2: A. Histogram of the 0-to-3-month change in peak V˙O2 for the 898 control group subjects. The mean change in peak V˙O2 was 0.2 mL·kg−1·min−1 (SD = 2.5 mL·kg−1·min−1) and 21 subjects (2.3%) were negative responders with a decrease in peak V˙O2 of at least 5 mL·kg−1·min−1. The negative response cut point is denoted by the dashed line. B. Histogram of the 0-to-3-month change in peak V˙O2 for the 972 exercise group subjects. The mean change in peak V˙O2 was 0.8 mL·kg−1·min−1 and nine subjects (0.9%) were negative responders with a decrease in peak V˙O2 of at least 5 mL·kg−1·min−1. The negative response cut point is denoted by the dashed line.
Figure 2b is a histogram of the baseline-to-3-month change in peak V˙O2 for the 972 subjects in the exercise group. Their mean change in peak V˙O2 was 0.8 mL·kg−1·min−1, and 9 subjects (0.9%) met the negative response threshold of experiencing a decrease in peak V˙O2 ≥ 5 mL·kg−1·min−1. Importantly, the 2.5-mL·kg−1·min−1 SD of the change in peak V˙O2 in the exercise group (SDexercise) was the same as that in the usual care group. Indeed, the histograms in Figures 2a and 2b were nearly identical. This suggests that negative responses in the exercise group were virtually all unrelated to training. Otherwise, we would have expected to see a larger SD in the exercise group and a longer negative tail in Figure 2b, neither of which occurred. Of note, SDEdelta / SDexercise = SDEdelta / SDcontrol, that is, 1.9 mL·kg−1·min−1 / 2.5 mL·kg−1·min−1 = 0.76, so 76% of the variation in the change in peak V˙O2 among subjects in both exercise and usual care groups was due to within-subject random variability.
Finally, there was variable adherence to the training program in the exercise group, with subjects averaging 95 min·wk−1 of exercise (5th percentile = 0 min·wk−1, 1st quartile = 58 min·wk−1, 3rd quartile = 128 min·wk−1, 95th percentile = 185 min·wk−1). For the 438 exercise group subjects who exercised at least 90 min·wk−1, the mean change in peak V˙O2 was 1.1 mL·kg−1·min−1 (SD = 2.5 mL·kg−1·min−1), with only two subjects (0.5%) meeting the negative response threshold. As seen in Figure 3, their histogram of the 0-to-3-month change in peak V˙O2 is slightly skewed to the right, corresponding to peak V˙O2 increases. Indeed, only 2 of those 438 subjects (0.5%) met the negative response threshold.
FIGURE 3: Histogram of the 0-to-3-month change in peak VO2 for the 433 exercise group subjects who exercised ≥90 min·wk−1 during the first 3 months of training. The mean change in peak V˙O2 was 1.1mL·kg−1·min−1 (SD = 2.5 mL·kg−1·min−1) and two subjects (0.5%) were negative responders with a decrease in peak V˙O2 of at least 5mL·kg−1·min−1. The negative response cut point is denoted by the dashed line.
DISCUSSION
The current analyses suggest that any apparent negative effect of exercise training on peak V˙O2 or exercise tolerance occurs in a very small percentage of patients with HF and reduced LVEF. Indeed, using a negative response threshold that corresponded to a two SD decline in the control group, slightly more control subjects (2.3%) met that threshold than did exercise subjects (0.9%) (Fisher’s P value = 0.02). Moreover, the two groups had the same variability in observed changes in peak V˙O2 (SD = 2.5 mL·kg−1·min−1 in both groups) and very similar histograms. This suggests that negative responses were likely due to health changes unrelated to exercise training, random day-to-day biological variability, and the technical variability involved in the measurement of peak V˙O2. This is an important finding given that just two decades ago, exercise training was not routinely embraced as a safe and effective therapy for patients with reduced LVEF and supports current evidence-based guidelines that recommend exercise therapy to improve exercise tolerance in these patients.
A recent study by Bouchard et al. (4) of 1687 non-HF sedentary subjects enrolled in six different exercise studies raised some potential concerns when they cited the relatively common occurrence of negative metabolic response to exercise training. They identified 12.2% of their subjects with a negative response for resting systolic blood pressure, 8.4% for insulin, 10.4% for triglycerides, and 13.3% for HDL-cholesterol. Unfortunately, our HF-ACTION database only includes follow-up exercise tolerance data and does not include follow-up data for those metabolic risk factors studied by Bouchard et al. Thus, we could not perform similar analyses to confirm their results in HF subjects. It is possible that exercise training negatively impacts metabolic risk factors in some individuals without negatively impacting exercise tolerance. However, it is important to note two differences in the methodology of Bouchard et al. from our own. First, Bouchard et al. defined a negative response as a 2 SDEsingle decrease between baseline and follow-up at approximately 6 months. As shown previously, SDEsingle underestimates by a factor of √2 the within-subject random variability SDEdelta inherent in the difference between the baseline and the follow-up measurements. Thus, even if all changes were due to random variability with no true changes unrelated to training, we would have expected approximately 8% of subjects to meet the 2 SDEsingle negative response threshold for normal distribution reasons. Second, and more importantly, Bouchard et al. did not compare their negative response rates and the SD of the risk factor changes to those of a control group who did not receive the exercise training. Without such a comparison, it is difficult to know the extent to which negative response was associated with training. Indeed, in our data, we estimated the random variability SDEdelta for the change in peak V˙O2 to be 1.9 mL·kg−1·min−1, while the overall variability SDcontrol in the control group was 2.5 mL·kg−1·min−1. Thus, approximately one-fourth of the change in peak V˙O2 in the control group was true health-related change. Accounting for these true changes led us to use 2 SDcontrol = 5 mL·kg−1·min−1 as the negative response threshold. Clearly, it will be very important to conduct further research on metabolic response to exercise in comparison with a control group of nonexercisers.
Our study had the following limitations. First, approximately 9% of the subjects (164 of 1870) were tested using a bike protocol while the remaining subjects were tested on a treadmill. Because bike values are likely to be approximately 10% lower than treadmill values, it is possible that the mean and SD values of the peak V˙O2 change were different than if all subjects were treadmill tested. Second, adherence to the training exercise program was quite variable in the exercise group with 5th and 95th percentiles of 0 and 185 min·wk−1 of exercise, respectively. Third, it is possible that some of the control subjects engaged in exercise training. More generally, as has been noted in the literature (9,13), to reliably identify a particular subject as a negative responder to an intervention (e.g., exercise training) would require a different study design than the parallel arm design that was used for our study. Indeed, a subject would need to undergo at least two distinct periods of exercise training as well as two distinct periods not training to estimate that subject’s individual variability associated with training and not training, respectively. Clearly such a “repeated crossovers” design would be more costly and difficult to execute than a parallel arm design. It is possible that in the future, patient characteristics will be discovered to be associated with negative response to training. However, until such time and given that the HF-ACTION study showed exercise to be safe while providing a modest reduction in risk for both all-cause mortality or hospitalization and cardiovascular mortality or HF hospitalization, recommendations to exercise may be made for the general HF patient rather than targeted towards specific HF patients.
The HF-ACTION trial was supported by grants from the National Heart, Lung, and Blood Institute. The authors thank the HF-ACTION participants and investigators who gave their time and effort to completion of the study. The authors also thank Drs. Nancy Geller, Michael Lauer, and James Troendle for their critical review of the manuscript. The opinions and assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the National Heart, Lung, and Blood Institute or the National Institutes of Health.
The results of the present study do not constitute endorsement by the American College of Sports Medicine.
The data used for this analysis were collected through a study that was supported by the National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland. There were no conflicts of interest to be declared with companies or manufacturers who will benefit from the results of the present study.
REFERENCES
1. Bensimhon DR, Leifer ES, Ellis SJ, et al. Reproducibility of peak oxygen uptake and other cardiopulmonary exercise testing parameters in patients with heart failure (from the heart failure and a controlled trial investigating outcomes of exercise training).
Am J Cardiol. 2008; 102: 712–7.
2. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement.
Lancet. 1986; 1 (8476): 307–10.
3. Bland JM, Altman DG. Statistics notes: measurement error.
BMJ. 1996; 313 (7059): 744.
4. Bouchard C, Blair SN, Church TS, et al. Adverse response to regular exercise: is it a rare or common occurrence?
PLoS One. 2012; 7 (5): e37887.
5. Davies EJ, Moxham T, Rees K, et al. Exercise based rehabilitation for heart failure.
Cochrane Database Syst Rev. 2010; (4): CD003331.
6. Daw EW, Province MA, Gagnon J, et al. Reproducibility of the HERITAGE family study intervention protocol: drift over time.
Ann Epidemiol. 1997; 7: 452–62.
7. Hunt SA, Abraham WT, Chin MH, et al. American College of Cardiology; American Heart Association Task Force on Practice Guidelines; American College of Chest Physicians; International Society for Heart and Lung Transplantation; Heart Rhythm Society. ACC/AHA 2005 guideline update for the diagnosis and management of chronic heart failure in the adult: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Writing Committee to Update the 2001 Guidelines for the Evaluation and Management of Heart Failure): developed in collaboration with the American College of Chest Physicians and the International Society for Heart and Lung Transplantation: endorsed by the Heart Rhythm Society.
Circulation. 2005; 112 (12): e154–e235.
8. Karavirta L, Häkkinen K, Kauhanen A, et al. Individual responses to combined endurance and strength training in older adults.
Med Sci Sports Exerc. 2011; 43 (3): 484–90.
9. Obarzanek E, Proschan MA, Vollmer WM, et al. Individual blood pressure responses to changes in salt intake: results from the DASH-Sodium trial.
Hypertension. 2003; 42: 459–67.
10. O’Connor CM, Whellan DJ, Lee KL, et al. for the HF-ACTION Trial Investigators. Efficacy and safety of exercise training in patients with chronic heart failure: HF-ACTION randomized controlled trial.
JAMA. 2009; 301 (14): 1439–50.
11. Rankinen T, Pérusse L, Rauramaa R, et al. The human gene map for performance and health-related fitness phenotypes: the 2001 update.
Med Sci Sports Exerc. 2002; 34 (8): 1219–33.
12. Rice T, Després J-P, Pérusse L, et al. Familial aggregation of blood lipid response to exercise training in the Health, Risk Factors, Exercise Training, and Genetics (HERITAGE) family study.
Circulation. 2002; 105: 1904–1908.
13. Senn S. Individual therapy: new dawn or false dawn.
Drug Inform J. 2001; 35: 1479–94.
14. Whellan DJ, O’Connor CM, Lee KL, et al. for the HF-ACTION Trial Investigators. Heart failure and a controlled trial investigating outcomes of exercise training (HF-ACTION): design and rationale.
Am Heart J. 2007; 153: 201–11.