Journal Logo

Basic Sciences: Epidemiology

Reproducibility of cardiovascular, respiratory, and metabolic responses to submaximal exercise: The HERITAGE Family Study


Author Information
Medicine & Science in Sports & Exercise: February 1998 - Volume 30 - Issue 2 - p 259-265
  • Free


The ability to track accurately change in physiological variables consequent to an exercise training intervention is an large part dependent on the reproducibility of the measurements, which reflects both trial-to-trial reliability as well as day-to-day biological variability. When measurements are less reliable (i.e., with large measurement error) or not highly reproducible, it is difficult to demonstrate change accurately. Unfortunately, the literature is replete with exercise training studies that have conducted only a single assessment pre-intervention and post-intervention without establishing reproducibility of their measurements.

The HERITAGE Family Study is a large multicenter clinical trial investigating possible genetic basis for the variability in the response of physiological measures and risk factors for cardiovascular disease and non-insulin-dependent diabetes mellitus to endurance exercise training. With a multicenter trial, it is critical that all clinical centers use exactly the same protocols for measurements. Further, to detect relatively small changes in critical variables and to establish the possible genetic basis for these changes, it is imperative to have extremely high reproducibility. Thus, the purpose of this study was to establish the reproducibility of important cardiovascular and metabolic variables obtained during submaximal exercise at two different rates of work, 50 W and 60% of maximal oxygen uptake(˙VO2max), before and after a 20-wk endurance training program in HERITAGE Family Study subjects, and in a separate group of quality control subjects, across four participating clinical centers. Details of the HERITAGE Family Study have been presented previously (2).


Subjects. The HERITAGE subject population is composed of families, including the natural father and mother and at least two(African-American families) or three (Caucasian families) offspring 17 yr of age or older. Subjects were recruited by each of the four clinical centers, located at Indiana University, Laval University, the University of Minnesota, and The University of Texas at Austin. Subjects for this study represent the first 390 subjects to participate in the HERITAGE Family Study protocol. Of these subjects, 192 were women and 198 were men, and their ages ranged from 17 to 65 yr. Subject characteristics are presented in Table 1. In addition to the above criteria, all subjects had to meet a set of inclusionary criteria (2) and to pass a physician-administered physical examination. This included a resting and exercise 12-lead electrocardiogram, the latter obtained during a maximal exercise test (2). The study protocol had been previously approved by each clinical center's institutional review board, and informed consent was obtained from each subject.

Each clinical center recruited additional subjects every 6 months over three consecutive 6-month periods to participate in an intracenter quality control substudy (ICQC substudy) during the second and third year of data collection. Subjects in the ICQC substudy were required to meet all criteria for admission to the HERITAGE Family Study with the exception of family membership. Data were available for 55 subjects across all four centers. Their characteristics are presented in Table 1.

Experimental design. Each subject in the HERITAGE Family Study completed an extensive battery of tests prior to starting the 20-wk endurance training program, including three cycle ergometer exercise tests conducted on separate days: a maximal exercise test (Max), a submaximal exercise test(Submax), and a submaximal/maximal exercise test (Submax/Max) which combined the previous Submax and Max tests (2). The training program was conducted on cycle ergometers (Universal Aerobicycle, Cedar Rapids, IO) interfaced with a Mednet computer system (Universal Gym Mednet, Cedar Rapids, IO) to control the power output of the ergometers to maintain constant training heart rates. Subjects started training at the heart rate associated with 55% of their initial maximal oxygen uptake(˙VO2max) for 30 min·d-1 and gradually progressed to the heart rate associated with 75% of their initial ˙VO2max for 50 min·d-1 at the end of 14 wk. They maintained this intensity and duration throughout the remaining 6 wk. Frequency was maintained at three sessions per week throughout the 20-wk training program. See Bouchard et al.(2) for further details concerning the training program. At the end of the 20-wk training program, the subjects completed an identical post-training test battery.

Subjects enrolled in the ICQC substudy completed the initial maximal exercise test to establish the power output necessary to elicit 60% of their˙VO2max. They then performed three Submax/Max tests within a 3-wk period to establish reproducibility of the test results at each clinical center. These tests were separated by at least 4 d, but not by more than 10 d.

Exercise test methodology. Maximal and submaximal exercise tests were conducted on a cycle ergometer (SensorMedics Ergo-Metrics 800S, Yorba Linda, CA). Subjects completed an initial maximal exercise test using a graded exercise test protocol starting at 50 W for 3 min. The rate of work was then increased by 25 W every 2 min thereafter to the point of exhaustion. For older, smaller, or less fit subjects, the test was started at 40 W and increased by 10 to 20 W increments. Using the results of this initial maximal test, subjects then performed the Submax exercise test at 50 W and at 60% of their initial ˙VO2max. The Submax/Max exercise test was then performed, starting with the Submax protocol, i.e., 50 W and 60% of the initial ˙VO2max, and progressing to a maximal level of exertion. For this last test, a venous catheter was inserted into the left arm and blood samples were obtained at rest, during exercise at 50 W, 60% of˙VO2max, and 80% of ˙VO2max, and immediately upon completion of the maximal test. The results of these tests were used to establish the endurance training program work rates and to quantify the magnitude of the training response (2).

During the Submax and Submax/Max tests, two cardiac output, heart rate, and blood pressure measurements were obtained and averaged both at 50 W and at 60% of the initial ˙VO2max. Subjects exercised for approximately 12-15 min at each work rate, with a 4-min period of seated rest between work rates. Cardiac output was determined using the Collier CO2 rebreathing technique (3), as described by Wilmore et al.(18). Each clinical center used the same electronic mixing system to assure the proper volume and concentration of CO2 for rebreathing dependent on the subjects' steady state ˙VO2 and end tidal PCO2. For all tests, ˙VO2, ˙VCO2, expiratory minute ventilation (˙VE), and the respiratory exchange ratio(RER) were determined every 20 s and reported as a rolling average of the three most recent 20-s values using a SensorMedics 2900 metabolic measurement cart (MMC). ˙VO2max was defined as the peak ˙VO2 obtained during the test. Heart rate was determined by electrocardiography, and values were recorded during the last 15 s of each stage of the maximal test and once steady state had been achieved at each of the submaximal work rates during the Submax and Submax/Max tests. Blood pressures were obtained during the last minute of each stage of the maximal test protocol, and once when steady state was achieved at each of the submaximal work rates using a Colin STBP-780 (San Antonio, TX) automated blood pressure unit. Two electrocondenser microphones are embedded in the cuff, and the sound signal is synchronized to the ECG R-wave. Earphones allow the technician to confirm the blood pressure values selected by the instrument's detection algorithm.

Quality assurance, quality control, and statistical methodology. Several important quality assurance and quality control procedures were instituted across all four clinical centers, as detailed by Gagnon et al(4). Staff from all clinical centers were trained centrally on several occasions, and all staff from each clinical center had to attain certification on each technique for which they were responsible. A detailed Manual of Procedures (MOP) was developed, and staff were required to review those sections of the MOP for which they were responsible every 6 months. Once each year for the first 2 yr of the study, a traveling crew of four subjects, two men and two women, went to each of the clinical centers over a 3- to 4-wk period, and were tested following the HERITAGE Family Study protocol at each clinical center, allowing comparisons to be made across the four clinical centers on these same four subjects (4).

All data were analyzed using the SAS statistical package. Except where noted, data are expressed as mean ± SD. Technical errors, coefficients of variation, and intraclass correlations were computed to evaluate the reproducibility in both the HERITAGE sample and the ICQC sample using the model of Shrout and Fleiss (13). With this model, the ith measurement on the jth subject, xij, is given by:Equation where μ is the population mean, bj is the difference from μ of the mean of the measurements on the jth subject, and wij is the difference from μ + bj of the ith measurement on the jth subject. Both bj and wij are assumed to be normally distributed and independent with SDs ofστ and σω, respectively.σω is the within-subjects SD, also called technical error. The coefficient of variation within subjects was computed as:Equation To compute the intraclass correlation coefficient, PROC GLM in SAS was used to run an ANOVA, providing a between-subjects mean square (BMS) and a within-subjects mean square (WMS). These were used to estimate the intraclass correlation (ICC) according to Shrout and Fleiss (13): Equation where k is the number of replicate measurements on a subject.

A multiple testing ANOVA was implemented using the General Linear Models Procedure to assess whether there were differences across the two pre-training and two post-training tests for the total sample and across the three Submax/Max tests for the ICQC sample. Tukey's Studentized Range (HSD) Test was used to determine between which trials there were significant differences. The multiple testing ANOVA used controls for all potential sources of variation. Statistical significance was set at the 0.05 level.


The pre-training cardiovascular, respiratory, and metabolic responses for the total sample at 50 W and 60% of ˙VO2max during the Submax and Submax/Max tests are presented in Tables 2 and 3. The data in these tables represent the means of the two trials at each work rate for each test. There was excellent agreement between the Submax and Submax/Max test for all variables. While there were significant mean differences for pre-training heart rate at 50 W and pre-training˙VCO2 at 50 W, these differences were small and physiologically insignificant. The technical errors, coefficients of variation, and intraclass correlations were well within the expected range at both power outputs, but the data suggest a better reproducibility of test results at 60% of˙VO2max than at 50 W.

Table 4 presents the intraclass correlations for each of the four clinical centers for the cardiovascular, respiratory, and metabolic variables, both pre- and post-training. With the exception of the RER, the intraclass correlations were reasonably high and consistent across the four centers.

Table 5 presents the results for the ICQC substudy. There was excellent agreement across the three trials. The only significant differences were observed for ˙VO2 and RER at 60% of˙VO2max. Again, although these differences were statistically significant, they were small and of no physiological significance. The technical errors, coefficients of variation, and intraclass correlations were similar to those for the HERITAGE Family Study sample presented inTables 2 and 3.


At its conclusion, approximately 750 subjects will have completed the entire HERITAGE Family Study protocol, including the assessment of cardiovascular, respiratory, and metabolic responses to three different submaximal and maximal exercise test protocols administered on separate days, both pre- and post-training. To determine pre- to post-training changes in these responses to both submaximal and maximal exercise, it is important to have accurate and highly reproducible measurements. Whereas the reproducibility of maximal exercise test data, such as maximal heart rate and˙VO2max, have been well established(7,10,15,17), there is little information on the reproducibility of submaximal exercise responses. Since a primary purpose of the HERITAGE Family Study is to determine the genetic basis of change in submaximal as well as maximal physiological responses to exercise, it is imperative to have highly reproducible measures.

The reproducibility estimates presented in this paper reflect intra-individual variability across days and the variability associated with measurement error. Reproducibility estimates were also calculated across trials on the same day for the Submax/Max test only. Since we were interested in the day-to-day variability these data were not reported. However, the intraclass correlations across trials on the same day, when compared with the mean of the two trials across days, were generally higher by 0.01 to 0.10, with the exception of diastolic blood pressure and RER at 60% of˙VO2max, which were higher by 0.11 and 0.17, respectively.

The reproducibility of the cardiovascular response data for the HERITAGE subjects (Table 2) and for the ICQC subjects(Table 5) compare favorably with the limited data available from other studies using cycle ergometry and the Collier technique(3) for determining cardiac output. Zeidifard et al.(20) determined the reproducibility of cardiac output, heart rate, and stroke volume in seven adults and three children during exercise at a ˙VO2 of 1,200 mL·min-1, a˙VO2 approximately midway between our mean values for 50 W and 60% of ˙VO2max work rates. They reported coefficients of variation, across at least 4 d, of 5.7% for cardiac output, 6.8% for heart rate, and 5.6% for stroke volume. Wolfe et al. (19) determined reproducibility of several cardiovascular variables over two separate days at three power outputs in 20 men. Using Pearson product-moment correlations, they reported correlations ranging from 0.83 to 0.94 for cardiac output, 0.90 to 0.92 for heart rate, 0.81 to 0.94 for stroke volume, and 0.60 to 0.86 for systolic blood pressure.

Van Herwaarden et al. (16) reported a coefficient of variation of 4.1% for cardiac output and 2.6% for heart rate at a mean submaximal ˙VO2 of 1.45 L·min-1. Their values appear to be substantially lower than those reported in the present study, but the details of their study have not been provided and it is unclear whether their coefficients of variation represent comparisons across trials or across days. Paterson et al. (12) conducted five repeat tests within 2 to 3 wk in 12 boys to determine the reproducibility of cardiac output, stroke volume, and heart rate at three submaximal power outputs. The coefficients of variation for cardiac output varied from 6.6% to 8.5%, for stroke volume from 7.2% to 10.8%, and for heart rate from 4.3% to 6.0%. Kirby and Shea (8) tested 15 subjects across three separate days and reported intraclass correlation coefficients for cardiac output of 0.69 at rest and at a low workload, and 0.87 at higher workloads. Our results compare favorably with these.

Becque et al. (1) investigated the reproducibility of heart rate, and systolic and diastolic blood pressures during submaximal cycle ergometry at power outputs of 50 W, 125 W, and 55% of each subject's maximum work rate. Four subjects exercised for 10 min at each power output and repeated each power output at least ten times on separate days. The coefficient of variation across the three power outputs varied from 2.9% to 3.4% for heart rate, 5.5% to 8.7% for systolic blood pressure, and from 7.6% to 13.1% for diastolic blood pressure, while the reliability coefficients ranged from 0.86 to 0.89 for heart rate, from 0.27 to 0.77 for systolic blood pressure, and from 0.64 to 0.84 for diastolic blood pressure. Finally, Hartung et al. (5) conducted repeat tests within approximately 1 wk of each other on 38 women who cycled at a submaximal work rate to attain steady-state heart rates. The correlation coefficient for heart rate between the two trials was 0.89, with a mean heart rate of 136.9 and 137.9 beats·min-1 for the first and second trials, respectively.

The reproducibility of the respiratory and metabolic data for the HERITAGE subjects (Table 3) and for the ICQC subjects(Table 5) compare favorably with data from other studies. Taylor (14) was one of the first to look at reproducibility of submaximal exercise responses. He compared the responses of 31 subjects to treadmill walking for 4 min on two separate days at a rate that resulted in a mean ˙VO2 of 1.65 L·min-1. Correlation coefficients for ˙VO2, ˙VE, and heart rate were 0.59, 0.49, and 0.78, respectively, with coefficients of variation of 6.6%, 8.0%, and 4.1%, respectively. Henry (6) reported the variability of˙VO2 at 16 W (N = 35) and 100 W (N = 25) on a cycle ergometer, with reliability coefficients of 0.69 and 0.55, respectively. Both Taylor and Henry used closed circuit spirometry to measure˙VO2, a method known to have low reliability.

Wolfe et al. (19), as described previously, reported the reliability coefficients for ˙VO2 at three power outputs on a cycle ergometer to range from 0.88 for light work up to 0.96 for heavy work. Becque et al. (1), as detailed earlier, found that the coefficient of variation across the three power outputs varied from 5.3% to 7.7% for ˙VE, and from 4.0% to 4.6% for ˙VO2, while the reliability coefficients ranged from 0.69 to 0.97 for ˙VE and from 0.79 to 0.95 for ˙VO2. Melanson et al. (9) evaluated the reliability of ˙VE and ˙VO2 in 13 men at three rates of work (i.e., slow walk, slow jog, and run) on a treadmill across 2 d. Intraclass correlations for ˙VE ranged from 0.87 to 0.93 and for˙VO2 from 0.77 to 0.94. Finally, Morgan et al.(11) reported an intraclass correlation of 0.95 for˙VO2 during level treadmill running (3.33 m·s-1) across two test days, 1 to 3 d apart in 31 male runners.

We were unable to find studies that had reported any measure of reproducibility for submaximal RER. In the present data set for both the HERITAGE Family Study subjects (Table 3) and the ICQC subjects (Table 5), the intraclass correlations are generally low. However, the coefficients of variation and the technical errors are also low. At first, this would seem to be a contradiction, as the intraclass correlations would suggest poor reproducibility, whereas the coefficients of variation and technical errors would suggest the opposite. However, the real reason for the low intraclass correlation coefficients is the small range of variation across subjects for a given rate of work (i.e., the SD within each trial is quite small; see Table 5). However, the diet 12 h prior to the exercise bout can also affect submaximal RER values, and there were no specific dietary controls enforced prior to exercise in these subjects.

There was excellent agreement across the four clinical centers with respect to the intraclass correlation coefficients (Table 4). For each variable, the intraclass correlation coefficients for both the 50 W and 60% ˙VO2max power outputs were, for the most part, very consistent across the clinical centers, and they were nearly identical to the intraclass correlation coefficients for the pooled data for each variable. Further, there was excellent agreement between the intraclass correlation coefficients pre- and post-training, indicating that reproducibility wasn't compromised longitudinally, which is particularly important when looking for changes in response to the training program. These results reflect the considerable effort that was put into standardizing the procedures across the four clinical centers, including identical equipment and protocols, centralized training and centrification of all technical staff, and constant monitoring of the technical staff to assure compliance with the HERITAGE Family Study Manual of Procedures. The use of the traveling crew (see Methods, Quality Assurance) and the ICQC substudy also contributed to the quality of the data across the four clinical centers.

For many of the variables measured, there was better reproducibility at 60% of ˙VO2max compared with those at 50 W. This is largely the result of the fact that for these variables (˙VO2, ˙VE,˙VCO2, cardiac output, and stroke volume) the variation in response is much greater at different rates of work compared with the same rate of work. To elaborate, at 50 W on a cycle ergometer, subjects will have similar˙VO2 values since there is little variation between subjects in their efficiency to cycle at this absolute rate of work. Thus, intraclass correlations would be expected to be much lower for the same absolute rate of work where there is little intersubject variability in their physiological response compared with the same relative rate of work (%˙VO2max) where the actual rate of work varies between subjects and intersubject variability in their physiological responses would be high. This would also be true for ˙VE and ˙VCO2. Since the major purpose of increasing cardiac output is to provide increased oxygen delivery to meet the needs for a given rate of work, it too, as well as stroke volume, would exhibit less variability between subjects for the same absolute rate of work compared with the same relative rate of work. This would result in lower intraclass correlations. These results point out important differences between using a fixed absolute rate of work and using a fixed relative rate of work for obtaining physiological responses at submaximal work rates.

In summary, the results from these early analyses of the HERITAGE Family Study data on cardiovascular, respiratory, and metabolic variables suggest good measurement reproducibility. This is critical to any subsequent analyses of this data set in which changes subsequent to endurance exercise training and the genetic basis of these changes are evaluated. Of particular importance was the close agreement across the four clinical centers, since the data from all clinical centers will be pooled for most all subsequent analyses.

Thanks are expressed to all of the co-principal investigators, investigators, coinvestigators, local project coordinators, research assistants, laboratory technicians, and secretaries who have contributed to this study (see reference 2). Finally, the HERITAGE consortium is very thankful to those hardworking families whose participation has made these data possible.


1. Becque, M. D., V. Katch, C. Marks, and R. Dyer. Reliability and within subject variability of VĖ, ˙VO2, heart rate, and blood pressure during submaximum cycle ergometry. Int. J. Sports Med. 14:220-223, 1993.
2. Bouchard, C., A. S. Leon, D. C. Rao, J. S. Skinner, J. H. Wilmore, and J. Gagnon. The HERITAGE Family Study: aims, design, and measurement protocol. Med. Sci. Sports Exerc. 27:721-729, 1995.
3. Collier, C. R. Determination of mixed venous CO2 tensions by rebreathing. J. Appl. Physiol. 9:25-29, 1956.
4. Gagnon, J., M. A. Province, C. Bouchard, et al. The HERITAGE Family Study: Quality assurance and quality control. Ann. Epidemiol. 6:520-529, 1996.
5. Hartung, G. H., R. J. Blancq, D. A. Lally, and L. P. Krock. Estimation of aerobic capacity from submaximal cycle ergometry in women. Med. Sci. Sports Exerc. 27:452-457, 1995.
6. Henry, F. M. Individual differences in oxygen metabolism of work at two speeds of movement. Res. Q. 22:324-333, 1951.
7. Katch, V. L., S. S. Sady, and P. Freedson. Biological variability in maximum aerobic power. Med. Sci. Sports Exerc. 14:21-25, 1982.
8. Kirby, T. E. The CO2 rebreathing technique for determination of cardiac output: Part I. J. Card. Rehabil. 5:97-101, 1985.
9. Melanson, E. L., P. S. Freedson, D. Hendelman, and E. Debold. Reliability and validity of a portable metabolic measurement system.Can. J. Appl. Physiol. 21:109-119, 1996.
10. Mitchell, J. H., B. J. Sproule, and C. B. Chapman. The physiological meaning of the maximal oxygen intake test. J. Clin. Invest. 37:538-547, 1958.
11. Morgan, D. W., P. E. Martin, G. S. Krahenbuhl, and F. D. Baldini. Variability in running economy and mechanics among trained male runners. Med. Sci. Sports Exerc. 23:378-383, 1991.
12. Paterson, D. H., D. A. Cunningham, M. J. Plyley, C. J. R. Blimkie, and A. P. Donner. The consistency of cardiac output measurement(CO2 rebreathe) in children during exercise. Eur. J. Appl. Physiol. 49:37-44, 1982.
13. Shrout, P. E., and J. L. Fleiss. Intraclass correlations: uses in assessing rater reliability. Psychol. Bull. 86:420-428, 1979.
14. Taylor, C. Some properties of maximal and submaximal exercise with reference to physiological variation and the measurement of exercise tolerance. Am. J. Physiol. 142:200-212, 1944.
15. Taylor, H. L., E. Buskirk, and A. Henschel. Maximal oxygen intake as an objective measure of cardio-respiratory performance.J. Appl. Physiol. 8, 1955.
16. van Herwaarden, C. L. A., R. A. Binkhorst, J. F. M. Fennis, and A. van't Laar. Reliability of the cardiac output measurement with the indirect Fick-principle for CO2 during exercise. Pfliigers Arch. 385:21-23, 1980.
17. Wilmore, J. H. Influence of motivation on physical work capacity and performance. J. Appl. Physiol. 24:459-463, 1968.
18. Wilmore, J. H., P. A. Farrell, A. C. Norton, et al. An automated, indirect assessment of cardiac output during rest and exercise.J. Appl. Physiol. 52:1493-1497, 1982.
19. Wolfe, L. A., D. A. Cunningham, G. M. Davis, and P. A. Rechnitzer. Reliability of noninvasive methods for measuring cardiac function in exercise. J. Appl. Physiol. 44:55-58, 1978.
20. Zeidifard, E., M. Silverman, and S. Godfrey. Reproducibility of indirect (CO2) Fick method for calculation of cardiac output. J. Appl. Physiol. 33:141-143, 1972.


©1998The American College of Sports Medicine