Gas exchange (GE) data are key components in the evaluation of patients with congestive heart failure (CHF). Cardiopulmonary exercise testing is routinely used to estimate prognosis, assess the severity of disability, and triage patients for possible cardiac transplantation (8,15,20). However, there is no accepted standard for evaluating and reporting GE data from computer-assisted systems. In a recent review, Howley et al. (11) raise concerns regarding the lack of guidelines and the need for consistent terminology. Howley et al. (11) suggested a few guidelines regarding exercise testing, but there are no specific guidelines regarding the evaluation and reporting of GE data.
The distinction between CHF patients and others is not trivial because of the significant differences in GE kinetics. Patients with a cardiomyopathy display an oscillating pattern of oxygen consumption (V̇O2), carbon dioxide production, and minute ventilation (V̇E) during exercise testing (25). This physiologic phenomenon combined with nonphysiologic artifacts commonly complicate the measurement of GE (14) and makes the reporting of results difficult. Assessments of GE in patients with CHF frequently require examination of a graphical display of the data.
Data can be presented by averaged time intervals, rolling time intervals, averaging of a number of breaths, and rolling averages of breaths. In the literature, research is often presented without the sampling interval mentioned or with descriptions that are vague. Although previous investigators have evaluated different averaging techniques (2,12,14,16,23), only one study (12) has addressed patients with CHF. Studies involving CHF and GE have used a variety of averaging techniques. Some studies use a time average of 30 s (15), whereas some textbooks present cardiomyopathy cases using a rolling average of eight breaths (25).
Previous studies have evaluated averaging techniques during exercise, but without evaluations of resting and recovery data. In the text Exercise Gas Exchange in Heart Disease, Kraemer (13) devotes an entire chapter to the use of GE in recovery and concludes that recovery GE may ultimately provide a more accurate gauge of patient symptoms. Kraemer (13) also states that there are few published studies in this area. One CHF study (18) used a graphical analysis of steady state and graded exercise GE data similar to that of the current study. Additionally, Myers (22) recommends collecting 1 to 5 min of resting GE to ascertain that the cardiopulmonary exercise test begins with the patient in a relaxed state. This resting data is clinically useful because it can identify patients who exhibit periodic breathing. Periodic breathing is known to be associated with increased mortality risk (1,6,9,10,19) and elevated pulmonary capillary wedge pressures (1,7,19). One author has expressed the need for standardization of sampling intervals for exercise testing (23) and has since recommended using 30-s averages printed every 10 s (21,22). This “rolling time average” (RTA) is unique and is not commonly available on automated systems. In the review by Howley et al. (11), a 30-s time interval is suggested for exercise testing.
The purpose of this study was to determine the usefulness of a new graphical method for evaluating GE as applied to three common averaging techniques during rest, exercise, and recovery in patients with CHF.
This protocol was designed to mirror GE interpretation in clinical practice. It is the University of Michigan Heart Failure Program’s practice to determine numerical values of each test from the printed graphs of data. This procedure is referred to as the “graphical method”. This method allows clinicians to determine values directly and to account for outliers themselves, without the bias of the computer’s analysis. This was a retrospective study where the GE of 50 consecutive patients was evaluated during three different stages of the test: rest, exercise, and recovery. An exercise physiologist who has used this technique over a 2-yr period in transplant evaluations and research protocols tested all patients and evaluated each graph. Before each test, comprehensive written informed consent was obtained from each patient, and the metabolic cart was calibrated using the manufacturer’s standard procedures (17).
Three different averaging techniques were incorporated: rolling middle five of seven breaths (5/7), rolling average of eight breaths (AVG 8), and time average of 30 s (30-s). The 5/7 technique uses the current breath and the six preceding it, omits the minimum and maximum values of the product of V̇O2 and V̇E, and averages the remaining five. The AVG 8 technique uses the current breath and averages it with the seven preceding breaths, whereas the 30-s technique averages all of the data collected over a 30-s period (17). All patients were tested using a Medical Graphics CPX/D metabolic cart with a disposable mouthpiece and pneumotach (Medical Graphics Corporation, St. Paul, MN) during all stages of the test.
V̇O2 and V̇E were analyzed for all stages, with ventilatory threshold (VT) and respiratory exchange ratio (RER) included in the exercise stage. As in clinical practice, the VT was evaluated from the display of data on the computer screen using the V-Slope method (3), which is built into the Medical Graphics software. This method has been reported as being superior in evaluating patients with CHF (24), and the computer software allows the user to toggle between each data point.
The GE variables of each stage of the test were graphed against time, and were printed to the limits of the operating system (approximately one page each), for each of the averaging techniques. Each graph was evaluated using a fine mechanical pencil and a transparent graphing triangle. Figure 1 displays one subject’s test using the graphical method.
After approximately 10 min in Fowler’s position (supine with torso elevated 30 degrees on an exam table) during preparation for exercise, GE was collected for three consecutive minutes. The graphical method resting values were defined as the average of the last two of the 3 min of data collected during rest. The graphing triangle was placed at the estimated mean value of each variable over the last 2 min of rest, and a line was drawn to its corresponding axis to determine its value (Fig. 1).
The graphical method peak values were defined as the highest linear value achieved during incrementally ramped treadmill exercise. The graphing triangle was placed along the estimated mean of the oscillating exercise data during the terminal portion of the exercise test, and a line of best fit was drawn. A line from the vertical axis was drawn parallel to the horizontal axis to the point where the data line (connection of data points) and the line of best fit intersected to determine the value of each variable (Fig. 1).
The graphical method recovery values were defined as the lowest value achieved postexercise after 5 min of recovery: 2 min of active recovery at 1 mph and 3 min of inactive recovery in Fowler’s position. A line parallel to the horizontal axis was drawn from this lowest point to the vertical axis to the value of each variable(Fig. 1).
Precision of the graphical method was evaluated using V̇O2 from each of the three stages from 10 tests. All investigators were blinded to the subject and averaging technique and were provided with their own copy of the graphs. Intrainvestigator precision involved repeated measures of the same tests, with a 12-wk period between repeated measures. Interinvestigator precision involved three investigators: the initial investigator (investigator No. 1), a cardiologist in the heart failure program who uses this technique in practice (investigator No. 2), and a student who was inexperienced in the technique (investigator No. 3). Investigator No. 3 was provided with only the written instructions from this manuscript to evaluate each graph.
The Medical Graphics Breeze 3.02a software (Medical Graphics Corporation, St. Paul, MN) was used on the metabolic cart and provided the resting and exercise computer-derived values. All of the patients’ computer-derived exercise and rest variables were recorded by 5/7, AVG 8, and 30-s averaging techniques for comparison to the graphical method. The computer software package does not evaluate recovery data.
The graphical method and computer analysis were compared with an RTA as an objective value for resting and exercise data. The RTA was evaluated over the last minute of data collection, which was updated every 20 s. GE variables were printed from the metabolic cart breath by breath, and the RTA was computed by hand.
One-way ANOVA was performed between different averaging techniques of each variable and between investigators with a level of significance of 0.05. Two-sided t-tests were performed between the computer and graphical method of each averaging technique for each variable. Data are presented in figures using Bland and Altman procedures (5). All statistical procedures were performed using Microsoft Excel 7.0 software (Microsoft Corporation, Seattle, WA).
Fifty patients’ metabolic data were evaluated. The patients’ mean characteristics were as follows: 49.1 ± 9.54 yr old, 11 women and 39 men, 77.3 ± 1.8 kg, 173.9 ± 9.8 cm tall, most recent left ventricular ejection fraction of 24.6 ± 11.6% retrieved from patient records, and a peak V̇O2 of 17.9 ± 4.5 mL·kg−1·min−1.
Figure 2 displays the comparisons between the computer analysis, graphical method, and RTA for the resting data. There were no statistically significant differences between RTA and the graphical method for V̇O2, whereas resting V̇O2 was statistically different between the computer and both the graphical method and RTA. Resting V̇E was statistically different for both the graphical method and the computer in comparison to the RTA, whereas the computer and graphical method V̇E were not significantly different.
Figure 3 displays the comparisons between the computer analysis, graphical method, and RTA for the exercise data. There were no statistically significant differences between RTA and either the graphical method or the computer results, whereas there were differences between the computer and the graphical method for exercise V̇O2, V̇E, and RER.
The graphical method revealed no statistically significant differences between trained and untrained investigators (Figs. 4, 5, and 6), or between different averaging techniques (Table 1) for rest, exercise, or recovery. Three of the 50 patients did not achieve VT by any of the techniques, whereas one patient clearly attained VT using both 5/7 and AVG 8, but VT was indeterminate by 30-s. Table 2 shows the P values from the two-sided t-tests between the computer and graphical methods for the resting and exercise values.
The graphical method proved to be helpful in the evaluation of GE in patients with CHF. The computer-derived results were statistically different and proved unreliable compared with the graphical method, particularly at rest. Although the “true” value is unknown, Figure 1 displays one example when the computer reported unrealistic values that were unsubstantiated by the data. The computer was also in poor agreement with the graphical method and the RTA (Fig. 2). The reasons for this discrepancy are unknown, but are likely technical error from the computer software program. There were three cases where the computer resting V̇O2 values exceeded the reported VT values, and eight cases where the V̇O2 or V̇E exceeded twice the graphical method. The precision of the graphical method was limited by the variability within patients. Many patients exhibited artifact and periodic breathing where the graphs were quite variable, making it difficult to consistently apply exact numeric data to the data.
The exercise stage of the evaluation can be particularly difficult for the computer to evaluate because of abnormal ventilation and nonphysiologic artifact (14). Periodic breathing is a common ventilatory abnormality in exercising patients with CHF (4), which tends to be more prevalent in the types of patients referred for transplant evaluation who have more severe heart failure (1,6,9,10,19). Additionally, GE artifact can be exaggerated when the workloads are altered by factors such as use of the treadmill handrails.
These factors demonstrate the importance of smoothing data in a manner where the sampling interval is large enough to be indicative of the patient’s physiology. The use of small averaging techniques allows a short time interval to provide a minute value, which may be multiplying measurement errors. This tendency is visible in Figure 3, where the bias for all measures is greatest for the sampling intervals that are less than 30 s. The computer uses a systematic search for the highest V̇O2 attained, which may be an outlying spike from the graph, and reports it as the patient’s peak minute value. When using larger time intervals, this error is reduced but not removed.
The V̇E and RER comparisons in Figure 3 are not reliable because of the manner in which the computer selects peak values. In the computer’s analysis the peak V̇O2 is noted, and the RER and V̇E associated with this V̇O2 are reported as the “peak” values. These values are not peaks, but rather are those associated with what the computer believes is the peak V̇O2. In Figure 3, these values are being compared with the true peak values as assessed using the graphical method. The computer analysis has the potential to lead to error and inconsistencies when values are reported that may not be representative of the patient’s true capacity. A patient may truly achieve an RER ≥ 1.10, but a lower RER may be reported if it was associated with an outlying V̇O2. The graphical method allows the clinician to select all GE variables independently, accounting for artifact to determine a more representative value of the patient’s true capacity.
The graphical method was superior in the exercise evaluations, as in the resting data. Figure 3 displays less bias using the graphical method than the computer analysis compared with the RTA for peak V̇O2. The bias was always greatest with the computer results, exceeding 4 mL·kg−1·min−1 for both of the shorter intervals. The range of the computer 30-s bias was more acceptable (−1.4 to 2.0 mL·kg−1·min−1) but was still greater than the range of the bias of the graphical method (−1.4 to 0.4 mL·kg−1·min−1).
Clinicians base important clinical judgments on a test that is lacking standardization throughout the medical community (11). Many medical centers use peak V̇O2 as a strict transplant criteria; if peak V̇O2 is less than 14.0 mL·kg−1·min−1 with a peak RER ≥ 1.10, patients can be listed for transplantation (8), whereas cardiac transplantation may be deferred with a peak V̇O2 > 14.0 mL·kg−1·min−1 (20). In this scenario, standard procedures are of great importance in evaluating peak V̇O2 and for evaluating RER independent of the peak V̇O2 to insure the proper allocation of donor hearts. The patient whose test is displayed in Figure 1 would be placed on the transplant list using the graphical technique, yet would not be listed if the computer-derived value were used. The intrainvestigator precision of peak V̇O2 measurement using the graphical method was excellent, which can help to standardize transplant evaluations at individual medical centers. The precision of the measurement decreased as other investigators were added, yet the bias was typically within 1.0 mL·kg−1·min−1 (Fig. 5).
GE measurements can have profound economic impacts on health care providers, insurance companies, and the social security administration because of the payment of disability benefits. Patients who have peak V̇O2 values under 15 mL·kg−1·min−1 can be declared unfit or incapacitated for employment, and patients may be employed if their peak V̇O2 is in the range of 15–24 mL·kg−1·min−1 (26). At times the bias between both the computer and the RTA and the computer and the graphical method exceeded 4 mL·kg−1·min−1, which would easily misclassify many patients. The lack of standardization of oxygen consumption testing is potentially misclassifying many patients and not allowing for consistent evaluation.
The graphical method may be used effectively in clinical practice, in research studies, and in large clinical trials to standardize reporting of results because it was shown to be precise regardless of the experience of the investigator (Figs. 4, 5, and 6).
The agreement between the three techniques was consistent with previous studies that found shorter averaging intervals to elicit higher values (12,16). This trend was displayed in both the graphical method and the computer selections. There were no significant differences for the exercise data, but VT was difficult to pick using the 30-s technique because fewer data points limit the investigator’s ability to demarcate a threshold. This limitation is particularly evident during short exercise tests, which are common in the evaluation of end-stage heart failure patients.
In conclusion, the graphical method may be used to standardize GE evaluation because it displays excellent intrainvestigator precision and good interinvestigator precision between experienced and inexperienced investigators. Averaging techniques less than 30 s have greater bias when using computer-derived values, although when incorporating the graphical method, the averaging technique chosen has little influence on all measures.
1. Bard, R. L., K. D. Aaronson, S. Nioguy, and J. M. Nicklas. Periodic breathing is associated with increased mortality risk in patients with congestive heart failure. Med. Sci. Sports Exerc. 30:S221, 1998.
2. Beaver, W. L., K. Wasserman, and B. J. Whipp. On-line computer analysis and breath-by-breath graphical display of exercise function tests. J. Appl. Physiol. 34:128–132, 1973.
3. Beaver, W. L., K. Wasserman, and B. J. Whipp. A new method for detecting anaerobic threshold by gas exchange. J. Appl. Physiol. 60:2020–2027, 1986.
4. Ben-Dov, I., K. E. Sietsema, R. Casaburi, and K. Wasserman. Evidence that circulatory oscillations accompany ventilatory oscillations during exercise in patients with heart failure. Am. Rev. Respir. Dis. 145:776–781, 1992.
5. Bland, J. M., and D. G. Altman. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 8476:307–310, 1986.
6. Bradley, T. D., and J. S. Floras. Review. Pathophysiologic and therapeutic implications of sleep apnea in congestive heart failure. J. Cardiac Failure 2:223–240, 1996.
7. Churchill, E. D., and E. Cope. The rapid shallow breathing resulting from pulmonary congestion and edema. J. Exp. Med. 49:531–537, 1929.
8. Costanzo, M. R., S. Augustine, R. Bourge, et al. Selection and treatment of candidates for heart transplantation: a statement for health professionals from the committee on heart failure and cardiac transplantation of the council on clinical cardiology, American Heart Association. Circulation 92:3593–3612, 1995.
9. Dowell, A. R., C. E. Buckley, R. Cohen, R. E. Whalen, and H. O. Sieker. Cheyne-Stokes respiration: a review of clinical manifestations and critique of physiological mechanisms. Arch. Intern. Med. 127:712–726, 1971.
10. Hanley, P., N. Zuberi, and R. Gray. Pathogenesis of Cheyne-Stokes respiration in patients with congestive heart failure. Chest 104:1079–1084, 1993.
11. Howley, E. T., D. R. Bassett, Jr., and H. G. Welch. Criteria for maximal oxygen uptake: review and commentary. Med. Sci. Sports Exerc. 27:1292–1301, 1995.
12. Johnson, J. S., J. J. Carlson, and R. L. Vanderlaan. Effects of sampling interval on peak oxygen consumption in patients evaluated for heart transplantation. Chest 113:816–819, 1998.
13. Kraemer, M. D. Gas exchange during recovery from exercise in patients with heart failure. In: Exercise Gas Exchange in Heart Disease, K. Wasserman (Ed.). Armonk, NY: Futura Publishing Company, Inc., 1996, pp. 67–69.
14. Lamarra, N., B. J. Whipp, S. A. Ward, and K. Wasserman. Effect of interbreath fluctuations on characterizing exercise gas exchange kinetics. J. Appl. Physiol. 62:2003–2012, 1987.
15. Mancini, D. M., H. Eisen, W. Kussmaul, R. Mull, L. H. Edmunds, and J. R. Wilson. Value of peak exercise oxygen consumption for optimal timing of cardiac transplantation in ambulatory patients with heart failure. Circulation 83:778–786, 1991.
16. Matthews, J. I., B. A. Bush, and F. M. Morales. Microprocessor exercise physiology systems vs a nonautomated system, a comparison of data output. Chest 92:696–703, 1987.
17. Medical Graphics Corporation. Cardiopulmonary Exercise Testing System (CPX/D) Operators Manual. Part No. 142077–001: Rev. B. St. Paul, MN: Medical Graphics Corporation, 1996, pp. 8–21.
18. Meyer, K., M. Schwaibold, R. Hajric, et al. Delayed V̇O2
kinetics during ramp exercise: a criterion for cardiopulmonary exercise capacity in chronic heart failure. Med. Sci. Sports Exerc. 30:643–648, 1998.
19. Mortara, A., G. D. Pinna, R. Maestri, et al. Cheyne-Stokes respiration during awake day-time in chronic heart failure: prognostic implication. JACC 31:249a, 1998.
20. Mudge, G. H., S. Goldstein, L. J. Addonizio, et al. Task Force 3: Recipient Guidelines/Prioritization. From 24th Bethesda Conference: Cardiac Transplantation. JACC 22:21–31, 1993.
21. Myers, J. Essentials of Cardiopulmonary Exercise Testing. Champaign, IL: Human Kinetics, 1996, pp. 68–69.
22. Myers, J. Ventilatory gas exchange in heart failure: techniques, problems and pitfalls. In: Exercise and Heart Failure, G. J. Balady and I. L. Pina (Eds.). Armonk, NY: Futura Publishing Company, Inc., 1997, pp. 230–231.
23. Myers, J., D. Walsh, M. Sullivan, and V. Froelicher. Effect of sampling variability and plateau in oxygen uptake. J. Appl. Physiol. 68:404–410, 1990.
24. Wasserman, K. Overview and future directions. Circulation 81(Suppl. II):II59–II64, 1990.
25. Wasserman, K., J. E. Hansen, D. Y. Sue, B. J. Whipp, and R. Casaburi. Principles of Exercise Testing and Interpretation, 2nd Ed. Malvern, PA: Lea and Febiger, 1994, pp. 226–241.
26. Zavala, D. C. Manual on Exercise Testing: a Training Handbook, 3rd Ed. Iowa City, IA: The University of Iowa, 1993, pp. 128.
Keywords:© 2000 Lippincott Williams & Wilkins, Inc.
CARDIOPULMONARY EXERCISE TESTING; CARDIOMYOPATHY; OXYGEN CONSUMPTION; METABOLISM