# Ventilatory threshold: a useful method to determine aerobic fitness in children?

HEBESTREIT, H., B. STASCHEN, and A. HEBESTREIT. Ventilatory threshold: a useful method to determine aerobic fitness in children? *Med. Sci. Sports Exerc.,* Vol. 32, No. 11, pp. 1964–1969, 2000.

Purpose: The objective of this study was to assess intra- and inter-evaluator reliability and validity of ventilatory threshold (VT) determination in children.

Methods: At the age of 6–12 yr, 35 children born prematurely and 20 controls born at term performed an incremental continuous cycling task until volitional fatigue. Fifteen-second averages of V̇E/V̇O_{2}, V̇E/V̇CO_{2}, and respiratory exchange ratio were plotted 1) over time (X-time) and 2) over V̇O_{2} (X-V̇O_{2}). V̇CO_{2} was plotted over V̇O_{2} only (X-V̇O_{2}). Two experienced evaluators, blind to the identity of plots, independently assessed VT from X-time and X-V̇O_{2} plots on two occasions, 6 wk apart. Thus, for each of the 55 subjects, four VT values were expected from X-time plots and four from X-V̇O_{2} plots (2 evaluators, 2 occasions).

Results: VT expressed as V̇O_{2} in mL·min^{−1} could be determined by both evaluators on both occasions in 40/55 children from X-time and in 45/55 children from X-V̇O_{2}. VT was significantly different between evaluators for X-time plots. Using X-time plots, intraevaluator ICC were 0.88 and 0.98 and interevaluator ICC were 0.82 and 0.79. The respective values for X-V̇O_{2} plots were 0.94 and 0.95, and 0.96 and 0.92. Intra- and inter-evaluator reliability of VT determinations tended to be slightly lower in children born prematurely compared with those born at term. There was a close association between VT and V̇O_{2peak} (r = 0.92).

Conclusion: Plotting gas exchange data over V̇O_{2} is likely to be the method of choice for determining VT. Although a minority of children have uninterpretable X-V̇O_{2} plots, VT can be reliably interpreted in the remainder. Furthermore, VT is a valid marker of aerobic capacity. Thus, VT is a useful measure of aerobic fitness in children.

Universität Kinderklinik, D-97080 Würzburg, Germany

Submitted for publication July 1999.

Accepted for publication January 2000.

Address of correspondence: Helge Hebestreit, Univ. Kinderklinik Würzburg, Josef-Schneider-Str. 2, D-97080 Würzburg, Germany; E-mail: Hebestreit@mail.uni-wuerzburg.de.

Exercise testing is commonly employed to determine aerobic fitness of pediatric patients with various chronic health conditions, i.e., congenital and acquired cardiac disease (^{20,21}), asthma (^{25}), or sequelae of bronchopulmonary dysplasia (^{2,24}). In cystic fibrosis, low aerobic fitness is a strong predictor of future morbidity and mortality (^{13}).

The highest oxygen uptake a person can achieve during an exhaustive exercise test (V̇O_{2peak}) is considered the best single physiologic measure for an individual’s aerobic fitness (^{8}). However, the assessment of an individual’s V̇O_{2peak} requires a maximal effort, which might not be warranted in patients with cardiac or pulmonary impairment. Furthermore, some children—healthy or not—may not perform with a true maximal effort (^{22}). Therefore, methods to determine aerobic fitness that do not require a maximal effort are needed in addition to the measurement of V̇O_{2peak}.

During incremental exercise tests, Wassermann and McIlroy (^{30}) first noted that there is a certain point above which ventilation (V̇E) increases out of proportion to V̇O_{2}. This point was termed the ventilatory anaerobic threshold (VT). Since then, different indicators for determining VT such as V̇E/V̇O_{2}, V̇E/V̇CO_{2}, or respiratory exchange ratio (RER) plotted over exercise time or over V̇O_{2} have been suggested (^{3,6}). The combination of various parameters seems to increase reliability (^{12,28}).

The VT can be determined noninvasively during exercise without requiring a maximal effort. Because VT provides a good marker for aerobic fitness (^{4,23}), it has been used frequently in healthy children (^{4,16,19,29}) and those with a chronic health condition (^{14,20,21}). The VT has been shown to be a more sensitive indicator of physical performance than V̇O_{2} at a heart rate of 170·min^{−1} in 50 children with intracardiac left-to-right shunts (^{20}). It can also be used to follow the effects of endurance training (^{7,11}).

However, in adults, a large variability of VT values has been obtained by different reviewers using the same set of data (^{12,31}). Therefore, the usefulness of VT has been questioned (^{10}). Little is known about VT differences between evaluators in children. Ohuchi et al. (^{14}) reported test-retest interclass correlation coefficients between three evaluators ranging from 0.88 to 0.99. However, the use of interclass correlation coefficients is an inappropriate method to assess reliability (^{27}). No data are available on intraevaluator reliability of VT in the pediatric population.

In children born prematurely, especially in those with sequelae of bronchopulmonary dysplasia, follow-up exercise testing is important to evaluate cardiopulmonary function (^{24}). The risk of desaturation during a maximal exercise in these children (^{2}) can make determination of V̇O_{2peak} difficult. Therefore, it is especially valuable to use the VT in addition to or instead of V̇O_{2peak} as measure of aerobic fitness. However, it has been suggested that children born prematurely have a different breathing pattern during exercise compared with children born at term (^{15,24}) and altered breathing patterns might interfere with the identification of VT (^{28}).

The objective of the present study was to determine inter- and intra-evaluator reliability of VT assessment in children when the same set of data is analyzed twice with a 6-wk time interval by two experienced evaluators. Furthermore, we intended to assess the reliability of VT separately in children born prematurely and those born at term. A secondary objective of this study was to support previous reports on the validity of VT as a marker of aerobic fitness in assessing the relationship between VT and V̇O_{2peak}.

## METHODS

### Subjects.

Fifty-five healthy children aged 6–12 yr took part in this study. Thirty-five of them, 21 girls and 14 boys, were born prematurely with a gestational age ≤ 32 wk and a birth weight ≤ 1500 g. Twenty children, 14 girls and 6 boys, were born at term. All children participated in a larger study on the exercise capacity of children born prematurely. Table 1 summarizes the characteristics of the groups. All children had normal lung functions and a normal echocardiogram with no signs of pulmonary hypertension on the day of exercise testing. The study design and informed consent form were approved by the Ethics Committee of the University of Würzburg. Before participating in the study, all subjects gave their oral consent while the parents read and signed the informed consent.

### Exercise test.

All children performed a continuous incremental exercise test on a calibrated cycle ergometer (ErgometrX CardiO_{2} Cycle, St. Paul, MN) as part of the study mentioned above. Work rate was 0 W·kg^{−1} body weight for the initial 2 min of the test. Work rate was then increased every 2 min by 1 W·kg^{−1} body weight. After a total of three stages of 2 min, including the initial stage, work rate was increased every min by 0.5 W·kg^{−1} until volitional fatigue. Subjects breathed through a mouthpiece with saliva trap and a light-weight, low dead-space pneumotach during the test (Pneumotach, MedGraphics, St. Paul, MN; total dead space including mouthpiece, 85 mL). Ventilatory and respiratory parameters were determined breath-by-breath using a commercially available metabolic cart calibrated before and after each test with gases of known concentrations (CPX/D, MedGraphics). A maximal effort during the test was assumed when the appearance of the child suggested a maximal effort and one of the following two criteria was fulfilled at peak exercise intensity: heart rate >185 beats per min or RER >1.00 (see (^{1,22})). Of 55 subjects, five children born prematurely did not meet the above criteria. In the other 50 children, V̇O_{2peak} was taken as the highest V̇O_{2} over 30 s during the exercise test.

### Ventilatory threshold.

Ventilatory parameters were averaged over 15-s intervals because this procedure improves the detection of the VT compared with the analysis of breath-by-breath data (^{28}). For each individual, two computer plots were prepared: 1) exercise time was plotted on the X-axis with V̇E/V̇CO_{2}, V̇E/V̇O_{2}, and RER on the Y-axis (X-time); and 2) V̇O_{2} was plotted on the X-axis with V̇E/V̇CO_{2}, V̇E/V̇O_{2}, RER and—additionally—V̇CO_{2} on the Y-axis (X-V̇O_{2}). Each X-time and each X-V̇O_{2} plot was photocopied three times so that a total of eight plots of the same data set became available (4 with time on the X-axis and 4 with V̇O_{2}). All plots were randomly number-coded by a coordinator not involved in the subsequent determination of VT.

Two experienced evaluators were asked to independently assess VT from X-time and X-V̇O_{2} plots on two occasions, 6 wk apart. On each of the two occasions, each evaluator received 55 X-time plots and 55 X-V̇O_{2} plots, a set of one X-time and one X-V̇O_{2} plot per subject. Because the plots were number-coded, the evaluators of VT were blind concerning the relationship between plots and the identity of individuals. VT from X-time plots was recorded as time (in seconds). VT from the X-V̇O_{2} plots was documented as V̇O_{2} (in mL O_{2}·min^{−1}). Criteria for VT determination were: 1) increase in V̇E/V̇O_{2} without corresponding increase in V̇E/V̇CO_{2} (^{6}), 2) increase in RER (^{5}), and 3) (for X-V̇O_{2} plots only) the nonlinear increase in V̇CO_{2} (^{3}). Evaluators were allowed to apply all criteria. They were asked to document for each plot whether the identification of VT was easy, possible, or not possible. The individual results were collected by the coordinator. In summary, if a VT could have been identified for all plots, for each of the 55 subjects, four VT values should have become available from X-time plots and four VT values from X-V̇O_{2} plots (2 independent evaluators, 2 occasions).

### Data analysis.

All plots were decoded after the second evaluation. For all X-time plots, the coordinator determined the V̇O_{2} (in mL O_{2}·min^{−1}) corresponding to the exercise time at VT. For the remainder of the analysis, VT was either expressed as V̇O_{2} in mL·min^{−1}, or relative to body weight in mL·min^{−1}·kg^{−1}. If one of the evaluators had termed a plot impossible to interpret on one of the two occasions, the respective subject was excluded from reliability calculations.

### Statistics.

The frequency of uninterpretable plots was compared between X-time and X-V̇O_{2} and between children born prematurely and at term by using chi-square statistics. Differences in VT determinations between evaluators were assessed by a 2 × 2 ANOVA for repeated measures (2 determinations, 2 evaluators). Overall intraindividual reliability of VT for X-time plots was assessed by calculating the standard deviation of all four VT values determined for each subject (2 evaluators on 2 occasions). Likewise, overall reliability of VT determination from X-V̇O_{2} plots was computed for each individual. Reliability of VT was compared between the two plotting modes by Wilcoxon signed ranks test because the intraindividual standard deviations did not follow a normal distribution. Intraclass correlation coefficients (ICC) were used as additional measure of VT reliability (^{27}). ICC were calculated from an ANOVA for repeated measures (^{27}). Standard error of measurement (SEM) was then calculated based on ICC (^{27}). The relationship between V̇O_{2peak} and VT was assessed using linear regression analysis. All statistical procedures were performed using BMDP statistical software (BMDP Statistical Software Incorporated, Los Angeles, CA). Significance was accepted at *P* < 0.05.

## RESULTS

By using X-time plots, VT could not be determined in 10 of 35 children born prematurely and 5 of 20 children born at term by at least one of the evaluators on at least one occasion. By using X-V̇O_{2} plots, VT was not identified in 9 of the 35 premature subjects and in 1 of 20 children born at term. The difference between the two subgroups in the frequency of unidentifiable plots was not significant. The frequency of subjects in whom VT could not be identified in at least one plot was not significantly different between X-time and X-V̇O_{2} (27% vs 18%).

There was little agreement between the two evaluators concerning uninterpretable plots: Only in 3 of the 15 subjects with uninterpretable X-time plots did both evaluators not define a VT. In the remainder, a VT was identified at least once by one of the evaluators. By using X-V̇O_{2} plots, agreement between evaluators was also low (one of ten subjects).

Mean VT determined by evaluator 1 from X-time plots and expressed as V̇O_{2} was 767 ± 222 mL·min^{−1} on the first occasion and 748 ± 193 mL·min^{−1} on the second. The respective values for evaluator 2 were 821 ± 260 and 805 ± 248 mL·min^{−1}. The difference between evaluators was significant (*P* < 0.05). Using X-V̇O_{2} plots, both evaluators found comparable VT (evaluator 1: 815 ± 195 and 826 ± 221 mL·min^{−1}; evaluator 2: 825 ± 192 and 823 ± 213 mL·min^{−1};*P* > 0.05).

Intraindividual SD, used as an index of reliability, were smaller for V̇O_{2} plots than for X-time plots when VT was expressed as V̇O_{2} in mL·min^{−1} (median/range: 37.3/7.1–149.8 vs 69.8/9.3–390.7 mL·min^{−1}) or V̇O_{2} relative to body weight in mL·min^{−1}·kg^{−1} (median/range: 1.31/0.21–5.91 vs 2.15/0.47–10.91 mL·min^{−1}·kg^{−1}). The difference was nearly significant for both comparisons (two-tailed Wilcoxon signed ranks test *P* = 0.057 and *P* = 0.063).

By using the VT determinations from X-time plots in those 40 subjects, in whom both evaluators could identify a VT on both occasions, intraevaluator ICC were 0.88 and 0.98 and interevaluator ICC were 0.82 and 0.79 when VT was expressed in mL V̇O_{2}·min^{−1}. When VT was expressed relative to body mass (in mL·min^{−1}·kg^{−1}), the respective values were 0.88 and 0.96 (intraevaluator ICC), and 0.76 and 0.74 (interevaluator ICC).

Table 2 displays the information on intraevaluator reliability using X-V̇O_{2} plots. ICC and the related SEM were slightly lower for children born prematurely compared to those born at term.

Table 3 summarizes the data on interevaluator reliability employing X-V̇O_{2} plots. Interevaluator reliability was somewhat lower in prematurely-born compared with term-born children. The interevaluator ICC and SEM were similar compared with the intraevaluator ICC and SEM.

Figure 1 shows the relationship between VT, as determined from X-V̇O_{2} plots, and V̇O_{2peak} in 41 subjects. The five subjects who did not fulfill the criteria for a maximal effort during the incremental cycling task were excluded from this analysis. Likewise, all subjects in whom one of the evaluators could not determine a VT on one occasion were excluded. VT was calculated as the average VT from all four determinations (2 evaluators, 2 occasions).

## DISCUSSION

When respiratory data were plotted over time, VT, identified by evaluator 1, was significantly lower compared with VT determined by evaluator 2. Plotting the data over V̇O_{2} yielded similar VT in both evaluators, indicating a better interevaluator reliability. In line with this interpretation, interevaluator ICC were lower for the X-time compared with the X-V̇O_{2} plotting mode. Furthermore, VT could be determined in fewer subjects when the data were plotted over time than when the data were displayed over V̇O_{2}, although the difference was not statistically significant.

Several investigators have used the X-time plotting mode to determine VT in children (^{11,29}), whereas others have employed plots of respiratory data over V̇O_{2} (^{4,14,16}). One advantage for using the time axis for VT determination is an even distribution of the respiratory data across the x-axis, which might enhance the detection of VT. However, it should be noted that the use of time as the x-axis variable might only be appropriate when the work rate increment is constant throughout the test, as in the present study. On the other hand, using time as the x-axis variable is not consistent with the definition of VT—namely an increase in pulmonary ventilation that is out of proportion to the increase in V̇O_{2} during graded exercise.

The advantage of plotting the data over V̇O_{2} rather than over time might be related to the additional plot of V̇CO_{2} over V̇O_{2}. By including this plot, the V-slope method (^{3}), which employs two linear regression lines between V̇CO_{2} and V̇O_{2}, can be applied to determine VT. It has been shown that the reliability of VT between separate exercise tests is quite good when the V-slope-method is employed to determine VT (^{16,18}). The higher number of parameters available to determine VT from X-V̇O_{2} plots might, in itself, also have increased the reliability of this plotting mode (^{12,28}). Further studies will be necessary to determine, how many and which respiratory parameters are necessary to obtain optimal reliability (and validity) of VT determination.

With data plotted over V̇O_{2}, VT could not be determined at least once by one of the evaluators on one occasion in 18.2% of the subjects. This is in agreement with the study by Washington et al. (^{29}), who could not determine VT in 34 of 185 healthy children (=18%). The problems in identifying VT in some subjects were attributed by Washington et al. (^{29}) to erratic breathing patterns blurring the clear-cut VT. Cooper et al. (^{4}) reported that VT could not be determined in 4% of 114 healthy subjects aged 6–17 yr. Ohuchi et al. (^{14}) found 12% uninterpretable plots in 25 cardiac patients aged 7–21 yr. The relatively higher number of uninterpretable plots in our data compared with the two latter studies might be reconciled by several reasons: On average, subjects were younger in our study than those tested by Cooper et al. (^{4}) and Ohuchi et al. (^{14}). It has been proposed that the detection of VT is more difficult in younger children (^{14}). Another explanation might be that we excluded individuals from further analysis if one evaluator at one occasion could not detect a VT. In other studies, VT was determined by one or two evaluators on one occasion only. Had only one evaluator determined VT just once in the present study, the frequency of plots with unidentifiable VT on a given occasion would have been 2–9%. Using two evaluators on one occasion, failure rate to detect a VT from X-V̇O_{2} plots would have been 9% and 12% for occasion 1 and 2, respectively, if all subjects were treated as “no VT identified” in whom one or both evaluators could not determine VT at that occasion.

ICC between duplicate determinations of VT by the same evaluator 6 wk apart were 0.94 and 0.95, respectively. We are aware of only one study on intraevaluator reliability of VT determination that was performed in adults and yielded a median ICC of 0.97 (^{9}). The relatively higher ICC in the study by Gladden et al. (^{9}) might reflect methodological differences. In their investigation, no time interval was scheduled between duplicate determinations. Moreover, duplicates were “secretly” put in among a set of plots. Possibly, the evaluators of that study noticed duplicates, leading to a very good reliability. This possibility was acknowledged by the authors (^{9}).

The agreement of VT determination (expressed as mL O_{2}·min^{−1}) between independent evaluators was quite high. In children, only one study attempted to assess interevaluator reliability by calculating interclass correlation coefficients (^{14}). Acknowledging the inappropriateness of this method for assessing reliability (^{27}) our interclass correlation coefficients were 0.86 and 0.91 compared with 0.91–0.96 reported by Ohuchi et al. (^{14}). The interevaluator intraclass correlation coefficients in our study were 0.92 and 0.96 for the two occasions which is similar to the interevaluator ICC in most (^{17,26}) but not all (^{9}) studies in adults. The lower interevaluator reliability with a median ICC value of 0.70 reported by Gladden et al. (^{9}) might be explained by the use of time as the x-axis variable for the plots of respiratory data in their study.

Yeh et al. (^{31}) criticized the VT because of low consistency between four evaluators. They found a standard deviation of VT determinations (expressed in % V̇O_{2peak}) between four evaluators of about 8.0% V̇O_{2peak}, which was more than half of the between subject variability of VT (13.7%V̇O_{2peak}). In our study, the between evaluators standard deviation was 3.8%V̇O_{2peak} for the first assessment and 4.3%V̇O_{2peak} for the second assessment. The corresponding between subject variability of VT was 31.6%V̇O_{2peak} and 26.2%V̇O_{2peak}, respectively. The ratio of interevaluator variability to intersubject variability was thus much lower in the present study. This indicates a lesser interference of interevaluator variation with discrimination between subjects than assumed previously. The small standard error of the measurement (Table 3) supports this conclusion.

Although interobserver reliability of VT determinations was high for plots in which both evaluators identified VT on both occasions, there was little consistency between evaluators whether or not the VT could be identified at all. This finding impairs the interevaluator reliability overall and might compromise comparisons between different studies. For future studies, it might therefore be recommended that two evaluators should analyze the plots independently for VT and exclude those plots in which no agreement is reached on whether or not a VT is detectable. Alternatively, in these situations, a review of the plots by a third evaluator might be warranted.

We found a strong correlation between V̇O_{2peak} and VT in the present study, suggesting a good validity of VT as a marker of aerobic capacity in children. Although some investigators found a weaker association between V̇O_{2peak} and VT in children (^{19}), our finding is in line with the results of other studies (^{4,23}). Cooper et al. (^{4}) and Rowland and Green (^{23}) reported correlation coefficients for the relationship between V̇O_{2peak} and VT of r = 0.92 and r = 0.85, respectively.

Traditionally, maximal V̇O_{2} has been regarded as the best single indicator of aerobic fitness (^{8}). However, maximal V̇O_{2} can only be measured at the end of a truly maximal exercise test. Furthermore, only a minority of children who can be subjected to such a test reaches a plateau of V̇O_{2} (^{1}). Consequently, several objective and subjective criteria are required to determine a maximal effort and the validity of this decision is dependent on the experience of the investigator (^{1}). Because the findings presented in this study have debased reports on low interevaluator reliability of VT determinations (^{31}), VT can be used in addition to V̇O_{2peak} to assess aerobic fitness in all children who can be allowed to exercise maximally. In all other children who cannot or will not perform with a maximal effort, VT is a method of choice to determine aerobic fitness. The use of the VT in these circumstances is only limited by the fact that it cannot be identified in about 10–12% of subjects. In these cases a repetition of the exercise test after habituation of the child to the laboratory environment and test procedures might yield a better recognizable VT.

In conclusion, plotting gas exchange data over V̇O_{2} is likely to be the method of choice for determining VT. Although a minority of children have uninterpretable X-V̇O_{2} plots, VT can be reliably interpreted in the remainder. Furthermore, a close association between VT and V̇O_{2peak} suggests a good validity of VT as a marker of aerobic capacity. In summary, our findings do support the usefulness of VT as a measure of aerobic fitness in children.

This study was supported, in part, by a grant from the Sanitätsrat Dr. Emil Alexander Huebner und Gemahlin Stiftung im Stifterverband für die Deutsche Wissenschaft.

## REFERENCES

_{2max}changes in children following endurance training. Med. Sci. Sports Exerc. 21: 425–431, 1989.

_{2}uptake in athletic boys. J. Appl. Physiol. 62: 2051–2057, 1987.

**Keywords:**

ANAEROBIC THRESHOLD; METHODOLOGY; PREMATURITY