Practical Considerations for Assessing Pulmonary Gas Exchange and Ventilation During Flume Swimming Using the MetaSwim Metabolic Cart : The Journal of Strength & Conditioning Research

Secondary Logo

Journal Logo

Original Research

Practical Considerations for Assessing Pulmonary Gas Exchange and Ventilation During Flume Swimming Using the MetaSwim Metabolic Cart

Lomax, Mitch; Mayger, Billy; Saynor, Zoe L.; Vine, Christopher; Massey, Heather C.

Author Information
Journal of Strength and Conditioning Research 33(7):p 1941-1953, July 2019. | DOI: 10.1519/JSC.0000000000002801
  • Free



Because of technological limitations, pulmonary oxygen uptake (Vo2) and ventilation (VE) during swimming have traditionally been determined using the Douglas bag (DB) method. Expired air has either been collected during swimming exercise (1,3,18,26,27) or collected after swimming cessation, with backward extrapolation used to determine end-swimming values (12,19,28,35). Although the DB method is considered the gold-standard method for assessing respiratory gas exchange (11,30), it is not without limitation. For example, it cannot detect rapid changes in ventilation or the components of ventilation. Neither can the DB method detect rapid changes in expired O2 or carbon dioxide (CO2) fractions (30), making it unsuitable for the study of oxygen uptake kinetics, which is gaining popularity in swimming research (31,33,37). It also places a much greater burden on the investigator compared with the ease of more contemporary, portable, open-circuit systems (30).

A number of portable online metabolic carts (e.g., Oxylog by P.K. Morgan, the K2 and K4b2 by Cosmed, the Oxycon by Jaeger, and Cortex's Metamax 1, II, and 3B) have been used to assess VE, Vo2, and CO2 output (Vco2) during terrestrial activities. The reliability of these and similar systems has typically been determined by comparing resting, submaximal, and vigorous/maximal exercise values with those obtained using the DB method (23–25,34,40). When compared to the latter, the Metamax 3b, Oxycon, and K2 systems reportedly overestimate Vo2 by 3–14%, Vco2 by 3–17%, and VE by 4–8% during moderate and vigorous cycling and rowing exercise collectively (25,34,40). The test-retest variability (percentage difference or coefficient of variation [CV]) is also quite variable ranging from <1 to 15% for Vo2 measured at rest and during both moderate and vigorous exercise, and 2–12% for maximal Vo2 measures (Vo2max) (23,24,34,40), 3–7% for submaximal Vco2, <1–6% for submaximal VE, and <5% for maximal VE (23,34,40).

Recent technological advances led to the development of 2 aquatic-specific metabolic cart systems. These are the Cosmed Aquatrainer, which is used in conjunction with the Cosmed K4b2, and the Cortex MetaSwim (MS) device. Both systems can be used conventionally with a mask or in an aquatic environment through a specialized freestyle snorkel. The Aquatrainer is the more popular of the 2 aquatic-specific systems, with a number of agreement (4,15,20) and oxygen uptake kinetic (31,33,37) studies published using this system. However, when compared with the mask and K4b2 assembly, the Aquatrainer has been shown to underestimate Vo2, Vco2, and VE during both submaximal and maximal cycling by 4–21% (15,20), with variability greater during the maximal rather than lower intensities (15). By contrast, Baldari et al. (4) reported only minimal differences in Vo2, Vco2, and VE but did observe that the variation was greater during swimming compared with cycling exercise.

Collectively, these studies demonstrate that perfect agreement and repeatability between and within different open-circuit spirometry approaches does not exist. This is not surprising, given that biological and technical variability will influence the data (5,13,24). Given that the MS samples expired air at the mouth and can do so on a breath-by-breath basis, the MS is more versatile than the DB method and can provide researchers with information on pulmonary gas exchange that the DB method cannot. The snorkel assembly configuration is also less cumbersome than the Aquatrainer, which is the only other aquatic-specific alternative. However, it is currently unknown how the agreement and repeatability of MS-derived physiological data compare with other open-circuit spirometry approaches, whether on land or during swimming. Similarly, it remains to be seen if the test-retest variability of the MS system is small enough to permit observation of changes in physiological data over time or between individuals (16).

The aims of this study were: (a) to determine the agreement between Vo2, Vco2, and VE using the MS and DB methods during flume swimming; and (b) to assess the repeatability of these and other MS-derived measures of pulmonary gas exchange and ventilation during flume-based swimming exercise of different intensities.


Experimental Approach to the Problem

The study consisted of 2 phases that were preceded by a single familiarization session and combined incremental and supramaximal verification test (Figure 1) to determine swimmers Vo2max and gas exchange threshold (GET). Phase 1 was designed to assess the agreement between Vo2, VE, and Vco2 using the MS and DB methods when swimming at different intensities. It was also used to assess within-trial and between-trial repeatability of Vo2, Vco2, and VE using the MS vs. DB methods. Phase 2, which occurred on different days to phase 1, was designed to assess the between-day variation and hence repeatability of MS derived ventilatory measures (tidal volume [VT] and breathing frequency [fr]) and pulmonary gas exchange parameters during constant-velocity submaximal swimming based on GET. In addition to Vo2, VE, and Vco2, this included the end-tidal pressures of O2 and CO2 (PETO2 and PETCO2, respectively).

Figure 1.:
Schematic of the protocol. *Limit of tolerance and †order of DB and MS collections counterbalanced between participants. Duration denotes duration of the stage. Sample denotes time point (minutes) within a given stage of phase 1 that expired air was measured. DB = Douglas bag; MS = MetaSwim.

All testing was completed using the front crawl stroke and in the same swimming flume (SwimEx 600-T Therapy pool, length 4.2 m, width 2.3 m, and depth 1.5 m) housed within a climate-controlled chamber. Although swimmers could take part in both phases, only 2 swimmers completed the protocol because of the required time commitment. In addition, during each Vo2max test and each experimental trial of phases 1 and 2, a flow turbine meter (Model 001, Current flow meter; ValePort, Totnes, United Kingdom) was used to independently assess flume speed.

Douglas Bag and MetaSwim Overview

Briefly, DB collections were made using a modified snorkel connected through standard respiratory tubing (32 OD, Hans Rudolf, Kansas, USA) to a DB rig containing multiple 150 L bags (Cranlea, Birmingham, United Kingdom). Standardized equations (11) were used to calculate Vo2 (standard temperature and pressure, dry, STPD), Vco2 (STPD), and VE (body temperature and pressure, saturated with water vapor, BTPS) from the measured fractions of expired O2 and CO2 (Rapidox 3100 gas analyzer; Sensotec, Cambridge, United Kingdom), bag volume (dry gas meter; Harvard Apparatus, Holliston, MA, USA), and expired air temperature (MCP multi digital thermometer, Shanghai, China).

Breath-by-breath changes in VE, VT, fr,Vo2, Vco2, PETO2, and PETCO2 were measured using the MS. A triple V digital flow sensor (manufacturer reported resolution of 7 ml, accuracy of ±2%) was placed at the end of the snorkel and was protected by 2 lightweight splash protectors. The snorkel contains a twin-tube and was connected to a tube-in-tube gas sample line through a hydrophobic filter (Figure 2). Expired O2 and CO2 were sampled at the mouth and were analyzed by an electrochemical sensor for O2 and a nondispersive infrared sensor for CO2 housed within the MS device (Figures 2 and 3).

Figure 2.:
MetaSwim and snorkel. For clarity reasons, only one splash protector is shown.
Figure 3.:
Participant swimming while instrumented with the MetaSwim.

Before testing, the gas analyzer and MS were calibrated using ambient air and gases of a known concentration in line with the manufacturer instructions, and the MS flow sensor was calibrated using a calibrated 3-L syringe supplied with the MS (Cortex, Leipzig, Germany).


Sixteen trained club-level competitive swimmers (10 females aged 19–39 years old) volunteered for this study, which consisted of 2 phases. Mean ± SD for absolute and body mass relative maximal Vo2 (Vo2max), which was measured at the start of the study during front crawl, age, body mass, and stature were 3.49 L·min−1, 48.5 ± 10.7 ml·kg−1·min−1, 22 ± 5 years, 72.0 ± 10.4 kg, and 1.75 ± 0.07 m. All participants provided fully informed written consent, and institutional ethical approval was granted and approved by the University of Portsmouth's institution review board before the study commenced.


Familiarization and Vo2max Determination

Participants were first familiarized with the operation of the swimming flume and became fully accustomed to swimming in the flume wearing the relevant snorkels before any testing took place: swimmers had used the swimming flume and a snorkel before familiarization, either by participating in other swimming research studies or, in the case of the snorkel, in training. After this, participants determined a self-selected warm-up velocity that could be comfortably sustained for 10 minutes without any increase in perceived effort. This velocity (0.93 ± 0.09 m·s−1) was then selected as the warm-up and cool-down velocity for all subsequent phase 1 or 2 tests. The familiarization session lasted approximately 20 minutes.

The Vo2max test was completed in the same testing session as the familiarization session after 15-minute rest. After a 5-minute warm-up, swimmers completed a progressive intensity swimming test consisting of 2-minute stages until the limit of tolerance. At the end of each 2-minute stage, velocity was increased by 0.05–0.10 m·s−1 until the limit of tolerance (inability to maintain velocity). After this, swimmers undertook a 5-minute cool-down, followed by 10 minutes of passive seated rest on poolside. Participants then completed a supramaximal constant-velocity test to verify that their measured Vo2peak reflected Vo2max. A 3-minute warm-up preceded an individualized step transition to a work rate corresponding to 105% of the final velocity achieved during the incremental Vo2max test (adapted from Ref. 36). This velocity differed from swimmer to swimmer, as it was dependent on the final velocity achieved during the Vo2max test. Participants were required to swim at this velocity until reaching their limit of tolerance. The highest 10-second average value achieved during either the Vo2max or verification test was taken to represent Vo2max (Figure 1).

The GET was identified from the incremental test using the V-slope method and verified using the ventilatory equivalents for O2 and CO2, and the end-tidal gas tension methods (7,11) by 2 independent observers trained in the technique. The GET was subsequently used to set the swimming velocities in phase 1.

Phase 1 Procedure

Nine swimmers (5 females; age: 22 ± 6 years; height: 1.77 ± 0.06 m; body mass: 77.6 ± 8.8 kg; and Vo2max: 48.6 ± 13.3 ml·kg−1·min−1) completed 2 variable-intensity swimming tests (barometric pressure: 764 ± 4 mm Hg; ambient temperature: 20.1 ± 0.7° C; and water temperature: 27.7 ± 0.4° C). Swimmers completed a 5-minute warm-up (velocity did not exceed velocity of stage 1), followed by 10 minutes of swimming at an intensity 15% below GET (stage 1: low) and 10 minutes of swimming at an intensity at the velocity immediately below GET (stage 2: mod) (modified from 4). Stage 1 and 2 velocities were chosen to ensure that the participants would reach a steady state in 3 minutes, so MS and DB collections could be made interchangeably during the 10-minute stage. Swimmers wore a nose clip throughout, along with the MS snorkel connected to the MS metabolic cart or a modified snorkel connected to the DB rig during the relevant part of each data collection stage.

During each 10-minute stage (low, mod), 5 minutes was designated as an MS collection phase and 5 minutes was designated as a DB collection phase. Expired air was only collected in 60-second bouts in the final 2 minutes of each 5-minute phase (minutes 3–5) per 10-minute stage. This permitted Vo2, Vco2, and VE to be calculated per 60 seconds of the 2-minute MS and DB data collection phases per stage (Figure 1).

After completion of stage 2, the velocity was increased (0.05–0.10 m·s−1) every 2 minutes until the limit of tolerance was reached (stage 3). The highest Vo2, VE, and Vco2 values observed during stage 3 were recorded as peak values. Because stage 3 required non–steady-state swimming, expired air was collected continuously per 60 seconds of each 2-minute stage using only the MS in one test and DB only in the other test. The selection of either MS or DB for test 1 in participant 1 was determined using a coin-toss and then counterbalanced for all participants thereafter. In test 2, stages 1 and 2 were collected in an identical order; however, if stage 3 was collected using the DB in test 1, it was collected using the MS in test 2 and vice versa (Figure 1). Although the order of MS and DB collections and number of 2-minute stages were identical per participant per variable-intensity test (excluding stage 3), the order of MS and DB collections was counterbalanced between participants.

Phase 2 Procedure

Nine swimmers (6 females, age: 22 ± 7 years; height: 1.72 ± 0.07 m; body mass: 70.0 ± 13.2 kg; and Vo2max: 44.4 ± 7.8 ml·kg−1·min−1) completed 3 or 4, 6-minute constant-velocity swimming tests (barometric pressure: 767 ± 2 mm Hg; ambient temperature: 24.1 ± 0.7° C; and water temperature: 27.8 ± 0.1° C) on different days. The velocity of these swims was based on critical velocity. Critical velocity (VCrit: 1.08 ± 0.13 m·s−1) was determined separately by backward extrapolation from a 400-m (346.1 ± 48.7 seconds) and 800-m (721.7 ± 95.5 seconds) time-trial pool swim, administered in a counterbalanced order and completed on separate days after a standardized competition warm-up (22). VCrit was chosen because it reflects the highest sustainable swimming intensity that can be maintained (14) and demarcates the heavy and severe intensity exercise domains, providing a measure of swimming endurance (38).

Each 6-minute constant-velocity swimming test began with 10 minutes of seated rest. Participants were then instrumented with the MS snorkel and donned a nose clip, which they wore for the reminder of the trial. They then undertook 3 minutes of prone floating (baseline: during which a low current was switched on to aid buoyancy), followed immediately by 6 minutes of constant-velocity swimming at a pace 5% slower than critical velocity (VCrit5% slower). After a 30-minute–seated poolside recovery, participants again floated for 3 minutes in the flume, followed immediately by 6 minutes of constant-velocity swimming at a pace 5% faster than critical velocity (VCrit5% faster).

Statistical Analyses

All data were first assessed for normality using a Shapiro-Wilk test and were normally distributed. Vo2max was calculated as the mean and SD of all 16 swimmers. The agreement and within-trial DB and MS repeatability data (phase I) were based on all 9 swimmers completing phase 1. The MS repeatability data (phase 2) were based on all 9 swimmers completing phase 2.

Phase 1: Variable-Intensity Tests

Vo2peak, Vco2peak, and VEpeak were compared between MS and DB (DB-MS) using limits of agreement (LoA) along with bias, random error, and 95% confidence intervals (CIs), in accordance with methods reported previously (5,9,10). Paired samples t-tests (IBM SPSS, v24, α = 0.05) were used to assess for significant bias between MS and DB measurements per stage and per variable.

As heteroscedasticity was present in some stage 1 and 2 data, Vo2, Vco2, and VE were logarithmically transformed (natural log), antilogged, and displayed as ratios (5,9,10). Consequently, Vo2, Vco2, and VE were compared between DB and MS (DB-MS) using ratio LoA, bias, random error, and 95% CI in accordance with the methods of Bland and Altman (9,10). Specifically, the last 2 minutes of stages 1 and 2 of each variable-intensity test was averaged and compared per test between methods. The replicate measurements for these 2-minute averages between the 2 variable-intensity tests were analyzed as 2 separate repeatability studies, so the estimates of each method's agreement could be compared (5).

To determine the within-trial repeatability for MS and DB, each 60 seconds of the 2-minute collection per stage was compared using the CV and repeatability coefficient (CR). The CV was determined by dividing the SD by the mean and multiplying by 100 (2). Repeatability coefficient was determined by multiplying the within-subject SD (square root of the residual mean square) by 2.77 (1.96 multiplied by the square root of 2) (5,39): the CR accounts for both random and systematic error and is preferred over Pearson's r and the intraclass correlation coefficient (39). As heteroscedasticity was evident in some stage 1 and 2 Vo2, Vco2, and VE data, these CR data were logarithmically transformed (natural log), antilogged, and expressed as ratio data, including the geometric mean, and displayed along with the 95% LoA (1,5,9,39). In addition, the CV and CR for achieved velocity per stage (within-trial) and between tests were calculated as a whole (only mechanical variation and not biological variation would be present) and CR expressed in the original units of measurement.

Phase 2: Swimming Above and Below VCrit

Along with measured velocity, the final minute of baseline and exercising data (VE, VT, fr, Vo2, Vco2, PETO2, and PETCO2) was averaged and compared between each of the 3–4 replicate tests. Repeatability was determined using the CR and CV as described in phase 1. As some data were heteroscedastic, all CR comparisons were made using ratio data.


Vo2max and Vo2max Verification

The highest Vo2 value determined during the Vo2max test was 3.46 ± 0.90 L·min−1 (48.5 ml·kg−1·min−1). The supramaximal verification test produced a Vo2peak of 2.05 ± 0.53 L·min−1. In only 3 participants, Vo2peak was higher in the verification test (by 0.14–0.20 L·min−1).

Phase 1: Variable-Intensity Tests

Velocities (CV in parentheses) at stages 1, 2, and 3 were 0.98 ± 0.14 m·s−1 (5.0 ± 2.8%), 1.15 ± 0.15 m·s−1 (4.8 ± 1.8%), and 1.47 ± 0.17 m·s−1 (3.2 ± 1.9%), respectively. The CR for velocity at stages 1, 2, and 3 was 0.15 m·s−1. The 95% lower and upper LoA were −0.04 and 0.25 m·s−1 for stage 1, 0.04 and 0.20 m·s−1 for stage 2, and −0.02 and 0.18 m·s−1 for stage 3. The GET occurred at 66 ± 7% (2.48 ± 0.63 L·min−1) of Vo2max.

Vo2peak (t = 1.588, p = 0.151), Vco2peak (t = 0.95, p = 0.37), and VEpeak (t = 1.25, p = 0.25) were not statistically different between MS and DB methods. Nevertheless, there was a tendency for absolute values to be lower during MS measurements, and both bias and random error were large (Table 1 and Figure 4).

Table 1.:
Douglas bag (DB) vs. MetaSwim (MS) limits of agreement (LoA) and precision of LoA for Vo 2peak, VEpeak, and Vco 2peak determined during variable-intensity swimming.*
Figure 4.:
Mean difference in Vo 2peak (A), Vco 2peak (B), and VEpeak (C) between DB and MS plotted against their mean values. Heavy line = bias, Solid line = ± 1.96 SD, p = bias, r = absolute difference between DB and MS and the mean. DB = Douglas bag; MS = MetaSwim.

Bias (p > 0.05) and random error for Vo2, Vco2, and VE during low and moderate swimming velocities are presented in Table 2. The CV and CR for Vo2, Vco2, and VE were typically as good as, if not better than, DB for within-trial MS measurements in both tests (Table 3).

Table 2.:
Douglas bag (DB) vs. MetaSwim (MS) ratio limits of agreement (LoA) per stage and per variable-intensity test including estimated precision of the LoA.*
Table 3.:
Repeatability coefficient (CR), coefficient of variation (CV), and absolute data for Douglas bags (DBs) and MetaSwim (MS) during steady-state swimming per variable-intensity test: within-equipment comparisons.*

Phase 2: Swimming Above and Below VCrit

The CR and CV for velocity, VE, VT, fr, Vo2, Vco2, PETO2, and PETCO2 are presented in Table 4. The repeatability of the physiological parameters was better for exercising values than baseline values during both VCrit5% slower and VCrit5% faster.

Table 4.:
MetaSwim absolute, repeatability coefficient (CR), and coefficient of variation (CV) data at rest (base) and when swimming (swim) 5% below (VCrit5% slower) and 5% faster (VCrit5% faster) than VCrit.*†


The aim of this study was to assess the level of agreement between MS- and DB-derived measurements of Vo2, Vco2, and VE and to determine the repeatability of VE, VT, fr, Vo2, Vco2, PETO2, and PETCO2 measured using the MS during flume-based swimming exercise. Agreement between the MS and DB methods was poor, and that the MS typically underestimated peak and submaximal Vo2, Vco2, and VE. However, the within-trial repeatability for the MS was at least as good as, if not better than, the DB-derived values, and the test-retest variability (CV) in VE, VT, fr, Vo2, Vco2, PETO2, and PETCO2 was consistent with that reported in the literature (23,24,34,40), although the CR was large.

When compared to the DB method, the MS underestimated Vo2peak by 13% (0.39 L·min−1), Vco2peak by 9% (0.26 L·min−1), and VEpeak by 11% (9.08 L·min−1) (Table 1). This is similar to the observations of Gayda et al. (15), who found that maximal Vo2, Vco2, and VE were underestimated by 15% (0.50 L·min−1), 6% (0.22 L·min−1), and 9% (10 L·min−1), respectively, when using the Aquatrainer system vs. the K4b2 face mask during cycle ergometry.

The MS also tended to underestimate (bias in parentheses) submaximal Vo2 (2–17%), Vco2 (2–11%), and VE (0–17%). This was slightly better than the underestimation in Vo2 (21%), Vco2 (2–14%), and VE (18%) reported by Gayda et al. (15) during submaximal (100 W) cycle ergometry, but worse than that observed by both Keskinen et al. (20) and Baldari et al. (4) when comparing the K4b2 face mask with the Aquatrainer system during cycle ergometry. Keskinen et al. (20) reported a pooled mean difference between the face mask and Aquatrainer of 5–7% (174 ml·min−1) for Vo2, 4–6% (138 ml·min−1) for Vco2, and 3–5% (3.05 L·min−1) for VE. Baldari et al. (4) reported even smaller differences in Vo2 (0.9–2.8 ml·min−1), Vco2 (5.1–11.3 ml·min−1), and VE (0.10–.0.14 L·min−1). However, when only the Aquatrainer system was used during either swimming or cycle ergometry, the mean difference in Vo2 was 3-fold higher during swimming and 2-fold higher for Vco2 and VE (4). This suggests that the variability in Vo2, Vco2, and VE is greater in an aquatic environment compared with a terrestrial one.

Although no statistically significant bias in peak or submaximal Vo2, Vco2, and VE was observed, a high level of random error was present, and given the small sample size, it would have been difficult to detect statistically significant bias (2). The wide LoA for peak and submaximal Vo2, Vco2, and VE mean that if the same participants were tested again, Vo2peak determined using the MS could be as much as 1.06 L·min−1 below or 1.84 L·min−1 above DB values (Table 1). Submaximal MS-derived Vo2 may also underestimate or overestimate DB values by as much as 35% during low-intensity swimming and 78% during moderate intensity swimming because of measurement error alone (Table 2). This lack of agreement between DB and MS measurements is not acceptable. Although the mean difference observed across swimming intensities is consistent with that reported between DB and other metabolic carts for Vo2 (3–14%), Vco2 (3–17%), and VE (4–8%) (15,25,34,40), the data indicate that the MS and DB cannot be used interchangeably during flume swimming. Despite this, the within-test repeatability (CV and CR) for Vo2, Vco2, and VE during submaximal swimming was similar between MS and DB measurements for the 2 repeat tests, with the MS typically exhibiting better repeatability (Table 3).

Only 2 studies have examined the test-retest performance of metabolic carts, and these studies have limited the number of comparisons to only 2 (23,40). The lack of test-retest metabolic cart data is disappointing, especially as the high variability between breaths can create a low signal-to-noise ratio reducing the confidence of kinetic parameters and their interpretation (21).

The repeatability of VE, VT, fr, Vo2, Vco2, PETO2, and PETCO2 was worse at baseline than during swimming with a CV ranging from 4 to 27% and ratio CR of ±1.09–1.75 (Table 4). This could reflect the manner in which these data were collected. During the 3 minutes of prone floating (baseline), the flume was switched on and a current was applied to aid buoyancy. This created a small amount of natural sway and likely increased convective heat loss because of the flowing water over the skin (29). Although a standard pool temperature of 28° C was used herein, this would not have been thermoneutral during floating (32). Some swimmers reported feeling cold and shivering during this phase, which would be expected to increase the metabolic demand and thus Vo2 (29). These factors could impact the repeatability of the physiological data at baseline; but during swimming, this would have been less of a problem because metabolic heat production will have increased.

All physiological variables measured during swimming (VE, VT, fr, Vo2, Vco2, PETO2, and PETCO2) produced a test-retest CV <9%, 6–7% for Vo2 specifically (Table 4). This is consistent with the CV (24,34) or percentage difference (23,40) found in the literature for Vo2 (<1–15%), Vco2 (3–7%), and VE (<1–6%) during treadmill exercise, cycle ergometry, or rowing ergometry. These differences have been shown to be inversely related to work rate (24,30,34). Furthermore, few studies have examined the repeatability of VT and fr, and none have examined PETO2 and PETCO2. The 8% and 4–6% CV observed in VT and fr is better than the 12% reported for VT and similar to the 5% reported for fr (15).

Although the exercising CV data of this study are consistent with others, this does not mean that the test-retest variability is inconsequential. The LoA for all CR analyses were wide and with a ratio CR of up ±1.26 for Vo2 and ±1.34 for VE (the worst CR observed in all parameters over both intensities), Vo2 and VE could vary by as much as 26 and 34%, respectively, in the same participants during repeat testing. A change of at least these magnitudes would be needed in future trials to be 95% confident that a real change in, or difference between, Vo2 and VE was evident (5,39). This level of variability was similar for Vco2 and fr but slightly better for VT, PETO2, and PETCO2 (Table 4). Whether or not the MS is capable of detecting a real change and is suitable for evaluative purposes will therefore depend on the size of the change expected or the minimum difference that is considered meaningful (16).

The hydrodynamic and fluid flow differences between flume and pool swimming impact stroke characteristics. Stroke cycle duration is shorter; stroke rate is higher; and the catch and glide phases are reduced at a given velocity during flume vs. pool swimming (17). It is not clear whether such changes to routine stroke kinematics impact the variability of physiological data during flume swimming: swimmers had some experience of swimming in the flume before data collection, but this was limited. This could be exacerbated further if the control of velocity is more variable in a flume because of inherent mechanical variation. In this study, the CR for velocity was ±0.15 m·s−1 during submaximal and maximal swimming in phase 1 with a CV as high as 5%. Phase 2 was slightly better with a ratio CR of ±1.09 for VCrit5% slower and ±1.13 for VCrit5% faster, and test-retest CV of <3%. This CV is worse than that reported (<1%) between target and achieved velocity when swimming at the same relative intensities (VCrit5% slower and VCrit5% faster) in an indoor swimming pool (22).

In light of this, it is possible that day-to-day repeatability would improve if data were collected in a swimming pool rather than a flume. Baseline variability could probably also be reduced by decreasing the likelihood of shivering. This could be achieved by reducing the period over which baseline data are collected if floating in water (although reducing this to less than 3 minutes is questionable), by increasing the temperature of the water, or by undertaking baseline measurements on poolside: prone floating baseline measurements were recorded in the flume to reflect the body position and environment experienced during front crawl. These recommendations require testing and data would still be subjected to the biological variability occurring between replicate tests, which can account for as much as 90% of the total variability in Vo2 (5,13,24). In addition, breathing in front crawl is constrained by swimming stroke. How this impacts the repeatability of VE, fr, PETO2, and PETCO2 in comparison with freely breathing activities as well as other swimming strokes has not been investigated.

It should also be acknowledged that all metabolic carts can encounter errors from a linearity of sensors and a temporal mismatch between ventilation and gas fractions during breath-by-breath sampling (30). It is possible that the water environment itself could exacerbate any such errors and contribute to the level of random error observed. For example, the hydrophobic filter separating the tube-in-tube sample line and the twin-tube can become saturated with water and the latter drawn into the analyzer. Although Drierite is used to reduce the water vapor in the sample line, the Drierite was more effective when the tube was placed vertically rather than horizontally as recommended by the manufacturers. The intrusion of water into the twin-tube was reduced further by wrapping the filter and other snorkel and electrical interfaces with disposable plastic paraffin film (Parafilm, laboratory film, American National Can, Greenwich, CT, USA).

Finally, condensation of expired air inside the snorkel was frequently observed, indicating the temperature of the expired air leaving the mouth was greater than that reaching the flow sensor. The flow measured at the flow sensor would therefore not exactly equal the flow at the mouth (8). Furthermore, the temperature sensor is located within the MS analyzer unit and not within the snorkel or flow sensor housing unit. Given that majority of variation in Vo2 with metabolic carts comes from the measurement of ventilation (6), it is possible that temperature differential errors could have increased the variability in VE and in turn Vo2 and Vco2.

Practical Applications

The test-retest data of the MS are consistent with other metabolic carts, suggesting similar levels of repeatability. The test-retest performance of the MS and DB method is similar, with the MS typically exhibiting smaller CV and CR values. The MS is more convenient to use than the DB method, making it appealing for practical use, and the breath-by-breath nature of data collection means the MS is more versatile. For example, as well as the traditional parameters of Vo2, Vco2, and VE that can be assessed with the DB method, pulmonary oxygen uptake kinetics, GETs, and rapid changes in ventilation and oxygen and carbon dioxide expired fractions can be examined with the MS. Although the MS can be used to assess the response of such parameters to training, the level of day-to-day variability is not inconsequential. Whether or not the MS is suitable for use as an evaluative tool will therefore depend on the size of the effect one wishes to detect.

The poor agreement and wide LoA between the MS and DB indicate that they cannot be used interchangeably during flume swimming. Biological and technical variability make perfect agreement very unlikely, and the disparity between MS- and DB-derived Vo2, Vco2, and VE values is consistent with the variability between other metabolic carts and the DB method. Given that the MS can provide a greater magnitude of physiological data, it is unlikely that the MS and DB would be used interchangeably.


The authors thank the swimmers who took part and Ms. Anne-Marie Smith.


1. Altman DG, Bland JM. Measurement in medicine: The analysis of method comparison studies. Stat 32: 307–317, 1983.
2. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med 26: 217–238, 1998.
3. Åstrand PO, Saltin B. Maximal oxygen uptake and heart rate in various types of muscular activity. J Appl Physiol 16: 977–981, 1961.
4. Baldari C, Fernandes RJ, Meucci M, Ribeiro J, Vilias-Boas JP, Guidetti L. Is the new Aquatrainer snorkel valid for VO2 assessment in swimming? Int J Sports Med 34: 336–344, 2013.
5. Barlett JW, Frost C. Reliability, repeatability and reproducibility: Analysis of measurement errors in continuous variables. Ultrasound Obstet Gynecol 31: 466–475, 2008.
6. Basset DR, Howley ET, Thompson DL, King GA, Strath SJ, McLaughlin JE, et al. Validity of inspiratory and expiratory methods of measuring gas exchange with a computerized system. J Appl Physiol 91: 218–224, 2001.
7. Beaver WL, Wasserman K, Whipp BJ. A new method for detecting anaerobic threshold by gas exchange. J Appl Physiol 60: 2020–2027, 1986.
8. Beaver WL, Wasserman K, Whipp BJ. On-line computer analysis and breath-by-breath graphical display of exercise function tests. J Appl Physiol 34: 128–132, 1973.
9. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 8: 135–160, 1999.
10. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1: 307–310, 1986.
11. Cooper CB, Storer TW. Data Integration and Interpretation. In: Exercise Testing and Interpretation: A Practical Approach. Cambridge, United Kingdom: Cambridge University Press, 2001. pp. 149–180.
12. Costill DL, Kovaleski J, Porter D, Kirwan J, Fielding R, King D. Energy expenditure during front crawl swimming: Predicting success in middle-distance events. Int J Sports Med 6: 266–270, 1985.
13. de Vet HCW, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J Clin Epidemiol 59: 1033–1039, 2006.
14. Dekerle J, Sidney M, Hespel JM, Pelayo P. Validity and reliability of critical speed, critical stroke rate, and anaerobic capacity in relation to front crawl swimming performances. Int J Sports Med 23: 93–98, 2002.
15. Gayda M, Bosquet L, Juneau M, Guiraud T, Lambert J, Nigam A. Comparison of gas exchange data using the Aquatrainer system and the facemask with Cosmed K4b2 during exercise in healthy subjects. Eur J Appl Physiol 109: 191–199, 2010.
16. Guyatt G, Walter S, Norman G. Measuring change over time: Assessing the usefulness of evaluative instruments. J Chronic Dis 40: 171–178, 1987.
17. Guignard B, Rouard A, Chollet D, Ayad O, Bonifaz M, Vedova DD, et al. Perception and action in swimming: Effects of aquatic environment on upper limb inter-segmental coordination. J Hum Mov Stud 55: 240–254, 2017.
18. Holmér I. Oxygen uptake during swimming in man. J Appl Physiol 33: 502–509, 1972.
19. Kapus J, Ušaj A, Kapus V, Štrumbelj B. Assessment of ventilation during swimming using backward extrapolation of the ventilation recovery curve. Kinesiol 36: 69–74, 2004.
20. Keskinen KL, Rodríguez FA, Keskinen OP. Respiratory snorkel and valve system for breath-by-breath gas analysis in swimming. Scand J Med Sci Sports 13: 322–329, 2003.
21. Koga S, Shiojiri T, Kondo N. Measuring VO2 kinetics the practicalities. In: Oxygen Uptake Kinetics in Sport, Exercise and Medicine. Jones AM, Poole DC, eds. Abingdon, United Kingdom: Routledge, 2005. pp. 39–61.
22. Lomax M, Thomaidis SP, Iggleden C, Toubekis AG, Tiligadas G, Tokmakidis SP, et al. The impact of swimming speed on respiratory muscle fatigue during front crawl swimming: A role for critical velocity? Int J Swimming Kinetics 2: 14–29, 2013.
23. Lucia A, Fleck SJ, Gotshall RW, Kearney JT. Validity and reliability of the Cosmed K2 instrument. Int J Sports Med 14: 380–386, 1993.
24. Macfarlane DJ. Automated metabolic gas analysis systems. Sports Med 31: 841–861, 2001.
25. Macfarlane DJ, Wong P. Validity, reliability and stability of the portable Cortex Metamax 3B gas analysis system. Eur J Appl Physiol 112: 2539–2547, 2012.
26. Magel JR, Foglia GF, McArdle WD, Gutin B, Pechar GS, Katch FI. Specificity of swim training on maximum oxygen uptake. J Appl Physiol 38: 151–155, 1974.
27. McArdle WD, Glasser RM, Magel JR. Metabolic and cardiorespiratory response during free swimming and treadmill walking. J Appl Physiol 30: 733–738, 1971.
28. Montpetit RR, Léger LA, Lavoie JM, Cazorla G. VO2 peak during free swimming using the backward extrapolation of the O2 recovery curve. Eur J Appl Physiol 47: 385–391, 1981.
29. Nadel ER, Holmér I, Bergh U, Åstrand PO, Stolwijk JAJ. Energy exchanges of swimming man. J Appl Physiol 36: 465–471, 1974.
30. Overstreet BS, Bassett DR, Crouter SE, Rider BC, Parr BB. Portable open-circuit spirometry systems. J Sports Med Phys Fitness 57: 227–237, 2017.
31. Pelarigo JG, Machado L, Fernandes RJ, Greco CC, Vilas-Boas JP. Oxygen uptake kinetics and energy system's contribution around maximal lactate steady state swimming intensity. PLoS One 12: e0167263, 2017.
32. Pendergast DR, Lundgren CEG. The underwater environment: Cardiopulmonary, thermal and energetic demands. J Appl Physiol 106: 276–283, 2009.
33. Reis JF, Alves FB, Bruno PM, Vleck V, Millet GP. Effects of aerobic fitness on oxygen uptake kinetics in heavy intensity swimming. Eur J Appl Physiol 112: 1689–1697, 2012.
34. Rosdahl H, Gullstarnd L, Salier-Eriksson J, Johansson P, Schantz P. Evaluation of the Oxycon mobile metabolic system against the Douglas bag method. Eur J Appl Physiol 109: 159–171, 2010.
35. Rodríguez FA. Maximal oxygen uptake and cardiorespiratory response to maximal 400-m free swimming, running and cycling tests in competitive swimmers. J Sports Med Phys Fitness 20: 87–95, 2000.
36. Saynor ZL, Barker AR, Oades PJ, Williams CA. Impaired pulmonary VO2 kinetics in cystic fibrosis depend on exercise intensity. Med Sci Sports Exer 48: 2090–2099, 2016.
37. Sousa AC, Vilas-Boas JP, Fernandes RJ. VO2 kinetics and metabolic contributions whilst swimming at 95, 100, and 105% of the velocity at VO2max. Biomed Res Int 48: 675363, 2014.
38. Toussaint HM, Wakayoshi K, Hollander AP, Ogita F. Simulated front crawl swimming performance related to critical speed and critical power. Med Sci Sports Exer 30: 144–151, 1998.
39. Vaz S, Falkmer T, Passmore AE, Parsons R, Andreou P. The case for using the repeatability coefficient when calculating test-retest reliability. PLoS One 8: e73990, 2013.
40. Vogler AJ, Rice AJ, Gore CJ. Validity and reliability of the Cortex MetaMax3B portable metabolic system. J Sports Sci 28: 733–742, 2010.

Douglas bags; oxygen uptake kinetics; reliability

© 2018 National Strength and Conditioning Association