The value of accurately estimating V˙O2max (mL·kg−1·min−1) has been highlighted in a recent large, population-based cohort study (14) from the Jebsen Center for Exercise in Medicine at the Norwegian University of Science and Technology. The study demonstrated that a simple estimation of V˙O2max can predict long-term cardiovascular disease and all-cause mortality. Hence, the accuracy and validity of estimating V˙O2max is paramount in reporting the association/link between V˙O2max and all-cause mortality.
Several studies have reported nonlinear associations between V˙O2max and age, and V˙O2max and body mass (5,10,16). Hence, it was surprising that Nes et al. (13) adopted a linear model to estimate V˙O2max (mL·kg−1·min−1) that was subsequently used by Nes et al. (14) to predict long-term all-cause mortality and cardiovascular disease. The authors reported the following linear regression models to estimate V˙O2max (mL·kg−1·min−1) for men: 100.27 − (0.296 × age) − (0.369 × WC) − (0.155 × resting heart rate [RHR]) + (0.226 × PA-index); for women: 74.74 − (0.247 × age) − (0.259 × WC) − (0.114 × RHR) + (0.198 × PA index), where WC is the waist circumference, RHR is the resting hear rate, and PA index is the physical activity index. The authors reported that their models were unable to detect any interaction or polynomial terms, i.e., the inclusion of such terms was unable “to influence the R2 of the models appreciably.”
There are at least three major concerns with these linear, additive models. First, both models suggest a linear decline in age that has the same rate (same slope parameter) for participants in the twenties, as in their fifties or sixties and in their eighties. However, there is evidence in the literature that indicates a curvilinear decline in V˙O2max with age, suggesting the need for a nonlinear or quadratic age term to be incorporated into the model, see Astrand and Rodahl (5), Figure 7–15 on p. 337, and Hawkins (10). The second concern is the absence of a weight/body-mass term in both models. Nevill et al. (19) and Astrand and Rodahl (5), in their Figure 9–4 on p. 400, reported a strong negative association between V˙O2max (mL·kg−1·min−1) and body mass. This is because absolute V˙O2max (L·min−1) scales to, or is associated, with body mass (M0.67), and hence when researchers calculate V˙O2max (mL·kg−1·min−1), by dividing V˙O2max (L·min−1) by body mass (M), the resulting ratio “overscales,” leaving V˙O2max (mL·kg−1·min−1) proportional to M−0.33. This nonlinear association with mass should have been considered by Nes et al. (13). Incorporating a power-function body mass term as a predictor in both models is likely to improve the accuracy when predicting V˙O2max (mL·kg−1·min−1).
Another major concern with these fitted models is the fact that the residuals from both linear models are unlikely to be (a) normally distributed (16) and (b) independent of the predictor variables (in particular age). If the residuals demonstrate a lack of normality and independence, then the validity of the models (i.e., the statistical significance of the estimated parameters) will be questionable. For example, we cannot be confident that the decline in age is linear, as discussed previously, and that by fitting an alternative biologically sound allometric model, that a nonlinear or curvilinear decline in age and a curvilinear power-function term in body mass might have been detected. For a brief and concise history of allometric modeling, see Winter and Nevill (23).
Hence, the purpose of this study was to fit the same linear, additive model adopted by Nes et al. (13) to both estimated and directly measure V˙O2max (mL·kg−1·min−1) data from two previously published studies: study 1, the Allied Dunbar National Fitness Survey (ADNFS) (2,16), and study 2, data reported by Amara et al. (3), to compare a linear model with an alternative, proportional allometric model to discover whether the latter provides 1) a superior quality of fit (using R2, maximum log-likelihood [MLL], and Akaike information criteria [AIC] criterion), 2) more normally distributed residuals, and 3) a more plausible, biologically sound and interpretable model.
METHODS (STUDY 1)
All variables and measurement used in the current study have been previously described and published (16) or reported in a technical report (7). Cardiopulmonary fitness or V˙O2max was assessed using a progressive incremental test on a motorized treadmill. In reality, the V˙O2max measurements are estimates based on the linear relationship (for each subject) between the oxygen cost and the HR, recorded breath by breath (n > 50) during a submaximal exercise test using an automated respiratory gas analyzer (Quinton Q-plex) and a diagnostic electrocardiogram (Quinton Q4000). The test continued until the end of a 1-min stage in which the subject's HR had reached 85% of estimated maximum for age (210 − 0.65 × age, bpm). For a given individual, the estimated V˙O2max is the predicted oxygen cost at an assumed maximum HR, taken to be 210 − 0.65 × age (11). All submaximal tests used to estimate V˙O2max are associated with an SE of prediction which is typically in the range of 10%–15% (5). One advantage of the protocol used in the ADNFS (2,7) is that the V˙O2 of each stage was directly measured, which eliminates variations in mechanical efficiency associated using workload. However, the accuracy of the method is still dependent on the variability in predicted maximum HR, which in normal adult participants has been shown to have an SD of 10–12 bpm (4). The validity of the linear extrapolation method described by Lange-Anderson et al. (12) to predict V˙O2max using measured submaximal V˙O2 values to a predicted maximum HR has been assessed against directly determined treadmill V˙O2max, where it was shown to underpredict by 13% with an SE of 1.4 mL·kg−1·min−1 (9), which is within the range typically reported for estimations of V˙O2max.
For our measure of physical activity, we adopted the number of 20-min bouts of vigorous exercise (VIGEX), defined as activities that were >7.5 kcal·min−1 or >60% of aerobic capacity reported during the 4 wk before the exercise test. There are well-established limitations to methods of physical activity assessment that rely on self-report, which have been shown to introduce measurement error and bias (1). However, in a preliminary study for the ADNFS, the recall of participants was shown to be consistent in more than 80% of repeat interviews that were completed 1 month apart (technical report by Fentem et al. , p. 11).
Waist girth measurements were obtained using a standardized protocol (see the technical report by Fentem et al. , p. 54). From behind the subject, the administrator identifies the iliac crest and the 12th rib, keeping the second (index) and the fourth fingers on the sites. A mark, using a demographic pencil, was put on the skin midway between two sites using the third (middle) finger as an indicator. This was repeated on the other side of the body. The tape was placed around the waist to cover the two marked spots and to lie in a horizontal plane around the body. The subject was instructed to stand upright in the standard anatomical position and to breathe normally. The reading was noted at the onset of inhalation and of exhalation, and a mean value was recorded to the nearest millimeter.
RHR measurements were also obtained using a standard protocol for obtaining blood pressure and RHR using an automated sphygmomanometer (Accutorr 1, Data Corporation, Cambridge, UK; see technical report by Fentem et al. , p. 57). Measurements were conducted after the anthropometry and flexibility test but before any strenuous tests. At least three measurements were recorded at 1-min intervals, after the participants had been seated with their legs uncrossed for at least 3 min. The value used for RHR was that associated with the lowest diastolic blood pressure measurement.
METHODS (STUDY 2)
A detailed description of subject selection and recruitment is provided in a previous study see Amara et al. (3). Briefly, the subjects were independently living women (n = 146) and men (n = 152) who volunteered to participate in the study and indicated verbally that they were able to walk a distance of 80 m (self-paced walk test). Body mass M was assessed to the nearest 0.1 kg using calibrated Leverbalance scales (HealthOMeter, Inc., Bridgeview, IL), and body height was measured using a stadiometer to the nearest 0.1 cm with the subject standing, lightly clothed, and without footwear. Harpenden skinfold calipers (Harpenden, British Indicators Ltd., UK) were used to measure skinfold thickness at four sites (biceps, triceps, suprailiac, and subscapular) on the right side of the body. Total body density was estimated from the log of the sum of four skinfold measurements with the equation from Durnin and Womersley (6) for adults 50 yr and older. Percentage body fat and subsequent fat-free mass (FFM) were estimated using Siri's equation (21).
The methods for determining V˙O2max are also described by Amara et al. (3). In brief, while breathing through a mouthpiece with nose clips, subjects performed an incremental ramp test to volitional or symptom-limited fatigue on a motorized treadmill. The protocol consisted of a 4-min warm-up at 0.76 m·s−1 1.7 mph) and a 0% gradient followed by gradient and/or speed changes such that oxygen uptake increased each minute by 1–3 mL·kg−1·min−1, and the total duration of the test was between 8 and 12 min. Subjects were encouraged verbally throughout the test to perform to the limit of their tolerance. Gas exchange and ventilatory variables were analyzed using a calibrated mass spectrometer (PerkinElmer MGA110) and a bidirectional turbine and volume transducer (SensorMedics VMM2A), respectively. HR was monitored throughout the test using a bipolar chest lead (CM5).
The physical activity of the participants in study 2 was assessed by the Minnesota Leisure Time Physical Activity questionnaire (22). Amara et al. (3) chose to include only the heavy intensity activity scores in their analysis because they should theoretically provide the greatest cardiorespiratory stimulus. The heavy intensity activities were those requiring >6 METs (1 MET = 3 × 5 mL·kg−1·min−1). This value was age adjusted based on previous data (D. H. Paterson, unpublished) from their laboratory to account for the age associated decline in V˙O2max such that the male heavy intensity activity code decreased by 1.00% per year and the female heavy intensity activity code decreased by 1.04% per year for 55 yr and older. Each subject's heavy intensity physical activity was determined as time spent and energy expenditure (METs·yr−1).
As discussed previously, given that body mass M is likely to be strongly (albeit negatively) associated with V˙O2max (mL·kg−1·min−1) and allowing the possibility of a nonlinear association with age, we adopted the following multiplicative model with allometric body size components for study 1 as proposed by Amara et al. (3), Nevill and Holder (17), and Nevill et al. (18),
where [Latin Small Letter Open E] is a multiplicative, error ratio that assumes the error will be in proportion to V˙O2max (mL·kg−1·min−1), see Figure 1.
The model (equation 1) can be linearized with a log transformation. A linear regression analysis on log(V˙O2max) can then be used to estimate the unknown parameters in the log transformed model; that is, the transformed model (equation 2) is now additive that conforms with the assumptions associated with ordinary least squares:
where the residual error log([Latin Small Letter Open E]) is assumed to be normally distributed and the intercept a and the other parameter bi are allowed to vary for various categorical or group differences within the population (e.g., sex).
Study 1 results using linear, additive models
Fitting a similar linear model for V˙O2max (mL·kg−1·min−1) as Nes et al. (13), we obtained the following equations for V˙O2max,
where the residual error [Latin Small Letter Open E] is assumed to be normally distributed. Note that the PA index variable, used by Nes et al. (13), has been replaced by VIGEX, the number of 20-min bouts of VIGEX, defined as activities that were >7.5 kcal·min−1 or >60% of aerobic capacity reported during the 4 wk before the exercise test. The R2 was = 0.638 (adjusted R2 = 0.636).
The residuals saved from the previous analysis were neither normally distributed (Kolmogorov–Smirnov statistic = 0.031, P < 0.001; Shapiro–Wilk statistic = 0.983) nor independent of either the predicted values (see Fig. 1) or the key predictor variable age; that is, the correlation between the absolute residuals versus predicted values was (r = 0.173, P < 0.001) and with age (r = −0.127, P < 0.001). The lack of normality and the heteroscedastic residual errors observed in Figure 1 must cast serious doubt regarding the validity of the predictor variables (questioning the statistical significance of some of the fitted variables but more likely the lack of significance or absence of body mass or higher order polynomial terms, in particular an age2 term). The systematically increasing spread of residuals observed in Figure 1 and the negative correlation between absolute residuals and age must also cast serious doubt on the accuracy/precision of predicting V˙O2max especially for young/fit participants with high estimates of V˙O2max (where the residual errors are at their widest/greatest, see Fig. 1).
Study 1 results using allometric, multiplicative models
The parsimonious allometric model for V˙O2max (mL·kg−1·min−1) was found to be
The R2 was = 0.653 (adjusted R2 = 0.651). The fitted age2 parameter was −0.000106 (SE = 0.000003; 95% confidence interval = −0.000112 to −0.0000995). The age and waist (WC) terms were both not significant (P > 0.05). The residuals saved from the previous analysis were normally distributed (Kolmogorov–Smirnov statistic 0.021, P = 0.064; Shapiro–Wilk statistic 0.997) and acceptably independent of either the predicted values (see Fig. 2) and age; that is, the correlation between the absolute residuals and the predicted values (log-transformed) was (r = −0.048, P = 0.044) and versus age (r = 0.033, P = 0.169).
The negative age2 term within an exponential function is now biologically sound. The model now predicts that the age decline of V˙O2max will follow the right-hand side of the bell-shaped normal distribution type curve, see Figure 3, where the slope of age decline in V˙O2max is flat/zero at 0 yr (i.e., it reaches a plateau), and as age increases to old age, V˙O2max tends toward a zero asymptote; that is, it can never become negative unlike the negative linear age decline proposed and fitted by Nes et al. (13).
Study 2 results using linear, additive models
Fitting a linear model for V˙O2max (mL·kg−1·min−1) as proposed by Nes et al. (13) but using the variables available to Amara et al. (3) plus body mass (for the reasons described in the introduction), we obtained the following equations for V˙O2max:
where the residual error [Latin Small Letter Open E] is assumed to be normally distributed. Note that the PA index variable, used by Nes et al. (13), has been replaced by the results from the Minnesota Leisure Time Physical Activity questionnaire (22). The R2 was = 0.469 (adjusted R2 = 0.456).
As in study 1, the residuals saved from the linear, additive model were not normally distributed (Kolmogorov–Smirnov statistic 0.067, P = 0.007; Shapiro–Wilk statistic = 0.965, P < 0.001). The lack of normality must cast doubt regarding the validity of the predictor variables (questioning the statistical significance of some of the fitted variables but more likely the lack of significance or absence of a higher order polynomial terms, in particular the age2 term).
Study 2 results using allometric, multiplicative models
The parsimonious allometric model for V˙O2max (mL·kg−1·min−1) was found to be
The R2 was = 0.491 (adjusted R2 = 0.481). The fitted age2 parameter was −0.00011 (SE = 0.00001; 95% confidence interval = −0.000124 to −0.000087), and as in study 1, the linear age term was not significant (P > 0.05). The residuals saved from the previous analysis were acceptably normally distributed (Kolmogorov–Smirnov statistic 0.031, P > 0.200; Shapiro–Wilk statistic 0.995, P = 0.546).
The goodness of fit of the competing linear and allometric models
Clearly, because the models are not nested or hierarchical, a direct comparison between two competing model forms (linear vs allometric) is not possible using traditional criteria such as the residual sum of squares, the SE, and the coefficient of determination (R2). However, Nevill and Holder (16) and Nevill et al. (18) chose the maximum likelihood criterion and the AIC as their standard criterion of model assessment (quality of fit) that does not require the competing models to be either nested or hierarchical.
A simple modification of the MLL criterion is able to produce the AIC (AIC = −2 × [MLL] + 2 × [number of parameters fitted]) that would take into account the different number of fitted parameters in the two model structures to be compared, see goodness-of-fit data from both studies 1 and 2 (Table 1).
On the basis of the concerns discussed in the introduction, the results from both studies confirm that the allometric models proposed by Amara et al. (3), Nevill and Holder (17), and Nevill et al. (18) (equation 1) performed better than the linear model proposed by Nes et al. (13) in all three major areas of concern.
The goodness of fit is superior when fitting allometric models. The R2 was greater but more importantly the MLL was also greater, and the AIC was smaller, compared with the linear additive models (see Table 1).
Furthermore, the residuals from both studies saved from fitting the linear, additive models violate the assumption of normality and reveal evidence of heteroscedastic errors associated with both the predicted values and the age. This will seriously question 1) the selection (or more importantly the nonselection) of possible predictor variables and 2) the accuracy when predicting V˙O2max, in particular, of the young and fit individuals in study 1 (who had the greatest predicted V˙O2max) where the residual errors were at their greatest (see Fig. 1). By contrast, the log-transformed allometric model resulted in residuals from both studies that were normally distributed and in the case of study 1, independent of both the predicted values and the key predictor variable age. When we fitted the quadric in age in both studies, the parsimonious solution identified only an age2 term within an exponential function as the appropriate model to describe the age decline in V˙O2max (i.e., the right-hand side of a normal, bell-shaped frequency distribution curve). Note that because the age2 parameters in the allometric models fitted to study 1 and study 2 were very similar, the curvilinear decline in age will be almost identical (Fig. 3). These models, see Figure 3, are now biologically sound and interpretable. To illustrate this based on the results of study 1, compare the systematic errors likely if we use the linear model proposed by Nes et al. (13). The linear model predicts the age decline as 2.96 and 2.47 (mL·kg−1·min−1) per decade (for all ages and decades) for men and women, respectively. However, the more realistic age decline (see, e.g., Astrand and Rodahl , Fig. 7–15 on p. 337) using the allometric model (see Fig. 1) was only 2.58 and 1.80 (mL·kg−1·min−1) for men and women in their 20s, but almost double that rate, found to be 4.66 and 3.25 (mL·kg−1·min−1) for men and women in their 60s.
Further support for the allometric model (1) comes from the fitted stature/height and body mass exponents obtained in study 1, found to be M−436H0.790. Nevill et al. (19) anticipated that when researchers calculate V˙O2max (mL·kg−1·min−1) by dividing V˙O2max (L·min−1) by body mass M, the ratio “overscales,” leaving V˙O2max (mL·kg−1·min−1) theoretically proportional to M−0.33. The fitted body mass exponent (−0.436; SE = 0.027) was greater than that anticipated (−0.333) but confirms the need for its inclusion and the concern by its absence from the Nes et al. (13) linear models. However, when taken together, the two allometric body size components can be rearranged as (H1.81M−1)0.436. This too has a sound biological interpretation, as the resulting index is a stature-to-body mass ratio that closely approximates the inverse body mass index, thought to be a measure of leanness (15,20). Clearly, having a greater lean body mass index, as described by Nevill and Holder (15), should also be strongly associated with predicting V˙O2max (mL·kg−1·min−1).
A similar “leanness” ratio was identified in study 2. The fitted FFM and body mass exponents were found to be M−0.872 × FFM0.679. Again taken together, the two allometric body size components can also be rearranged as (FFM0.779 × M−1)0.872. The resulting FFM-to-body mass ratio is physiologically similar to the ratio reported in study 1, as a greater FFM is a strong determinant of V˙O2max (mL·kg−1·min−1) (8).
We acknowledge that the current study is not without limitations. The fact that we have been able to demonstrate the benefits of modeling V˙O2max using allometric models using just two data sets is not ideal. Clearly, future research should explore the benefits of allometric models using many more V˙O2max data sets especially ones where linear, additive models such as those reported by Nes et al. (13) have been adopted/reported.
In summary, the quality of fit associated with predicting V˙O2max (mL·kg−1·min−1) using allometric models in both studies was superior to linear, additive models based on all criteria (R2, MLL, and AIC). Furthermore, it would appear that by fitting the linear, additive models proposed by Nes et al. (13), systematic errors are likely when predicting V˙O2max (mL·kg−1·min−1), see Figure 3. The linear models fitted to study 1 will systematically overestimate V˙O2max for participants in their 20s and systematically underestimate V˙O2max for participants in their 60s. The failure by Nes et al. (13) to identity curvature in their age decline or the presence of a body mass power function term might well have been explained by examining the residuals saved from their analyses. The residuals from the linear regression analysis from both study 1 and study 2 were neither normally distributed nor independent of the predicted values and key predictor variables such as age. This will almost certainly explain their possible invalid inclusion of some terms, or more likely the absence of other key variables such as body mass and the quadratic term in age2, both crucially identified using the allometric models proposed be Nevill and coworkers. Not only does the curvilinear age decline within an exponential function follow a more realistic age decline (right-hand side of the bell-shaped curve, see Astrand and Rodahl , Fig. 7–15 on p. 337), but the allometric models also identified a stature-to-body mass ratio (study 1) or an FFM-to-body mass ratio (study 2), both known to be associated with leanness, new insights that lead to a more plausible, biologically sound, and interpretable model when predicting V˙O2max (mL·kg−1·min−1).
This study was unfunded. The authors have no conflicts of interest. The results of the study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. The results of the present study do not constitute endorsement by the American College of Sports Medicine.
1. Ainsworth BE, Caspersen CJ, Matthews CE, Mâsse LC, Baranowski T, Zhu W. Recommendations to improve the accuracy of estimates of physical activity derived from self report. J Phys Act Health
. 2012; 90(1 Suppl):S76–84.
2. Allied Dunbar National Fitness Survey. A Report on Activity Patterns and Fitness Levels: Main Findings
. London: Sports Council and Health Education Authority; 1992. pp. 1–398.
3. Amara CE, Koval JJ, Johnson PJ, Paterson DH, Winter EM, Cunningham DA. Modelling the influence of fat-free mass and physical activity on the decline in maximal oxygen uptake with age in older humans. Exp Physiol
4. Arena R, Myers J, Williams MA, et al. Assessment of functional capacity in clinical and research settings: a scientific statement from the American Heart Association Committee on Exercise, Rehabilitation, and Prevention of the Council on Clinical Cardiology and the Council on Cardiovascular Nursing. Circulation
5. Astrand PO, Rodahl K. Textbook of Work Physiology
. 3rd ed. New York: McGraw-Hill; 1986. p. 337.
6. Durnin JV, Womersley J. Body fat assessed from total body density and its estimation from skinfold thickness: measurements on 481 men and women aged from 16 to 72 years. Br J Nutr
. 1974; 32:77–97.
7. Fentem PH, Collins MF, Tuxworth W, Allied Dunbar National Fitness Survey. Technical Report
. London: Sports Council; 1994. p. 1–294.
8. Goran M, Fields DA, Hunter GR, Herd SL, Weinsier RL. Total body fat does not influence maximal aerobic capacity. Int J Obes Relat Metab Disord
9. Grant S, Corbett K, Amjad AM, Wilson J, Aitchison T. A comparison of methods of predicting maximum oxygen uptake. Br J Sports Med
10. Hawkins S, Wiswell R. Rate and mechanism of maximal oxygen consumption decline with aging: implications for exercise training. Sports Med
11. Jones N, Campbell E. Clinical Exercise Testing
. Philadelphia: WB Saunders Co.; 1975. p. 214.
12. Lange-Anderson K, Shephard RH, Denoln H, Varnauskas E, Masironi R. Fundamentals of Exercise Testing
. Geneva: World Health Organization; 1971. p. 134.
13. Nes BM, Janszky I, Vatten LJ, Nilsen TI, Aspenes ST, Wisløff U. Estimating V˙O2peak
from a non-exercise prediction model: the HUNT study. Med Sci Sports Exerc
14. Nes BM, Vatten LJ, Nauman J, Janszky I, Wisløff U. A simple nonexercise model of cardiorespiratory fitness predicts long-term mortality. Med Sci Sports Exerc
15. Nevill AM, Holder RL. Body mass index: a measure of fatness or leanness? Br J Nutr
16. Nevill AM, Holder RL. Modelling maximum oxygen uptake—a case study in non-linear regression model formulation and comparison. J R Stat Soc Ser C Appl Stat
17. Nevill AM, Holder RL. Scaling, normalizing, and per ratio standards: an allometric modeling approach. J Appl Physiol (1985)
18. Nevill AM, Holder RL, Baxter-Jones A, Round JM, Jones DA. Modeling developmental changes in strength and aerobic power in children. J Appl Physiol (1985)
19. Nevill AM, Ramsbottom R, Williams C. Scaling physiological measurements for individuals of different body size. Eur J Appl Physiol Occup Physiol
20. Nevill AM, Stavropoulos-Kalinoglou A, Metsios GS, et al. Inverted BMI rather than BMI is a better proxy for percentage of body fat. Ann Hum Biol
21. Siri WE. Body composition from fluid space and density: analysis of methods. In: Brozek J, Henschel A, editors. Technique for Measuring Body Composition
. Washington (DC): National Academy of Sciences; 1961. pp. 223–44.
22. Taylor HL, Jacobs DR, Schucker B, Knudsen J, Leon AS, Debacker G. A questionnaire for the assessment of leisure time physical activities. J Chronic Dis
23. Winter EM, Nevill AM. Scaling: adjusting for differences in body size. In: Eston RG, Reilly T, editors. Kinanthropometry and Exercise Physiology Laboratory Manual for Tests, Procedures and Data
. 3rd ed. Abingdon (Oxon): Routledge; 2009. p. 300–14.