Validation, Recalibration, and Predictive Accuracy of Published V̇O2max Prediction Equations for Adults Ages 50–96 Yr : Medicine & Science in Sports & Exercise

Journal Logo

SPECIAL COMMUNICATIONS: Methodological Advances

Validation, Recalibration, and Predictive Accuracy of Published V̇O2max Prediction Equations for Adults Ages 50–96 Yr

SCHUMACHER, BENJAMIN T.1; DI, CHONGZHI2; BELLETTIERE, JOHN1; LAMONTE, MICHAEL J.3; SIMONSICK, ELEANOR M.4; PARADA, HUMBERTO Jr5,6; HOOKER, STEVEN P.7; LACROIX, ANDREA Z.1

Author Information
Medicine & Science in Sports & Exercise 55(2):p 322-332, February 2023. | DOI: 10.1249/MSS.0000000000003033
  • Free

Abstract

The capacity of the circulatory and respiratory systems to deliver oxygen to skeletal muscles for use during physical activity (PA) and exercise can be quantified by one’s cardiorespiratory fitness (CRF) level (1). CRF is a physiological attribute determined by several factors including age, sex, health status, and genetics; however, the principal modifiable determinant is habitual PA level (1). Through increases in the frequency, duration, and intensity of PA, CRF can incrementally increase, especially among the sedentary, although CRF declines soon after the frequency, duration, and/or intensity of PA declines. Thus, CRF often is used as an objective surrogate of recent PA patterns. Decades of clinical, epidemiologic, and exercise science studies have reported that higher CRF is a strong and independent predictor of a myriad of beneficial health outcomes (2–4). Low CRF is among the strongest predictors of cardiovascular and all-cause mortality, with associations as strong or stronger than those of smoking, obesity, and high blood pressure with the same outcomes (5,6). Likewise, higher CRF is associated with lower coronary heart disease/cardiovascular disease incidence and mortality (7–9), incidence of cardiometabolic risk factors (10,11), cancer incidence and cancer mortality (12–14), dementias (15) including Alzheimer’s disease (16) and their progression, depression symptoms (17,18), rates of loss of independence for older adults (19), and all-cause mortality (5,6,8,20).

The gold-standard measure of CRF is maximal oxygen uptake (V̇O2max) (1). In research settings, V˙O2max measurements are conducted using maximal graded exercise tests on a treadmill or stationary cycle ergometer and require specialized testing equipment, highly trained personnel, and direct physician supervision in most instances. Furthermore, in vulnerable populations such as older adults, V˙O2max testing may be contraindicated because it requires maximal, strenuous activity to the point of absolute exhaustion. Thus, conducting direct measures of V˙O2max in large epidemiologic cohort studies are largely infeasible (21). As an alternative approach, several non–exercise-based V˙O2max prediction equations have been published to enable the approximation of V̇O2maxin a variety of settings, including large epidemiologic cohorts (7,22–29). However, few equations have been developed specifically for use in older adult populations (25,27). There is a critical need for accurate V̇O2maxprediction models in older adults, given that by the year 2060, almost a quarter of the US population will be composed of adults 65 yr or older (i.e., older adults) (30), and V̇O2maxhas been identified as a hallmark biomarker of successful aging (31). Given the shifting demographics, the challenges older adults face with V̇O2maxtesting and the benefits of increased CRF on health, we aimed to quantify the performance of published V˙O2max prediction models in relation to measured V̇O2maxin the Baltimore Longitudinal Study of Aging (BLSA), recalibrate the equations to the BLSA cohort, and assess their predictive accuracy in relation to all-cause mortality.

METHODS

Study Participants

The analytic sample for the present study was derived from the BLSA, the longest running scientific study of aging (32,33). The BLSA was established in 1958 and is conducted by the National Institute on Aging Intramural Research Program (34). BLSA participants have been asked to visit the BLSA testing facility every 1 to 4 yr to undergo a 3-d battery of health, cognitive, and functional evaluations. More than 3000 participants have participated in the BLSA since its inception, and more than 1300 participants are still active (32). To date, 1080 BLSA participants have had laboratory-based V˙O2max measurements that meet the criteria for a maximal test. Extensive details about the design, recruitment, and measurements collected in the BLSA have been published elsewhere (33). This study was approved by the relevant institutional review boards, and all participants provided written informed consent.

Measures

V̇O2max measurement

V̇O2max(measured in milliliters of O2 uptake per kilogram body weight per minute; mL·kg−1·min−1) was assessed in the BLSA using a modified Balke treadmill testing protocol (35,36). This protocol consists of a graded exercise test; walking on a treadmill at a constant pace at 3.0 mph for women and 3.5 mph for men, with the incline of the treadmill increasing 3% every 2 min until the participant indicates he/she has reached exhaustion. During this test, expired gas volumes were measured using a Parkinson–Cowan gas meter and concentrations of oxygen and carbon dioxide were measured using a medical mass spectrometer (Perkin-Elmer MGA-1110), which was calibrated daily using standard gases. A computerized interface between the gas meter and mass spectrometer calculated average expired gas concentrations every 30 s throughout the test, and the highest 30-s value for O2 uptake defined the participant’s V̇O2max.

Achievement of maximal effort during the treadmill test was defined as reaching a respiratory exchange ratio (RER) >1.0. Fifty-two participants had a V˙O2max test just below this RER cutoff when the treadmill test was stopped. Of these 52 participants, 11 achieved ≥85% of their age-predicted maximal heart rate (in beats per minute; calculated as 220 − age) and reported a Borg rating of perceived exertion ≥17 on a 20-point scale, so their tests were considered to reflect maximal effort and were included in the present analysis. Of the remaining 41 participants with an RER <1.0 at the time the treadmill was stopped, 31 were excluded because they had no other V̇O2maxtest that met the aforementioned maximal effort criteria, and 10 participants who provided a subsequent V̇O2maxtest that fit the criteria for a maximal test were included, resulting in a final analytic sample of 1080. For participants with multiple V˙O2max measurements, the first measurement satisfying these criteria was used in the present study.

Nonexercise test V̇O2max prediction models

Google Scholar was used to query previously published studies using the terms “non–exercise-based V̇O2maxprediction models” and “older adults,” yielding a total of 12 V̇O2maxprediction equations from nine published studies that were assessed in the present study. Studies that developed V̇O2maxprediction equations derived solely for use in younger populations, were developed using any form of exercise testing or physical performance as a predictor of V̇O2max, or included variables in the prediction equation not available in the BLSA were not included in the present study. Each prediction equation included sex, age, and some measure of body mass. Some equations additionally included variables such as self-reported PA scores, smoking history, height, and resting heart rate. In the present analysis, covariates in the published V̇O2maxprediction equations were matched with their closest equivalent covariate in the BLSA.

Outcome ascertainment

All-cause mortality status and date of death were ascertained by linking participants to the National Death Index, a centralized database of death record information compiled from state vital statistics records, and by correspondence from relatives (37). Follow-up for mortality occurred from first V̇O2maxtest date, the earliest of which was January 1, 2007, through April 15, 2021. Mortality ascertainment was high, with 96% of participants having a classified vital status. Over a median follow-up time of 9.6 yr (range, 0.60–14.1 yr), 141 participants died of any cause.

Covariates

Covariates for the V̇O2maxprediction equations or their closest approximations in the BLSA included participant’s sex, age, body mass index (BMI), resting heart rate, self-reported PA/exercise level, self-rated general health status, and smoking history. In the BLSA, a participant’s sex and age were self-reported during each health history interview. Height and weight were measured using a stadiometer and calibrated scale, respectively, and BMI was calculated as weight in kilograms divided by height in meters squared. Resting heart rate was assessed by a nurse after the participant had been sitting quietly for at least 5 min (38). Participants were asked how much time they spent each week engaging in weight/circuit training, moderate-to-high intensity exercise, or brisk walking, which was then categorized as 0–29 min (coded as 0), 30–74 min (coded as 1), 75–149 min (coded as 2), or ≥150 min (coded as 3) min. Health-related quality of life was assessed using the 12-item Short-Form Health Survey (SF-12) (39). Smoking history (never, current, or former smoker) was self-reported using a standardized questionnaire (40). The following covariates were not used in any V̇O2maxprediction models but were used in the description of the study sample: self-reported race (White, Black, Asian/Other Pacific Islander, Other/not classifiable), self-reported educational attainment (noncollege graduate, college graduate, postcollege graduate), β-blocker use (yes or no), and systolic and diastolic blood pressures (mm Hg; oscillometric brachial blood pressure was measured with the participant in a supine position on both arms three times and the minimum systolic and diastolic blood pressures were used).

Statistical Analysis

We compared covariates by sex-specific quartiles of measured V̇O2maxusing χ2 tests for categorical variables and ANOVA tests for continuous variables.

Predicted V̇O2maxwas calculated using each V˙O2max prediction equation as originally published. The performance (ability to accurately predict measured V̇O2max) of each equation was evaluated by comparing the predicted V̇O2maxwith the measured V̇O2maxusing the root mean square error (RMSE), bias, mean absolute percentage error (MAPE), the Bland–Altman 95% limits of agreement (LOA) (41), correlation coefficients, and R2. These analyses were conducted in the overall sample and within sex strata. In brief, RMSE quantifies the concentration of the data around the line of best fit by estimating the square of all predicted V̇O2maxminus measured V̇O2maxpairs, taking the mean of these squared differences, and obtaining the square root of the mean squared errors. Bias was computed by taking the mean of the measured V˙O2max minus predicted V˙O2max pairs. MAPE was computed by taking the mean of the absolute value of the percent deviation of the predicted V̇O2maxfrom the measured V̇O2max.The lower the RMSE, bias, and MAPE, the better the performance of the prediction model, with 0 indicating perfect prediction of the measured V̇O2max. The calculation for the Bland–Altman 95% LOA has been described elsewhere, but these limits are expected to capture 95% of the differences between measured and predicted V̇O2max; a more narrow range of limits indicates a better prediction (41). The Bland–Altman 95% LOA were obtained using the blandr package in R (42).

Because the accuracy of each V̇O2maxprediction equation is strongly influenced by the distribution of covariates and measured V̇O2maxin the source population from which the equation was derived, the application of a prediction equation from one population to another can affect predictive accuracy. Therefore, each V̇O2maxprediction equation was recalibrated by regressing measured V̇O2maxin the BLSA on the BLSA covariates representing those used in each prediction equation. With recalibration, the regression coefficients for each covariate in relation to measured V̇O2maxare derived fully from the BLSA, as opposed to applying regression weights calculated in a different population to BLSA covariates. Recalibration has been used in other settings to evaluate the accuracy of prediction equations when transported from the source to other populations (43). Residuals versus fitted, normal Q–Q, scale–location, and residuals versus leverage plots were used to assess model diagnostics of the recalibrated V̇O2maxprediction equations (44). After evaluation of all recalibrated equations, their predicted V˙O2max values were output. Performance metrics for the recalibrated equations included the same metrics as described previously for evaluation of the original equations, as well as the 10-fold cross-validation RMSE and R2 values.

To further evaluate the validity of predicted V̇O2maxvalues, sequentially adjusted Cox proportional hazards regression models were used to estimate the associations of quartiles of (1) measured V˙O2max, (2) predicted V̇O2max, and (3) recalibration-predicted V̇O2maxwith all-cause mortality. Model 1 was unadjusted, model 2 adjusted for age and sex, and model 3 adjusted for model 2 covariates in addition to race and ethnicity, and education. Linear trends across quartiles (P value for trend) were tested by specifying the quartile indicator in the model as a continuous variable. Associations between a 1-SD increase in each V˙O2max variable and all-cause mortality were also assessed using the same modeling approach, and P values for the centered and scaled V˙O2max variable in the model are presented. The concordance statistic (c statistic), the proportion of pairs of participants where the model correctly predicts which participant will experience a mortality event first is also presented. Tests of the proportional hazards assumption were conducted using the cox.zph function of the survival package (45) in R through the testing of the correlation of each covariate’s (and the whole model’s) scaled Schoenfeld residuals with time to ensure independence between the residuals and time; no violations were noted. Variance inflation factors were used to assess multicollinearity between independent variables; no values were outside the range of 0.25–4 (46).

All analyses were conducted in R version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria).

RESULTS

Sample characteristics

The 565 women and 515 men with measured V̇O2maxincluded in this study had a mean age, BMI, and V˙O2max of 69.0 ± 10.4 yr, 27.0 ± 4.4 kg·m−2, and 21.6 ± 5.9 mL·kg−1·min−1, respectively (Table 1). Two-thirds of study participants were non-Hispanic White, one-fourth were non-Hispanic Black, 4.6% were non-Hispanic Asian, 3.2% were Hispanic, and the remaining 0.7% were from other non-Hispanic race/ethnicity groups or could not be classified. The majority of the sample (61.9%) had a postcollege education. Current smoking prevalence was 1.8%, and mean systolic and diastolic blood pressures were 114.1 ± 14.1 and 66.7 ± 8.8 mm Hg, respectively. Age, BMI, current smoking, and systolic blood pressure were inversely related to incremental quartiles of measured V̇O2max, whereas education, self-reported exercise, self-rated health status, and diastolic blood pressure were positively related to V̇O2max(Table 1).

TABLE 1 - Characteristics of BLSA participants overall and according to quartiles of measured V˙O2max (n = 1080).
Characteristic a Measured V̇O2max
Total (n = 1080) Quartile 1 b (n = 270) Quartile 2 b (n = 277) Quartile 3 b (n = 265) Quartile 4 b (n = 268) Pc
Age (yr), mean (SD) 69.0 (10.4) 75.5 (8.8) 72.1 (9.7) 67.3 (8.9) 60.9 (8.2) <0.01
Race, n (%) <0.01
 Non-Hispanic, White 708 (65.6) 169 (62.6) 177 (63.9) 177 (66.8) 185 (69.0)
 Non-Hispanic, Black 279 (25.8) 87 (32.2) 82 (29.6) 60 (22.6) 50 (18.7)
 Non-Hispanic, Asian/Other Pacific Islander 50 (4.6) 8 (3.0) 9 (3.2) 14 (5.3) 19 (7.1)
 Hispanic 35 (3.2) 4 (1.5) 6 (2.2) 11 (4.2) 14 (5.2)
 Non-Hispanic, Other/not classifiable 8 (0.7) 2 (0.7) 3 (1.1) 3 (1.1) 0 (0.0)
Highest attained education, n (%) <0.01
 Postcollege 669 (61.9) 152 (56.3) 168 (60.6) 169 (63.8) 180 (67.2)
 College 225 (20.8) 51 (18.9) 53 (19.1) 57 (21.5) 64 (23.9)
 Noncollege graduate 183 (16.9) 67 (24.8) 56 (20.2) 39 (14.7) 21 (7.8)
BMI (kg·m−2), mean (SD) 27.0 (4.4) 28.9 (4.7) 27.4 (4.6) 26.6 (4.1) 24.9 (3.4) <0.01
β-Blocker use, n (%) 152 (14.1) 78 (28.9) 39 (14.1) 22 (8.3) 13 (4.9) <0.01
Minutes of exercise, n (%) <0.01
 0–29 465 (43.1) 171 (63.3) 127 (45.8) 93 (35.1) 74 (27.6)
 30–74 169 (15.6) 36 (13.3) 48 (17.3) 33 (12.5) 52 (19.4)
 75–149 165 (15.3) 25 (9.3) 42 (15.2) 52 (19.6) 46 (17.2)
 150+ 272 (25.2) 36 (13.3) 59 (21.3) 84 (31.7) 93 (34.7)
Self-rated health, n (%) <0.01
 Excellent 339 (31.4) 43 (15.9) 84 (30.3) 90 (34.0) 122 (45.5)
 Very good/good 715 (66.2) 219 (81.1) 185 (66.8) 170 (64.2) 141 (52.6)
 Fair/poor 14 (1.3) 5 (1.9) 6 (2.2) 2 (0.8) 1 (0.4)
Systolic BP (mm Hg), mean (SD) 114.1 (14.1) 117.3 (14.8) 116 (13.3) 113 (13.7) 110.2 (13.3) <0.01
Diastolic BP (mm Hg), mean (SD) 66.7 (8.8) 65 (8.4) 66.3 (9.3) 66.9 (8.6) 68.5 (8.5) <0.01
Smoking status, n (%) <0.01
 Never 682 (63.1) 149 (55.2) 169 (61.0) 180 (67.9) 184 (68.7)
 Former 372 (34.4) 112 (41.5) 103 (37.2) 83 (31.3) 74 (27.6)
 Current 19 (1.8) 7 (2.6) 4 (1.4) 1 (0.4) 7 (2.6)
Maximal exercise test
 V̇O2max (mL·kg−1·min−1), median (SD) 21.6 (5.9) 15.5 (2.5) 19.8 (2.1) 23.5 (2.2) 28.8 (4.5) <0.01
 RER, mean (SD) 3.3 (68.1) 1.2 (0.1) 1.2 (0.1) 1.2 (0.1) 9.5 (136.8) 0.18
 Borg score, Mean (SD) 16.5 (1.7) 16.1 (1.7) 16.2 (1.7) 16.7 (1.7) 17 (1.6) <0.01
 % of maximum predicted HR, mean (SD) d 98.8 (50.2) 89.6 (13.3) 97.5 (9.3) 100.3 (8.5) 107.8 (98.3) <0.01
Bold indicates significance at the P < 0.05 level.
aPercentages may not sum to 100% because of missing data.
bSex-specific quartile definitions were as follows: Q1: men, <19.9 (n = 129); women, <16.5 (n = 141). Q2: men, ≥19.9 and ≤23.7 (n = 131); women, ≥16.5 and ≤19.9 (n = 146). Q3: women, >23.7 and ≤27.4 (n = 128); women, >19.9 and ≤23.7 (n = 137). Q4: women, >27.4 (n = 127); women, >23.7 (n = 141).
cP value for continuous variables from one-way ANOVA and χ2 goodness-of-fit test for categorical variables across V̇O2max quartiles.
dMaximum predicted heart rate: 220 − age.

V̇O2max prediction equations

When each prediction equation (Table 2) was used to estimate V˙O2max in the BLSA sample, the lowest and highest RMSE values (in units of mL·kg−1·min−1) of the V˙O2max prediction equations were 4.2 (Bradshaw et al.’s [22] equations) and 20.4 (Jang et al. [28]), respectively (Table 3). The absolute value of the bias (unitless) ranged from 0.1 (Matthews et al. [25]) to 19.3 (Jang et al. [28]). Bradshaw et al. (22) had the lowest MAPE value (15.4%), and Jang et al. (28) had the highest MAPE value (97.7%).

TABLE 2 - Extant V˙O2max prediction equations, adaptations for the BLSA, and the recalibrated formulas.
Study Prediction Equation from Study Variable Definitions Variable Adaptations
Jurca et al. (23); NASA Original: 18.07 + 2.77(Sex) − 0.10(Age) − 0.17(BMI) − 0.03(Resting HR) + 0.32(SRPA1) + 1.06(SRPA2) + 1.76(SRPA3) + 3.03(SRPA4)
Recalibrated: 58.64 + 4.62(Sex) − 0.31(Age) − 0.47(BMI) − 0.04(Resting HR) − 2.39(exercise0) − 1.73(exercise1) − 1.15(exercise2)
Sex coded as men = 1 and women = 0.
SRPA0 = little activity other than walking for pleasure (ref.). SRPA1 = some regular participation in modest physical activities involving sports, recreational activities. SRPA2 = aerobic exercise such as run/walk for 20 to 60 min·wk−1. SRPA3 = aerobic exercise such as run/walk for 1 to 3 h·wk−1. SRPA4 = aerobic exercise such as run/walk for >3 h·wk−1.
BLSA participants self-reported the weekly minutes of weight/circuit training, moderate-to-high intensity exercise, and/or brisk walking and were categorized into 0–29 min (coded as 0), 30–74 min (coded as 1), 75–149 min (coded as 2), and 150+ min (coded as 3). Dummy variables were created, and exercise0 = SRPA1, exercise1 = SRPA2, exercise3 = SRPA3, and exercise3 = SRPA4.
Jurca et al. (23); ACLS Original: 18.81 + 2.49(Sex) − 0.08(Age) − 0.17(BMI) − 0.05(Resting HR) + 0.81(SRPA1) + 1.17(SRPA2) + 2.16(SRPA3) + 3.05(SRPA4)
Recalibrated: 58.64 + 4.62(Sex) − 0.31(Age) − 0.47(BMI) − 0.04(Resting HR) − 2.39(exercise0) − 1.73(exercise1) − 1.15(exercise2)
Sex coded as men = 1 and women = 0. SRPA0 = no activity (ref.). SRPA1 = participated in sporting or leisure-time PA other than walking, jogging, or running. SRPA2 = walk, jog, or run up to 10 miles per week. SRPA3 = walk, jog, or run from 10 to 20 miles per week. SRPA4 = walk, jog, or run >20 miles per week BLSA participants self-reported the weekly minutes of weight/circuit training, moderate-to-high intensity exercise, and/or brisk walking and were categorized into 0–29 min (coded as 0), 30–74 min (coded as 1), 75–149 min (coded as 2), and 150+ min (coded as 3). Dummy variables were created, and exercise0 = SRPA1, exercise1 = SRPA2, exercise3 = SRPA3, and exercise3 = SRPA4.
Jurca et al. (23); ADNFS Original: 21.41 + 2.78(Sex) − 0.11(Age) − 0.17(BMI) − 0.05(Resting HR) + 0.35(SRPA1) + 0.29(SRPA2) + 0.64(SRPA3) + 1.21(SRPA4)
Recalibrated: 58.64 + 4.62(Sex) − 0.31(Age) − 0.47(BMI) − 0.04(Resting HR) − 2.39(exercise0) − 1.73(exercise1) − 1.15(exercise2)
Sex coded as men = 1 and women = 0. SRPA0 = from 0 to 4 occasions of at least moderate activity in the past 4 wk (ref.). SRPA1 = from 5 to 11 occasions of at least moderate activity in the past 4 wk. SRPA2 = G.E. 12 occasions of moderate activity in the past 4 wk. SRPA3 = G.E. 12 occasions of a mix of moderate and vigorous activities in the past 4 wk. SRPA4 = G.E. 12 occasions of vigorous activity in the past 4 wk BLSA participants self-reported the weekly minutes of weight/circuit training, moderate-to-high intensity exercise, and/or brisk walking and were categorized into 0–29 min (coded as 0), 30–74 min (coded as 1), 75–149 min (coded as 2), and 150+ min (coded as 3). Dummy variables were created, and exercise0 = SRPA1, exercise1 = SRPA2, exercise3 = SRPA3, and exercise3 = SRPA4.
Bradshaw et al. (22) Original: 48.073 + 6.178(Sex) − 0.246(Age) − 0.619(BMI) + 0.712(PFA) + 0.671(PA-R)
Recalibrated: 46.61 + 4.82(Sex) − 0.31(Age) − 0.46(BMI) + 1.58(SF health) + 0.57(exercise)
Sex coded as men = 1 and women = 0. PFA = 2 questions that ascertain how fast participants feel they can cover a 1- and 3-mile distance at a comfortable pace. The participant’s sum total of both 13-point questions is counted as the PFA score (range, 2–26). PA-R = individuals rate their activity level over the previous 6 mo using a 10-point scale. Reverse coded SF-12 self-rated health score was used in lieu of PFA (5 = “excellent,” 4 = “very good,” 3 = “good,” 2 = “fair,” 1 = “poor”; used the aforementioned exercise variable in lieu of PA-R).
Jackson et al. (24) Original: 56.363 + 1.921(PA-R) − 0.381(Age) − 0.754(BMI) + 10.987(Sex)
Recalibrated: 53.89 + 0.74(exercise) − 0.31(Age) − 0.5(BMI) + 4.7(Sex)
Sex coded as men = 1 and women = 0. SRPA0 = little activity other than walking for pleasure (ref.). PA-R from Jurca et al. NASA equation. Used the aforementioned exercise variable in lieu of PA-R
Matthews et al. (25) Original: 34.142 + 11.403(Sex) + 0.133(Age) − (0.005(Age*Age)) + 1.463(PAS) + 9.170*(ht meters) − 0.254(Body mass in kg)
Recalibrated: 32.58 + 5.15(Sex) − 0.32(Age) + 0.74(exercise) + 12.97(ht meters) − 0.18(wt kg)
Sex coded as men = 1 and women = 0. PAS = physical activity status (0–7); this instrument has subjects rate their last month of PA participation on a 0–7 scale. Responses of 0 and 1 represented no regular PA, whereas a response of 2 or 3 represented moderate-intensity activities, and responses of 4 to 7 represented regular vigorous PA participation of increasing exercise time. Used the aforementioned exercise variable in lieu of PAS
Sloan et al. (26) Original: Male without HR: 52.23–0.20(Age) − 0.35(BMI) − 0.06(HRrest/min) + 2.05(PA score)
Original: Female without HR: 47.79–0.21(Age) − 0.35(BMI) − 0.06(HRrest/min) + 2.08(PA score)
Original: Male w/o HR: 49.9–0.21(Age) − 0.36(BMI) + 2.12(PA score)
Original: Female w/o HR: 43.27–0.22(Age) − 0.37(BMI) + 2.17(PA score)
Recalibrated without HR: 56.06 + 4.62(Sex) − 0.31(Age) − 0.47(BMI) − 0.04(Resting HR) + 0.78(exercise)
Recalibrated w/o HR: 53.89 + 4.7(Sex) − 0.31(Age) − 0.5(BMI) + 0.74(exercise)
PA score: in accordance with the procedure outlined by Jurca et al. (23), participants were asked to select 1 of the 5 levels of self-reported PA that best described their usual activity pattern: (a) level 0, inactive or little activity other than usual daily activities; (b) level 1, regular (>5 d·wk−1) participation in physical activities requiring low levels of exertion that result in slight increases in breathing and heart rate for at least 10 min at a time; (c) level 2, participation in aerobic exercises such as brisk walking, jogging or running, cycling, swimming or vigorous sports at a comfortable pace, or other activities requiring similar levels of exertion for 20–60 min·wk−1; (d) level 3, participation in aerobic exercises such as brisk walking, jogging or running at a comfortable pace, or other activities requiring similar levels of exertion for 1–3 h·wk−1; and (e) level 4, participation in aerobic exercises such as brisk walking, jogging or running at a comfortable pace, or other activities requiring similar levels of exertion for >3 h·wk−1. Used the aforementioned exercise variable in lieu of PA score
de Souza e Silva et al. (27) Original: 44.74 – 10.9(Sex) – 0.35(Age) – 0.15(Weight pounds) + 0.68(Height inches); treadmill constant into the intercept
Recalibrated: 44.3–5.31(Sex) − 0.33(Age) − 0.09(Weight pounds) + 0.35(Height inches)
Sex coded as men = 1 and women = 2
Baynard et al. (29) Original: 77.96–10.35(Sex) − 0.32(Age) − 0.92(BMI)
Recalibrated: 61.45–4.82(Sex) − 0.33(Age) − 0.54(BMI)
Sex coded as men = 0 and women = 1
Jang et al. (28) Original: 50.543–0.069(Age) + 13.525(Sex) − 0.403(BMI) − 1.530(Smoking)
Recalibrated: 56.58–0.33(Age) + 4.85(Sex) − 0.53(BMI) − 1.49(Smoking)
Smoking: 0 = never or former, 1 = current
Myers et al. (7) Original: 79.9–0.39(Age) − 13.7(Sex) − 0.127(wt lb)
Recalibrated: 62.71–0.35(Age) − 6.85(Sex) − 0.08(wt lb)
Sex coded as men = 0 and women = 1

TABLE 3 - Performance metrics for previously published V̇O2max prediction equations compared with measured V̇O2max in the BLSA.
Formula RMSE Bias MAPE LOA Correlation with Measured V̇O2max R 2
Jurca et al. (23); NASA 15.8 15.1 67.8 (5.7, 24.5) 0.68 0.46
 Male 16.8 15.8 63.5 (4.9, 26.7) 0.60 0.36
 Female 15.0 14.5 71.2 (6.7, 22.3) 0.67 0.45
Jurca et al. (23); ACLS 15.1 14.3 63.5 (4.6, 23.9) 0.66 0.44
 Male 16.1 15.1 60.2 (4.0, 26.2) 0.58 0.34
 Female 14.2 13.6 66.2 (5.5, 21.6) 0.64 0.41
Jurca et al. (23); ADNFS 15.4 14.7 65.4 (5.1, 24.2) 0.69 0.48
 Male 16.4 15.4 61.5 (4.4, 26.4) 0.66 0.44
 Female 14.6 14.1 68.6 (6.0, 22.1) 0.68 0.46
Bradshaw et al. (22) 4.2 1.0 15.4 (−7.1, 9.0) 0.72 0.52
 Male 4.6 0.2 15.7 (−8.8, 9.2) 0.67 0.45
 Female 3.8 1.7 15.2 (−5.1, 8.5) 0.74 0.55
Jackson et al. (24) 7.2 4.7 29.4 (−6.0, 15.4) 0.69 0.48
 Male 5.1 1.5 17.1 (−8.0, 11.1) 0.64 0.41
 Female 8.7 7.6 40.5 (−0.7, 15.9) 0.73 0.53
Matthews et al. (25) 5.3 −0.1 20.7 (−10.4, 10.2) 0.72 0.52
 Male 5.6 −2.3 20.1 (−12.2, 7.6) 0.67 0.45
 Female 5.0 1.9 21.2 (−7.2, 10.9) 0.72 0.52
Sloan et al. (26); HR 5.1 −2.3 21.0 (−11.2, 6.6) 0.67 0.45
 Male 6.0 −2.9 23.2 (−13.1, 7.4) 0.58 0.34
 Female 4.2 −1.8 19.2 (−9.4, 5.7) 0.66 0.44
Sloan et al. (26); no HR 5.3 −2.3 21.3 (−11.6, 7.0) 0.65 0.42
 Male 6.4 −4.0 26.0 (−13.9, 5.8) 0.57 0.32
 Female 3.9 −0.8 17.0 (−8.3, 6.8) 0.67 0.45
de Souza e Silva et al. (27) 5.4 −1.6 21.4 (−11.8, 8.5) 0.67 0.45
 Male 6.6 −4.4 26.3 (−14.1, 5.2) 0.61 0.37
 Female 4.1 0.9 16.9 (−6.9, 8.7) 0.71 0.50
Baynard et al. (29) 6.3 −3.6 25.7 (−13.8, 6.6) 0.66 0.44
 Male 8.0 −6.4 33.7 (−15.9, 3.2) 0.61 0.37
 Female 4.2 −1.1 18.4 (−9.0, 6.9) 0.70 0.49
Jang et al. (28) 20.4 −19.3 97.7 (−32.5, −6.1) 0.44 0.19
 Male 24.8 −24.2 113.3 (−35.1, −13.2) 0.45 0.20
 Female 15.4 −14.8 83.4 (−23.0, −6.7) 0.61 0.37
Myers et al. (7) 5.7 −2.3 22.2 (−12.4, 7.8) 0.66 0.44
 Male 7.0 −5.0 28.4 (−14.6, 4.5) 0.61 0.37
 Female 4.0 0.2 16.6 (−7.7, 8.0) 0.68 0.46
Bold indicates significance at the P < 0.01 level.

After recalibration of the equations to the BLSA data, every equation improved on all performance metrics (Tables 3, 4). The recalibrated formulas’ cross-validated RMSE values ranged from 3.9 (Bradshaw et al. [22]) to 4.2 (Myers et al. [7]), and as expected, all bias values were 0. MAPE values were similar across the recalibrated prediction equations, ranging from 14.4% (Bradshaw et al. [22]) to 15.7% (Myers et al. [7]). The R2 for the recalibrated equations ranged from 49% (Myers et al. [7]) to 58% (Bradshaw et al. [22]), which compares favorably with an age- and sex-adjusted model R2 of 36%. Additional recalibrated performance metrics including sex-stratified performance metrics are reported in Tables 3, 4.

TABLE 4 - Performance metrics for recalibrated V̇O2max prediction equations compared with measured V̇O2max in the BLSA.
Recalibrated Formula RMSE MAPE LOA Correlation with Measured V̇O2max R 2a
Jurca et al. (23) 4.1 15.4 (−8.1, 8.1) 0.73 0.53
 Male 4.8 16.2 (−9.4, 9.4) 0.68 0.46
 Female 3.5 14.8 (−6.8, 6.8) 0.73 0.53
Bradshaw et al. (22) 3.9 14.4 (−7.6, 7.6) 0.76 0.58
 Male 4.3 14.6 (−8.4, 8.4) 0.72 0.52
 Female 3.4 14.3 (−6.7, 6.7) 0.74 0.55
Jackson et al. (24) 4.0 15.0 (−7.9, 7.9) 0.73 0.53
 Male 4.5 15.3 (−8.9, 8.9) 0.67 0.45
 Female 3.5 14.7 (−6.8, 6.8) 0.73 0.53
Matthews et al. (25) 4.0 15.0 (−7.9, 7.9) 0.73 0.53
 Male 4.5 15.3 (−8.9, 8.9) 0.67 0.45
 Female 3.5 14.7 (−6.9, 6.9) 0.73 0.53
Sloan et al. (26); HR 4.1 15.4 (−8.1, 8.1) 0.73 0.53
 Male 4.8 16.1 (−9.4, 9.4) 0.68 0.46
 Female 3.5 14.8 (−6.8, 6.8) 0.72 0.52
Sloan et al. (26); no HR 4.0 15.0 (−7.9, 7.9) 0.73 0.53
 Male 4.5 15.3 (−8.9, 8.9) 0.67 0.45
 Female 3.5 14.7 (−6.8, 6.8) 0.73 0.53
de Souza e Silva et al. (27) 4.1 15.4 (−8.1, 8.1) 0.72 0.52
 Male 4.6 15.7 (−9.1, 9.1) 0.66 0.44
 Female 3.6 15.1 (−7.1, 7.1) 0.71 0.50
Baynard et al. (29) 4.1 15.4 (−8.1, 8.1) 0.72 0.52
 Male 4.6 15.8 (−9.1, 9.1) 0.66 0.44
 Female 3.6 15.1 (−7.0, 7.0) 0.71 0.50
Jang et al. (28) 4.1 15.4 (−8.1, 8.1) 0.72 0.52
 Male 4.6 15.7 (−9.1, 9.1) 0.66 0.44
 Female 3.6 15.1 (−7.0, 7.0) 0.71 0.50
Myers et al. (7) 4.2 15.7 (−8.2, 8.2) 0.70 0.49
 Male 4.6 15.9 (−9.1, 9.1) 0.65 0.42
 Female 3.7 15.5 (−7.3, 7.3) 0.68 0.46
RMSE and R2 were obtained using 10-fold cross-validation. Bold indicates significance at the P < 0.01 level.
aIn a model with age and sex alone, the R2 for the entire sample was 0.356, 0.315 for men, and 0.251 for women.

V̇O2max associations with mortality

When assessing the associations between quartiles of measured V˙O2max and all-cause mortality, a steep inverse gradient in mortality risk across incremental V˙O2max quartiles was evident in both unadjusted and adjusted models. Adjusting for model 3 covariates, the hazard ratios (HR) and 95% confidence intervals (CI) were 0.55 (0.37–0.82), 0.30 (0.17–0.54), and 0.34 (0.15–0.75) for quartile 2 (Q2)–Q4 relative to Q1 of measured V̇O2max, respectively (Ptrend < 0.001; Table 5). To further investigate the robustness of measured V̇O2max to adjustments beyond the model 3 covariates, we additionally adjusted for the following variables: BMI, smoking history, self-rated health, diagnosed diabetes, glucose intolerance or high blood sugar, history of heart attack or myocardial infarction, history of heart failure or congestive heart failure, history of stroke, mini stroke, or slight stroke, and current hypertension. The HR from this model slightly strengthened in magnitude, remained statistically significant, and maintained their trend across quartiles (HR for Q2–Q4 relative to Q1: 0.56 (0.36–0.88), 0.30 (0.16–0.59), and 0.31 (0.13–0.75); Ptrend < 0.001).

TABLE 5 - HR of all-cause mortality by measured and predicted V̇O2max in the selected BLSA sample (n = 1080).
Author Model Sex-Specific Quartiles of V̇O2max P-Trend HR for 1-SD Increase P c Statistic
Q1 Q2 Q3 Q4
Measured
1 1.00 (ref.) 0.43 (0.29–0.63) 0.16 (0.09–0.29) 0.10 (0.05–0.20) <0.01 0.46 (0.38–0.57) <0.01 0.71 (0.02)
2 1.00 (ref.) 0.55 (0.37–0.81) 0.30 (0.17–0.54) 0.34 (0.16–0.75) <0.01 0.51 (0.39–0.66) <0.01 0.79 (0.02)
3 1.00 (ref.) 0.55 (0.37–0.82) 0.30 (0.17–0.54) 0.34 (0.15–0.75) <0.01 0.50 (0.38–0.66) <0.01 0.79 (0.02)
Baynard et al. (29)
1 1.00 (ref.) 0.67 (0.45–0.99) 0.42 (0.27–0.66) 0.15 (0.07–0.29) <0.01 0.89 (0.75–1.05) 0.16 0.66 (0.02)
2 1.00 (ref.) 0.72 (0.49–1.07) 0.82 (0.51–1.33) 0.58 (0.28–1.20) 0.12 0.91 (0.66–1.24) 0.55 0.78 (0.02)
3 1.00 (ref.) 0.71 (0.48–1.06) 0.85 (0.53–1.37) 0.63 (0.30–1.32) 0.19 0.94 (0.68–1.30) 0.72 0.78 (0.02)
Bradshaw et al. (22)
1 1.00 (ref.) 0.66 (0.44–0.98) 0.33 (0.20–0.54) 0.24 (0.14–0.42) <0.01 0.79 (0.67–0.93) <0.01 0.64 (0.02)
2 1.00 (ref.) 0.74 (0.50–1.10) 0.76 (0.45–1.28) 1.09 (0.58–2.02) 0.62 0.90 (0.68–1.19) 0.47 0.78 (0.02)
3 1.00 (ref.) 0.73 (0.49–1.09) 0.78 (0.47–1.32) 1.27 (0.67–2.41) 0.83 0.93 (0.70–1.25) 0.64 0.78 (0.02)
de Souza e Silva et al. (27)
1 1.00 (ref.) 0.59 (0.40–0.88) 0.43 (0.28–0.66) 0.14 (0.07–0.28) <0.01 0.86 (0.73–1.01) 0.07 0.66 (0.02)
2 1.00 (ref.) 0.66 (0.44–0.98) 0.87 (0.54–1.39) 0.57 (0.28–1.18) 0.13 0.89 (0.65–1.21) 0.45 0.78 (0.02)
3 1.00 (ref.) 0.65 (0.44–0.97) 0.90 (0.56–1.45) 0.64 (0.31–1.34) 0.21 0.93 (0.67–1.28) 0.64 0.78 (0.02)
Jackson et al. (24)
1 1.00 (ref.) 0.47 (0.31–0.72) 0.32 (0.20–0.51) 0.19 (0.11–0.34) <0.01 0.81 (0.69–0.96) 0.01 0.66 (0.02)
2 1.00 (ref.) 0.69 (0.46–1.05) 0.82 (0.50–1.35) 0.98 (0.51–1.89) 0.52 0.92 (0.68–1.25) 0.59 0.77 (0.02)
3 1.00 (ref.) 0.69 (0.46–1.06) 0.83 (0.51–1.37) 1.12 (0.57–2.21) 0.70 0.94 (0.69–1.29) 0.71 0.78 (0.02)
Jang et al. (28)
1 1.00 (ref.) 1.08 (0.70–1.67) 0.88 (0.56–1.38) 0.54 (0.33–0.91) 0.02 1.33 (1.12–1.57) <0.01 0.57 (0.02)
2 1.00 (ref.) 1.03 (0.67–1.59) 0.74 (0.47–1.17) 1.00 (0.59–1.69) 0.49 0.80 (0.38–1.67) 0.55 0.78 (0.02)
3 1.00 (ref.) 1.05 (0.68–1.63) 0.76 (0.48–1.20) 1.05 (0.62–1.77) 0.60 0.87 (0.41–1.84) 0.71 0.78 (0.02)
Jurca et al. (23); ACLS
1 1.00 (ref.) 0.55 (0.31–0.99) 0.33 (0.17–0.64) 0.28 (0.13–0.58) <0.01 0.74 (0.59–0.94) 0.01 0.63 (0.04)
2 1.00 (ref.) 0.65 (0.36–1.17) 0.65 (0.33–1.26) 0.89 (0.40–1.96) 0.39 0.87 (0.60–1.26) 0.47 0.80 (0.03)
3 1.00 (ref.) 0.62 (0.34–1.12) 0.60 (0.31–1.18) 0.90 (0.40–2.01) 0.34 0.87 (0.60–1.27) 0.47 0.80 (0.03)
Jurca et al. (23); ADNFS
1 1.00 (ref.) 0.76 (0.45–1.29) 0.24 (0.11–0.50) 0.14 (0.06–0.37) <0.01 0.67 (0.52–0.85) <0.01 0.67 (0.03)
2 1.00 (ref.) 1.18 (0.69–2.02) 0.74 (0.33–1.68) 1.31 (0.42–4.09) 0.95 0.89 (0.54–1.46) 0.64 0.80 (0.03)
3 1.00 (ref.) 1.14 (0.66–1.97) 0.72 (0.32–1.64) 1.33 (0.41–4.36) 0.89 0.88 (0.53–1.47) 0.63 0.80 (0.03)
Jurca et al. (23); NASA
1 1.00 (ref.) 0.44 (0.24–0.79) 0.31 (0.16–0.59) 0.22 (0.10–0.48) <0.01 0.67 (0.52–0.85) <0.01 0.64 (0.04)
2 1.00 (ref.) 0.69 (0.38–1.26) 0.69 (0.35–1.33) 1.12 (0.47–2.66) 0.58 0.84 (0.57–1.24) 0.39 0.80 (0.03)
3 1.00 (ref.) 0.67 (0.36–1.22) 0.65 (0.33–1.27) 1.15 (0.48–2.78) 0.54 0.84 (0.57–1.25) 0.40 0.81 (0.02)
Matthews et al. (25)
1 1.00 (ref.) 0.25 (0.16–0.40) 0.19 (0.12–0.31) 0.09 (0.04–0.18) <0.01 0.60 (0.50–0.71) <0.01 0.71 (0.02)
2 1.00 (ref.) 0.47 (0.29–0.75) 0.57 (0.32–1.03) 0.60 (0.25–1.45) 0.04 0.82 (0.57–1.16) 0.26 0.78 (0.02)
3 1.00 (ref.) 0.47 (0.29–0.75) 0.62 (0.34–1.13) 0.69 (0.28–1.68) 0.07 0.85 (0.59–1.22) 0.37 0.79 (0.02)
Myers et al. (7)
1 1.00 (ref.) 0.62 (0.42–0.91) 0.36 (0.23–0.56) 0.13 (0.06–0.26) <0.01 0.83 (0.70–0.98) 0.03 0.66 (0.02)
2 1.00 (ref.) 0.82 (0.55–1.21) 0.80 (0.49–1.30) 0.70 (0.32–1.53) 0.24 0.78 (0.56–1.07) 0.13 0.77 (0.02)
3 1.00 (ref.) 0.86 (0.58–1.27) 0.86 (0.53–1.40) 0.79 (0.35–1.76) 0.43 0.82 (0.59–1.15) 0.26 0.78 (0.02)
Sloan et al. (26); HR
1 1.00 (ref.) 0.41 (0.22–0.75) 0.32 (0.17–0.60) 0.22 (0.10–0.47) <0.01 0.65 (0.51–0.83) <0.01 0.64 (0.04)
2 1.00 (ref.) 0.63 (0.34–1.15) 0.61 (0.32–1.17) 1.05 (0.44–2.50) 0.37 0.84 (0.59–1.20) 0.34 0.80 (0.03)
3 1.00 (ref.) 0.62 (0.34–1.16) 0.58 (0.30–1.12) 1.08 (0.44–2.61) 0.34 0.84 (0.59–1.21) 0.36 0.81 (0.02)
Sloan et al. (26); no HR
1 1.00 (ref.) 0.38 (0.24–0.59) 0.39 (0.25–0.60) 0.26 (0.15–0.43) <0.01 0.86 (0.73–1.02) 0.08 0.64 (0.02)
2 1.00 (ref.) 0.67 (0.42–1.05) 0.74 (0.47–1.15) 0.94 (0.53–1.66) 0.39 0.94 (0.72–1.24) 0.67 0.78 (0.02)
3 1.00 (ref.) 0.67 (0.43–1.06) 0.73 (0.46–1.14) 1.04 (0.58–1.88) 0.50 0.96 (0.72–1.26) 0.75 0.78 (0.02)
Model 1 = V̇O2max quartiles; crude. Model 2 = model 1 + age + sex. Model 3 = model 2 + race and ethnicity + education.

Results from the Cox proportional hazards regression models estimating the associations between predicted V̇O2max (each equation separately) and all-cause mortality are shown in Table 5. For most equations, predicted V˙O2max was associated with mortality in a pattern and strength similar to that of measured V̇O2max in the crude model (model 1), but adjustment for basic covariates in models 2 and 3 attenuated the HR, widened the confidence intervals to statistical insignificance, and eliminated all linear trends (Table 5).

After recalibration, unadjusted HR for Q2–Q4 relative to Q1 of predicted V˙O2max exhibited patterns and magnitudes of association that more closely reflected those for measured V̇O2max. For example, no published equation had an HR of 0.10 (the Q4 HR of measured V˙O2max relative to Q1) in Q4 relative to Q1, but the recalibrated Q4 HR was ≤0.10 for most equations. However, after adjustment for covariates in models 2 and 3, the HR values were attenuated again, CI values widened to statistical insignificance, and linear trends were not statistically significant (Table 6).

TABLE 6 - HR of all-cause mortality by measured and recalibrated, predicted V̇O2max in the selected BLSA sample (n = 1080).
Author Model Sex-Specific Quartiles of V̇O2max P-Trend HR for 1-SD Increase P c Statistic
Q1 Q2 Q3 Q4
Measured
1 1.00 (ref.) 0.43 (0.29–0.63) 0.16 (0.09–0.29) 0.10 (0.05–0.20) <0.01 0.46 (0.38–0.57) <0.01 0.71 (0.02)
2 1.00 (ref.) 0.55 (0.37–0.81) 0.30 (0.17–0.54) 0.34 (0.16–0.75) <0.01 0.51 (0.39–0.66) <0.01 0.79 (0.02)
3 1.00 (ref.) 0.55 (0.37–0.82) 0.30 (0.17–0.54) 0.34 (0.15–0.75) <0.01 0.50 (0.38–0.66) <0.01 0.79 (0.02)
Baynard et al. (29)
1 1.00 (ref.) 0.39 (0.26–0.58) 0.23 (0.14–0.37) 0.10 (0.05–0.20) <0.01 0.60 (0.50–0.71) <0.01 0.70 (0.02)
2 1.00 (ref.) 0.75 (0.48–1.16) 0.78 (0.43–1.39) 0.98 (0.39–2.46) 0.48 0.90 (0.64–1.27) 0.55 0.77 (0.02)
3 1.00 (ref.) 0.79 (0.51–1.22) 0.85 (0.47–1.55) 1.19 (0.47–3.05) 0.75 0.94 (0.66–1.33) 0.72 0.78 (0.02)
Bradshaw et al. (22)
1 1.00 (ref.) 0.50 (0.33–0.73) 0.26 (0.16–0.42) 0.13 (0.07–0.25) <0.01 0.60 (0.50–0.71) <0.01 0.68 (0.02)
2 1.00 (ref.) 0.87 (0.57–1.32) 0.83 (0.48–1.46) 1.16 (0.50–2.70) 0.77 0.86 (0.63–1.17) 0.34 0.78 (0.02)
3 1.00 (ref.) 0.90 (0.59–1.36) 0.91 (0.51–1.62) 1.34 (0.56–3.19) 0.97 0.89 (0.65–1.22) 0.48 0.78 (0.02)
de Souza e Silva et al. (27)
1 1.00 (ref.) 0.41 (0.27–0.60) 0.23 (0.14–0.37) 0.10 (0.05–0.20) <0.01 0.59 (0.49–0.70) <0.01 0.70 (0.02)
2 1.00 (ref.) 0.82 (0.53–1.27) 0.78 (0.44–1.39) 1.03 (0.41–2.59) 0.55 0.86 (0.61–1.21) 0.40 0.77 (0.02)
3 1.00 (ref.) 0.88 (0.57–1.36) 0.87 (0.48–1.58) 1.26 (0.50–3.23) 0.89 0.91 (0.64–1.29) 0.58 0.78 (0.02)
Jackson et al. (24)
1 1.00 (ref.) 0.40 (0.27–0.60) 0.23 (0.14–0.38) 0.10 (0.05–0.21) <0.01 0.61 (0.51–0.72) <0.01 0.69 (0.02)
2 1.00 (ref.) 0.65 (0.43–1.00) 0.72 (0.41–1.26) 0.82 (0.35–1.96) 0.22 0.91 (0.66–1.25) 0.57 0.77 (0.02)
3 1.00 (ref.) 0.65 (0.42–0.99) 0.78 (0.44–1.39) 0.95 (0.40–2.29) 0.34 0.94 (0.68–1.30) 0.72 0.78 (0.02)
Jang et al. (28)
1 1.00 (ref.) 0.40 (0.27–0.60) 0.23 (0.15–0.38) 0.10 (0.05–0.20) <0.01 0.59 (0.50–0.70) <0.01 0.70 (0.02)
2 1.00 (ref.) 0.79 (0.51–1.22) 0.84 (0.47–1.50) 1.07 (0.42–2.69) 0.66 0.90 (0.64–1.28) 0.56 0.78 (0.02)
3 1.00 (ref.) 0.83 (0.53–1.29) 0.91 (0.50–1.65) 1.33 (0.52–3.40) 0.96 0.94 (0.66–1.34) 0.72 0.78 (0.02)
Jurca et al. (23)
1 1.00 (ref.) 0.39 (0.22–0.68) 0.21 (0.11–0.43) 0.09 (0.03–0.24) <0.01 0.52 (0.40–0.66) <0.01 0.71 (0.03)
2 1.00 (ref.) 0.66 (0.37–1.19) 0.74 (0.33–1.66) 0.92 (0.25–3.34) 0.42 0.82 (0.52–1.29) 0.38 0.80 (0.03)
3 1.00 (ref.) 0.64 (0.35–1.16) 0.73 (0.32–1.65) 1.03 (0.27–4.02) 0.43 0.81 (0.50–1.29) 0.37 0.81 (0.02)
Matthews et al. (25)
1 1.00 (ref.) 0.41 (0.27–0.61) 0.24 (0.15–0.38) 0.11 (0.05–0.21) <0.01 0.60 (0.50–0.71) <0.01 0.69 (0.02)
2 1.00 (ref.) 0.70 (0.46–1.07) 0.74 (0.42–1.29) 0.79 (0.34–1.84) 0.25 0.88 (0.64–1.21) 0.43 0.77 (0.02)
3 1.00 (ref.) 0.69 (0.45–1.06) 0.79 (0.45–1.41) 0.88 (0.37–2.07) 0.37 0.92 (0.66–1.27) 0.60 0.78 (0.02)
Myers et al. (7)
1 1.00 (ref.) 0.41 (0.27–0.60) 0.21 (0.13–0.35) 0.09 (0.04–0.19) <0.01 0.56 (0.47–0.67) <0.01 0.70 (0.02)
2 1.00 (ref.) 0.76 (0.50–1.15) 0.66 (0.37–1.19) 0.86 (0.33–2.22) 0.22 0.76 (0.54–1.08) 0.13 0.77 (0.02)
3 1.00 (ref.) 0.80 (0.52–1.21) 0.72 (0.40–1.31) 1.11 (0.42–2.95) 0.42 0.81 (0.57–1.16) 0.26 0.78 (0.02)
Sloan et al. (26); HR
1 1.00 (ref.) 0.39 (0.22–0.67) 0.19 (0.09–0.39) 0.08 (0.03–0.23) <0.01 0.51 (0.40–0.66) <0.01 0.71 (0.03)
2 1.00 (ref.) 0.61 (0.34–1.10) 0.61 (0.27–1.41) 0.78 (0.22–2.82) 0.21 0.81 (0.51–1.27) 0.36 0.80 (0.03)
3 1.00 (ref.) 0.60 (0.33–1.08) 0.59 (0.25–1.37) 0.86 (0.22–3.32) 0.21 0.80 (0.50–1.28) 0.35 0.81 (0.03)
Sloan et al. (26); no HR
1 1.00 (ref.) 0.40 (0.27–0.60) 0.23 (0.14–0.38) 0.10 (0.05–0.21) <0.01 0.61 (0.51–0.72) <0.01 0.69 (0.02)
2 1.00 (ref.) 0.65 (0.43–1.00) 0.72 (0.41–1.26) 0.82 (0.35–1.96) 0.22 0.91 (0.66–1.25) 0.57 0.77 (0.02)
3 1.00 (ref.) 0.65 (0.42–0.99) 0.78 (0.44–1.39) 0.95 (0.40–2.29) 0.34 0.94 (0.68–1.30) 0.72 0.78 (0.02)
Model 1 = V̇O2max quartiles; crude. Model 2 = model 1 + age + sex. Model 3 = model 2 + race and ethnicity + education.

DISCUSSION

In the present study, we sought to provide validation, recalibration, and predictive accuracy metrics of published V̇O2max prediction equations with the aim of enabling large-scale epidemiologic cohorts with older, ambulatory, community-dwelling adults to accurately estimate V̇O2max. Performance metrics of several of the extant equations yielded reasonable results relative to measured V̇O2max, for example, the Bradshaw (22) equation had an RMSE value of 4.2 mL·kg−1·min−1. This means that, on average, this equation’s errors were within ~1.2 METs assuming the standard conversion of 3.5 mL·kg−1·min−1 to 1 MET. The Matthews (25) equation had absolute bias value of 0.1, meaning that, on average, this model’s predictions were within 0.03 METs. The recalibration of these equations using the BLSA-measured V˙O2max and covariate data improved every performance metric, although such recalibration would not be possible in epidemiologic cohorts unless V˙O2max and the covariates used in the derivation cohort were directly measured.

Cox proportional hazards modeling showed measured V̇O2max is an extremely powerful predictor of all-cause mortality in BLSA participants in both the unadjusted and adjusted models. Compared with participants in the lowest quartile of measured V̇O2max, those in the highest quartile had a threefold reduction in the risk of all-cause mortality, after adjusting for age, sex, race and ethnicity, and education. These HR values are similar to, although slightly stronger than, those reported in other studies of V˙O2max and all-cause mortality for those with the highest levels of CRF relative to those with the lowest CRF (47–49).

Among the previously published V̇O2maxprediction models, there was no discernable pattern of covariate types (i.e., demographics, body mass, self-reported PA) that contributed to the performance of the model more than others (e.g., the Bradshaw equation (22), one of the best performing models, has the same covariates as the Jurca equations (23), which did not perform as well in relation to measured V˙O2max in the BLSA). Several of the published equations yielded HR similar in pattern and magnitude to those of measured V̇O2maxbefore adjustment, but these associations were not robust to even minimal adjustments. After adjustment for only age and sex, the ability of the equations to predict mortality was substantially weakened, suggesting that much of the association observed in the unadjusted models was due to these two variables alone. In a very large study (n = 43,356), sex-stratified estimates of the association between predicted CRF and all-cause mortality remained statistically significant after adjustment for age (50). In regression models using the recalibrated equations, the patterns of association were more similar to those estimated using measured V˙O2max in unadjusted models (i.e., closer to the pattern of the unadjusted HR of measured V̇O2max; Q1–Q4): 1.00 (reference value), 0.43 (0.29–0.63), 0.16 (0.09–0.29), and 0.10 (0.05–0.20).

Despite the pattern of the recalibrated equations’ HR in unadjusted models, these associations were still not robust to adjustment. These findings strongly suggest that, although the equations may be valid and useful, to varying degrees, for individual exercise prescriptions in the field, their ability to predict mortality is severely compromised after adjustment for basic demographic and anthropometric covariates, some of which are components of the prediction equations themselves. V̇O2max and CRF in general are complex constructs reflecting an integration of multifaceted organ systems and metabolic processes (51). Without direct measures of the physiologic variability across individuals inherent in measured CRF, even well-performing prediction equations based on basic demographic and health characteristics do not predict mortality independent of sex and age. To a large extent, this is because demographic and behavioral characteristics do not adequately capture the integrated physiological signal reflected in measured V̇O2max.

There are some limitations to the present study. First, not all covariates from the published equations had exact counterpart covariates in the BLSA. Although these discrepancies could potentially limit the performance metrics of the equations when applied in the BLSA, this limitation would be eliminated once the equations were recalibrated to the BLSA measured V̇O2max. Next, the majority of the sample (61.9%) had a postcollege education, which is higher than the general population. One substantial strength of the present study is the prospective follow-up, enabling the evaluation of the accuracy of predicted V˙O2max with respect to measured V˙O2max and their associations with mortality. BLSA enrolled a large group of racially and ethnically diverse older adults, included laboratory-based measurements of V̇O2max, followed participants for mortality outcomes after V̇O2maxassessment, and collected data that enabled adjustment for confounders. The conclusions drawn from these data and analyses are robust across our approaches—the performance metrics and the HR contribute to a consistent and unified narrative regarding the importance of accurately assessing V˙O2max in older adults and the relevance of this aging biomarker (31) to clinical outcomes such as all-cause mortality.

CONCLUSIONS

Measured V˙O2max is an extremely strong predictor of all-cause mortality in aging men and women. Those in the highest sex-specific quartile of measured V˙O2max experienced a 66% lower risk of death relative to those in the lowest quartile of V˙O2max after adjustment for age, race, sex, and education. Several published V˙O2max prediction models yielded the following: 1) reasonable performance metrics relative to measured V˙O2max, especially when recalibrated, and 2) all-cause mortality HR similar to those of measured V˙O2max, especially when recalibrated, yet 3) were not robust to adjustment for basic demographic covariates. These findings make an important contribution to research on the development of an inexpensive surrogate for direct measurement of CRF that could be broadly used to guide healthy aging in the older population. Future studies should investigate whether modern analytic methods such as machine learning can improve prediction of V˙O2max in community-dwelling older adults so that this critical “vital sign” can be more broadly studied as a modifiable target for promoting functional resiliency and healthy aging.

The authors would like to acknowledge the BLSA participants and staff for their participation in this important scientific endeavor. The authors would like to thank Sandy Liles for his thorough review of and contributions to this article.

This research was supported in part by the Intramural Research Program of the National Institute on Aging.

H. P. J. was supported by the National Cancer Institute (K01 CA234317), the SDSU/UCSD Comprehensive Cancer Center Partnership (U54 CA132384 and U54 CA132379), and the Alzheimer’s Disease Resource Center for advancing Minority Aging Research at the University of California San Diego (P30 AG059299). The content is solely the responsibility of the authors and does not necessarily represent the official views of funding agencies.

The authors have no conflicts of interest to declare. The results of the study have been presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. Publication of these results does not constitute endorsement by the American College of Sports Medicine.

REFERENCES

1. Garber CE, Blissmer B, Deschenes MR, et al. American College of Sports Medicine position stand. Quantity and quality of exercise for developing and maintaining cardiorespiratory, musculoskeletal, and neuromotor fitness in apparently healthy adults: guidance for prescribing exercise. Med Sci Sports Exerc. 2011;43(7):1334–59.
2. Surgeon general’s report on physical activity and health. JAMA. 1996;276(7):522.
3. 2018 Physical Activity Guidelines Advisory Committee. 2018 Physical Activity Guidelines Advisory Committee Scientific Report. Washington, DC: U.S. Department of Health and Human Services; 2018.
4. Physical Activity Guidelines Advisory Committee Report, 2008 to the Secretary of Health and Human Services: (525442010-001). 2008 [cited 2022 Feb 11]. Available from: http://doi.apa.org/get-pe-doi.cfm?doi=10.1037/e525442010-001.
5. Lee D, Artero EG, Sui X, Blair SN. Mortality trends in the general population: the importance of cardiorespiratory fitness. J Psychopharmacol. 2010;24(4 Suppl):27–35.
6. Wei M, Kampert JB, Barlow CE, et al. Relationship between low cardiorespiratory fitness and mortality in normal-weight, overweight, and obese men. JAMA. 1999;282(16):1547–53.
7. Myers J, McAuley P, Lavie CJ, Despres J-P, Arena R, Kokkinos P. Physical activity and cardiorespiratory fitness as major markers of cardiovascular risk: their independent and interwoven importance to health status. Prog Cardiovasc Dis. 2015;57(4):306–14.
8. Kodama S, Saito K, Tanaka S, et al. Cardiorespiratory fitness as a quantitative predictor of all-cause mortality and cardiovascular events in healthy men and women: a meta-analysis. JAMA. 2009;301(19):2024–35.
9. Sui X, LaMonte MJ, Blair SN. Cardiorespiratory fitness as a predictor of nonfatal cardiovascular events in asymptomatic women and men. Am J Epidemiol. 2007;165(12):1413–23.
10. LaMonte MJ, Barlow CE, Jurca R, Kampert JB, Church TS, Blair SN. Cardiorespiratory fitness is inversely associated with the incidence of metabolic syndrome. Circulation. 2005;112(4):505–12.
11. Barlow CE, LaMonte MJ, Fitzgerald SJ, Kampert JB, Perrin JL, Blair SN. Cardiorespiratory fitness is an independent predictor of hypertension incidence among initially normotensive healthy women. Am J Epidemiol. 2006;163(2):142–50.
12. Lakoski SG, Willis BL, Barlow CE, et al. Midlife cardiorespiratory fitness, incident cancer, and survival after cancer in men: the Cooper Center Longitudinal Study. JAMA Oncol. 2015;1(2):231–7.
13. Peel JB, Sui X, Adams SA, Hébert JR, Hardin JW, Blair SN. A prospective study of cardiorespiratory fitness and breast cancer mortality. Med Sci Sports Exerc. 2009;41(4):742–8.
14. Sui X, Lee D-C, Matthews CE, et al. Influence of cardiorespiratory fitness on lung cancer mortality. Med Sci Sports Exerc. 2010;42(5):872–8.
15. Liu R, Sui X, Laditka JN, et al. Cardiorespiratory fitness as a predictor of dementia mortality in men and women. Med Sci Sports Exerc. 2012;44(2):253–9.
16. Vidoni ED, Honea RA, Billinger SA, Swerdlow RH, Burns JM. Cardiorespiratory fitness is associated with atrophy in Alzheimer’s and aging over 2 years. Neurobiol Aging. 2012;33(8):1624–32.
17. Milani RV, Lavie CJ. Impact of cardiac rehabilitation on depression and its associated mortality. Am J Med. 2007;120(9):799–806.
18. Sui X, Laditka JN, Church TS, et al. Prospective study of cardiorespiratory fitness and depressive symptoms in women and men. J Psychiatr Res. 2009;43(5):546–52.
19. Shephard RJ. Maximal oxygen intake and independence in old age. Br J Sports Med. 2009;43(5):342–6.
20. Sui X, LaMonte MJ, Laditka JN, et al. Cardiorespiratory fitness and adiposity as mortality predictors in older adults. JAMA. 2007;298(21):2507–16.
21. Fletcher GF, Ades PA, Kligfield P, et al. Exercise standards for testing and training: a scientific statement from the American Heart Association. Circulation. 2013;128(8):873–934.
22. Bradshaw DI, George JD, Hyde A, et al. An accurate V̇O2max nonexercise regression model for 18–65-year-old adults. Res Q Exerc Sport. 2005;76(4):426–32.
23. Jurca R, Jackson AS, LaMonte MJ, et al. Assessing cardiorespiratory fitness without performing exercise testing. Am J Prev Med. 2005;29(3):185–93.
24. Jackson AS, Blair SN, Mahar MT, Wier LT, Ross RM, Stuteville JE. Prediction of functional aerobic capacity without exercise testing. Med Sci Sports Exerc. 1990;22(6):863–70.
25. Matthews CE, Heil DP, Freedson PS, Pastides H. Classification of cardiorespiratory fitness without exercise testing. Med Sci Sports Exerc. 1999;31(3):486–93.
26. Sloan RA, Haaland BA, Leung C, Padmanabhan U, Koh HC, Zee A. Cross-validation of a non-exercise measure for cardiorespiratory fitness in Singaporean adults. Singapore Med J. 2013;54(10):576–80.
27. de Souza E Silva CG, Kaminsky LA, Arena R, et al. A reference equation for maximal aerobic power for treadmill and cycle ergometer exercise testing: analysis from the FRIEND registry. Eur J Prev Cardiol. 2018;25(7):742–50.
28. Jang T-W, Park S-G, Kim H-R, Kim J-M, Hong Y-S, Kim B-G. Estimation of maximal oxygen uptake without exercise testing in Korean healthy adult workers. Tohoku J Exp Med. 2012;227(4):313–9.
29. Baynard T, Arena RA, Myers J, Kaminsky LA. The role of body habitus in predicting cardiorespiratory fitness: the FRIEND registry. Int J Sports Med. 2016;37(11):863–9.
30. Vespa J, Medina L, Armstrong DM. Demographic Turning Points for the United States: Population Projections for 2020 to 2060 Population Estimates and Projections Current Population Reports. [date unknown]. Available from: www.census.gov/programs-surveys/popproj.
31. Kritchevsky SB, Forman DE, Callahan KE, et al. Pathways, contributors, and correlates of functional limitation across specialties: workshop summary. J Gerontol A Biol Sci Med Sci. 2019;74(4):534–43.
32. BLSA History. National Institute on Aging. [date unknown]; [cited 2021 Aug 17]. Available from: http://www.nia.nih.gov/research/labs/blsa/history.
33. Ferrucci L. The Baltimore Longitudinal Study of Aging (BLSA): a 50-year-long journey and plans for the future. J Gerontol A Biol Sci Med Sci. 2008;63(12):1416–9.
34. Stone JL, Norris AH. Activities and attitudes of participants in the Baltimore longitudinal study. J Gerontol. 1966;21(4):575–80.
35. Balke B, Ware RW. An experimental study of physical fitness of air force personnel. U S Armed Forces Med J. 1959;10(6):675–88.
36. Simonsick E, Fan E, Fleg J. Estimating cardiorespiratory fitness in well-functioning older adults: treadmill validation of the long distance corridor walk. J Am Geriatr Soc. 2006;54(1):127–32.
37. Data Access—National Death Index—About 2021; [cited 2022 Mar 1]. Available from: https://www.cdc.gov/nchs/ndi/about.htm.
38. Schrack JA, Leroux A, Fleg JL, et al. Using heart rate and accelerometry to define quantity and intensity of physical activity in older adults. J Gerontol A Biol Sci Med Sci. 2018;73(5):668–75.
39. Ware J, Kosinski M, Keller SD. A 12-item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care. 1996;34(3):220–33.
40. Wanigatunga AA, Gresham GK, Kuo P-L, et al. Contrasting characteristics of daily physical activity in older adults by cancer history. Cancer. 2018;124(24):4692–9.
41. Martin Bland J, Altman Douglas G. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;327(8476):307–10.
42. Datta D. deepankardatta/blandr: Version 0.4.2. 2017 [cited 2022 Feb 24]. Available from: https://zenodo.org/record/824524.
43. D’Agostino RB, Grundy S, Sullivan LM, Wilson P; CHD Risk Prediction Group. Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. JAMA. 2001;286(2):180–7.
44. Altman N, Krzywinski M. Regression diagnostics. Nat Methods. 2016;13(5):385–6.
45. Therneau TM. A Package for Survival Analysis in R. [date unknown]. Available from: https://cran.r-project.org/package=survival.
46. Freund RJ, Littell RC, Creighton L. Regression Using JMP. Cary, NC: J. Wiley; 2003. 286 p.
47. Farrell SW, Braun L, Barlow CE, Cheng YJ, Blair SN. The relation of body mass index, cardiorespiratory fitness, and all-cause mortality in women. Obes Res. 2002;10(6):417–23.
48. Park M-S, Chung S-Y, Chang Y, Kim K. Physical activity and physical fitness as predictors of all-cause mortality in Korean men. J Korean Med Sci. 2009;24(1):13–9.
49. Salier Eriksson J, Ekblom B, Andersson G, Wallin P, Ekblom-Bak E. Scaling V̇O2max to body size differences to evaluate associations to CVD incidence and all-cause mortality risk. BMJ Open Sport Exerc Med. 2021;7(1):e000854.
50. Artero EG, Jackson AS, Sui X, et al. Longitudinal algorithms to estimate cardiorespiratory fitness: associations with nonfatal cardiovascular disease and disease-specific mortality. J Am Coll Cardiol. 2014;63(21):2289–96.
51. Mitchell JH, Blomqvist G. Maximal oxygen uptake. N Engl J Med. 1971;284(18):1018–22.
Keywords:

CARDIORESPIRATORY FITNESS; AGING; EPIDEMIOLOGY; ASSESSMENT; RECALIBRATION

Copyright © 2022 by the American College of Sports Medicine