Criterion methods for the estimation of body fat such as underwater weighing, deuterium oxide dilution, magnetic resonance imaging, and computed tomography are seldom feasible in population level applications because they are expensive, usually not portable and require subject compliance (^{25}). As an alternative, equations predicting percentage body fat (PBF) using anthropometric measurements are used (^{19,25}). Nevertheless, most equations that use anthropometrics to predict PBF were developed (^{1–5,8–12,17,18,20,24,27,30,33–36,38}) and internally validated (^{3,12,20,35,36,38}) in relatively small samples of predominant White individuals assembled by convenience. Thus, questions often remain about the generalizability of developed equations.

A limited number of studies in adults have validated equations using an independent, external sample, and those studies also used samples assembled by convenience (^{15,19,36}). Although those studies do increase confidence in the external validity of calculated PBF estimates, they do not give clear indication of the generalizability of results to a defined population or give clear insights into the usefulness of equations in different sex, age and race/ethnic groups. Data from the 1999–2004 National Health and Nutrition Examination Survey (NHANES) provide an opportunity to examine generalizability in a sample that is representative of the United States. In the 1999–2004 NHANES, PBF was measured using dual-energy x-ray absorptiometry (DXA), and participants also provided a battery of anthropometric and demographic measurements. Therefore, PBF results calculated from published equations that used anthropometric, and demographic variables can be compared with those measured using DXA.

The purpose of this study was to evaluate the generalizability of PBF calculated using selected, published equations by examining their accuracy, precision, and bias separately in the adult men and women in NHANES. Because the relationships between anthropometric measurements and PBF change with age (^{6}), sex, race/ethnicity, and obesity status (^{17}), investigators strive to use equations developed in a sample similar to the one under study. For equations that were developed in a defined subgroup (e.g., 20–29 yr old white men), we conduct our studies of validation in a subgroup from NHANES constructed to have similar characteristics. We also tested the generalizability in the whole NHANES sample ages ≥20 yr and within age, body mass index (BMI), and race/ethnicity categories. Analyses within these categories were conducted to provide insight into the usefulness of different equations in specific groups. To examine differential bias (i.e., systematic/nonrandom error between comparison groups), we compared differences in PBF estimates between normal weight and obese men and women.

#### METHODS

Data were from the 1999–2004 NHANES (^{22}). NHANES is a stratified, multistage probability sample that represents the U.S. civilian noninstitutionalized population. Non-Hispanic blacks, Mexican Americans, low-income whites (beginning in 2000), adolescents ages 12–19 yr, and persons ages ≥60 yr were oversampled to provide more reliable estimates for those subgroups. All participants provided written informed consent, and the study protocol was approved by the institutional review board at the Centers for Disease Control and Prevention (Atlanta, GA).

##### Anthropometric measurements

Body measurements were recorded by a trained technician in a mobile examination center following standard procedures (^{21}). Standing height without shoes was measured with a stadiometer to the nearest 1 mm. Weight was measured to the nearest 0.1 kg in an examination gown without shoes using a Toledo digital scale. BMI was calculated as body weight in kilograms divided by the square of height in meters. Participants were categorized as underweight (<18.5 kg·m^{−2}), normal weight (≥18.5 to <25.0 kg·m^{−2}), overweight (≥25.0 to <30.0 kg·m^{−2}), or obese (≥30.0 kg·m^{−2}) (^{39}). Waist circumference (WC) was assessed with a measuring tape at the uppermost lateral border of the hip crest (ilium) to the nearest 0.1 cm. Triceps and subscapular skinfolds were measured to the nearest 0.1 mm using a Holtain skinfold caliper.

##### DXA measurements

Whole-body DXA scans were taken using a Hologic QDR 4500A fan-beam densitometer (Hologic, Inc., Bedford, MA) following the manufacturer’s acquisition procedures in the fast mode to determine PBF (^{23}). Participants were excluded from the DXA examination if they were pregnant, reported taking tests with radiographic contrast material or participating in nuclear medicine studies in the past 3 d, or their self-reported weight or height exceeded the DXA table limit (300 lb or 6 ft 5 inches). When possible, the National Center for Health Statistics imputed DXA data. Five imputations (^{23}) of the missing data were provided to allow the assessment of variability due to imputation.

##### Analytic sample

The NHANES 1999–2004 sample included 14,221 participants ages ≥20 yr who were interviewed and examined. Participants who were missing PBF from DXA (*n* = 1127), height (*n* = 133), weight (*n* = 31), WC (*n* = 342), and triceps or subscapular skinfolds (*n* = 2654) were excluded from all analyses. The total analysis sample included 9934 participants. DXA data were imputed for 1769 (17.8%) participants.

##### Equations evaluated

A literature search was conducted using PubMed and EMBASE to identify equations estimating PBF, fat mass, lean body weight/fat-free mass, and/or body density. Four categories of search terms were used: anthropometrics (WC and skinfold thickness), body composition, criterion methods for body composition measurement, and adults. All publications in English until February 4, 2013, in the data sets were included. Equations were selected for evaluation if they were developed in white, black, or Indian European adults that used anthropometric variables that were available from the NHANES (height, weight, WC, triceps, or subscapular skinfolds). To focus on identified equations using anthropometrics, those that included measurements from bioelectrical impedance analysis were not evaluated. When authors presented multiple equations from analyses of randomly selected subsamples (equation development and internal validation subsets), we selected the equation developed using the whole sample under study (^{3,12,36}). However, when equations were developed in informative subgroups (e.g., in specific age or sex groups) or using different variables in the whole sample, all equations recommended by the authors were evaluated (^{5,8,20,38}). The reference lists of publications containing evaluated equations were also reviewed. Overall, we identified 22 publications (^{1–5,8–12,17,18,20,24,27,29,30,33–36,38}), 4 of which presented two sets of equations (^{5,8,20,38}). Of the 26 equations, 2 are only for males and 3 are only for females (Table 1). The publication by Lean et al. (^{20}) was listed twice in Table 1 because the two equations published in this article were categorized to two different groups (i.e., equations used WC/waist-to-height ratio [WHtR] and those used multiple anthropometric variables). Lean body weight and fat-free mass was converted to PBF using (weight − lean body weight) / weight × 100 and (weight − fat-free mass) / weight × 100, respectively, and body density was converted to PBF using Siri’s equation (^{32}).

##### Analytic approach

The identified equations were applied to the NHANES demographic and anthropometric variables, and PBF was calculated. All analyses were stratified by sex. In one set of calculations, the sex, age range, and race/ethnicity of the sample in which the equation was developed were applied to the NHANES data as inclusion criteria. The equation by Deurenberg et al. (^{4}) was not examined because the age range in the original sample in which the equation was developed was not published. Also, this equation-specific analysis was limited to adults ages ≤84 yr because NHANES coded ages of ≥85 yr as 85 yr. In additional sets of calculations, the equations from the literature were used to estimate mean PBF in the entire NHANES sample more than 20 yr and in predefined subgroups by age, obesity status, or race/ethnicity. Equations were evaluated using *R*^{2}, root mean square error (RMSE, square root of the average of squared differences between the PBF from the equation and the PBF from DXA), and mean signed difference (MSD, the average of the differences between the calculated PBF and the PBF estimated from DXA) (^{37}). We also examined the absolute value of the discrepancy between MSD in normal weight and obese participants to illustrate the size of the differential bias found between participants in those BMI categories.

##### Statistical analysis

All analyses were conducted using SAS (version 9.2; SAS institute, Cary, NC). *R*^{2} and RMSE were calculated using the SURVEYREG procedure with the PBF by DXA as dependent variable and predicted PBF as the only independent variable. Because there was only one independent variable in the regression analysis, adjusted *R*^{2} was not calculated. MSD and its SEM were calculated using the SURVEYMEANS procedure. Domain analysis was applied to obtain estimates in subgroups stratified by sex and age, obesity status, or race/ethnicity. All the analyses were adjusted for complex sample design.

Standardized analyses were performed to account for the variability due to the five imputations in DXA data (^{23}). Briefly, every analysis involving PBF from DXA was run five times by using the five imputation data sets, and the mean *R*^{2}, RMSE, and MSD were calculated. The combined SEM of MSD was calculated as the square root of total variance, where total variance was calculated as *T* = *W* + (6 / 5)*B*, where *W* is the within-imputation variance that was calculated as the mean of the variances of the five individual estimates of MSD and *B* is the between-imputation variance calculated as

, where *Qi* represents the five individual estimates and

is the mean of the five individual estimates of MSD.

#### RESULTS

The mean age was 44.3 yr for males and 46.6 yr for females in the study population (Table 2). The mean PBF was 26.8 for males and 38.1 for females. The prevalence of overweight was 44.5% for males and 31.9% for females, and the prevalence of obesity was around 19.0% in both sexes. Mexican Americans, non-Hispanic whites, and blacks accounted for 7%, 73%, and 9% in the study population, respectively.

We evaluated the accuracy and precision of PBF prediction equations stratified by sex using three metrics (i.e., *R*^{2}, RMSE, and MSD) in the NHANES subgroups that were matched to the sex, age range, and race/ethnicity of the sample used to develop each equation (Table 3). No single equation performed the best in all the three metrics. We emphasized the MSD because it represents the ability of an equation to accurately estimate the mean level of PBF. In males, the equation by Slaughter et al. (^{33}) produced a relative small MSD (−0.9 percentage points) and RMSE (3.0 percentage points) and large *R*^{2} (0.770), but it was developed in an adult sample younger than 29 yr. Over a wide age range (20–64 yr), the equation by Kagawa et al. (^{18}) performed the best in males. This equation explained 68.2% of variance in PBF, produced a relatively small RMSE of 3.1 percentage points (Fig. 1), and only slightly overestimated mean PBF by 1.1 percentage points. Compared with the two equations by Lean et al. (^{20}), the Kagawa et al. (^{18}) equation produced a smaller MSD, especially in those age 40–64 yr (0.8 vs 2.0 and 3.4 percentage points, data not shown). Compared with the equation by Gomez-Ambrosi et al. (^{10}), the equation by Kagawa et al. (^{18}) produced a similar absolute magnitude of MSD (1.1 vs −1.0 percentage points) but had a greater *R*^{2} (0.682 vs 0.603) and smaller RMSE (3.1 vs 3.5 percentage points). Similarly, in females, the equation developed in samples with relatively narrower age range (18–27 yr) by Rush et al. (^{30}) performed the best (*R*^{2} = 0.721, RMSE = 3.3 and MSD = 0.2 percentage points). Over a wide age range (20–80 yr), the equations by Gomez-Ambrosi et al. (^{10}) and the equation by Lean et al. (^{20}) that used WC and triceps skinfold performed overall the best (*R*^{2} ≈ 0.69, RMSE ≈ 3.6, and |MSD| ≈1.3 percentage points) in females. Although the equations by Kagawa et al. (^{18}) and Lean et al. (^{20}) that used WC only produced a slightly smaller MSD (≈0.5 percentage points), they tended to have a smaller *R*^{2} (≈ 0.56) and a larger RMSE (= 4.4 percentage points). *R*^{2}, RMSE, and MSD in the subgroups stratified by sex and age groups were shown in Supplemental Digital Content Tables 1–3. (See Table, Supplemental Digital Content 1, http://links.lww.com/MSS/A315, *R*^{2} of regression analyses using PBF estimated from published equations using anthropometric measures to predict PBF from DXA in NHANES 1999–2004 in adult males and females matched by age range and race/ethnicity with the sample in which the equations were developed. See Table, Supplemental Digital Content 2, http://links.lww.com/MSS/A316, Root mean square errors of regression analyses using PBF estimated from published equations using anthropometric measures to predict PBF from DXA in NHANES 1999–2004 in adult males and females matched by age range and race-ethnicity with the sample in which the equations were developed. See Table, Supplemental Digital Content 3, http://links.lww.com/MSS/A317, Mean difference in PBF estimated from published equations using anthropometric measures and DXA in NHANES 1999–2004 in adult males and females matched by age range and race-ethnicity with the sample in which the equations were developed.)

Table 3 Image Tools |
Figure 1 Image Tools |

As would be expected, the amount of variance explained in the whole NHANES study population (Table 4) tended to be smaller than that found when equations were applied to the NHANES subset assembled to mimic the sample in which the equation was developed (Table 3). Similarly, Figure 1 shows that the RMSE was larger in the whole study population than that in the equation-specific population. The RMSE generally ranged between 3.0 and 4.0 percentage points of PBF in males and between 3.5 and 4.5 percentage points in females in the whole study population (Fig. 1 and Table, Supplemental Digital Content 4, http://links.lww.com/MSS/A319, Root mean square errors of regression analyses using PBF estimated from published equations using anthropometrics to predict PBF from DXA in males and females age ≥20 yr by overall, age, obesity status, and race/ethnicity: NHANES 1999–2004). Supplemental Digital Content 5 (http://links.lww.com/MSS/A318) shows the MSD according to subgroups by sex and age, weight status, and race/ethnicity in the whole study population (see Table, Supplemental Digital Content 5, http://links.lww.com/MSS/A318, Mean signed differences in PBF estimated from published equations using anthropometrics and from DXA in males and females age ≥20 yr by overall age, obesity status and race/ethnicity: NHANES 1999–2004).

Taking into account *R*^{2}, RMSE, and MSD, the comparison of equations using single category of anthropometrics (i.e., skinfold thickness, BMI, or WC/WHtR) found the two equations by Kagawa et al. (^{18}) (*R*^{2} = 0.664, RMSE = 3.3 percentage points, and MSD = 0.9 percentage points) and Lean et al. (^{20}) (*R*^{2} = 0.677, RMSE = 3.2 percentage points, and MSD = 0.4 percentage points) that used WC/WHtR performed overall the best in males, whereas those by Deurenberg et al. (^{3}), Gomez-Ambrosi et al. (^{10}), Pasco et al. (^{27}), and Rush et al. (^{30}) using BMI were the best in females (*R*^{2} = 0.600–0.668, RMSE = 3.7–4.1 percentage points, MSD = −1.7 to −0.9 percentage points). Compared with the equations using WC/WHtR or BMI, those that used skinfold thickness performed less favorably in older adults (*R*^{2} = 0.396–0.696 vs 0.652–0.777, RMSE = 4.2–5.9 vs 3.7–4.2 percentage points, MSD = −6.8 to −2.5 vs −3.5 to 4.5 percentage points).

Comparison across race/ethnicity found equations did not perform better in one race ethnicity than that in others. For example, in females, the *R*^{2} estimates tended to be the largest in non-Hispanic whites (Table 4). Nevertheless, RMSE tended to be the lowest in Mexican Americans (Supplemental Digital Content 4, http://links.lww.com/MSS/A319, RMSE in the whole study population), and MSD tended to be the lowest in non-Hispanic blacks (Supplemental Digital Content 5, http://links.lww.com/MSS/A318, MSD in the whole study population).

In general, the equations by Kagawa et al. (^{18}) and by Lean et al. (^{20}) that used WC/WHtR performed the best in males by overall and subgroups (Table 4 and Supplemental Digital Content 4, http://links.lww.com/MSS/A319, RMSE in the whole study population, and Supplemental Digital Content 5, http://links.lww.com/MSS/A318, MSD in the whole study population). Compared with the two equations, the equation by Gomez-Ambrosi et al. (^{10}) had a similar MSD (−0.9 percentage points), but it produced a smaller *R*^{2} (0.590) and a larger RMSE (3.6 percentage points). However, in females, the equations by Gomez-Ambrosi et al. (^{10}) and by Rush et al. (^{30}) that used BMI performed the best overall and across the subgroups. In contrast, the equation by Kagawa et al. (^{18}) had a relative small MSD but it produced a smaller *R*^{2} (0.540) and a larger RMSE (4.4 percentage points).

Figure 2 shows the absolute values of differences in the MSD in obese compared with normal weight males and females. This value represents the size of the differential bias between the estimates, but not the direction. For example, the value of 10.2 percentage points for the study by Smith and Boyce (^{34}) in females is a result of the equation underestimating the PBF by 6.4 percentage points in normal weight group while overestimating PBF in obese group by 3.8 percentage points. For ease of visual evaluation, the results are shown in order of the size of the discrepancy, and it can easily be seen that only 8 of 23 equations for males (^{1,2,5,8,11,36,38}) and 4 of 24 equations for females (^{1,5,20,24}) had a differential bias of less than 2 percentage points.

#### DISCUSSION

We found most equations evaluated provided *R*^{2} values between 0.5 and 0.7 and RMSE estimates between 3.0 and 4.0 percentage points for males and between 3.5 and 4.5 percentage points for females when tested in the full NHANES data. The overall MSD was generally between −5.0 and 2.0 percentage points. The equations by Kagawa et al. (^{18}) and by Lean et al. (^{20}) that used WC/WHtR provided the most accurate and precise (*R*^{2}, RMSE, and MSD) estimate of overall average PBF in males, while the equations by Gomez-Ambrosi et al. (^{10}) and Rush et al. (^{30}) that used BMI performed overall the best in females. However, the differential bias of results from the equations by Kagawa et al. (^{18}) and by Lean et al. (^{20}) that used WC/WHtR in normal weight compared with obese adults was greater compared with that found in the examination of the two by Durnin and Womersley (^{5}) and the equations by Hassager et al. (^{11}), Chapman et al. (^{1}), Visser et al. (^{36}), Wilmore and Behnke (^{38}) that estimated lean body weight, Deurenberg et al. (^{2}), and the simplified equation by Gallagher et al. (^{8}) in males. In females, the differential bias of results from the equations Gomez-Ambrosi et al. (^{10}) and by Rush et al. (^{30}) that used BMI was greater than that produced by Noppa et al. (^{24}), Chapman et al. (^{1}), Durnin and Womersley (^{5}) age-specific equations, and Lean et al. (^{20}) WC equations. Because NHANES collected only two skinfolds (triceps and subscapular) and did not collect hip circumference, some of the more popular skinfold and/or WC-based prediction equations that require other skinfolds or hip circumference cannot be evaluated.

Matching the sex, age range, and race/ethnicity of the sample in which an equation was developed generally resulted in a higher *R*^{2} and lower RMSE and MSD compared with the whole NHANES study population, but differences were usually not large. Differences were larger for equations developed in a relatively narrow age range of young adults, perhaps because application of these equations to an older population was problematic. In addition, previous studies indicated that body fat prediction equations should only be applied to populations similar to those in which the equation was developed (^{15,36}). However, our study found that the Gomez-Ambrosi et al. (^{10}) equation developed in a sample with a higher prevalence of overweight and obesity than the NHANES 1999–2004 sample (85% vs 57%) provided the best estimate of overall average PBF in females.

In our analysis, equations using skinfold thickness alone performed less favorably in older adults than those using BMI or WC alone. Similarly, in a study of 60–87 yr old adults, Visser et al. (^{36}) found that relative to PBF measures from underwater weighing, an equation that included skinfolds (sum of triceps, subscapular, biceps, and suprailiac) explained a smaller proportion of variance than an equation that used BMI (58% vs 67%). Skinfold thickness may provide weak prediction of PBF in older individuals because body fat accumulates internally with aging (^{5}), resulting in a lower correlation between skinfold thickness and body composition (^{6,28}).

It is well known that at the same BMI, women have a much higher PBF than men, and therefore, almost all equations are either stratified by sex or include sex in the prediction equation. Sex comparisons of equations that included only one type of anthropometric assessment (skinfold or circumference or BMI) found those using WC performed the best in males while those using BMI performed the best in females. This finding is supported by a study by Flegal et al. (^{6}) that used the same NHANES data used here and in other studies (^{20}). There is no unified measurement protocol for WC. WC was measured at the umbilicus (^{29}), the midway (^{20}), or the narrowest point (^{18}) between the lateral lower ribs and the iliac crests, and laterally at the level of the iliac crests and anteriorly at the umbilicus (^{38}) in the equations evaluated, while WC was assessed at the top of iliac crest in NHANES. The difference in the measurement of WC for the development of equations and in NHANES might have influenced the performance of these equations in our study.

In our study, we used four different metrics (i.e., *R*^{2}, RMSE, MSD, and differential bias) to indicate the validity of calculated PBF values, but there is no established criterion for acceptable accuracy and precision for each metric and no direct way to combine the information. It is likely that the relative need for strong performance in each of these metrics may vary given the goals of the investigator. If the main goal is to predict the mean level of PBF in one group, then a small RMSE and an overall MSD close to zero will be important. However, if accurate ranking of individuals in a sample by PBF is important to the investigator, then *R*^{2} will become increasingly important as will differential error within key subgroups. We emphasized the systematic difference in bias in estimates of percent body fat in normal weight and obese adults because it is likely to often be important to investigators that there is not a differential error in the estimation of PBF over the range of BMI. Differential bias in the estimation of body fat between adults in different BMI categories can warp the association of obesity with its risk factors and consequences.

We found the equations using BMI or WC tended to underestimate PBF in normal weight adults but slightly overestimate PBF in obese adults. In contrast, equations using skinfolds were likely to underestimate PBF in obese to a greater extent than that in nonobese women. A similar pattern was seen in the study by Heyward et al. (^{13}) that evaluated an equation that included skinfolds in 77 nonobese and 71 obese women compared with underwater weighing. Consistent with previous studies (^{14,26}), we found that compared with non-Hispanic whites, equations used skinfolds or BMI systematically overestimated PBF in non-Hispanic blacks and underestimated PBF in Mexican Americans. Because most of those equations underestimated average PBF in non-Hispanic whites (^{17,19}), those equations provided the most accurate estimation of average PBF in non-Hispanic blacks in the representative American population.

Previous investigators have reported a curvilinear relationship between BMI (^{7,17}), skinfold thickness (^{33}), WC (^{7}), and PBF. Studies have also reported interactions or effect modification by demographic variables of the relationship between anthropometrics and PBF (^{16,17,27}). The prediction of PBF from anthropometric measures might be improved by taking into account these complex relationships between anthropometrics and PBF. In our analysis, the equations by Gomez-Abrosi et al. (^{10}) and Pasco et al. (^{27}) that included BMI^{2} and interaction terms between age, sex and BMI, and BMI^{2} explained 2%–8% more of variance in PBF than those using BMI in its linear form only. Similarly, Freedman et al. (^{7}) compared equations with and without nonlinear terms in 1151 healthy men and women ages 18–110 yr compared with DXA. Their results indicated that adding a nonlinear term for BMI or WC increased the *R*^{2} from 0.79 to 0.83 for the equation using BMI and from 0.77 to 0.79 for that using WC. Inclusion of nonlinear and interaction terms would make equations more difficult for hand calculation, but with the wide availability of computational technology, this would pose an obstacle in only very selected setting.

To our knowledge, this is the first study to evaluate the performance of equations that estimate PBF from anthropometric measures in a large-scale, nationally representative American sample, and the use of the NHANES data was a strength of this study. Nevertheless, it was a limitation that we used DXA rather than a four-component model as the criterion measure of body fat. Twenty-two of the 26 equations were developed using criterion methods other than DXA. A previous study indicated that the DXA device used in NHANES underestimate PBF compared with body density methods (^{31}). Therefore, the public release NHANES DXA data were adjusted based on the work by Schoeller et al. (^{31}). Nevertheless, it is still feasible that using DXA to evaluate equations developed against other criterion methods may introduce bias (^{15}), possibly making equations that used DXA as the criterion method appear more favorable. In this study, we identified four equations that performed the best (i.e., the equations by Kagawa et al. [^{18}] and by Lean et al. [^{20}] that used WC/WHtR for males, and the equations by Gomez-Ambrosi et al. [^{10}] and by Rush et al. [^{30}] for females); however, only the equation by Kagawa et al. (^{18}) was developed against DXA. In addition, the other three equations developed using DXA (^{1,27,35}) did not perform well in this study. Further, DXA data for 17.8% of participants were imputed. In addition, equations including anthropometric variables that were not measured in the NHANES could not be evaluated. Equations for children were not examined here because of the many different issues that affect PBF in children compared with adults.

Although equations generally performed better in subgroups similar to the ones in which they were developed, this trend was not always seen and differences were not large for some equations. Performance of equations in specific demographic subgroups varied and appeared to be related to the types of anthropometric measures that were included. It is possible that the prediction of PBF could be improved by multiple anthropometric measures in a single equation and by inclusion of nonlinear terms and interactions. We believe more work needs to be done comparing differential error in informative subgroups in addition to the examination of overall *R*^{2}, RMSE, and MSD. In addition, the elucidation of the relative importance of the different metrics used to assess the validity of equations to predict PBF would be helpful to future investigators.

This study was not supported by external funding.

The authors have no conflicts of interest to declare.

The results of the present study do not constitute endorsement by the American College of Sports Medicine.