# Validation of the Wilks powerlifting formula

VANDERBURGH, P. M. and A. M. BATTERHAM. Validation of the Wilks powerlifting formula. *Med. Sci. Sports Exerc.,* Vol. 31, No. 12, pp. 1869–1875, 1999.

Purpose Because maximal strength varies with body mass, the International Powerlifting Federation (IPF) has adopted a method of adjusting powerlifting events (bench press, BP; squat, SQ; deadlift, DL, and total lift (the sum of BP, DL, and SQ), TOT) by body mass. This method, the Wilks formula, multiplies one’s lift by an index based on body mass so that lifters of different size can be compared on the same event. The Wilks formula is not, however, based on published data and has yet to be critically evaluated. The purpose of this investigation, then, was to validate the Wilks formula.

Methods This was performed by 1) examining residuals bias to verify that the adjusted score does, in fact, lead to no systematic bias based on body mass and 2) by applying a more theoretically supportable allometric model to the same data and comparing the fit with the Wilks approach. Subjects were the current men’s and women’s world record holders as well as the top two performers for each event in the IPF’s 1996 and 1997 World Championships (a total of 30 men and 27 women for each lift).

Results Results of data analysis regarding the Wilks formula indicate that: 1) there is no bias for men’s or women’s BP and TOT; 2) there is a favorable bias toward intermediate weight class lifters in the women’s SQ with no bias for men’s SQ; and 3) there is a linear unfavorable bias toward heavier men and women in the DL. Furthermore, the allometric approach indicated a bias against light and heavy men and women which may be considered acceptable given that half as many lifters are found in the lightest and heaviest weight classes as in the intermediate weight classes.

Conclusion As used currently (BP and TOT only), the Wilks formula appears to be a valid method to adjust powerlifting scores by body mass.

Department of Health and Sport Science, University of Dayton, Dayton, OH 45469-1210; and School of Social Sciences, University of Teesside, Middlesbrough TS1 3BA, UNITED KINGDOM

Submitted for publication May 1998.

Accepted for publication January 1999.

Address for correspondence: Paul M. Vanderburgh, Ed. D., HSS Dept., University of Dayton, 300 College Park, Day ton, OH 45469-1210. E-mail: vanderbu@yar.udayton.edu.

Four events make up the sport of powerlifting: bench press (BP), squat (SQ), deadlift (DL) and the total of the three (TOT). As with any maximal effort index of strength, these events are certainly related to body size; hence, different weight classes have been established for powerlifting competitions. Nearly all competitions, even at the world class level, exhibit nearly twice as many competitors per weight class in the intermediate weight class as the lightest and heaviest weight classes. This poses a dilemma for meet organizers: whether to award medals to the top performers in each weight class (even if there may be less than three per weight class), to have all competitors compete against one another while making some adjustment for body mass, or to provide a combination of both.

Several conventions to adjust for body mass can be employed. First, the simple ratio score can be used by dividing one’s lift by his/her body mass and high score denotes high achievement. This rather simple method creates a score that has reasonable face validity; it is clearly an expression of one’s strength-to-weight ratio, a variable that has been shown to be meaningful in certain scenarios such as military field training or sports such as rock climbing or gymnastics (^{13}). Further, it can be combined with other ratio scores from other maximal lifts to create an overall index of total body muscular strength.

Ratio scaling has been criticized, however, for its failure to properly account for differences in body mass, and this argument has been strengthened by theoretical support (^{1}). Astrand and Rodahl (^{2}) detailed why muscular strength (related to cross-sectional area, a two-dimensional construct) is not directly proportional to body mass (a three-dimensional construct), a necessary condition for ratio scaling to be used. Therefore, the proper theoretical relationship would be muscular strength ∝ body mass^{2/3}. Of note is the fact that the ratio standard assumes the body mass exponent to be one. In fact, empirical data exist to support the theory that the body mass exponent for events of maximal strength is not one. Croucher (^{5}) showed that men’s world-record Olympic-style lifts (Olympic-style weightlifting entails different lifts than powerlifting) were related to body mass^{0.58}. Hui et al. (^{6}) found the body mass exponents to range between 0.73 and 0.87 for maximal isometric strength of 1098 adult men. Vanderburgh et al. (^{15}) reported that, for a maximal lift to a uniform height (152 cm) among 18-yr-old military service recruits, body mass exponents were 0.697 for 384 men and 0.726 for 321 women. Finally, Batterham and George (^{3}) recently found body mass exponents for men and women competitors of the 1996 (Olympic style) World Weightlifting Championships to range from 0.45 to 0.48 when the heaviest weight class was included. Omission of this weight class (the only one with no maximal allowable body mass, the implications of which are addressed later in this paper) for men resulted in exponents of 0.68. In virtually all of these cases, “1” was not within the confidence interval of these exponents. These data indicate that strength scaled with a ratio score would tend to penalize heavier subjects because too much an adjustment would be made for body mass, thus disproportionately “deflating” the overall score for the heavier subjects (^{14}).

Such exponents lend themselves to another type of body mass adjustment termed “allometric scaling.” Allometry is based on the relationship Y =*a* X^{b}∈, where Y is some outcome variable (e.g., strength or maximal oxygen uptake), X is a body size variable, ∈ is the error term, and *a* and *b* are constants. This relationship is assumed to represent the equation of the best-fit curve between and X and Y. This has theoretical merit because it necessitates a zero y-intercept, a plausible case since one with zero body mass would be expected to have zero strength (^{14}). Secondly, the allometric model’s ε term supports a divergence in residuals (the vertical distance each point is from the best-fit curve) as body size increases. This quality, termed heteroscedasticity, is likely more representative of true physiology as opposed to the more common simple linear regression based assumption of constant variance throughout the range of body size (^{9}). Third, the allometric model best supports any body mass exponent (including one, characteristic of the ratio method) so it is clearly more versatile than simple ratio scaling.

Allometric scaling has not been exempt from criticism nor is it devoid of some limitations. While allometric exponents can be easily determined, their values clearly vary with the type of lift (^{3,6}) and cannot then be combined (or added). This is simply because the resulting indices of strength divided by body mass^{b} would be in different units (because of the different exponents). Furthermore, exponents developed for one sample can only be generalized to the specific population represented by that sample. In short, allometric models often yield many different exponents because of the lift type and the different populations tested.

A further criticism of allometric modeling applied to world-class weightlifting is that it may not properly adjust for the effects of body mass. Batterham and George’s (^{3}) analyses indicated that allometric modeling yielded indices of performance that penalized those weightlifters from the heaviest and lightest weight classes for men and women. This was based on analysis of the residuals (the actual lift minus the lift predicted by the best-fit allometric curve) by body mass with a clear pattern of penalty being evident that favored intermediate weight class lifters. To this end, this investigation favored a second-order polynomial model that clearly showed a superior statistical fit with no bias.

Other models to “handicap” lifters of different size have been employed but only for the sport of weightlifting, the only type of strength lifting that is an Olympic sport. Weightlifting consists of only two lifts: the “snatch” (the maximum weight lifted from the ground to directly over one’s head in one movement) and the “clean and jerk” (the maximum weight lifted from the ground to shoulder level, then pressed overhead). As such, it incorporates more a power component (^{8}) and is also likely to be more influenced by grip strength than powerlifting (as the DL is the only lift in powerlifting that requires a substantial grip strength). The formulas of O’Carroll et al. (^{10,11}) were based on pre-1972 data from Olympic weightlifting and, as such, incorporated the “press” movement which has since been eliminated from competition. Vorobyev’s formula (^{16}) adjusts for the fact that the lifter lifts not only the weight but his/her own body weight as well. Neither of these two formulas has been rigorously evaluated for weightlifting performances at the elite level.

The International Powerlifting Federation (IPF) relies on its own method of adjusting for body mass: the Wilks formulas, one for each gender. Developed by Wilks (unpublished data), they are based on a 5^{th} order polynomial reflecting the best fit relationship between body mass (or weight class category, by kg) and “informed estimations” of what world class lifters should be capable of lifting (personal communication, R. Wilks, 1997) derived from various IPF national and international men’s and women’s 1987 to 1994 competition data. Though not real data, these estimations apparently have taken into account Wilks’ concern for the undue influence of few competitor numbers in the extreme weight classes as well as the drug-induced outliers. The formulas have been used to develop correction factors that are applied to all competitor’s lifts before comparisons are made between individuals of different size. For example, a man in the 65-kg class lifts 165 kg on the BP while another from the 100-kg class lifts 205 kg. The lighter must multiply his total by 0.7952 and the latter by 0.6086, yielding scores of 131.2 and 124.8 kg, respectively. Therefore, the 65-kg man is computed to have achieved a superior body mass adjusted BP.

Despite the fact that the IPF has adopted the Wilks formula for all major competitions (to determine the “Champion of Champions,” or the body mass adjusted best lifter of the event or meet), the convention has never been critically evaluated in a peer-reviewed process. Furthermore, the polynomial nature of the formulas has not been scrutinized with respect to theoretical soundness. The purpose of this paper, then, is two-fold: to examine residuals bias to verify that the adjusted Wilks score does, in fact, lead to no systematic bias based on body mass and to apply a more theoretically supportable allometric model to the same data and compare/evaluate residuals bias with that of the Wilks approach.

### Subjects.

The sample was comprised of three subjects per event by gender: the official IPF’s current men’s and women’s world record holders as of March, 1998 and the top two performers in the 1996 and 1997 IPF World Championships (30 men and 27 women for each event). Given the fact that subjects were at or near the physiologic extreme (e.g., they were elite-class) then the sample number for each gender would be arguably sufficient to describe the related functions (^{3}). Because, however, the present analysis was not done to examine group differences, a power test was not indicated. Within each weight class, one stipulation was made to preserve independence of data points (a necessary condition for regression analysis, upon which allometric modeling is based): the top three performances had to be from different subjects. All data were obtained from the IPF’s official Internet site (http://www.ipf.com) and subject informed consent was not obtained for obvious reasons. The highest weight class lifters were not included in the data analysis as their weight class is the only with no upper limit. Any modeling that might include these subjects would likely introduce the confounding influence of fat mass (as the heaviest lifters need not attain a maximal allowable lift). This approach has been substantiated elsewhere (^{3}). One important criteria for eligibility of these subjects is a drug-free status which must be ascertained before acceptance of either an official world record or World Championships awards. As done with Croucher (^{5}), all body mass values for world record holders were assumed to be the maximal allowable primarily because subject’s exact weights were not recorded. This appeared to be a prudent measure based on visual analysis of Batterham and George’s (^{3}) weightlifting world championships data which show that the top three finishers for each weight categories were at or very near the maximal allowable weight (also considered the actual weight category). World Championships data, however, included subjects’ exact weights at the time of competition and, as such, were used as the body mass values for the two nonworld record performances for each event in each weight class.

### Procedures.

All subjects complied with the detailed procedures for each lift regarding movement, sequence, dress, and body mass allowances. These details can be viewed on the aforementioned official IPF Internet site.

### Data analysis.

To examine the first research question regarding bias, scatterplots were developed for the following relationships for each lift (L): L versus body mass and L versus Wilks adjusted score. Each scatterplot included both genders but identified each. In compliance with the recommendations of Batterham and George (^{3}), visual examination of the scatterplot for trends was preferred over computing correlations primarily because of the likelihood of curvilinear relationships. For the second research question regarding theoretical soundness of the Wilks formula as compared with that of allometric models, the best-fit allometric relationships were developed for each lift by gender. The resulting curves were superimposed over the L versus body mass scatterplots to examine the possibility of bias. For each lift exponent differences between genders were checked using the diagnostics recommended by Vanderburgh (^{12}) and Batterham et al. (^{4}). This technique compares men’s with women’s exponents and, if not significantly different, computes a single common exponent. Common exponent differences were then evaluated between lifts and compared with those expected by theory.

The allometric modeling was based on the following relationship (in this example, using BP and M for body mass):MATH 1 where a and b are constants. Log-transforming this relationship creates an equation of a straight line, the parameters of which can be solved for via multiple regression (^{4,8,15}):MATH 2

## RESULTS

Figures 1–4 depict the scatter plots of body mass versus lift by gender as well as body mass versus the Wilks adjusted score. Clearly, the former relationships are curvilinear with similar shape to those reported by Batterham and George (^{3}) who examined Olympic style weightlifting. These figures, which also contain the best-fit allometric curves superimposed, indicate two primary findings. First, the Wilks residuals are apparently randomly distributed (showing no trend for bias) in the BP (Fig. 2) and TOT (Fig. 4) for both men and women. In the DL (Fig. 3) a notable trend to penalize heavier lifters regardless of gender is apparent. For the SQ (Fig. 1), no bias is evident for men, but a favorable bias toward women in the intermediate weight classes is seen in Figure 1. Second, the allometric curves, which provide accurate model fit as determined by the correlation between actual and predicted scores (otherwise seen as vertical distance from the curve) do display a tendency of negative bias toward those lifters in the lightest and heaviest weight classes. This finding is also similar to that reported by Batterham and George (^{3}) for Olympic style weightlifting.

Within each gender, exponents were not significantly different though separate exponents are shown in Table 1. Incidentally, these exponents and the associated SEE make clear that an “isometric scaling” approach (i.e., scaling by body mass raised to the first power) is not appropriate because the 95% confidence interval of the exponent (exponent + 1.96(SEE)) does not contain “1” in any of the four events. Using the technique of Vanderburgh (^{12}), the DL common exponent of 0.51 was significantly different from the other three common exponents. The common exponents for SQ, BP, and TOT were not significantly different from each other, and the single best common exponent for these three lifts for both genders was 0.670, a surprisingly close match to that predicted by theory.

The close agreement of this exponent with that predicted by the aforementioned dimensional analysis is perhaps more fortuitous than expected given the range of each exponent by gender and lift. Furthermore, applying the 0.67 body mass exponent to each of these three lifts to create a new index, L^{.}M^{−}0.67, and plotting its value as a function of body mass also revealed a pattern for bias toward the intermediate weight classes (Fig. 5).

## DISCUSSION

The IPF has been using the Wilks formula since January of 1997 for all competitions to select the winner of the “Champion of Champions” award, probably the most prestigious award given at a powerlifting competition. As stated previously, this award is given to the overall strongest lifter when taking into account body mass differences. Presently, this award is given only in the TOT and BP events (the latter only at IPF Bench Press competitions). Nonetheless, according to the official IPF Internet site, the Wilks formula may be applied to any of the four events for either gender.

The prestige of the “Champion of Champions” award underscores the importance of selection of the body mass adjustment technique. Body mass adjustment, although sometimes a complex statistical exercise, can be understood simply as developing a curve which is superimposed over the L versus M data points. The competitor with the largest vertical distance above the curve, is the winner of the “Champion of Champions” award. Obviously, then, the shape of the curve is most important. A statistician can superimpose virtually any shape over a set of data to reward, fairly or unfairly, any competitor. This necessitates some theoretical reason as to why the curve is shaped as it is.

The shape of the Wilks “curve,” although not based on real data, does appear to be sound from at least one perspective: that is, it adjusts clearly biased data (i.e., the L vs M raw data) to a set of randomly distributed scores which show no bias for three of the four lifts (SQ, BP, and TOT but not DL) in men and two of the four (BP and TOT but not SQ or DL) in women. The bias evident in the women’s SQ model is characteristic of that found in the present data’s allometric approach and that reported by Batterham and George (^{3}). The cause for this particular bias is not known but may be related to three possible sources: 1) Muscle distribution differences in women and its relationship with the SQ event, 2) Illegal but nondetectable drug use of a higher prevalence among one gender than another (e.g., human growth hormone), and 3) Artifact resulting from women’s powerlifting being much more recent in its inception than men’s.

Although the DL scores, when adjusted by the Wilks index, show a pattern of negative bias toward heavier men and women lifters, this apparently has no immediate consequence to IPF competitions which presently do not use the index to adjust DL scores. Nevertheless, the present data indicate that use of the Wilks index to adjust for DL scores does not eliminate body mass bias.

Close examination of the DL does reveal, however, some possible reasons why it does not conform to the same adjustment formula as the other lifts. First, the DL is the only event of the three lifts that relies heavily on grip strength for success. The possibility exists then that grip strength may be one of the limiting factors for DL success, especially within the higher weight classes. If this were the case, then the Wilks formula’s ability to partition out the independent effect of body mass in the DL might indeed be influenced by grip strength. This contention is supported by Batterham and George’s (^{3}) weightlifting data (heaviest weight class included) which indicated a gender-common exponent of 0.47. This type of weightlifting (the sum of the clean and jerk and the snatch lifts) is also clearly dependent to some extent on grip strength.

Grip strength has also been reported to be proportional to body mass raised to the 0.51 power for men and women (^{14}) although the subjects upon which those findings were based were college-age active men and women, not world class powerlifters. Nevertheless, this exponent is precisely the same as that reported in the present data for the DL common exponent. The congruence of the present data with others (^{3,14}) lends support to the contention that for grip strength dependent lifts perhaps the body mass exponent is substantially smaller than for other lifts. Even for nonallometric models, this indicates that the body mass to weight-lifted relationship tends to plateau in a more pronounced fashion. This finding has important implications for body mass adjustment of world-class-level strength competitions.

Interestingly, the Wilks adjusted TOT score for women does not indicate body mass bias despite the finding that heavier women lifters are penalized in both the SQ and DL (two of the three events that make up the TOT). Of importance, though, is the fact that the world’s best SQ and DL women lifters are often not near the best TOT lifters. Therefore, the top TOT lifts are often not comprised of the best individual lifts of SQ, BP, or DL. Accordingly, the most elite TOT women lifters may exhibit a body mass-to-strength relationship that is different than that of the TOT lifts of women who excel in only one lift.

Examination of allometric models reveals a rather accurate overall fit of data (all correlations between actual and predicted values are over 0.93) yet with a systematic trend to penalize those in the lightest and heaviest weight categories similar to that reported by Batterham and George (^{3}). This is evident in that for each of the four powerlifting events the data points for the intermediate weight class competitors (men or women) generally are above the best-fit allometric curve and those at the extreme weight classes are below. This phenomenon conflicts with the theoretical soundness of the allometric model (^{3,5,12}) but is perhaps explained as follows. Cursory examination of all IPF contests listed on its official Internet site (over 20, covering events from 1995 to the present) reveals that the number of contestants per the lightest two and heaviest two weight classes is approximately one-half the number of contestants per weight class in the remaining divisions. Stated differently, the intermediate weight class lifters have more competition and these classes’ top lifters are perhaps more likely to achieve at a higher body mass adjusted level than those at the extremes. Even in the population where body mass is somewhat normally distributed, there would clearly be more subjects in the intermediate than extreme weight classes. One could argue then that achievement should be higher as reflected by the allometric curve as compared to the actual data points. In turn, the Wilks formula may provide too much “reward” for subjects in the lightest and heaviest weight classes (the polynomial curve can “bend down” at either extreme to accommodate the proportionally lower values). In short, the allometric model may fail to statistically adjust for the independent effects of body mass, but its use may still be theoretically more defensible than the Wilks formula.

The choice to adopt the allometric model for body mass adjustment, however, is not without some practical and even theoretical shortcomings. First, one could adopt the use of the 0.67 body mass exponent for both lifts that the IPF adjusts by body mass, i.e., BP and TOT. This requires simply dividing one’s lift by M^{0.67} and comparing the resulting score (high score wins). Use of this convention exhibits two problems. First, as shown in Figure 5, within the elite population of powerlifters, the intermediate weight class lifters will tend to win nearly all the “Champion of Champion” awards as their data points consistently have the highest L^{.}M^{−}0.67 scores. If one accepts the premise that those in these weight classes are competing against a larger population (perhaps twice as large as those from the lightest and heaviest weight classes), then this may be appropriate. Second, the convention of dividing by one’s body mass raised to a power requires a calculator; computation by hand is nearly impossible. Use of the Wilks index, or simply multiplying one’s lift by a number from a table, can be done by hand and is simpler.

In conclusion, the present data suggest that the Wilks formula does remove body mass bias in the men’s and women’s BP and TOT although a systematic bias is seen in the women’s SQ and the men’s/women’s DL. Given that the IPF uses the Wilks formula only for the BP and TOT, its use as a body mass adjustment index appears validated. The allometric models, whether individual “best exponents” or the one common exponent for the BP, SQ, and TOT, present a favorable bias toward those in the intermediate weight classes, although this has some theoretical defense. Because of the relatively smaller values of the DL body mass exponent, the one common exponent of 0.67 could only be applied to the BP, SQ, and TOT. We conclude that the Wilks formula, as presently used for men and women, is an appropriate “handicapping” technique to select the “Champion of Champions” in IPF events. Furthermore, allometric modeling of elite powerlifting can also be used to adjust men’s and women’s scores by body mass but should only be done with a clear understanding of its potential biases.

## REFERENCES

**Keywords:**

POWERLIFTING; ALLOMETRIC SCALING; POLYNOMIALS