Resistance training has become more popular in recent years, especially for elderly subjects (5-8,17), children, and teenagers (2,3,14). This implies that strength has to be evaluated in populations that, in contrast with athlete populations, are not familiar with strength testing. One of the problems encountered in evaluating strength in these populations is the large variability of measurements. The variability observed in strength testing can originate from various sources such as psychological and environmental factors as well as methodologic influences. Variability in strength measurement can influence research results as well as exercise recommendations for specific resistance training interventions.
To improve the level of precision of strength measurement, some authors have suggested that the amount of testing sessions should be increased (10,11). As many as 9 testing sessions have been proposed to achieve reliable and consistent results in older women (within 1 kg) (11). Increasing the amount of testing sessions can help to reduce the variability of strength measurement that is associated with a possible learning effect. However, the feasibility of administrating 9 testing sessions using several testing exercises in older individuals is questionable. The accumulated intensity and the time requirements (approximately 3 wk when tests are 48 hr apart) can be overwhelming for some individuals in addition to being costly, both in time and finances, in a research setting. It is also possible that increasing the number of testing sessions can induce a training effect, limiting the ability to obtain a real baseline measure of strength before an intervention. The first objective of this study was to determine whether a stable measure of strength, without any familiarization session, can be obtained within a maximum of 3 sessions in postmenopausal women who are overweight or obese.
Moreover, physical activity energy expenditure (PAEE) of an individual could influence the number of sessions required to achieve a stable measure of strength by impairing recovery. Yet, to our knowledge, a clear relation between PAEE and strength measurement has not yet been established (1,7,12,16). Thus, the second objective of this study was to evaluate the potential effect of usual PAEE on the variability of strength measurement. Finally, we investigated whether the selection of the best lift during the 3 tests in each exercise was different from the results achieved at the baseline testing session to evaluate the potential benefit of adding more sessions to strength measurement in postmenopausal women who are overweight or obese.
Experimental Approach to the Problem
To evaluate the impact of the number of testing sessions on the variability of strength measurements and the influence of PAEE, subjects were submitted to 3 similar exercise testing sessions every other day for 1 week. A reliable test can be repeated without any subsequent significant variation in results within a limited time frame. An unreliable 1 repetition maximum (1RM) test would be characterized by a significant variation from day to day. A significant positive increase in strength over successive tests could be caused by a learning effect that would not be necessarily related to a physiologic strength gain. This would imply that the use of a single strength test would not represent the actual strength of a subject. A significant decrease in strength could be associated with an improper amount of time to recover between tests and would require standardization for recovery periods between testing days.
Subjects were recruited by newspaper advertisement in the University of Montreal area (Montreal, Canada), and data were collected between 2003 and 2006. All subjects signed a consent form before initiation of the study, and the study was approved by the Research Ethics Board at the University of Montreal. Sedentary postmenopausal women who were overweight or obese were recruited for a 6-month weight loss program. Participants were randomly assigned to receive a caloric restriction diet intervention (n = 89) or a caloric restriction diet with resistance training intervention (n = 48). The inclusion and exclusion criteria for women in this study were published earlier (4). Women volunteers were included in the study if they were a) overweight or obese, with a BMI 27 kg/m2 or greater, b) postmenopausal, c) nonsmokers, d) consumed less than 2 alcoholic drinks/day, and e) took part in less than 2 hours per week of structured exercise. The exclusion criteria were a) diabetes (fasting glucose >7.1 mmol/L or 2-hr plasma glucose of >11.1 mmol/L after a 75 g oral glucose tolerance test), b) untreated hypothyroidism, c) chronic liver or renal disease, d) asthma requiring therapy, e) known cardiovascular or peripheral vascular disease, f) previous 3-month use of drugs affecting metabolism, including hormone replacement therapy, g) dyslipidemia or hypertension requiring immediate medical intervention, h) history of alcohol or drug abuse, 9) abnormal creatinine or hematocrit values, i) orthopedic limitation, j) body weight fluctuation of 2 kg in the last 3 months, k) known history of inflammatory disease or cancer.
The results presented in this article represent post hoc analysis of baseline characteristics of a subgroup of 30 women from the 89 women recruited for the hypocaloric diet. The selection criteria for the 30 women included in this strength substudy were as follows: a) completion of all 3 testing sessions and b) completion of all tests on each apparatus. All measurements were completed before the beginning of the larger weight loss study. Other reasons for exclusion were range of motion under 85% of the warm-up range of motion on any of the 3 exercises or missing data.
Body composition was measured by dual energy X-ray absorptiometry (version 6.10.019, General Electric Lunar Corporation, Madison, WI, USA). Subjects did not present any physical abnormalities that could prevent them from completing a maximal strength test. Subjects had no history of training for the last 2 months before testing and were not familiar with the use of resistance training equipment (Atlantis, Laval, Quebec, Canada). Resting metabolic rate (RMR) was measured continuously for 40 minutes by indirect calorimetry using the ventilated hood technique (Sensormedics, Delta Trac II, Datex-Ohmeda, Finland), and only the last 30 minutes were used as measurements for RMR. Total energy expenditure (TEE) was measured with the doubly labelled water (DLW method using stable isotope during a 10-day period before the strength testing sequence). A more thorough description of the procedure has been previously presented (15). PAEE was calculated as follows:
PAEE = TEE × 0.9 - RMR (13).
Strength tests were conducted on 3 separate apparatus: a seated leg press (Atlantis, model C-203), a seated horizontal chest press (Atlantis, model P-140), and a lat pull down for which the bar was pulled in front of the head (Atlantis, model D-123). Subjects did not benefit from any familiarization period before testing nor were they familiar with the use of resistance training equipment (Atlantis, Laval, Quebec, Canada). The 3 exercises covered all major muscle groups and were thought to be representative of overall strength. Subjects were informed about the procedures to follow at the beginning of each testing session. After a warm-up session (12-15 min), which consisted of brisk walking on a treadmill, subjects were properly adjusted on the first testing exercise, the leg press. After completing a specific warm-up on the leg press with a weight that easily allowed 10 repetitions, the participant took a rest period of 3 to 4 minutes while standing or walking slowly. Range of motion for the specific warm-up set was recorded, and subjects were instructed to maintain the same range of motion for all subsequent trials. A maximum of 6 trials were allowed with a resting period of 3 to 4 minutes between each attempt. The load was increased gradually according to the ease of motion of the participant after each attempt that yielded more than 1 repetition. After completing the final leg press attempt and a short rest of approximately 2 minutes, subjects moved on to the chest press and followed the same procedures before moving on to the lat pull down exercise. The weight lifted, the range of motion, as well as the number of attempts required to achieve the 1RM were recorded. The same procedure was conducted 48 and 96 hours after the first session. Subjects did not have any physical abnormalities that could have prevented them from completing the maximal strength test.
Maximal strength was defined as the highest weight lifted properly, expressed in kilograms. The criteria for the 1RM selection were a) 1RM repetition range of motion had to be at least 85% of the warm-up range of motion; b) the lowest number of repetitions if more than 1 repetition was completed on the last trial, and c) the highest weight properly lifted. Failure to achieve a suitable range of motion was considered as a failed attempt. If, after a maximum of 6 trials, more than 1 repetition was completed and the 1RM was not achieved, the Wathen equation (9) was used to extrapolate the predicted maximal weight for the exercise. Maximal work (Joules) was defined as the maximal weight lifted multiplied by the distance displaced in meters and by gravity (9.81 m/s2). The best result of the testing week for each exercise was selected according to the criteria for the 1RM selection. The best lift for each exercise for the whole week was considered as the best result for the week.
Data are presented as mean and SD. Repeated measures analyses of variance (ANOVA) were completed to examine a possible time effect in strength testing. Bonferroni post hoc analyses were used to distinguish potential differences between each of the testing sessions. Also, comparison of the mean difference of the first and second and the second and third sessions were made using paired t-test. Those comparisons were made to evaluate the presence of any potential systematic bias. To assess the level of precision of the strength measurement, we compared the random error values. Random error or root means square error is the result of any factor that randomly affects measurement of the variable across the sample. A small random error is generally associated with a good level of precision with minimal “noise.” Paired-sample t-tests were used to compare the best results of the week with the results obtained at the first session. Intraclass coefficient correlations (ICC; two-way, random effect) were also used to assess the reliability of the testing sequence. To estimate the effect of size, we used eta-squared values to determine the clinical relevance of our findings. To evaluate the potential effect of PAEE on strength variation, repeated ANOVA were used with PAEE as a covariate. Finally, because 5 evaluators were assigned to the strength testing sequence, we examined whether there was an evaluator effect on the measure of strength by repeated measure two-way ANOVA with the evaluator as a covariant. All analyses were completed on SPSS for Windows (Version 13.0, SPSS, Chicago, IL, USA). Statistical significance was determined at p < 0.05 and a power of 0.80.
Physical characteristics and energy expenditure results are presented in Table 1. Although our subjects were overweight or obese and had no more of 2 hours of reported planned physical activity per week, they still had a relatively high level of PAEE (1.5 to 6.6 MJ/d). Results of strength for the 3 exercises for the 3 testing sessions are presented in Table 2. Of note, there was no evaluator effect on the relation between strength and testing session (data not shown). The leg press did not show any significant difference in strength or work among the 3 testing sessions, although a significant linear progression was observed with the ANOVA (p = 0.02). No systematic bias was observed for strength or work in the leg press among the 3 testing sessions (Table 3). The number of attempts required to achieve the 1RM with the leg press was significantly different between session 2 and 1 (mean difference: −0.7 kg attempt; SD: 1; attempt; 95% confidence intervals: −1.3 to −0.2 attempt) but revert to a nonsignificant difference between the third and second sessions (mean difference session 3 − 2: 0.4 attempt; SD: 1.4 attempt; 95% confidence intervals: −0.2 to 1.1 attempt). The eta-squared for the leg press revealed that only 9% of the variance could be explained by the effect of time for strength and 4% for work (Table 4).
The chest press exercise showed similar results for strength and work on all 3 sessions and was the most stable exercise (mean difference session 2 − 1: 0.2 kg; SD: 5 kg; 95% confidence intervals: -1.4 to 1.9 kg; mean difference session 3 −2: 0.3 kg; SD: 3 kg; 95% confidence intervals: −1 to 1.5 kg). A significant decrease in the number of attempts was observed between session 1 and the subsequent sessions (mean attempts difference session 2 − 1: −0.6 attempt; SD: 1; 95% confidence intervals: −1 to −0.2, mean attempts difference session 3 - 1: −0.7 attempt; SD: 1; 95% confidence intervals: −1 to −0.4). No significant improvement was observed for the number of attempts after the second testing session (mean attempts difference session 3 - session 2: -0.1 attempt; SD: 1; 95% confidence intervals: −0.4 to 0.2). The eta-squared for the chest press showed that 1% of the variance between the tests could be explained by a time effect for strength and 3% for work (Table 4).
Results analyzed with the repeated ANOVA using the post hoc Bonferroni procedure revealed that the lat pull down exercise presented a significant difference (systematic bias) for strength between test 1 and 3 (mean difference test 3 − 1: 2.2 kg; SD: 3 kg; 95% confidence intervals: 1.1-3.3 kg) but not for work (mean difference test 3 − 1: 19 J; SD: 64 J; 95% confidence intervals: -5 to 43 J). A significant linear increase was observed throughout sessions 1 to 3 (p < 0.01) for strength with an eta-squared of 0.20, suggesting that 20% of the variance was explained by the effect of time. Results for work differed from those observed for strength in which no effect of time was measured, and the eta-squared value suggested that only 8% of the variance was explained by the number of sessions (Table 4). No significant difference was found between tests 2 and 3. No difference for work was observed between any of the testing sessions (Table 3). No difference was observed for the number of attempts with the lat pull down exercise.
The random error for each exercise for the 3 sessions (Table 4) did not reveal any statistical difference between the 2 pairs of tests (test 2 - test 1 and test 3 - test 2). Albeit nonsignificant, a general tendency toward an improvement in precision can be observed in all exercises (i.e., reduction in random error values). The ICCs presented in Table 4 for all exercises revealed a high level of agreement between the testing sessions for both strength and work results.
Analyses with repeated measures ANOVA did not demonstrate any significant influence of PAEE on any of the 3 exercises in the testing sequence (Table 5). Finally, there was no significant improvement between the results for strength and work achieved on the first session and the best lift for the leg press and the chest press during the whole week of testing (Table 6). However, we found a significant improvement between the first session and the best attempt for work results in the lat pull down exercise (mean difference best lift - session 1: 25; SD: 57 J; 95% confidence intervals: −47 to −4 J).
The first objective of this study was to determine whether a stable measurement of strength using the 1RM method could be obtained on 3 separate exercises without any prior familiarization period. Ploutz-Snyder and Giamis (11) suggested that older women (n = 7, mean age: 66 yr; SD: 5 yr) require as much as 9 testing sessions before reaching a consistent value for maximal strength (within 1 kg). An arbitrary mean difference of 1 kg (smallest increment on their equipment) between consecutive trials was selected in their study as a stable measurement of strength on a small sample of older women (n = 6). This difference represents 1.3% of the mean strength results of the last attempt in the older subjects of their sample, which is a very strict level of precision. Phillips et al. (10) used a more thorough statistical procedure to assess potential bias in consecutive strength testing measurements, using random and systematic error change as a criteria of precision and reliability on a larger sample of older women (n = 31, mean age: 75.2; SD: 17.3 yr). They concluded that, in older adults, 3 familiarization sessions followed by 2 to 3 testing sessions were sufficient to achieve highly reliable results (5-6% random error). They found that adding more familiarization sessions did not significantly improve the stability of the results, thus adding little clinical or research benefits. Our results question the need for a familiarization session before strength testing to improve the level of precision (reduction in random error) because we achieved a similar level of precision (3.5 and 6.7%) as Phillips et al. (10) on a relatively similar population of older women without any familiarization session. The ICC results were high for all exercises, but this might be in part caused by the relatively important variation between subjects. As mentioned in Phillips et al, (10), ICCs are subject to inflation when applied to a sample that has a high intersubject variation. The change in random error and the small eta-squared values observed are probably a better assessment of the reliability of the procedure in our study.
The measurement of work performed when testing for strength proves to be a valuable addition, allowing a quantifiable control of the range of motion and the weight lifted. The assessment of the work completed during a maximal lift proved to be useful to better understand the change in the load carried. This was especially true for the leg press exercise, which demonstrated an increase in the weight lifted of 6.9 kg between the first and the last session but presented a reduction in work of 12 J. The same analyses apply, but to a lesser extent, to the lat pull down exercise between the first and the last testing session. The measurement of work allows for a better overall assessment of the strength results that should be evaluated when measuring strength.
We found no significant change in leg press among all 3 tests, although there was a significant linear progression between the sessions in the strength results. This suggests that with additional sessions of testing, there may be a statistical difference in strength between the first and subsequent sessions. However, the small eta-squared (0.09) suggests that this effect would be small and possibly of little clinical value. Moreover, we did not find a similar increase with the work measurement because the increase in weight was accompanied by a change (reduction) in range of motion. The discrepancy between the linear increase observed in strength and the nonlinear increase in work precludes us from suggesting that increasing the number of sessions would provide better results for strength when range of motion is controlled or fixed. Also, in the context of strength testing, it is possible that more testing sessions will eventually provide a sufficient training effect to generate a strength gain adaptation. The relevant question at this point is whether it is possible to have an initial stable measurement of strength and work to have a valid baseline measurement. Systematic and random errors were constant during the testing week, revealing no significant improvement or deterioration in precision even without prior familiarization session using a relatively complex exercise (multijointed exercise and heavy weight involved). Taken together, these results suggest that the first testing session could be sufficient to provide a valid baseline measurement for strength and work with a precision ranging between 5.8% and 6.7% and that the addition of more sessions would provide little improvement for the leg press exercise in postmenopausal women who are overweight or obese.
The chest press was the most stable and reliable of the 3 exercises. Little variation was found for strength and work between the tests. The significant reduction in the number of attempts suggests an improvement in the precision of the 1RM procedure, allowing a quicker achievement of the maximal weight lifted. Variations in the random error were slightly better (3.5%-4.5%) than the results (6.8%) reported by Phillips et al. in their study (10) for a similar exercise yielding a high level of precision (1.5 kg). On the basis of these results, there is no indication of a learning effect in the chest press exercise that has influenced strength or work measurements.
The lat pull down exercise was the least reliable. The last testing session produced significantly higher strength values than the first. We also observed a systematic bias between the first and the second testing session. The presence of such a bias requires the addition of other sessions to abolish the difference between the means. Adding a third session appears sufficient to nullify the statistical difference between the means and to achieve a plateau in strength. Work measurement resulted in a similar trend, reinforcing the necessity of a third session to achieve more stable results. Unfortunately, neither Ploutz-Snyder et al. (11) nor Phillips et al. (10) reported testing results for a similar exercise, preventing us from any comparison in systematic or random error. For this last exercise, the completion of more than 1 testing session would be necessary. A possible explanation might reside in the exercise itself. The range of motion was established by an imaginary line in front of the nose of the participant, whereas a physical limitation was present for the other 2 exercises. When we observed the work results, there was no linear progression between the tests, suggesting that a maximal workload was achieved and that subsequent increment in load would result in a decrease in range of motion. The significant bias in the lat pull down exercise also raises the question of whether there is a difference between exercises or muscle groups for the number of sessions required to achieve a stable measurement. Another question that could not be addressed in the present study is the testing sequence. It is possible that the order in which the muscle groups are tested and the presence of an overall fatigue might influence the variability and the precision if more than 1 muscle group is tested in the same session.
Measurements of TEE, RMR, and PAEE were gathered before the strength testing week and can be defined as the subjects' usual level of energy expenditure in free-living conditions. We hypothesized that PAEE influence strength assessment in postmenopausal women. Of note, no measurement of energy expenditure was made during the strength testing week. However, all subjects were weight stable and were instructed not to change their daily food intake during the testing week. Because body weight did not fluctuate throughout the testing week, it is possible to infer that our subjects were in energy equilibrium and maintained a similar level of physical activity during the testing week. No influence of PAEE was observed on the testing sessions for any of the 3 exercises. This suggests that the usual level of physical activity of postmenopausal women who are overweight or obese is not sufficient to alter the recovery pattern or to generate strength gain during a week of testing. Taken together, our results indicate that PAEE did not influence the precision of the strength testing procedure in our sample of older women.
The selection of the best lift was completed according to the 1RM criteria previously described. We sought to determine whether this procedure improves the validity of the strength and work measurements by including only the attempt that showed the best range of motion, the lowest number of repetitions, and the highest lifted weight. The difference observed between the best lift and the first test was not larger than the respective random error for each exercise. The only statistical difference found was a small improvement in the work performed on the lat pull down that is of little clinical importance. The selection of the best lift for each exercise through the week did not improve significantly the results for strength from those achieved in the first session. This demonstrates no advantage of additional testing sessions to achieve a better, or more representative, strength result.
In conclusion, our results do not support the need of familiarization sessions to improve the stability of strength measurements in postmenopausal women who are overweight or obese. Furthermore, given that our subjects were able to achieve a stable measurement of strength with 1 (leg press, chest press) or 3 testing sessions (lat pull down), the addition of more than 3 testing sessions is not necessary to achieve an acceptable level of precision on similar exercises This can reduce the volume of training and the time requirements to prepare older women for strength testing. Finally, PAEE did not play a significant role in the interindividual variations in strength and work measurements in our population. Further research should be conducted to evaluate any potential influence of the muscle group tested or the order of exercise when multiple devices are used in the same testing session.
In light of our results, we are confident that it is possible to achieve a reliable strength measurement in postmenopausal women who are overweight or obese with the use of only 3 testing sessions on selected apparatus (machine or mechanically guided devices). Moreover, the use of prior familiarization sessions does not appear as important as reported in other study to reach a plateau in strength testing in this population. The use of a single testing session, in opposition to 3 testing sessions, can yield a strength difference of several kilograms (1-12 kg) according to the exercise selected for testing. Although the weight difference might appear high, most of this variation is related to the strength testing procedure (1RM testing) or the subjects themselves, as revealed by the small eta-squared results, and not the number of testing sessions by itself.
If strength testing is used to determine the weight required for training, it could be beneficial to complete more than a single testing session to slightly reduce the margin of error. Unfortunately, the strength test itself and the subject will provide a greater challenge in reaching any precise results. Proper use of an adequate strength testing protocol and adequate supervision can increase the precision of the intensity in resistance training, therefore reducing the risk of injury and increasing the potential benefits of exercise training.
The habitual level of physical activity does not appear to contribute to the fluctuation in strength measurements. Participants in strength testing should be encouraged to maintain their usual activity pattern because it should not interfere with the reliability of the strength results.
This work was supported by grants from the Canadian Institute of Health Research (T 0602 145.02). May Faraj is a recipient of the CIHR New Investigator Award. The authors declare no conflicts of interest. The results of the present study do not constitute endorsement by the NSCA. The authors thank Diane Mignault and Lyne Messier for their contribution to this project and all the subjects who participated in this study.
1. Ades, PA, Savage, PD, Brochu, M, Tischler, MD, Lee, NM, and Poehlman, ET. Resistance training
increases total daily energy expenditure in disabled older women with coronary heart disease. J Appl Physiol
98: 1280-1285, 2005.
2. Benson, AC, Torode, ME, and Fiatarone Singh, MA. A rationale and method for high-intensity progressive resistance training
with children and adolescents. Contemp Clin Trials
28: 442-450, 2007.
3. Falk, B, Sadres, E, Constantini, N, Zigel, L, Lidor, R, and Eliakim, A. The association between adiposity and the response to resistance training
among pre- and early-pubertal boys. J Pediatr Endocrinol Metab
15: 597-606, 2002.
4. Faraj, M, Messier, L, Bastard, JP, Tardif, A, Godbout, A, Prud'homme, D, and Rabasa-Lhoret, R. Apolipoprotein B: a predictor of inflammatory status in postmenopausal overweight and obese women. Diabetologia
49: 1637-1646, 2006.
5. Fatouros, IG, Tournis, S, Leontsini, D, Jamurtas, AZ, Sxina, M, Thomakos, P, Manousaki, M, Douroudos, I, Taxildaris, K, and Mitrakou, A. Leptin and adiponectin responses in overweight inactive elderly following resistance training
and detraining are intensity related. J Clin Endocrinol Metab
90: 5970-5977, 2005.
6. Hartman, MJ, Fields, DA, Byrne, NM, and Hunter, GR. Resistance training
improves metabolic economy during functional tasks in older adults. J Strength Cond Res
21: 91-95, 2007.
7. Hunter, GR, Wetzstein, CJ, Fields, DA, Brown, A, and Bamman, MM. Resistance training
increases total energy expenditure and free-living physical activity
in older adults. J Appl Physiol
89: 977-984, 2000.
8. Hurley, BF and Roth, SM. Strength training in the elderly: effects on risk factors for age-related diseases. Sports Med
30: 249-268, 2000.
9. Knutzen, KM, Brilla, L, and Caine, D. Validity of 1RM prediction equations for older adults. JSCR
13: 242-246, 1999.
10. Phillips, WT, Batterham, AM, Valenzuela, JE, and Burkett, LN. Reliability of maximal strength testing in older adults. Arch Phys Med Rehabil
85: 329-334, 2004.
11. Ploutz-Snyder, LL and Giamis, EL. Orientation and familiarization to 1RM strength testing in old and young women. J Strength Cond Res
15: 519-523, 2001.
12. Poehlman, ET, Denino, WF, Beckett, T, Kinaman, KA, Dionne, IJ, Dvorak, R, and Ades, PA. Effects of endurance and resistance training
on total daily energy expenditure in young women: a controlled randomized trial. J Clin Endocrinol Metab
87: 1004-1009, 2002.
13. Reed, GW and Hill, JO. Measuring the thermic effect of food. Am J Clin Nutr
63: 164-169, 1996.
14. Shaibi, GQ, Cruz, ML, Ball, GD, Weigensberg, MJ, Salem, GJ, Crespo, NC, and Goran, MI. Effects of resistance training
on insulin sensitivity in overweight Latino adolescent males. Med Sci Sports Exerc
38: 1208-1215, 2006.
15. St-Onge, M, Mignault, D, Allison, DB, and Rabasa-Lhoret, R. Evaluation of a portable device to measure daily energy expenditure in free-living adults. Am J Clin Nutr
85: 742-749, 2007.
16. Treuth, MS, Hunter, GR, Pichon, C, Figueroa-Colon, R, and Goran, MI. Fitness and energy expenditure after strength training in obese prepubertal girls. Med Sci Sports Exerc
30: 1130-1136, 1998.
17. Wieser, M and Haber, P. The effects of systematic resistance training
in the elderly. Int J Sports Med
28: 59-65, 2007.
Keywords:© 2009 National Strength and Conditioning Association
testing reliability; resistance training; learning effect; 1 repetition maximum; physical activity