Secondary Logo

Journal Logo

Original Research

Reliability of Alternative Trunk Endurance Testing Procedures Using Clinician Stabilization Vs. Traditional Methods

Reiman, Michael P1,2; Krier, Amber D1; Nelson, Julie A1; Rogers, Michael A1; Stuke, Zachariah O1; Smith, Barbara S1

Author Information
Journal of Strength and Conditioning Research: March 2010 - Volume 24 - Issue 3 - p 730-736
doi: 10.1519/JSC.0b013e3181c06c56
  • Free



The core is often defined as the lumbo-pelvic-hip complex. This core or lumbo-pelvic-hip complex is required during functional activities, specifically to perform acceleration, deceleration, and stabilization maneuvers (9). Despite the multiplicity of the importance of core/trunk stabilization, it remains difficult to quantify because of its complex functional requirements.

A need for new testing that would more accurately assess trunk stability has been recommended (21). Expensive isokinetic testing has been used to assess strength and work of the trunk (11,18). Less expensive testing in the form of isometric trunk endurance testing (22,24) has been more popular and practical. Endurance testing has also been especially relevant because reduced extensor endurance was found in workers who reported low back troubles (1). Some researchers have suggested that although isometric strength was not associated with the onset of back troubles, poor static trunk endurance scores were (4,22). Potentially more important is the concept of trunk muscle endurance imbalances. McGill et al. (27) have suggested that having a history of low back troubles seems to be associated with an imbalanced flexion to extension endurance ratio, with the extensors having less endurance than the flexors. This would afford credibility to the concept of neuromuscular imbalance of trunk endurance as a major component necessary to examine for the assessment of trunk stability.

Recent argument, though, has been for acceptable field tests that measure the strength or power component of trunk stability as they may be more useful (20) and they may better mimic the demands imposed by sport (8). Although strength and power are more likely representative of athletic explosive demands, strength seems to have little, or very weak, relationship with low back health (25). Trunk endurance testing continues to be warranted and necessary to assess for several reasons, not the least of which includes demonstrated isolated multifidus atrophy. The multifidus is a predominantly type I muscle fiber (23,37) that appears to take on more anaerobic characteristics as a result of deconditioning (25,27), therefore, limiting its ability to function aerobically. Isolated muscle atrophy (15) with fatty infiltration in the same level and side multifidus (19) has also been demonstrated in subjects with low back pain (LBP), suggesting further dysfunction of endurance capacity. Potentially even more important is the concept of trunk endurance balance due to the recent discovery of a direct relationship between LBP and neuromuscular imbalance of trunk muscles, although maximum isometric strength had no relationship to the presence of LBP in athletes (33). Field tests measuring trunk muscle endurance ratios therefore would seem to be an important component for testing of trunk stability and, therefore, a necessary assessment not only for those with a previous history of LBP but also for the athletic population with or without LBP as well.

Field testing for trunk endurance affords greater applicability of its employment. Although there is an apparent necessity of testing trunk endurance both from rehabilitation and a screening/prevention standpoint, the lack of sufficient equipment may be a limitation in its implementation. Utilization of appropriate field tests for trunk endurance to this point has required appropriate tables and use of multiple straps. These limitations may not afford such utilization of appropriate trunk endurance field testing to determine normative values for trunk endurance in various populations and in multiple types of settings.

Normative values have been developed for these trunk endurance assessments among college-aged students with no history of LBP (26) and college-aged male rowers (16) using the same endurance tests for trunk flexion and extension. These testing methods have proven to demonstrate excellent reliability (26) and have been shown to differentiate between workers without LBP and those who had back disorders (23). Therefore, the purpose of this research was to compare the reliability of a modified trunk flexion and extension testing setup with those of previously established testing procedures (26), to determine the plausibility of these techniques, initially for the normal population, but with the potential for use as a screening mechanism for injury prevention.


Experimental Approach to the Problem

Trunk endurance testing according to previously established methods of testing (ST) and a modified procedure (MOD) were assessed at 1-week intervals. The subjects were randomly assigned the order of testing. Before all testing, subjects were encouraged to maintain the static position for as long as possible. No verbal encouragement was given during the actual testing.


Fifty women (n = 34) and men (n = 16) between 22 and 38 years, with no previous history of LBP within the past 6 months or a history of lumbar surgery, participated. Written information and oral instructions were given before each test, and each subject gave written and oral consent of participation. The subjects were a sample of convenience from physical therapy students who had not previously performed, or were aware of, the testing procedures. Descriptive information of the subjects is shown in Table 1. All subjects were physically active, participating in either aerobic, strength training or both aerobic and strength training exercise at least once per week with a maximum of 4 times per week. Subjects were informed of the experimental risks and signed an informed consent form before the investigation. Approval for the investigation was provided by the Wichita State University Institutional Review Board for the Protection of Human Subjects.

Table 1:
Mean (±SD) and minimum-maximum values for age, height, and body weight of subjects (n = 50).


Each subject was tested with either the standard flexion and extension method (ST) or the modified testing procedure (MOD). Each subject was then retested 1 week later with the method not used in the previous week. The order of which method of testing was performed was randomly determined by the subject drawing of a concealed number. The order of either flexion or extension testing was then also randomly assigned by the drawing of another concealed number. Each subject was allowed to rest 5 minutes between flexion and extension testing at both testing sessions.

ST trunk endurance testing was performed for trunk flexion and extension according to previously published methods (26). The trunk extensor endurance test as performed by McGill et al. (26) was a modification of the original version (4). This original testing method is a reliable measure of back extensor endurance (1). The subjects lay prone with the lower body fixed to the testing surface/plinth via straps at the ankles, knees, and hips. The upper body was off the testing surface. Before testing, the subjects were able to hold their upper body from cantilevering off the end of the table via pushing with their extended arms on a chair directly below them. At the beginning of the exertion, the upper limbs were lifted off the chair and crossed over the chest with the hands resting on the opposite shoulder in a manner comfortable to the subject. Subjects were instructed to maintain the horizontal position for as long as possible before and at the start of the testing procedure. The extension endurance time was manually timed from the moment the upper limbs were lifted off the chair and crossed over the chest as described above until the subject broke the horizontal plane.

ST trunk flexion endurance testing was performed in supine and consisted of placing the subject's hips and knees at a 90° angle. The trunk was inclined at an angle of 60° via use of a prefabricated wedge. The subjects' feet were stabilized with a belt around the table and over the dorsum of the distal foot. Subjects were instructed to cross their arms across their chest and place their hands on opposite shoulders, again in a manner comfortable to the subject. The prefabricated wedge was moved back 10 cm to begin the test. Subjects were instructed to maintain their body position at 60° for as long as possible before and at the very start of the test. The test was manually timed from the moment the prefabricated wedge was moved back 10 cm until the subject broke the 60° angle test position.

The MOD testing procedures included using a clinician to hold the subject's lower extremities down rather than using straps during the flexion and extension endurance tests. During extension testing, the subjects were positioned exactly as with the ST technique, except a clinician lay across the back of the subject's lower extremities so that the middle of the clinician's trunk was over the middle of the subject's knees (see Figure 1). No subjects complained of pain or discomfort while having the clinician lie over their lower extremities. The extension endurance time was manually timed from the moment the upper limbs were lifted off the chair and crossed over the chest as described for ST extension testing until the subject broke the horizontal plane. With flexion testing, the clinician sat over the subject's feet (with shoes on) where the strap would have been placed (see Figure 2). Again, no subjects complained of pain or discomfort while having their lower extremities stabilized in this manner. The test was manually timed from the moment the prefabricated wedge was moved back 10 cm until the subject broke the 60° angle test position. Before testing, it was determined that the clinician providing stabilization would weigh more than the subject being tested. This general rule was used throughout the testing procedure.

Figure 1:
MOD testing method of extension.
Figure 2:
MOD testing method of flexion.

Before all testing, subjects were encouraged to maintain the testing position as long as possible. Subjects were not encouraged during testing, and the instructions were kept standardized throughout the entire testing procedure, regardless of testing method. Subjects were not informed of their scores until the entire testing was completed. Subjects were also encouraged to not change their current activity level between testing sessions, specifically in regard to trunk muscle activity.

Statistical Analyses

Concurrent validity of the MOD tests with the ST tests was determined by the correlation between the time the subject could hold the extension and flexion positions, respectively. The level of validity as described by Meyer (28) is an accurate indicator of the extent of validity. The extent of validity was determined as follows: correlation coefficients greater than 0.80 indicated high validity, values between 0.60 and 0.80 indicated good validity, values between 0.40 and 0.59 indicated moderate validity, and values less than 0.40 indicated poor validity. Interrater reliability, intraclass correlation coefficient (3,2), was evaluated on 15 subjects: for extension, r = 0.97 and for flexion, r = 0.93.


In this comparison of our modified testing procedures (MOD) to previous standardized testing procedures (ST), correlation analysis revealed a Pearson r = 0.84 between the times for holding the flexion postures and r = 0.90 for holding the extension postures. The mean time subjects could hold each position is found in Table 2. Individual data are illustrated in Figures 3 and 4.

Table 2:
Mean (±SD) time (seconds) and minimum/maximum values (time in seconds) of the 2 testing methods (n = 50).*
Figure 3:
Scatterplot comparison of standard (ST) flexion testing vs. modified (MOD) flexion in seconds (n = 50).
Figure 4:
Scatterplot comparison of standard (ST) extension testing vs. modified (MOD) extension in seconds (n = 50).


A high degree of correlation was found between the times on the ST and MOD methods when the 2 tests were concurrently applied to normal college-aged subjects. No other correlation values have been reported for the ST against any type of modification. Using such a test over a wide range of age groups and subjects has provided valuable clinical information and helped to establish normal trunk endurance ranges that can then continue to be used in these different populations of subjects with LBP.

The use of static endurance testing seems to be cost effective, easy, and quick to perform and requires no special equipment in the clinics, so clinicians could choose it for measuring trunk muscle endurance (30). Trunk flexion (12,26) and extension (4,22,26) endurance testing have proven to be highly reliable methods. The use of these methods, as originally described, does require a table that allows for looping a belt around the width of the table. Weight and athletic training rooms may not necessarily have these types of tables. Determining the reliability of an MOD testing method was necessary to allow more convenience and widespread use of ST testing procedures that have clearly shown benefit (26). The fact that the MOD testing method compared well with the ST method is of benefit not only to clinicians with these types of facilities but also to other clinicians implementing these testing strategies due to time constraints (avoiding time necessary to belt and loosen each subject) and for research purposes. However, it cannot be stated that the MOD method has the same reliability as the ST method due to the body type of the individual restraining the subject's lower extremities. Although the clinician, who is providing the stability, is not the same as a static belt, the clinician was not required to perform any specific resistive force to stabilize the subjects. The clinician simply lay over the subject's lower extremities (or sit on their feet in the case of flexion) and remained stationary throughout the testing procedure. Although it is most likely that the clinician was required to perform some type of isometric muscular contraction(s) to keep the subject's lower extremities from moving, the clinician never reported fatigue, and so on. None of the subjects felt uncomfortable during the MOD testing procedure, and they all felt that they were stabilized equally during both testing methods.

Although the MOD method seems to be an acceptable alternative to the ST testing procedure in a normal asymptomatic population, its implementation remains to be determined in a population of LBP subjects. Endurance times in trunk extensor (3,14,17,35) and trunk flexor muscles (7,16,24,29,31) in subjects with LBP are less than that in the normal healthy population. Because an apparent loss of muscle control after trunk muscle fatigue could be considered one of the important causes of LBP (32), it could be postulated that there is an importance on assessing trunk muscle endurance as a potential measure of future LBP. The ST methods have been implemented in different LBP populations (2,34,36) and post single-level microdiscectomy (13). Although the reasons for termination of the trunk extension endurance test seem to be a combination of fatigue (2,10,30,34), motivation (10,30), and LBP (34) (at least for symptomatic individuals), trunk endurance testing has become a tool of reference for evaluating muscle performance in patients with LBP, most notably before and after rehabilitation programs (10). It remains to be determined whether our MOD procedure, while comparing favorably with ST trunk endurance testing methods in normal subjects, will also compare as favorably in a symptomatic population.

The large SD range on both of the flexion testing methods could be of concern, especially because it was higher than in previous studies (5,26). The fact that the SD was equal in both testing methods lends reasoning to the fact that the population studied, while all are college students, varied significantly in their athletic level of activities. The range of scores (Table 2) demonstrates a significant variation in scores for both testing methods. It could be reasonably concluded that because there was a great variation in the range of scores and the fact that both testing methods compared favorably, the large SD range could be most likely due to this variation in the level of physical activity. Although this is not ideal, especially for research analysis, this variation in physical activity level may actually be more representative of the general population than that noted in previous study populations (5,26).

The flexion time was notably longer in our study compared with previous studies (5,26) for both testing methods. Although the parameter for termination of the flexion test was previously established as breaking of the 60° angle (26), it did not clearly define the exact criterion for test termination (whether the test was terminated if the subject broke the plane at all or if the test was terminated when the subjects contacted the prefabricated wedge that was 10 cm behind them). It was determined in a pilot testing before this study that the reliability was less favorable using breaking the 60° plane in any manner as a criterion for test termination compared with when the subject broke the plane of testing and contacted the prefabricated wedge. Although this was a different criterion for test termination than previously used (26), it was a more clearly defined criterion in our opinion as it was often difficult to ascertain if the subject broke the 60° plane throughout the entire spine as many subjects would lose lumbar lordosis yet appeared to maintain the 60° angle as measured through the midline of the spine and not to contact the wedge. Due to this potential complication in test termination criterion, we decided on the criterion of the subject contacting the prefabricated wedge. Our test-retest reliability was lower compared with the study of initial standardization (26), and although the different criterion for test termination may be a factor for this, it is our opinion that the reliability results would have compared even less favorably as our pilot reliability testing using the previously established criterion was poorer than the criterion we implemented. Contacting the wedge was a criterion used in Chan's study (5). Although our mean time scores and SD were still longer than Chan (5), our times were more comparable to this study than the study of McGill et al. (26). Chan (5) also used a more specific subject base (college-aged rowers) compared with our subjects. The variability of our subject base could have accounted for the differences of times, especially the SD range. Also, more recently, alternative testing positions for abdominal endurance have been advocated (6). Testing in the same manner as McGill et al. (26) except that the initial starting position (with a wedge) of 45° and in a trunk curl up position (with bilateral scapulae clearing the table) was more time effective and showed less variance than the 60° trunk starting position (6). The curl up exercise was considered easy, convenient, and representative of trunk flexor effort; it was considered a preferable alternative to the 60° flexor exercise for healthy women (n = 28) of mean age 23.8 ± 2.4 years (6). This would seem to concur with feedback from our subjects, many of whom felt that the limiting factor in their test termination was hip flexor muscle pain, fatigue, and so on, compared with abdominal muscles as the limitation.

The method of test-retest reliability may also be a consideration. Previously established standard reliability (26) was established both before and after testing with 5 subjects tested consecutively. Our testing involved many more subjects and less days of testing, which is more likely representative of clinical situations. The fact that all our subjects were the emphasis of reliability testing and the fact that our subject pool seemed variable in weight and testing duration could not only account for the lower level of reliability but also be more representative of the actual clinical situation.

Another factor that may warrant consideration is age range of subjects in the various studies. Although our subjects were all college students, our age range was quite variable (22-38 years) and our mean age was greater than previous studies (20.52 ± 1.16 years) for Chan (5) and (23 ± 2.9 years) for McGill et al. (26). Although the age ranges were not listed for other studies, our age range was quite large (22-38 years), which could also contribute to the variable fitness levels and test tolerance, therefore the greater variability in the flexion endurance scores and SD.

With our testing, the individual providing stabilization was consistent. This individual weighed 90.72 kg, who was heavier than every subject tested except one (who weighed the same). Pilot testing to determine the relationship between the individual providing stabilization during testing and the subjects being tested was implemented with a second individual providing stabilization. This second individual weighed 70.3 kg, which was greater than the mean weight of 68.4 kg in the subset of 15 subjects used in this pilot study but less than 6 of these individuals in this subset (range was 47.7-90.72 kg). Correlation analysis for this pilot study revealed a Pearson r = 0.84 between the times for holding the flexion postures and r = 0.61 for holding the extension postures and r = 0.85 between the times for holding the flexion postures and r = 0.62 for holding the extension postures when comparing the first and second individuals providing stabilization. Therefore, flexion compared very favorably with prior testing methods with this second tester, but correlation between the various extension methods was only good compared to high with other correlations. Additional testing should be implemented to determine this specific relationship before its implementation.

Practical Applications

The use of these MOD testing procedures for endurance of trunk flexion and extension has proven to be a reliable alternative to the ST method of testing. The use of these MOD procedures will allow clinicians in settings with less than ideal testing tables to assess trunk muscular endurance accurately according to accepted standards. The fact that another individual provides the necessary stabilization also allows the use of these MOD procedures for assessment of large groups of individuals, such as athletic teams, to be tested in a much more efficient manner than would be available with the standard testing procedures in which straps would have to be readjusted for each individual. Further acceptance and use of this MOD testing procedure will allow for continued study of its reliability and applicability in subjects with LBP.


1. Alaranta, H, Hurri, H, Heliovaara, M, Soukka, A, and Harju, R. Nondynamic trunk performance tests: Reliability and normative data. Scand J Rehabil Med 26: 211-215, 1994.
2. Arab, AM, Salavati, M, Ebrahimi, I, and Ebrahim Mousavi, M. Sensitivity, specificity and predictive value of the clinical trunk muscle endurance tests in low back pain. Clin Rehabil 21: 640-647, 2007.
3. Ashmen, KJ, Swanik, CB, and Lephart, SM. Strength and flexibility characteristics of athletes with chronic low back pain. J Sport Rehabil 5: 275-286, 1996.
4. Biering-Sorensen, F. Physical measurements as risk indicators for low-back trouble over a one-year period. Spine 9: 106-119, 1984.
5. Chan, RH. Endurance times of trunk muscles in male intercollegiate rowers in Hong Kong. Arch Phys Med Rehabil 86: 2009-2012, 2005.
6. Chen, L-W, Bih, L-I, Ho, C-C, Huang, M-H, Chen, C-T, and Wei, T-S. Endurance times for trunk-stabilization exercises in healthy women: Comparing 3 kinds of trunk-flexor exercises. J Sport Rehabil 12: 199-207, 2003.
7. Corin, G, Strutton, PH, and McGregor, AH. Establishment of a protocol to test fatigue of the trunk muscles. Br J Sports Med 39: 731-735, 2005.
8. Cowley, PM and Swensen, TC. Development and reliability of two core stability field tests. J Strength Cond Res 22: 619-624, 2008.
9. Crisco, JJ III and Panjabi, MM. The intersegmental and multisegmental muscles of the lumbar spine. A biomechanical model comparing lateral stabilizing potential. Spine 16: 793-799, 1991.
10. Demoulin, C, Vanderthommen, M, Duysens, C, and Crielaard, JM. Spinal muscle evaluation using the Sorensen test: A critical appraisal of the literature. Joint Bone Spine 73: 43-50, 2006.
11. Deplitto, A, Rose, SJ, Crandell, CE, and Strube, MJ. Reliability of isokinetic measurements of trunk muscle performance. Spine 16: 800-803, 1991.
12. Evans, K, Refshauge, KM, Adams, R, and Aliprandi, L. Predictors of low back pain in young elite golfers: A preliminary study. Phys Ther Sport 6: 122-130, 2005.
13. Flanagan, SP and Kulig, K. Assessing musculoskeletal performance of the back extensors following a single-level microdiscectomy. J Orthop Sports Phys Ther 37: 356-363, 2007.
14. Hultman, G, Nordin, M, Saraste, H, and Ohlsen, H. Body composition, endurance, strength, cross-sectional area and bone density of erector spinae in men with and without low back pain. J Spinal Disord 6: 114-123, 1993.
15. Hyun, JK, Lee, JY, Lee, SJ, and Jeon, JY. Asymmetric atrophy of multifidus muscle in patients with unilateral lumbosacral radiculopathy. Spine 32: E598-E602, 2007.
16. Ito, T, Shirado, O, Suzuki, H, and Takahashi, M. Lumbar trunk muscle endurance testing: An inexpensive alternative to a machine for evaluation. Arch Phys Med Rehabil 77: 75-79, 1996.
17. Jorgensen, K and Nicholaisen, T. Trunk extensor endurance: Determination and relation to low back trouble. Ergonomics 30: 259-267, 1987.
18. Keller, A, Hellesnes, J, and Brox, JI. Reliability of the isokinetic trunk extensor test, Biering-Sorensen test, and Astrand bicycle test. Spine 26: 771-777, 2001.
19. Kjaer, P, Bendix, T, Sorensen, JS, Korsholm, L, and Leboeuf-Yde, C. Are MRI-defined fat infiltrations in the multifidus muscles associated with low back pain? BMC Med 25: 2-11, 2007.
20. Leetun, DT, Ireland, ML, Willson, JD, Ballantyne, BT, and Davis, IM. Core stability measures as risk factors for lower extremity injury in athletes. Med Sci Sports Exerc 36: 926-934, 2004.
21. Liemohn, WP, Baumgartner, TA, and Gagnon, LH. Measuring core stability. J Strength Cond Res 19: 583-586, 2005.
22. Luoto, S, Heliovaara, M, Hurri, H, and Alaranta, H. Static back endurance and the risk of low-back pain. Clin Biomech 10: 323-324, 1995.
23. MacDonald, DA, Moseley, GL, and Hodges, PW. The lumbar multifidus: Does the evidence support clinical beliefs? Man Ther 11: 254-263, 2006.
24. Malliou, P, Gioftsidou, A, Beneka, A, and Godolias, G. Measurement and evaluations in low back pain patients. Scand J Med Sci Sports 16: 219-230, 2006.
25. McGill, SM. Low Back Disorders: Evidence-Based Prevention and Rehabilitation. Champaign, IL: Human Kinetics, 2007.
26. McGill, SM, Childs, A, and Liebenson, C. Endurance times for low back stabilization exercises: Clinical targets for testing and training from a normal database. Arch Phys Med Rehabil 80: 941-944, 1999.
27. McGill, SM, Grenier, S, Bluhm, M, Preuss, R, Brown, S, and Russell, C. Previous history of LBP with work loss is related to lingering effects in biomechanical, physiological, personal, and psychological characteristics. Ergonomics 46: 731-746, 2003.
28. Meyer, CR. Measurement in Physical Education. New York, NY: Ronald Press Co, 1974.
29. Moffroid, MT. Endurance of trunk muscles in persons with chronic low back pain: Assessment, performance, training. J Rehabil Res Dev 34: 440-447, 1997.
30. Moreau, CE, Green, BN, Johnson, CD, and Moreau, SR. Isometric back extension endurance tests: A review of the literature. J Manipulative Physiol Ther 24: 110-122, 2001.
31. O'Sullivan, PB, Mitchell, T, Bulich, P, Waller, R, and Holte, J. The relationship between posture and back muscle endurance in industrial workers with flexion-related low back pain. Man Ther 11: 264-271, 2006.
32. Parnianpour, M, Nordin, M, Kahanovitz, N, and Frankel, V. The triaxial coupling of torque generation of trunk muscles during isometric exertion and the effect of fatiguing isoinertial movements on the motor output and movement patterns. Spine 13: 982-992, 1988.
33. Renkawitz, T, Boluki, D, and Grifka, J. The association of low back pain, neuromuscular imbalance, and trunk extension strength in athletes. Spine J 6: 673-683, 2006.
34. Ropponen, A, Gibbons, LE, Videman, T, and Battie, MC. Isometric back extension endurance testing: Reasons for test termination. J Orthop Sports Phys Ther 35: 437-442, 2005.
35. Roy, SH, Deluca, CJ, and Casavant, DA. Lumbar muscle fatigue and chronic low back pain. Spine 14: 992-1001, 1989.
36. Stewart, M, Latimer, J, and Jamieson, M. Back extensor muscle endurance test scores in coal miners in Australia. J Occup Rehabil 13: 79-89, 2003.
37. Thorstensen, A and Carlson, H. Fiber types in human lumbar back muscles. Acta Phys Scand 131: 185-202, 1987.

upper torso; trunk flexion; trunk extension

© 2010 National Strength and Conditioning Association