Journal Logo

Measurement of Moderate Physical Activity: Advances in Assessment Techniques

A comparative evaluation of three accelerometry-based physical activity monitors

WELK, GREGORY J.; BLAIR, STEVEN N.; WOOD, KHERRIN; JONES, SHELBY; THOMPSON, RAYMOND W.

Author Information
Medicine & Science in Sports & Exercise: September 2000 - Volume 32 - Issue 9 - p S489-S497
  • Free

Abstract

Improving techniques for the assessment of physical activity is one of the top research priorities for the exercise science field. An accurate assessment of activity is necessary to more objectively evaluate the health benefits of physical activity and the effectiveness of behavioral interventions designed to promote physical activity. With recent public health guidelines currently endorsing walking and other forms of “lifestyle” physical activity (22), efforts are specifically needed to improve assessments of moderate intensity physical activity. This need was explicitly stated as a primary research priority in the Surgeon General’s report on Physical Activity and Health (25, p. 201).

There are several difficulties inherent in assessing moderate-intensity lifestyle physical activity. Because lifestyle activity is generally less structured than more vigorous bouts of exercise, it may be harder to code and recall on self-report instruments. Moderate intensity or lifestyle activities can also encompass a wide range of intensities. For example, one person may perform a task such as raking leaves at a light intensity, whereas another may perform it much more vigorously. The Surgeon General’s report (25) uses the general classification of approximately four to six multiples of resting metabolic rate (RMR) (4–6 METs) (1 MET = average rate of energy expenditure at rest, or 3.5 mL·kg1·min1) as a description of “moderate” intensity. However, this range varies greatly by age—from as low as 2.3 for individuals over 80 to as high as 7.1 for younger adults. Another challenge is that moderate intensity activity, as it is often promoted, can be accumulated throughout the day. This fact makes it especially difficult to obtain an accurate measure of duration or volume of activity. Thus, there are difficulties associated with assessing frequency, intensity, and duration of lifestyle activity.

There has been some consensus that a new generation of activity monitors (e.g., Computer Science and Applications monitor [CSA], the Tritrac-R3D monitor [Tritrac], and the Biotrainer activity monitor [Biotrainer]) may offer the most promise for assessing free-living physical activity. These instruments are small, are easy to use, and provide objective measures of activity across the full range of the activity continuum. The instruments also feature solid-state construction, interval-based time sampling, and computer downloading capabilities. These features make them well suited for field-based research applications or for validating self-report measures of physical activity.

Previous studies generally indicate that these monitors provide a valid measure of physical activity but less accurate estimates of energy expenditure (EE) (21). Some studies on the Caltrac, for example, suggested that it underestimates EE at lower intensities but overestimates EE at higher intensities (15,18). Other studies noted consistent overestimations across a full range of speeds (3,7,20). When used to assess treadmill walking on a grade, monitors were found to significantly underpredict EE (9). Based on these findings, most researchers recommend caution when interpreting EE estimates from these monitors. Although some error can be attributed to inherent limitations of the monitors, the results may also be due to a lack of valid population-specific regression equations for each of the monitors.

Because equations developed under laboratory conditions may not be generalizable to field-based applications, a number of studies have examined the utility of these monitors under normal free-living conditions. Studies have compared activity monitors against heart rate monitors (13,26), self-report instruments (16), direct observation techniques (27), and doubly labeled water (6,10). Although these comparisons provide useful information, the inherent limitations of each individual method preclude a definitive evaluation of validity (2). Even the use of doubly labeled water (an apparent gold standard measure of EE) is limited by the inability to segment activity by time.

Studies have also compared devices based on similar or competing accelerometry-based technologies. For example, studies have compared the Caltrac and Tritrac (26), the CSA and the Caltrac (17), and the Tritrac with the Actigraph/CSA monitor (14). Moderate to high correlations (range: r = 0.77–0.88) have generally been reported among these monitors. Although these findings support the convergent validity of the monitors, they cannot address criterion validity without a criterion measure. Moreover, unless the activities are coded or recorded during the monitoring, it is not possible to identify probable sources of discrepancy in the results. To further advance research on the assessment of moderate physical activity, it is necessary to begin systematically evaluating the absolute and concurrent validity of these instruments under a variety of conditions. This can be accomplished only by comparing multiple instruments under the same conditions and against a more suitable “gold standard.”

In the present study, we sought to address some of these gaps in the literature. The primary purpose of the study was to evaluate three different activity monitors under both laboratory and field conditions against measures from indirect calorimetry. The study was designed to examine both absolute (comparisons against EE) and concurrent validity (comparisons among monitors) for a variety of activities and to document potential differences in validity between treadmill and lifestyle conditions.

A secondary purpose was to provide estimates of the EE required for various forms of “lifestyle” physical activity. The Compendium of Physical Activities (1) includes estimates for a variety of classes and intensities of physical activities; however, because the Compendium was designed to provide a comprehensive database, some of the MET values were based on estimates from textbooks or approximations based on similar activities. To increase our understanding of the potential health benefits of lifestyle physical activity, it is important to have more accurate estimates of the energy costs of these activities.

METHODS

Participants.

Participants for the study were 52 young to middle-aged adults (21 men, 31 women), with a mean age of 29 yr. The mean BMI of the participants was 21.7 kg·m2 and 24.3 kg·m2 for the men and women, respectively. Thus, participants were slightly leaner (and probably more active and fit) than the average adult. The overall study protocol was approved by the Institutional Review Board at the Cooper Institute, and written informed consent was obtained from all participants.

Instruments.

Three contemporary activity monitors were compared in the present study. Each is described below:

CSA monitor (Computer Science and Applications, Inc., Shalimar, FL [www.csa-ucc.com]).

The CSA monitor (also called the Actigraph) is a one-dimensional accelerometer that uses a cantilevered piezoelectric plate to sense acceleration. An analog to digital filter generates a linear output from the acceleration value. This output is then summed over a specified time interval set by the user. The CSA is currently the smallest, but most expensive, monitor available. Recently several studies were conducted to establish the calibration of the CSA for the prediction of EE in adults (17) and children (24).

Tritrac monitor (Hemokinetics Inc., Madison, WI [www.reining.com]).

The Tritrac monitor is a three-dimensional accelerometer that is based on the same technology as the more frequently studied Caltrac monitor (3,20,23). Like the Caltrac, it has demonstrated reasonable validity as a measure of physical activity but questionable validity as a measure of EE (16). The Tritrac reports activity in raw movement counts for each plane, a three-dimensional vector magnitude output, as well as estimates of activity calories and total calories that factor in body weight and basal metabolism. An independent calibration equation was recently developed to provide an alternative means of quantifying EE (19). Theoretically, the three-dimensional nature of the Tritrac would make it better suited to assess more sporadic lifestyle activity; however, studies have been equivocal. Welk and Corbin (26) reported correlations of r = 0.88 between concurrent recordings from a Caltrac and Tritrac, suggesting that the two instruments provide similar information. Others have demonstrated stronger validation criteria for three-dimensional monitors when compared with doubly labeled water (5) and indirect calorimetry (8).

Biotrainer monitor (IM Systems, Baltimore, MD [www.imsystems.net]).

The Biotrainer monitor is a one-dimensional accelerometer that is similar in design to the CSA monitor. It features a similar piezoelectric bender accelerometer as the Tritrac (and CSA), but uses a high-speed sampling method instead of an “integrative” approach in which the counts are accumulated over a specific interval. It features the same time sampling capabilities as the CSA and the Tritrac but does not require computer initialization; instead time is tracked retrospectively from the time the data are downloaded. This feature makes it especially useful for field monitoring. No studies to date have been published on the validity or reliability of the Biotrainer; however, the instrument is currently being used as the primary monitoring device for a National Institutes of Health–funded physical activity intervention currently being conducted in our laboratory (Project PRIME).

Pilot study.

Because the present study required that participants wear three different monitors at the same time, it was important to determine whether the position of the monitors would influence the results. In a technical analysis of accelerometer positioning, the hip position yielded the best prediction of EE (5); however, it was not clear whether the position around the hip (from front to back) would affect accelerometer output. To check the effect of monitor positioning on accelerometry output, we had a sample of the participants (N = 42) complete three 6-min bouts of walking at 3 mph (80.5 m·min1) with the monitor positioned in three different places along the right side of the hip. The positions were numbered 1, 2, and 3 from front to back. Position 1 was at the anterior axillary line (iliac crest), position 2 was in line with the mid-axillary line, and position 3 was an equal distance further posterior. The order of the positions was randomized across participants. Analysis of variance (ANOVA) indicated no significant differences in scores by position for the Biotrainer and the Tritrac, but significant differences were found for the CSA [F (2,123) = 18.3, P < 0.001]. Position 2 yielded the highest output, followed by position 3 and position 1 (see Table 1). Post hoc analyses confirmed that all pairwise comparisons were significantly different. The results of this pilot study suggest that monitor position influences the CSA but not the Tritrac or the Biotrainer.

T1-8
Table 1:
Comparison of output from three accelerometers at three monitoring positions.

Procedures.

Participants in the main study completed two choreographed routines in a randomized and counterbalanced design. Each routine included six activities of 6 min duration (total of 36 min) that were designed to simulate “lifestyle” activities commonly recommended in public health guidelines. Three of the activities—walking at 3 mph 80.5 m·min1), brisk walking at 4 mph (107.3 m·min1), and jogging at 6 mph (170.0 m·min1)—were performed in both routines to evaluate the reliability of the assessments and to cross-validate existing calibration equations developed for these instruments (11,19). Six of the participants completed only one trial, and three others did not choose to complete the jogging pace.

The three additional activities in routine 1 took place outdoors and included activities that are commonly performed in gardening or lawn care (mowing, raking, and shoveling). The additional activities in routine 2 took place indoors and included activities used in general housework (vacuuming, sweeping, and stacking groceries). For these lifestyle activities, participants were given some guidelines on the work task to complete but were given freedom to complete these at their normal or preferred speed. For example, in the shoveling task, participants were given the same amount of dirt and asked to shovel it into an adjacent pile, but no restrictions were made on the size of the shovel load or the rate of shoveling.

During both routines, the activity levels of the participants were assessed concurrently with all three activity monitors. Each instrument was initialized according to manufacturer specifications and set to record activity every 2 min. The instruments were inserted into padded pouches and attached to a waist belt that the participants wore on their hip. The Tritrac was positioned at the anterior axillary line, the CSA at the mid-axillary line, and the Biotrainer at the posterior axillary line. Based on the pilot study, we assumed that the Tritrac and Biotrainer would provide similar output at whatever hip position was used. Therefore, all monitors can be assumed to be positioned at the mid-axillary position.

During both routines, the EE of the participants was directly assessed with indirect calorimetry. The Sensormedics 2900 metabolic cart (Yorba Linda, CA) was used during the indoor treadmill (laboratory) activities. The Aerosport KB1-C portable metabolic cart (Ann Arbor, MI) was used for both the indoor and the outdoor lifestyle (field) activities. To ensure accuracy of the indirect calorimetry, both systems were calibrated between each trial. Data from each of the monitors and from the metabolic cart were synchronized by initializing each of the instruments to begin recording activity at the beginning of a new minute.

Data processing.

Data from all of the monitors were downloaded and processed before data analysis. The data files from each monitoring device were first standardized and synchronized so that they could be processed collectively. Each of the data files was examined visually to look for malfunctioning units and to check the synchronization of the units. Once a usable database was established, we examined the distribution of the data from each monitor for the presence of outliers. Data points greater than 2 SD from the mean were removed from the analyses because the presence of outliers can significantly impact the results of correlation and regression analyses. Outliers were found for the Aerosport KB1-C (N = 9), Sensormedics 2900 cart (N = 5), CSA (N = 1), Tritrac (N = 2), and Biotrainer (N = 1). The outliers for the indirect calorimetry measurements were all low and were probably due to ineffective seals around the mouth or improper calibration. The cause of the outliers for the activity monitors cannot be determined but could be due to errors in synchronization of the units. The errors were isolated incidents because there was no pattern associated with these cases. To maximize sample sizes for the data analyses, we employed listwise deletion procedures. Sample sizes ranged from N = 33 to N = 44 for the various analyses.

Data analyses.

The study was designed to provide a comparative evaluation (i.e., both absolute and relative assessments) of the reliability and validity of these instruments under both laboratory and field conditions. The descriptive statistics for each monitor and activity were first computed to permit comparisons of the results with other published studies in the literature. Test-retest reliability was evaluated with intraclass correlation coefficients computed between the two treadmill trials. To provide an overall indicator of correspondence among the monitors, we computed Pearson product moment correlations using raw output scores from each of the measures. These analyses were performed separately for the treadmill and lifestyle conditions to compare the relationships for these two settings. Because the use of repeated measurements from the same participant could bias the results of correlation and regression analyses, we randomly assigned each participant to one of the three treadmill paces and one of the six lifestyle activities. With this assignment, each participant was represented only once in the correlation and regression analyses.

The subsequent data analyses examined the predictive validity of the instruments for estimating EE. For these analyses, we tested the accuracy of currently available calibration equations for both the CSA (11) and the Tritrac (19). We also tested manufacturer-based algorithms for predicting EE for both the Tritrac and the Biotrainer. The average of data from minutes 5 and 6 were used to represent the metabolic cost of each activity. For the monitors, the number from the corresponding 2-min interval was divided by two to represent the activity level per minute. Because we used two different criterion measures to evaluate the monitors under both laboratory and field conditions, the subsequent analyses were conducted in two phases.

In phase I (laboratory condition), we compared the accuracy of the estimations against the data from the Sensormedics 2900 metabolic cart. Data from trial 2 were selected for these analyses, because the number of participants with complete data from all monitors was greater. A two-way (monitor × activity) repeated-measures ANOVA was performed with contrasts specified to compare each of the monitors against the criterion measure. Simple effects by pace were tested for each significant contrast to determine which paces had significantly different estimations of EE.

In phase II (field condition), we compared the accuracy of the estimations against the data from the Aerosport KB1-C portable metabolic cart. Because each of the activities was deemed independent, we performed a separate repeated-measures ANOVA for each of the six lifestyle activities. Univariate contrasts were specified to compare each of the monitors against the criterion measure.

RESULTS

Descriptive results.

The descriptive statistics for the treadmill and lifestyle conditions are shown in Table 2. Using the data from the two treadmill trials, we evaluated the test-retest reliability of each of the monitors. The intraclass correlation coefficients for the CSA, Tritrac, and Biotrainer monitors were R = 0.85, 0.96, and 0.89, respectively. A comparison value for measured V̇O2 (mL·kg1·min1) was R = 0.94.

T2-8
Table 2:
Descriptive statistics for activity monitors under treadmill and field conditions.

Correlations among the monitors were computed for both the treadmill and the field-based conditions to examine the utility of these monitors for assessing physical activity. These correlations are shown in Table 3. The mean correlations among the three monitors were (r = 0.86 and 0.70) for the treadmill and lifestyle activities, respectively. The high correlations under both conditions suggest that the monitors provide similar information about physical activity under both laboratory and field conditions. The monitors were highly correlated with measured V̇O2 for the treadmill conditions (mean r = 0.86) and more weakly correlated for the lifestyle activities (mean r = 0.55).

T3-8
Table 3:
Correlations among monitors for treadmill and lifestyle conditions.

Because the sample sizes for these correlation analyses were small, we also checked the correlations using pooled data from all of the participants. The sample sizes ranged from N = 230 to N = 235 for the treadmill condition and from N = 207 to N = 246 for the lifestyle condition. Similar to the previous results, we found high correlations with V̇O2 for the treadmill condition (r = 0.84) but lower correlations for the lifestyle condition (r = 0.44). When correlations among the monitors were examined, the correlations were similar for both conditions (r = 0.84 and 0.82 for the treadmill and lifestyle conditions, respectively).

Results for treadmill activities.

The predictive utility of the monitors for assessing EE was assessed by comparing the predicted MET values with measured METs obtained through indirect calorimetry. Descriptive data for these estimations are shown in Table 4. The SM2900 system yielded slightly higher MET scores than predicted from the Compendium of Physical Activities (1). The estimated values from the Compendium for the 3, 4, and 6 mph paces were 3.5, 4.0, and 10 METs, respectively. The values recorded in the present study were 3.68, 5.36, and 10.14 METs (values averaged across two trials).

T4-8
Table 4:
Measured and predicted MET values for three treadmill activities.

Overall, the three activity monitors yielded reasonably close predictions of METs for the walking paces but became progressively less accurate at higher intensities. Repeated-measures ANOVA indicated that the estimated METs from the CSA monitor were not significantly different from measured METs [trial 1:F (1,104) = 3.91, P > 0.05; trial 2:F (1,99) = 0.04, P > 0.05]. The estimated values were significantly different from the measured values for both the Tritrac [trial 1:F (1,104) = 24.5, P < 0.001; trial 2:F (1,99) = 23.0, P < 0.001] and the Biotrainer [trial 1:F (1,104) = 26.5, P < 0.001; trial 2:F (1,99) = 51.2, P < 0.001]. Post hoc analysis with Tukey tests revealed that the Tritrac estimate was significantly higher (P < 0.01) for the two lower-intensity paces, whereas the Biotrainer estimate was significantly higher for all three paces. The results for all three monitors were consistent across both trials.

Because the results of these EE estimations are dependent on the accuracy of the prediction equations, we performed a cross-validation of the existing calibration equations. The summary data on the results of these cross-validation analyses are provided in Table 5. Overall, the Tritrac possessed the strongest validation criteria (average R2 = 0.86, average SEE [standard error of the estimate] = 1.25), followed by the CSA (average R2 = 0.65, average SEE = 2.19) and the Biotrainer (average R2 = 0.52, average SEE = 2.65).

T5-8
Table 5:
Cross Validation of energy expenditure prediction equations for each monitor.

Bland-Altman plots (4) were generated to reveal the errors in prediction across a range of scores (Fig. 1). Examination of the plots indicates that there were smaller errors in prediction for the Tritrac across the range of possible scores. All errors for the Tritrac were within 4 METs, whereas a significant number of difference scores for the CSA were considerably larger. A wider spread of scores at the higher intensities was evident for all of the monitors. This indicates that prediction errors were consistently larger during higher-intensity activity. The tendency for the Tritrac and Biotrainer to overestimate EE can be seen with a greater number of difference scores (measured minus estimated) less than zero. This type of prediction bias was less apparent for the CSA monitor; however, the range of errors was considerably larger at all intensities for the CSA. This most likely accounts for the lower R2 values.

F1-8
Figure 1:
Bland-Altman plots showing the accuracy of energy expenditure estimations across the range of scores for treadmill activities.

Results for lifestyle activities.

The descriptive data from the field-based activities are shown in Table 6. The estimated METs from the Compendium ranged from 2.5 for sweeping and vacuuming to 5.5 for lawn mowing. The observed MET values from the Aerosport KB1-C were consistently higher than the Compendium values for the three indoor household tasks and lower than the Compendium values for two of the three outdoor lawn/gardening tasks. The largest discrepancies were observed for the sweeping and vacuuming tasks, with values 1.37–1.4 METs higher than estimated. The discrepancies for the other four activities ranged from 0.07 to 0.52 (mean = 0.33).

T6-8
Table 6:
Measured and predicted MET values for six different lifestyle activities.

The estimated METs from each monitor were significantly different from the METs measured with the Aerosport (P < 0.001). Post hoc analyses with Tukey tests confirmed that the estimated MET levels were significantly lower than the measured METs for each of the monitors and for each of the lifestyle activities. When they were averaged across the six lifestyle activities, the degree of underprediction was 53%, 57%, and 52% for the CSA, Tritrac (Nichols et al. equation), and Biotrainer, respectively. When the activities were examined individually, the closest correspondence was found for lawn mowing, with estimations ranging from 67% to 74% of the measured EE.

DISCUSSION

We investigated the comparative validity of three different activity monitors under both treadmill and lifestyle conditions. Although numerous studies have tested activity monitors in the laboratory, this is one of the first to systematically evaluate monitors under both laboratory and field conditions. This is also one of the first studies to report relationships among three different but comparable monitors. The three monitors selected are based on similar accelerometry principles, but they differ in their sensitivity to movement as well as in how movement is accumulated and processed within the monitor. For these reasons, it isreasonable to expect that there may be measurable differences in the validity, reliability, and utility of these competing products.

Consistent with previous studies (11,24), strong correlations were found between the activity monitors and V̇O2 (range: r = 0.85–0.92) under laboratory conditions. Lower correlations were found between the activity monitors and these indirect calorimetry measures under lifestyle conditions (range: r = 0.48–0.59). A more compressed range of scores in the lifestyle activities may contribute to the lower correlations; however, we found even lower correlations (range: r = 0.40–0.47) when we examined correlations on the pooled data. Because correlations would typically improve if these data were autocorrelated, it is likely that the low correlations are due more to the nature of lifestyle activities.

Also consistent with previous studies are the strong and consistent correlations reported among the three different monitors. The most notable difference in technology among the devices is the three-dimensional nature of the Tritrac compared with the single-plane assessments provided by the Biotrainer and the CSA. A previous study (26) reported high correlations (r = 0.88) between the Tritrac and its unidimensional predecessor, the Caltrac. The similar correlations reported here among the Tritrac and the CSA and Biotrainer suggest that the various accelerometry-based devices provide similar information despite different technologies and sensitivities.

Most studies have reported poor predictive validity when data from activity monitors are used to predict EE values. In the present study, we found no significant differences in EE estimates from the CSA monitor across all three treadmill speeds. When averaged across all three speeds, the estimates from the CSA were within 3.3% of the measured EE. The estimates from the Tritrac and the Biotrainer were less accurate, with overestimations averaging out to 112% and 128%, respectively. The tendency for the monitors to overestimate EE at higher intensities is consistent with previous literature from other monitors (3,7,19,20).

Although the CSA yielded more accurate estimations than the other monitors, this effect may be due to differences in the accuracy of prediction equations rather than differences in technology. To examine the results in more detail, we cross-validated the equations using data in the present study. If the validation criteria obtained from this independent sample are close to the original data, this would suggest that the equation is more generalizable for the broader population. The results for the CSA were weaker than those reported from the original equation (11) as well as those from a similar study on children (24). It is not clear how much shrinkage should be expected from the equation, but the results here indicate some loss of predictive accuracy in the calibration equation. The results for the Tritrac were similar to those reported from the original equation (19). Eston and colleagues (8) also observed stronger relationships with the Tritrac (R2 = 0.82) than with the CSA monitor when predicting V̇O2. It is not clear whether the better results for the Tritrac are due to its three-dimensional nature or the prediction equations used to estimate EE.

The results for the Biotrainer were not strong; however, these estimates required back extrapolation of raw movement counts from the internal calorie estimate and may be biased. The output of the Biotrainer also has a more limited scale, so the monitor may be less sensitive at picking up small differences in activity. This may lead to considerable reductions in the accuracy of the assessment, particularly when assessing lower-intensity lifestyle physical activity. Comparison data are not available for the Biotrainer, but additional testing is warranted on the newer units that can record raw movement counts rather than integer estimates of calories.

The findings that the CSA yielded more accurate estimations of EE despite weaker predictive accuracy can be explained using the Bland-Altman plots (Fig. 1). The plots for the CSA reveal an even distribution about the zero point on the y-axis, indicating that the CSA is equally likely to overestimate as to underestimate. The Tritrac, on the other hand, had a much tighter distribution of difference scores (yielding a higher R2 and lower SEE), but a tendency to overestimate EE as the majority of scores were below the zero line on the y-axis. The results with the Nichols equation (19) were clearly better than those from the manufacturer-based equation (see Table 4). This observation, combined with the accurate predictions from the Freedson equation (11), suggests that laboratory-based calibration studies are more accurate than typical manufacturer-based algorithms. This fact likely contributes to the weaker validation criteria observed for the Biotrainer. Because high correlations were found between the Biotrainer and V̇O2 and also with the other monitors, it is likely that the estimations from this device would be improved if more accurate calibration equations were available.

It is not clear whether the more sporadic patterns observed for the CSA are typical results for this device or are reflective of problems with the particular units tested in our study. In the pilot study, we observed that the CSA yielded significantly different results when positioned at different points around the hip. No differences were observed for the Tritrac and Biotrainer monitors during the same testing. To our knowledge, this is the first study indicating that position on the hip can lead to differences in output. This observation merits further study, since differences owing to position may make it difficult to compare results across studies.

Although there was a general tendency for the monitors to overestimate EE during the treadmill condition, there was a clear tendency for the monitors to underpredict the measured EE values during field activities. When estimates were averaged across all six conditions, the magnitude of error ranged from 38% to 48% for the three monitors. The magnitude of underprediction was similar between monitors but varied slightly by activity. The fact that the monitors underpredicted EE during these lifestyle activities is not surprising, because most of the lifestyle activities included considerable upper body movement, which cannot be assessed with a monitor worn on the hip. The closest estimation was found for lawn mowing (21–29% underprediction), which was the activity that involved the most locomotor activity. The use of field-based prediction equations may improve these estimations somewhat; however, it would be difficult to create a generalizable equation since free-living activity is inherently variable.

Underprediction appears to be common for these lifestyle activities, but it is possible that errors in estimations of different activities could average out over a whole day. When whole-body calorimeters are used to assess the validity of the monitors, higher correlations and better estimations have been observed. For example, Bouten et al. (6) observed correlations of 0.73 with measured physical activity level (PAL = average metabolic rate divided by the sleeping metabolic rate). Similarly, Gretebeck et al. (12) observed tight relationships between EE estimates from a Caltrac and the doubly labeled water (D2O) method over a 7-d span. Average daily EE estimates from the Caltrac were only 4.1% lower than the D2O values. Thus, despite the limitations of these devices for measuring lifestyle activities, long-term relationships during free-living activity appear to be better than expected.

In some of the earliest work with prototypes of the original Caltrac, Montoye and colleagues (18) described some of the limitations of accelerometry-based assessments of physical activity. In a comparison of 14 different movement tasks, they concluded that in general, monitors overestimate EE for activities with a small force:displacement ratio (e.g., jumping, running) and underestimate EE for activities with a large force:displacement ratio (e.g., stair climbing, knee bends). These relations were clearly evident in the present data, because we observed slight overestimations for fast treadmill activities and pronounced underprediction for the various lifestyle activities. The extent to which these errors balance out in the course of a given day may influence the resulting accuracy of the predication estimates for total daily EE.

Overall, this study reveals that the validity of activity monitors varies by setting (laboratory vs field) and by type of activity. The results of this study agree with previous studies that indicate that contemporary, accelerometry-based activity monitors provide a useful measure of physical activity but less accurate predictions of EE. Although the devices vary in a number of ways (e.g., cost, technological sophistication, sensitivity, and output measure), the results of this study generally indicate that each provides similar information. This was evidenced most strongly by the high correlations among the various monitors for both treadmill and lifestyle activities as well as by the similar underprediction of EE during the field-based assessments. Although the errors in some of the estimations are considerable, these monitors still provide the most objective and detailed record of physical activity for behavioral and epidemiologic research. Further work is needed to clarify the sources of error and causes of intraindividual variability with accelerometry-based monitors. Further work is also needed to better understand how these monitors can be used most effectively to assess free-living physical activity.

This study was completed while the lead author was employed at the Cooper Institute. The authors would like to acknowledge a number of student interns who assisted with the data collection in this project, particularly Natalie Cannon and Marius Maianu. We also thank the staff members of the Cooper Institute and the Cooper Fitness Center for their willingness to serve as participants for the project. Special thanks go to Carolyn E. Barlow for helping set up the database, Matt Mahar for some technical suggestions, Melba Morrow for her editorial comments, Stephanie Parker for proofing the manuscript and the two equipment manufacturers (IM Systems Inc. and Computer Science and Applications, Inc.) for providing instruments to use in the study. This work was supported by the International Life Sciences Institute Center for Health Promotion (ILSI CHP). The use of trade names and commercial sources in this document is for purposes of identification only and does not imply endorsement by ILSI CHP. In addition, the views expressed herein are those of the individual authors and/or their organizations and do not necessarily reflect those of ILSI CHP.

REFERENCES

1. Ainsworth, B. E., W. L. Haskell, A. S. Leon, et al. Compendium of Physical Activities: classification of energy costs of human physical activities. Med. Sci. Sports Exerc. 25: 71–80, 1993.
2. Ainsworth, B. E., H. J. Montoye, and A. S. Leon. Methods of assessing physical activity during leisure and work. In:Physical Activity, Fitness and Health: International Proceedings and Consensus Statement. C. Bouchard, T. Stephens, and R. J. Shephard (Eds.). Champaign, IL: Human Kinetics, 1994, pp. 146–159.
3. Balogun, J. A., D. A. Martin, and M. A. Clendenin. Calorimetric validation of the Caltrac accelerometer during level walking. Phys. Ther. 69: 501–509, 1989.
4. Bland, J. M., and D. G. Altman. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 8: 307–310, 1986.
5. Bouten, C. V. C., A. A. H. J. Sauren, M. Verduin, and J. D. Janssen. Effects of placement and orientation of body-fixed accelerometers on the assessment of energy expenditure during walking. Med. Biol. Eng. Comput. 35: 50–56, 1997.
6. Bouten, C. V. C., W. P. H. G. Verboeket-Van De Veene, K. R. Westerterp, M. Verduin, and J. D. Janssen. Daily physical activity assessment: comparison between movement registration and doubly labeled water. J. Appl. Physiol. 81:1019–1026, 1996.
7. Bray, M. S., J. R. Morrow, N. F. Butte, and J. M. Pivarnik. Caltrac versus calorimeter determination of 24-h energy expenditure in female children and adolescents. Med. Sci. Sports Exerc. 26: 1524–1530, 1994.
8. Eston, R. G., A. V. Rowlands, and D. K. Ingledew. Validity of heart rate, pedometry, and accelerometry for predicting the energy cost of children’s activities. J. Appl. Physiol. 84: 362–371, 1998.
9. Fehling, P. C., D. L. Smith, S. E. Warner, and G. P. Dalsky. Comparison of accelerometers with oxygen consumption in older adults during exercise. Med. Sci. Sports Exerc. 31: 171–175, 1999.
10. Fogelholm, M., H. Hilloskorpi, P. O. Laukkanen, W. V. M. Lichtenbelt,and K. Westerterp. Assessment of energy expenditure in overweight women. Med. Sci. Sports Exerc. 30: 1191–1197, 1998.
11. Freedson, P. S., E. Melanson, and J. Sirard. Calibration of the Computer Science and Applications, Inc. accelerometer. Med. Sci. Sports Exerc. 30: 777–781, 1998.
12. Gretebeck, R., H. J. Montoye, and W. Porter. Comparison of the doubly labelled water method for measuring energy expenditure with Caltrac accelerometer readings (Abstract). Med. Sci. Sports Exerc. 29: S60, 1997.
13. Janz, K. F. Validation of the CSA accelerometer for assessing children’s physical activity. Med. Sci. Sports Exerc. 26: 369–375, 1994.
14. Kochersberger, G., E. McConnell, M. N. Kuchibhatla, and C. Pieper. The reliability, validity, and stability of a measure of physical activity in the elderly. Arch. Phys. Med. Rehabil. 77: 793–795, 1996.
15. Maliszewski, A. F., P. S. Freedson, C. J. Ebbeling, J. Crussemeyer, and Kastango K. B. Validity of the Caltrac accelerometer. Pediatr. Exerc. Sci. 3:141–151, 1991.
16. Matthews, C. E., and P. S. Freedson. Field trial of a three-dimensional activity monitor: comparison with self report. Med. Sci. Sports Exerc. 27: 1071–1078, 1995.
17. Melanson, E. L. Jr., and P. S. Freedson. Validity of the Computer Science and Applications, Inc. (CSA) activity monitor. Med. Sci. Sports Exerc. 27: 934–940, 1995.
18. Montoye, H. J., R. A. Washburn, S. Servais, A. Ertl, J. G. Webster, and F. J. Nagle. Estimation of energy expenditure by a portable accelerometer. Med. Sci. Sports Exerc. 15: 403–407, 1983.
19. Nichols, J. F., C. G. Morgan, J. A. Sarkin, J. F. Sallis, and K. J. Calfas. Validity, reliability, and calibration of the Tritrac accelerometer as a measure of physical activity. Med. Sci. Sports Exerc. 31: 908–912, 1999.
20. Pambianco, G., R. R. Wing, and R. Robertson. Accuracy and reliability of the Caltrac accelerometer for estimating energy expenditure. Med. Sci. Sports Exerc. 22: 858–862, 1990.
21. Pate, R. R. Physical activity assessment in children and adolescents. Crit. Rev. Food Sci. Nutr. 33: 321–326, 1993.
22. Pate, R. R., M. Pratt, S. N. Blair, et al. Physical activity and public health: a recommendation from the Centers for Disease Control and Prevention and the American College of Sports Medicine. JAMA 273: 402–407, 1995.
23. Sallis, J. F., M. J. Buono, J. J. Roby, D. Carlson, and J. A. Nelson. The Caltrac accelerometer as a physical activity monitor for school-age children. Med. Sci. Sports Exerc. 22: 698–703, 1990.
24. Trost, S. G., D. S. Ward, S. M. Moorehead, P. D. Watson, W. Riner, and J. R. Burke. Validity of the Computer Science and Application (CSA) activity monitor in children. Med. Sci. Sports Exerc. 30: 629–633, 1998.
25. U.S. Department of Health and Human Services. Physical Activity and Health: A Report of the Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, 1996.
26. Welk, G. J., and C. B. Corbin. The validity of the Tritrac-R3D activity monitor for the assessment of physical activity in children. Res. Q. Exerc. Sport 66: 202–209, 1995.
27. Welk, G. J., and C. B. Corbin. The validity of the Tritrac-R3D activity monitor for the assessment of physical activity, II: Temporal relationships among objective assessments. Res. Q. Exerc. Sport 69: 395–399, 1998.
Keywords:

PHYSICAL ACTIVITY ASSESSMENT; MOTION SENSORS; CSA; TRITRAC; BIOTRAINER

© 2000 Lippincott Williams & Wilkins, Inc.