The use of physical activity questionnaires (PAQ) to quickly assess physical activity (PA) levels in a target population is important for research and public health surveillance. Current methods of validating such questionnaires typically involve comparing accelerometer data with questionnaire responses (1-3,8,13). Correlations or agreement between PAQ and accelerometer data have been found to range from r (or K) = 0.0 to 0.4 (1-3,8,13). Reasons typically cited for these weak to moderate associations include poor understanding of PA concepts, recall and social desirability biases of study participants, digit preference (e.g., responses rounded to nearest 5 or 10 min), and the inability of accelerometers to measure some types of movement (1-3,8,13). To date, no study has explored the usefulness of accelerometers in quantifying levels of specific physical activities asked about on PAQ.
Accelerometers are small devices that provide objective measurements of PA via minute-by-minute recording of body acceleration (20). Although various accelerometer models are available that measure single- or multiple-axis movements, the basic technology is the same for all models: piezoelectric or piezoresistive materials produce electric signals when the devices are subject to acceleration. Acceleration data are reported as counts per minute and are typically used to estimate energy expenditure, expressed as metabolic equivalents (METs; 1 MET is equivalent to resting metabolic rate expressed as 3.5 mL·kg−1·min−1) (4) or as minutes per day spent in light-, moderate-, and vigorous-intensity activities (15). The devices have been shown to provide an optimum combination of ease of use and measurement accuracy when a single unit is worn on the waist (17). In general, accelerometers underestimate energy expenditure for lifestyle activities such as raking, shoveling, and sweeping, and also for static activities (4,10,20). The validity of PAQ results can be evaluated by several cut-point methods that classify PA minutes by intensity (15). Some cut points were derived from studies comparing subjects' measured energy expenditure with their counts per minute during treadmill activity (9), whereas other cut points were similarly derived while study participants were engaged in sports, exercise, household chores, or walking activities(10,17). Comparisons of PA estimation methods using a variety of cut points have shown that they produce varying results and that none of the methods seem to be ideal(2,15).
When PAQ are being validated, the scoring methodology applied to the validation criteria should match the way in which activity is assessed by the questionnaire (11). The U.S. Centers for Disease Control and Prevention/American College of Sports Medicine guidelines define moderate- and vigorous-intensity physical activities in terms of absolute movement (e.g., METs), and accelerometers can be used to estimate time spent in PA at intensities that are defined by MET levels (4,10,17,20). However, these units of measure are not practical for the evaluation of PAQ. Survey questions typically ask respondents to report the frequency, intensity, and duration of their leisure-time or lifestyle activities (5-7). For example, the 2003 Behavioral Risk Factor Surveillance System survey asked respondents to report total frequency and daily duration of 10-min bouts of moderate and vigorous PA (5). These were defined in terms of the estimated increase in breathing or heart rate and by examples of leisure-time, transportation, and household activities. Two characteristics of validation study designs have not been investigated as potential causes of weak associations in PAQ validation studies. First, researchers compare survey responses that are measures of movement that may be influenced by respondents' fitness levels (e.g., changes in breathing and heart rate) with measures of absolute movement (e.g., accelerometer data). Additionally, PAQ quantify the time spent in activity, which could include warm-up time, rest periods, and cool-down time, whereas accelerometers measure only the time that people are physically active. We do not understand how the weak associations between PAQ responses and accelerometer data are related to study design, PAQ, and accelerometer characteristics.
In this study, we simulated a validation study of a generic PAQ that defined physical activity intensity by changes in heart rate to identify limitations in accelerometer measurement and data analysis methods. In place of the subjective estimates of heart rate change that a PAQ would elicit, we used actual heart rate monitor (HRM) data as an indicator of activity levels (16). We used ActiGraph accelerometer data to identify PA bouts, which we compared with HRM data. We used intensity cut points previously determined for the ActiGraph accelerometer (9,10,17) to correlate the intensity and duration of our participants' movement with their percent heart rate reserve (%HRR). Our hypothesis was that the minutes of moderate and vigorous activity during free-living movement would differ between those yielded from HRM data using PA bouts computed from accelerometer data using existing accelerometer cut-point equations.
Participants and study design.
A convenience sample of 25 individuals aged 22-56 yr were recruited through word of mouth and from posted announcements at a university. Participants were required to be willing to comply with the study protocol and to have no physical limitations that would limit daily physical activity. All subjects read and signed an informed consent form for participation in a research study as approved by the institutional review board at the university. Participants were enrolled in the study for 1 wk, during which time they made two visits to an exercise physiology laboratory at the university. During the first visit, participants read and signed the informed consent form, completed a health history, were measured for their height using a stadiometer (Seca Corp., Columbia, MD), were weighed with a physician's scale (Health-O-Meter, Bridgeview, IL), and were instructed on how to wear an ActiGraph motion detector and a HRM for seven consecutive days during their waking hours and whenever they were not in contact with water. During the second clinic visit, participants returned the motion detectors and HRM. Body mass index (BMI) was computed from the height and weight data (kg·m−2). Of the original 25 individuals, seven men and five women were used for the present study analysis. Thirteen individuals were excluded from the present analysis because of limited or no monitored 8- to 10-min bouts of moderate or vigorous activity as assessed by either the ActiGraph or HRM.
We used the ActiGraph accelerometer (model 7164, ActiGraph, LLC, Fort Walton Beach, FL) to monitor participants' PA during the 7-d monitoring period. The ActiGraph was placed in a pouch that participants wore on the waistband or belt over the right anterior axillary line. ActiGraphs were initialized the day before the 7-d monitoring period began; they were programmed to record data in 60-s epochs (time segments) and synchronized to be worn concurrently with the HRM. After the monitoring period, accelerometer data were downloaded and imported into a digital file. Accelerometer calibration took place at the beginning and end of the study and consistently met manufacturer's specifications (± 5% of reference value).
We used the Polar Vantage NV heart watch (Polar Electro, Inc., Lake Success, NY) to measure participants' heart rate (HR) during the monitoring period. Participants wore the HRM at the level of their fifth intercostal space from the time they awoke in the morning until they went to bed at night. The HRM was programmed to record HR data in 60-s intervals. After the 7-d monitoring period, HR data were downloaded and imported into a digital file for analysis.
Physical activity scoring.
Activity data from the ActiGraphs were imported into Microsoft Excel for bout scoring, and a file was created for each import. HR data were first downloaded into a digital file, and then were merged with ActiGraph data in Excel files for each participant and imported into SAS version 9.1. For ActiGraph data to be considered eligible for analysis, the ActiGraph had to be worn for a minimum of 600 min·d−1 (10 h·d−1) and a minimum of 4 d·wk−1, including one weekend day. ActiGraph data that recorded 60 min or more of consecutive zeros were removed from analysis. We excluded minutes for which accelerometer counts exceeded 25,000 counts per minute. Six minutes in the dataset were 20,000-25,000 counts per minute, and all of these were included in PA bouts that were classified as vigorous-intensity bouts by all scoring methods. We computed the %HRR using the Karvonen formula (12), by which a person's maximum heart rate is estimated to be 220 minus the person's age. Heart rate data are sometimes subjected to interference between the chest strap and the receiver, typically discernible by HR values of 0 bpm or values greater than 220 bpm. Heart rate values deemed as abnormal according to the above criteria were replaced by the average of the preceding and subsequent value, but if abnormal readings occurred for 5 min or more, the data were excluded from analysis.
In this step of our simulated validation study, we summarized the data using several different scoring methodologies to compare the PA captured by each method. Three sets of moderate- and vigorous-intensity bouts of activity were defined as periods of time in which 8 of 10 consecutive minutes were of a single intensity (moderate or vigorous). Two minutes of a different intensity (light, moderate, or vigorous) were allowed within every 10-min period. Bouts were separated by at least three consecutive minutes of a different intensity. Each set of bouts was derived using one set of ActiGraph cut points that is commonly considered for use in validating PAQ (Freedson (9), Swartz (17), or Hendelman (10); Table 1). We then classified the bouts by a second criterion as very light, light, moderate, or vigorous according to participants' mean %HRR (Table 1) (18). Thus, we analyzed three sets (Freedson, Swartz, and Hendelman) of PA bouts in which each bout had two intensities (counts per minute cut points, mean %HRR). We computed the duration of each moderate- and vigorous-intensity bout.
We scored PA bouts using two sets of %HRR cut points (USDHHS (18); Table 1) and the same criteria as for the ActiGraph data, which required at least 8 of 10 consecutive minutes to be of the intensity of interest. The %HRR ranges that were included in the classifications were 25-59%HRR, ≥ 60%HRR; and 45-59%HRR, ≥ 60%HRR. We computed the duration of each bout.
We computed means, standard deviations (SD), and ranges for participants' ages, BMI, total ActiGraph counts per day, and monitored time and plotted distributions of minutes recorded for accelerometer counts per minute as smoothed histograms for each of four HRM intensity levels (< 25%HRR (very light), 25.0-44.9%HRR (light), 45.0-59.9%HRR (moderate), ≥ 60%HRR (vigorous)). As previously described, we estimated the numbers of moderate and vigorous PA bouts using three ActiGraph cut points (those of Freedson (9), Swartz (17), and Hendelman (10)). To examine the HRM intensity of activity that ActiGraph data analysts might classify as moderate or vigorous intensity, moderate- and vigorous-intensity PA bouts were cross-classified by four HRM intensity levels (< 25%HRR (very light), 25.0-44.9%HRR (light), 45.0-59.9%HRR (moderate), ≥ 60%HRR (vigorous)), and chi-square tests were computed for each cut-point method. To examine weekly activity totals that might be compared in a PAQ validation study, we report the means and SD of frequency (d·wk−1) of activity and total daily durations (min·d−1) in bouts for the days that participants participated in PA for the three ActiGraph and two %HRR (45 and 60%HRR; and 25% and 60%HRR) scoring methods. We used SAS 9.1 (SAS, Cary, NC) for all analyses.
Study participants had a mean age of 31.0 ± 14.3 yr and a mean BMI of 25.2 ± 5.3 (Table 2). Their mean accelerometer total was 339,390 ± 106,827 counts per day, with a threefold difference between the least and most active participants. The range in counts per minute for moderate-intensity %HRR (45-59%) ranged from 0 to more than 16,000 counts per minute; for approximately 40% of the minutes during which participants engaged in moderate-intensity PA as indicated by their %HRR, their accelerometer counts were less than 1000 counts per minute (Fig. 1). Accelerometer counts ranging from 2000 to 6000 counts per minute had corresponding %HRR in the light, moderate, and vigorous ranges. The majority of minutes above 7000 counts per minute had corresponding vigorous-intensity %HRR.
Our estimates of the number of 10-min bouts of moderate-intensity PA that participants engaged in ranged from 46 when we used the Freedson cut points (1952-5724 counts per minute) to 658 when we used the Hendelman cut points (192-7525 counts per minute) (Table 3). During the majority of the moderate-intensity PA bouts that we identified using the Freedson (78.3%), Swartz (88.0%), and Hendelman (94.7%) cut points, participants' mean %HRR was less than 45% (indicating PA of very light or light intensity). Most of the moderate PA bouts were in the range of 25-44%HRR. Fewer bouts of vigorous PA were found in the data, and the majority were in the vigorous category of ≥ 60%HRR. Chi-square tests for each ActiGraph method were significant (P < 0.0001), indicating that the intensity classifications differed between the accelerometer and HRM scoring methods.
We found significant differences in the frequency and daily amounts of activity in PA bouts as indicated by the HRM and ActiGraph (Table 4). The mean frequency of moderate-intensity PA ranged from 1.1 d·wk−1 for %HRR cut points of 45 and 60% to 7.0 d·wk−1 for Hendelman's cut points. Total daily duration of moderate-intensity PA on active days ranged from mean 17.9 min·d−1 for cut points of 25 and 60%HRR to 139.2 min·d−1 for Hendelman's cut points. Vigorous-intensity PA frequency ranged from 0.7 d·wk−1 using Hendelman's cut points to 1.5 d·wk−1 when Swartz's and the HRM cut points were used. Frequency and mean duration of moderate-intensity PA were higher when the 25%HRR cut point was used (5.8 d·wk−1 and 111.2 min·d−1, respectively) compared with 1.1 d·wk−1 and 17.9 min·d−1 using the 45%HRR cut point.
The results of this simulation study suggest that use of the heart rate response to determine cut points for moderate- and vigorous-intensity activity yields PA estimates that differ significantly from those produced with the use of existing accelerometer cut-point equations. Thus, the scoring methodology applied to accelerometer validation criteria would not match the way in which activity is assessed by PAQ that ask respondents to define PA in terms of changes in heart rate (e.g., Behavioral Risk Factor Surveillance System) (5). The intensity, frequency, and duration of PA estimates varied by the method used to determine the duration and intensity, and bouts of moderate-intensity PA, as determined from accelerometers, frequently included periods in which participants' heart rates did not increase to levels indicative of moderate-intensity PA. For most PA bouts that were classified as moderate-intensity by each of the ActiGraph cut points, participants had a mean %HRR indicative of only light- or very-light-intensity PA. The mean frequency with which our participants engaged in moderate-intensity PA varied substantially by analytic method, from 1.1 d·wk−1 (45-59%HRR) to 7.0 d·wk−1 (192-7526 counts per minute), as did the mean duration, from 17.9 min·d−1 (45-59%HRR) to 139.2 min·d−1 (192-7526 counts per minute). Vigorous-intensity PA bouts had fewer differences in intensity classification and less variability in frequency and mean total duration than the moderate-intensity bouts. We found that ActiGraph counts ranging from 2000 to 6000 counts per minute had corresponding %HRR of light, moderate, and vigorous intensities.
In two previous studies, researchers found significant differences in estimates of mean and total minutes per day of moderate and vigorous PA duration produced with different ActiGraph cut-point methods (2,15). We found similar results and also showed differences in the number of days per week of moderate- and vigorous-intensity PA estimated by each analytic method. Vigorous-intensity PA had smaller differences in frequency and duration than moderate-intensity PA.
Studies in which accelerometers were used to measure moderate-intensity PA showed that household activities such as sweeping and mopping floors had the same measured MET values as slow walking but that the household activities were associated with lower mean accelerometer counts (500-700 counts per minute) than walking (2000 counts per minute) (10,20). Our study provides additional evidence that survey respondents could experience physiologic changes indicative of moderate intensity (such as increased heart rate and breathing) while engaging in activities that produced mean accelerometer counts as low as 500 counts per minute. In contrast, our results also show that bouts of PA defined as 192-7526 counts per minute (Hendelman cut points), 574-4944 counts per minute (Swartz cut points), or 1952-5724 counts per minute (Freedson cut points), may not be intense enough to elicit a change in heart rate indicative of moderate-intensity PA. For example, in 78% of moderate-intensity PA bouts that we identified using Freedson cut points, participants' mean %HRR was less than that associated with moderate-intensity PA (Table 3). Consequently, we suggest that moderate-intensity PA as reported in PAQ responses may not be accurately detected in accelerometer data because of limitations of the accelerometers in addition to the known limitations of PAQ. Much of the accelerometer scoring error may be overestimation of minutes of moderate-intensity PA. When activities can be performed at light or moderate intensities, it is important that PA assessment tools differentiate between light- and moderate-intensity activity. Our results suggest that more research is needed to distinguish PA intensities in objective measurement devices and scoring algorithms.
Our finding that much of the time thought to be spent in moderate-intensity PA may, in fact, be light- or very-light-intensity PA (Table 3) raises an additional research question. Can our data suggest a more appropriate cut point for discerning light- from moderate-intensity minutes in our data than the cut points we used? We look to Figure 1 for the answer. If an appropriate cut point existed, we might see a natural break in the data. For example, between 6000 and 7000 counts per minute on Figure 1, the percentage of minutes of moderate-intensity (by %HRR) PA declined, and the percentage of minutes of vigorous-intensity PA increased. Accelerometer counts exceeding 7000 counts per minute had corresponding HRM intensities that were primarily vigorous. Consequently, our data suggest that any cut points in the range of 6000-7000 counts per minute would be appropriate for discriminating between moderate- and vigorous-intensity PA in our data. Turning to the range of 0-3000 counts per minute, we do not see a similar pattern. The majority of minutes in this range were either light- or very-light-intensity PA. Therefore, we conclude that it is not possible for any ActiGraph cut points to distinguish between light- and moderate-intensity minutes in our study. Future research is needed to determine whether our observations may be true in other samples and populations.
These findings have implications for PAQ validation studies in which the limitations of ActiGraph results were not considered (1-3,8,13) and suggest possible reasons for the weak associations between accelerometer results and PAQ results based on respondents' self-reports of increased breathing or heart rate. Figure 1 illustrates the wide variations in how people move, displacing their hips with any given intensity of heart rate, and the variety of %HRR for any given range of movement intensity. This variation is an additional explanation for the relatively low correlations or agreement between perceived intensity of moderate and vigorous PA reported by PAQ respondents and the amount of moderate- and vigorous-intensity PA that adults were found to engage in based on ActiGraph results and the use of standard cut points (1-3,8,13). Our results suggest that the activity classification (e.g., active, insufficiently active, inactive) from objective measurement could differ substantially according to the analytic method used to classify their PA level. Kappa and other statistics that were used to quantify association assume that perfect agreement (association = 1.0) between the measures is possible. Our study demonstrates that this statistical assumption may be false and that the best possible agreement in those studies was likely to have been less than 1.0. Thus, the associations found in the validation studies might not reflect the true validity of those PAQ for assessing PA levels.
Our findings have implications for future PAQ validation studies that ask respondents to recall PA bouts defined by the survey as causing small or large physiologic changes, such as increases in heart rate. Aadahl and Jorgensen (1) have suggested that accelerometers may not be suitable for PAQ validation. The weak association between accelerometer results and %HRR findings suggests the need to develop and validate objective methods of assessing PA participation that are more comparable with those used in PAQ. Objective measures of the changes in breathing and heart rate and participation in specific activities would be appropriate for validating PAQ that are used for surveillance (5-7).
The limitations of accelerometers in validating PAQ do not necessarily pertain to other applications of accelerometers in measuring PA, such as comparisons of the PA levels of different populations or provision of adjunct information in combination with data from other instruments. For example, the errors described in this study would not be relevant for evaluating an intervention study in which the measure of interest was total minutes of activity above threshold levels of accelerometer counts per minute, because the errors would be the same for all participants. However, when comparing accelerometer data with measures of moderate- and vigorous-intensity PA, the errors described here would affect the study results. We did not evaluate the usefulness of accelerometers in classifying subjects by measures such as total estimated energy expenditure.
This study is subject to several limitations. First, the sample size was small, and the activity patterns of the sample may not be representative of those in the U.S. adult population. The study was not meant to be generalized to a population. However, because of the wide range in activity levels among study participants, we believe that we have illustrated some of the limitations associated with the use of accelerometers in measuring adult activity levels. Second, we used only the ActiGraph accelerometer; results from other monitors may differ. However, results from the ActiGraph accelerometer have been shown to be consistent with those from two other commonly used monitors (20). Third, the results of HRM can be influenced by factors such as heat, humidity, the subjects' emotional condition, and excessive sweating, which may interfere with the conduction of the heart rate response to the recording device (16). However, few moderate-intensity PA bouts had corresponding %HRR that was vigorous intensity.
In conclusion, we conducted a study to examine the correlation between PA estimates derived from motion detectors and estimates derived from HRM. We designed the study to simulate the validation of a PAQ that asked adults to report their participation in moderate- and vigorous-intensity PA on the basis of changes in their breathing and heart rate, and we derived a protocol for scoring intermittent PA bouts. Not surprisingly, our estimates of the amount of activity engaged in by study participants varied with the criteria we used to define various levels of activity. Although most of the activity of study participants was in the light range, as judged by their HRM results, intensity of this activity, on the basis of accelerometer results interpreted using published criteria, was often quite different. Thus, the methods used to objectively measure and analyze activity are important and can have large effects on estimates of time that people spend in moderate- to vigorous-intensity activity. These findings help to explain the inconsistent and low associations between PA estimates based on accelerometer results and those based on other methods of measuring physical activity, such as questionnaires.
The findings and conclusions in this report are those of the authors and do not necessarily represent the views of CDC.
1. Aadahl, M., and T. Jorgensen. Validation
of a new self-report instrument for measuring physical activity. Med. Sci. Sports Exerc.
2. Ainsworth, B. E., D. R. Bassett Jr., S. J. Strath, et al. Comparison of three methods for measuring the time spent in physical activity. Med. Sci. Sports Exerc.
32(9 Suppl):S457-S464, 2000.
3. Anderson, C. B., M Hagstromer, and A. Yngve. Validation
of the PDPAR as an adolescent diary: effect of accelerometer cut points. Med. Sci. Sports Exerc.
4. Bassett, D. R., B. E. Ainsworth, A. M. Swartz, et al. Validity of four motion sensors in measuring moderate intensity physical activity. Med. Sci. Sports Exerc.
32(9 Suppl):S471-S480, 2000.
6. Centers for Disease Control and Prevention. National health and nutrition examination survey questionnaire 2001-2002: physical activity and physical fitness. http://www.cdc.gov/nchs/nhanes.htm
8. Craig, C. L., A. L. Marshall, M. Sjostrom, et al. International physical activity questionnaire: 12-country reliability and validity. Med. Sci. Sports Exerc.
9. Freedson, P. S., E. Melanson, and J. Sirard. Calibration of the Computer Science and Applications, Inc. accelerometer. Med. Sci. Sports Exerc.
10. Hendelman, D., K. Miller, C. Baggett, E. Debold, and P. Freedson. Validity of accelerometry for the assessment of moderate intensity physical activity in the field. Med. Sci. Sports Exerc.
32(9 Suppl):S442-S449, 2000.
11. Jacobs, D. R. Jr., B. E. Ainsworth, T. J. Hartman, and A. S. Leon. A simultaneous evaluation of ten commonly used physical activity questionnaires. Med. Sci. Sports Exerc.
12. Karvonen, M. J., E. Kentala, and O. Mustala. The effects of training on heart rate
. Acta Medica. Exp. Fenn.
13. Matthews, C. E., B. E. Ainsworth, C Hanby, et al. Development and testing of a short physical activity recall questionnaire. Med. Sci. Sports Exerc.
15. Strath, S. J., D. R. Bassett Jr., and A. M. Swartz. Comparison of accelerometer cut-points for predicting time spent in physical activity. Int. J. Sports Med.
16. Strath, S. J., A. M. Swartz, D. R. Bassett Jr., W. L. O'Brien, G. A. King, and B. E. Ainsworth. Evaluation of heart rate
as a method for assessing moderate intensity physical activity. Med. Sci. Sports Exerc.
32(9 Suppl):S465-S470, 2000.
17. Swartz, A. M., S. J. Strath, D. R. Bassett Jr., W. L. O'Brien, G. A. King, and B. E. Ainsworth. Estimation of energy expenditure using CSA accelerometers
at hip and wrist sites. Med. Sci. Sports Exerc.
32(9 Suppl):S450-S456, 2000.
18. U.S. Department of Health and Human Services. Physical Activity and Health: A Report of the Surgeon General.
Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, 1996.
20. Welk, G. J., S. N. Blair, K. Wood, S. Jones, and R. W. Thompson. A comparative evaluation of three accelerometry-based physical activity monitors. Med. Sci. Sports Exerc.
32(9 Suppl):S489-S497, 2000.