Journal Logo

Basic Sciences: Epidemiology

Development and Testing of a Short Physical Activity Recall Questionnaire

MATTHEWS, CHARLES E.1; AINSWORTH, BARBARA E.2; HANBY, CARA1; PATE, RUSSELL R.3; ADDY, CHERYL4; FREEDSON, PATTY S.5; JONES, DEBORAH ARRIAZA6; MACERA, CAROLINE A.7

Author Information
Medicine & Science in Sports & Exercise: June 2005 - Volume 37 - Issue 6 - p 986-994
doi: 10.1249/01.mss.0000171615.76521.69
  • Free

Abstract

Surveillance efforts to assess the full range of physical activity (PA) behaviors encountered in daily living (e.g., household, occupational, and leisure-time activities) are designed to identify population subgroups at risk for adverse health outcomes, because their current activity levels are lower than the public health recommendations (5,16). Identification of at-risk populations enables communities and state and local governments to target PA promotion programs effectively and monitor long-term trends. The telephone-administered Behavioral Risk Factor Surveillance System (BRFSS) is the primary source for these surveillance data in most states (6). However, not all states have resources to administer the current seven-item BRFSS PA module (6). A valid short telephone-administered PA surveillance instrument may therefore be useful to classify individuals into one of the three PA categories evaluated in current surveillance efforts: 1) physically inactive (<1 d·wk−1 and <10 min·d−1 of moderate-vigorous PA); 2) meets current recommendations for regular moderate-vigorous PA; and 3) insufficiently active (not 1 or 2). We operationally define meeting current recommendations as participating in moderate activity (3–6 metabolic equivalents (METs)) on at least 5 d·wk−1 for 30 min·d−1, or vigorous activity (≥6.1 METs) on at least 3 d·wk−1 for 20 min·d−1. In this definition, time spent in moderate-vigorous activity is derived only from activity bouts lasting at least 10 min.

In this research, we developed and cognitively tested two different three-item PA instruments in an ethnically diverse sample of adults. Both candidate instruments were examined for reliability, as well as classification accuracy against a series of 24-h PA recalls, and objectively measured activity levels during a 28-d study period.

METHODS

Study objectives and design.

This research had three components: 1) the development of two short telephone-administered activity instruments, 2) reliability testing of these instruments, and 3) validity testing of the instruments by comparing them with a series of 24-h PA recalls (24PAR) and an objective measure of daily activity (by Actigraph).

Participant recruitment.

Participants were recruited from the greater Columbia metro area and Sumter, SC through public service announcements in area newspapers; a liaison to the African American community from Sumter, SC; flyers posted at the University of South Carolina (USC); church-based health fairs; and from a group of past participants in research at USC. We designed and monitored our recruitment for the project to ensure capturing a wide age distribution, similar numbers of men and women, and a 40% sample of African Americans. In an effort to capture individuals who were more likely to be physically inactive at work and in their leisure time, we also tracked indicators of physical inactivity in the form of available BRFSS questions. These questions asked about usual nonoccupational participation in “any physical activities or exercise such as running, calisthenics, golf, gardening, or walking for exercise,” as well as information about the best description of participants’ activity at work (i.e., mostly sitting or standing, mostly walking, or mostly heavy labor or physically demanding work) (6). Roughly 25% of participants enrolled in the study reported no leisure-time activity, and about 15% reported no leisure-time activity and mostly sitting or standing at work. Based on these figures, we felt that a modest number of physically inactive individuals were entering the study. The USC institutional review board approved all participant recruitment and data collection procedures, and each participant read and signed an informed consent. Participants in the study received $25 and a detailed PA report after completing the study.

Short telephone-administered activity recall (STAR) questionnaire development.

To develop the questionnaires, we reviewed commonly used PA assessment instruments for potential candidate items by reviewing instruments in the collection of physical activity questionnaires (8), MEDLINE searching through the end of 2000, and published reports in our personal files. From this review, our investigative group developed two three-item instruments that could be used to estimate the prevalence of both occupational and nonoccupational activity behaviors and that were likely to provide information consistent with current telephone-administered surveillance instruments (i.e., BRFSS). To be consistent with existing surveillance methods, the time frame selected for the questionnaires was a “usual week,” and the mode of intended administration was by telephone. Two instruments were developed and revised in two rounds of cognitive testing with the use of established protocols (29).

Cognitive interviews of the candidate instruments were conducted using a retrospective interview script in combination with spontaneous probes following National Center for Health Statistics cognitive interview guidelines (29). Participants in these interviews (N = 20) were on average 38 yr of age (range 20–48 yr), 55% women, 65% Caucasian, 20% African American, and 15% were from other racial/ethnic groups. About half (55%) reported meeting current physical activity guidelines, and most had at least a college education (70%).

The primary objective of the interviews was to refine the initial survey questions by evaluating their ability to capture: 1) the activity domains of interest, 2) the appropriate intensity of activity, and 3) a minimal threshold for activity bouts (e.g., 10 min). Following the administration of the surveys, participants were interviewed and asked to report the specific type of activity they were thinking about when they reported activity participation on the survey. These activities were coded for activity intensity using the compendium of physical activities of Ainsworth and colleagues (2). We also obtained qualitative information about the strategies respondents used in the formulation of their answers. Results suggested that the survey items provided adequate description to enable respondents to report activities done at home, at work, and during leisure time, and that roughly 75% of the activities reported were of the appropriate intensity. However, when activity intensity was not reported accurately, intensity tended to be overreported. In response to this information, we modified the description, type, and order of activities listed to describe both moderate and vigorous intensity in an effort to provide more effective intensity cues. Most respondents (>70%) felt confident that their reports of activity duration were derived from bouts of activity lasting at least 10 min rather than a simple report of accumulated activity. However, individuals that felt less confident about reporting only 10-min bouts reported difficulties recalling their activities, were unsure about the intensity of the activities they did (i.e., unsure about which activities to report), or had difficulty estimating a daily average because of the variability of their activities from one day to the next. The majority of respondents reported using estimation strategies to develop their responses to the survey questions, rather than recollection and enumeration of specific activity events. The instruments resulting from this process that were examined in reliability and validity testing used either closed-ended (CLOSED) or open-ended (OPEN) responses to capture activity frequency (Table 1).

TABLE 1
TABLE 1:
Short telephone activity recall (STAR) questionnaires

Reliability study.

Participants were randomized at study entry and were administered either the OPEN or the CLOSED version twice, approximately 3 d apart. This split-sample approach was taken to reduce the number of overall administrations of the instruments to each participant. Approximately 3 d before the participant’s first visit to the study site, the first instrument was administered by phone. The second administration was completed by phone during the first study visit.

Validity study.

To evaluate the validity of each short questionnaire, we compared the results of each with a series of 24PAR and an objective measure of activity (by Actigraph) over the 28-d reference period. The 24PAR were completed on 14 randomly selected days, and participants were asked to wear the Actigraph for the entire period. At the end of the study period (day 28), participants completed a phone-administered interview using one of the candidate questionnaires. Seven days later the other candidate questionnaire was administered by telephone. Administration order was determined at random. This data collection schedule was employed to allow the surveillance instruments to assess physical activity patterns in the recent past. The 7-d interval between administrations of the instruments was employed to minimize response contamination from one administration to the next. To evaluate the potential influence of the intensive 24PAR assessment protocol on activity behavior during the study period (reactivity), as well as its potential influence on reporting accuracy in the short instruments, we randomly split the study sample into an Actigraph-only group and an Actigraph + 24PAR group (at about a 1:4 ratio).

MEASUREMENTS

Demographics.

Demographic data (e.g., age, sex, marital status, education, household income, employment status), the number of children under 18 in the household, and work status were collected at the first visit (baseline) by using the demographics module of the 2001 BRFSS questionnaire that is available at http://www.cdc.gov/brfss/questionnaires/english.htm.

Actigraph.

Daily PA was objectively assessed by the Manufacturing Technology, Inc., Actigraph model 71256 (formerly the Computer Science and Applications, Inc., accelerometer (CSA) (27). This version of the accelerometer collects both activity count data (counts·min−1) and “cycle counts” that approximate the number of steps taken in the sampling interval (steps·min−1). Participants were instructed to wear the monitor on the right hip during waking hours and to record both the amount of time they wore the monitor each day and time spent in activities not well captured by the monitor (e.g., swimming, cycling, lifting weights). Monitors were checked for calibration after each field administration and were maintained within the manufacturer’s specifications (±5%).

To translate activity count data to duration values, we employed the following activity cut points and intensity labels: inactivity (0–259 counts·min−1); light (260–759 counts·min−1); moderate (760–5724 counts·min−1); and vigorous (≥5725 counts·min−1). A description of the development and testing of the moderate cut point is described elsewhere (10). We also calculated moderate and vigorous duration from bouts lasting at least 10 min to estimate relevant frequency data. In the process of extracting bouts, we allowed up to 2-min intervals below the moderate or vigorous thresholds in the calculation of time spent in the bouts. Duration estimates were corrected for reports of moderate and vigorous activity not captured by the monitor. To classify these activities by intensity we used the reported description of the activities and compendium derived MET values. The frequency (d·wk−1) of participation in moderate activity for 30 min·d−1 and vigorous activity for 20 min·d−1 was estimated as a proportion of days sampled, and normalized to 7 d. The Actigraph step counts appear to be highly correlated (R > 0.80) (28) with the previously validated YAMAX Digi-walker (3), but the Actigraph provides higher daily step counts. To adjust our step counts, we censored steps recorded when activity counts were below 260 counts·min−1 (i.e., inactive). We have found this censoring approach to bring the Actigraph values to within 3% of the Digi-walker in 13 middle-aged women that wore both devices for an average of 11 d (unpublished observations, 2005).

24PAR.

The 24PAR we employed was an update of an established 24PAR instrument (13,15). The updated 24PAR enabled the collection of detailed open-ended information about the specific types and intensities of activities reported via linkage with the compendium of physical activities of Ainsworth and colleagues (2). Interviewers trained to conduct PA interviews and use the 24PAR computer interface systematically led each subject back through his or her previous day (midnight to midnight) using a structured interview based on established methods (19,20). Interviewers elicited reports of time spent sleeping in the 24-h period. Waking time spent sitting quietly and reports of the type and duration of physical activity for each segment of the day (i.e., morning, afternoon, evening) were assessed. Individual activities lasting at least 5 min at a time in a given segment of a day were recorded. Interviews typically took about 15–20 min to complete. Intensity of activity was determined from the MET values assigned by the Compendium (2). A weighted sum of daily PA energy expenditure (MET·h·d−1) was calculated using MET values and reported activity duration (h·d−1) (2). The 24PAR were conducted on 50% of the randomly selected weekdays and weekends in the study period, and the frequency of moderate and vigorous activity of specified duration was estimated. We have found the 24PAR approach to provide useful estimates of PA in comparison to PA logs (12), accelerometers (12,15), and PA surveys (15).

Statistical methods.

Our primary analytic goal was to evaluate the classification accuracy of the two questionnaires from one administration to the next (reliability) and relative to our two criterion measures (validity). On the basis of their questionnaire responses, subjects were classified according to the method used for the 2001 BRFSS PA module: meeting recommendations (moderate intensity for 5 d·wk−1 and 30 min·d−1 (5 30) or vigorous intensity for 3 d·wk−1 and 20 min·d−1 (3 20)), insufficient (some moderate- or vigorous-intensity activity but not of sufficient duration or frequency to meet the recommendations), and inactive (reporting no moderate- or vigorous-intensity activity). Similar classification was completed for the 24PAR and Actigraph measures. Classification accuracy was assessed by evaluating the proportion of participants classified correctly and the proportion of extreme misclassifications (more than one category beyond the reference measure). We also calculated sensitivity and specificity (18) and the kappa statistic (7). To describe the classification accuracy with the use of kappa values, we followed the guidelines of Landis and Koch (9) for describing the strength of agreement: kappa of 0.39 or less indicates “poor” agreement, values between 0.40 and 0.75 indicate “fair to good,” and values greater than 0.75 indicate “excellent” agreement. In the reliability study, we evaluated group means from one administration of the instrument to the next, and calculated intraclass correlation coefficients. In the validity study, we also evaluated Spearman correlations for moderate activity reported on the questionnaires and our criterion measures, as well as mean differences in frequency and duration using analysis of variance.

RESULTS

Recruitment, randomization, and retention.

Overall, 108 men and women consented to participate in the study. Of this group, 104 (96%) completed the reliability study. For the reliability analyses, 48 (46%) participants were randomized to the CLOSED group and 56 (54%) to the OPEN group. Of the original 108 participants, 29 (27%) were randomized to the Actigraph-only group and 79 (73%) to the Actigraph + 24PAR group. There were seven participants who did not complete the study, all in the Actigraph + 24PAR group. Among the Actigraph + 24PAR participants completing the study (N = 72), the number of completed 24PAR was 13.6 (0.9) recalls (mean (SD), range 10–14 recalls). In addition to the seven who did not complete the study, four participants were excluded because of technical problems with the monitor (malfunction, N = 3; immersion in ocean, N = 1). Five other participants were excluded because of noncompliance with the requirement to wear the monitor at least 12 h·d−1 for at least 7 d. In this population, 12 h·d−1 is roughly 75% of waking hours. The average number of days of Actigraph wear was 21.8 (5.6) d, for an average of 919 (74) min·d−1 (values are mean (SD)). Finally, four more participants were excluded from the validation analyses as a result of incomplete short questionnaire data. Thus, complete data were available for 104 participants for the reliability study and for 88 participants for the validation study.

Descriptive information.

The average age and BMI of participants were 46 yr and 30 kg·m−2. Roughly 55% of participants were women, 38% were nonwhite (35% African American), most were married and college educated, and more than half had BMI values greater than 25 kg·m−2. There were no significant demographic differences (P < 0.05) between the Actigraph-only and Actigraph + 24PAR groups, except that more individuals who were married or cohabitating were in the Actigraph + 24PAR group (67 vs 52%). There were no significant differences (P > 0.05) between the Actigraph-only and Actigraph + 24PAR group on any of the Actigraph summary measures, which suggests that the intensive 24PAR protocol did not induce higher activity levels (data not shown).

Descriptive duration data (min·d−1) from the 24PAR were sitting 583 (102), light 243 (87), moderate 68 (51), and vigorous 11 (17) min·d−1 (values are mean (SD)). For nonsitting PA energy expenditure (MET·h·d−1), values were light 8.8 (3.2), moderate 4.3 (3.3), and vigorous 1.6 (2.5) MET·h·d−1. Descriptive values from the Actigraph were average counts (ct), 308 (130) ct·min·d−1; steps, 7782 (2885) steps·d−1; inactivity, 683 (94) min·d−1; light, 129 (36) min·d−1; moderate, 115 (48) min·d−1; and vigorous, 4 (9) min·d−1. The median duration of moderate-intensity activities derived from bouts of activity lasting at least 10 min as recorded by the Actigraph was 48 min·d−1, and from the 24PAR it was 54 min·d−1.

Reliability.

Compared with the OPEN version, the CLOSED questionnaire provided a higher prevalence of physical inactivity (13 vs 2%), a lower prevalence of meeting the moderate 5 × 30 recommendation (25 vs 38%), and a lower prevalence of meeting the recommendations (43 vs 54%, Table 2). For both questionnaires, overall correct classification from one administration to the next was reasonably high (65–92%), and the level of extreme misclassification was low (0–7%, Table 2). Kappa values for meeting the moderate, vigorous, and overall activity recommendations were between 0.46 and 0.81 (Table 2). We found no significant differences for reports of moderate activity duration between administrations of either the OPEN or CLOSED versions (Table 3). Intraclass correlation coefficient (ICC) values initially suggested that reliability was lower for the CLOSED (ICC = 55%) than for the OPEN questionnaire (ICC = 77%). On close inspection, however, exclusion of one female participant with highly discordant responses increased women’s overall ICC to 91% for the CLOSED questionnaire. Exclusion of highly discordant responses among men did not appreciably increase ICC values on the CLOSED questionnaire. In detailed analyses, we found no striking differences in our reliability indicators by age, race, or BMI (data not shown).

TABLE 2
TABLE 2:
Test-retest classification accuracy from the STAR questionnaire study, Columbia, SC 2001–2002.
TABLE 3
TABLE 3:
Moderate-intensity activity duration (min·d−1) in and reliability coefficients (ICC) in test-retest analyses of the STAR questionnaire study, Columbia, SC 2001–2002.

Validity.

Few participants were inactive in any of the measures according to our operational definition of less than 1 d·wk−1 of moderate activity lasting at least 10 min at a time, although the CLOSED questionnaire provided the highest inactivity prevalence (8 vs 0–2%) (Fig. 1). No participants were classified as inactive by the 24PAR, and the Actigraph classified only 2% of participants as inactive. The prevalence of meeting the recommendations on the 24PAR (45%) was intermediate between the two questionnaires (40 and 50%), and the value from the Actigraph was lower (30%, Fig. 1).

FIGURE 1—Prevalence of overall physical activity, by method of assessment, the Short Telephone Activity Recall (STAR) questionnaire study, Columbia, SC, 2001–2002.
FIGURE 1—Prevalence of overall physical activity, by method of assessment, the Short Telephone Activity Recall (STAR) questionnaire study, Columbia, SC, 2001–2002.

We also examined the individual frequency and duration question responses assessed on the questionnaires relative to our criterion measures, comparing moderate-intensity frequency (d·wk−1), duration (min·d−1), and overall duration (min·wk−1) between the Actigraph and 24PAR and the two short questionnaires (Table 4). In general, levels reported on the short questionnaires for both frequency and duration were lower than the two criterion measures, whereas the two questionnaires had similar results (Table 4). Correlations between reports of moderate-intensity duration on the two instruments and the criterion indices were in the range of 0.30–0.40, and there were no striking differences between the two instruments (data not shown). Short questionnaire reports of moderate activity tended to be inversely related to physical inactivity and unrelated to light activity or steps per day on the criterion measures. We also examined prevalence estimates for the frequency of participation in vigorous physical activity for each assessment method. Both short questionnaires elicited higher prevalence estimates for participating in at least 3 d·wk−1 of vigorous activity than did the criterion measures (>25 vs < 15%), but the closed-ended questionnaire provided an estimate that was lower than the open-ended instrument (26 vs 33%, data not shown).

TABLE 4
TABLE 4:
Moderate-intensity frequency and duration by assessment method, the STAR questionnaire study, Columbia, SC 2001–2002.

In terms of classification into the inactive, insufficient, and meets-recommendation categories, the correct classification between the questionnaires and the 24PAR was about 60–70%, and extreme misclassification was rare (0–3%, data not shown). However, the low rates of physical inactivity in the population limited our ability to carefully evaluate extreme misclassification. Because of the low rates of inactivity on all measures, we evaluated overall activity from the two questionnaires in terms of meeting the moderate or vigorous recommendation, or not meeting the recommendations (Table 5). For the 24PAR comparisons, kappa values were generally higher for women than men, but only of modest strength (kappa ∼ 0.40). Kappa values for the Actigraph comparisons were low for comparisons of overall activity (kappa < 0.20). Sensitivity values for the 24PAR comparisons for all participants and men and women separately ranged between 50 and 90%, and specificity values were between 63 and 84%. As for overall classification into meeting or not meeting the recommendations, both the OPEN and CLOSED questionnaires were of similar validity (kappa = 0.36–0.43). However, there appeared to be intensity-specific differences between the instruments. The OPEN moderate-intensity question appeared to have higher values (kappa = 0.46 vs 0.33). In contrast, the CLOSED vigorous-intensity question appeared to have higher kappa values (0.53 vs 0.32, Table 5). Still, given the width of the confidence intervals, these apparent differences should be interpreted cautiously.

TABLE 5
TABLE 5:
Validation classification results for all participants and by sex, the STAR questionnaire study, Columbia, SC 2001–2002.

We also examined the validation results for effects in the administration order and, using only the Actigraph data, for differences between the two experimental groups (Actigraph-only vs Actigraph + 24PAR). No effect of survey administration order was evident. Participants in the Actigraph + 24PAR group did appear to have slightly higher validity coefficients than the Actigraph-only group (N = 25), but the small sample size of this group makes definitive conclusions difficult. For overall classification into meeting or not meeting the recommendations, kappa values for the OPEN and CLOSED instruments were 0.17 and 0.22 in the Actigraph + 24PAR group, and −0.04 and 0.04 in the Actigraph-only group.

Using the short instruments, we also examined mean differences in our criterion measures by grouping them as meeting the moderate 5 × 30, vigorous 3 × 20, and overall classification (data not shown). Participants who reported meeting the moderate recommendation on the short questionnaires tended to have higher levels of 24PAR moderate activity (P = 0.08) and total 24PAR activity (i.e., light, moderate, and vigorous, P ≤ 0.05). Participants who reported meeting the vigorous recommendation had higher levels of 24PAR and Actigraph vigorous activity (P ≤ 0.05) and tended to have higher total 24PAR activity and ct·min·d−1 from the Actigraph. Overall classification by the short questionnaires into meeting the recommendations was associated with higher levels of total 24PAR activity (P ≤ 0.01), as well as greater steps per day and activity counts (ct·min·d−1) from the Actigraph (P ≤ 0.08).

DISCUSSION

In this research we developed and conducted initial testing of two short physical activity questionnaires that assessed overall moderate-vigorous activity done at home, work, and during leisure time. The main difference between the two instruments was in the response option for reports of activity frequency (i.e., open- or closed-ended). In general, both instruments were found to have reasonable reliability. Further, they elicited reports of less frequent moderate-intensity activity, but more frequent participation in vigorous activity than did either criterion measure. In terms of meeting the current PA recommendations, both instruments demonstrated classification accuracy of 60 to 70% and low rates of extreme misclassification (<5%). Both were found to stratify this population into groups with significantly different levels of total activity relative to the 24PAR and Actigraph. Taken together, results from this validation work suggest that both of the short surveillance instruments developed in this research are able to characterize important differences in overall activity at the population level.

At the same time, our evaluation of the validity of the questionnaires using methods that were more sensitive to the level of individual errors in reporting (i.e., kappa values, correlations) not only provide additional evidence of validity, but also indicate that there is substantial reporting error at the individual level in these questionnaires. The validity coefficients (Spearman r) were modest for the 24PAR comparisons (e.g., 0.3–0.4) and lower still for the Actigraph (0.1–0.3). Given the elementary nature of the questionnaires–-three items to capture all moderate and vigorous activity done at home, work, and during leisure time–-it is not surprising that individual errors were relatively large. Although minimizing errors in any assessment instrument is a high priority, the objective of measurements in the setting of public health surveillance is to take a reliable snapshot of the overall activity level of a population. Physical activity assessment for this purpose may be able to tolerate higher levels of individual error than assessments that seek to carefully characterize individual activity patterns (e.g., clinical or correlative studies); however, the magnitude and direction of the errors must remain constant over time. Clearly the short instruments developed in this research would not be suitable for use in assessing the activity patterns of individuals.

Several studies have examined the reliability of PA measures used in surveillance to estimate PA behaviors, but currently little is known about the validity of these questionnaires. Our reliability results, indicating kappa values for test-retest reliability between 0.46 and 0.81, were generally consistent with values from Brownson and colleagues (kappa 0.26–0.44) (4), Stein and colleagues (kappa 0.45–0.56) (22,23), and Shea and colleagues (Kappa = 0.65) (21), all of whom evaluated surveillance questions for leisure-time PA. Our validation results examining the correlation between reports of the duration of moderate-intensity activity are consistent with several other PA assessment methods, most of which report correlations of low to moderate strength (<0.50) as compared with PA logs and accelerometers (1,17,24,26).

Strath and colleagues (24) recently completed a careful evaluation of the current BRFSS PA survey for overall walking and nonoccupational moderate and vigorous activity among 25 healthy adults. The criterion measure against which the survey items were compared was 7 d of combined heart rate and accelerometry (25). Fifty-six percent of participants reported meeting the recommendations on the BRFSS survey, whereas only 44% were recorded as meeting the recommendations. Investigators observed kappa values of 0.40 for meeting the moderate recommendation, 0.58 for meeting the vigorous recommendation, and 0.61 for overall activity (24). Our validity coefficients were all slightly lower and in the range of 0.30–0.50, depending on the survey examined (Table 5). In contrast to our results for both the 24PAR and Actigraph, Strath and colleagues found no correlation between their objective measures and reports of moderate activity duration.

A central objective of this work was to evaluate two different short questionnaires for reliability and validity and to use this testing information in developing a single short questionnaire for field testing in a small national questionnaire. Results from this investigation revealed only minor differences between the open- and closed-ended moderate-intensity questions. The open-ended moderate-intensity item appeared to be more reliable, and the prevalence estimates for inactivity were more consistent with our criterion measures. In contrast, the closed-ended vigorous-intensity item provided a less biased estimate of the frequency of vigorous activity, and the validity coefficients for both the 24PAR and Actigraph were higher on the closed-ended vigorous-intensity item than the open-ended version. Thus, in subsequent work associated with this project, we elected to use the moderate-intensity open-ended items in combination with the closed-ended vigorous-intensity item.

Some limitations should be considered when evaluating the development and testing of these questionnaires. First, the very low prevalence of assessed physical inactivity in our study population limited our ability to evaluate the utility of the instrument for capturing very low levels of activity. The low prevalence of inactivity may have resulted from our definition of the behavior, which in contrast to the current BRFSS classification of inactivity, included both occupational and nonoccupational activity. Second, the convenience sample recruited for this project was from a wide age range (19–85 yr), was ethnically diverse (38% nonwhite), and most were married and college educated. Thus, our results may be less generalizable for unmarried individuals with less education or of different ethnic backgrounds. Future work to extend this initial development and testing effort is needed to determine the utility of these instruments in other study populations. Third, the lack of a gold-standard criterion measure of relevant PA behaviors (type, duration, frequency) is lacking. We elected to employ both self-report (24PAR) and objective measures of activity in parallel analyses. In general, both criterion measures supported the validity of the reported duration of moderate-intensity activity on the questionnaires, and both questionnaires captured significant differences in indicators of total activity on both criterion measures. For classification accuracy, kappa values from the 24PAR comparisons indicated “fair to good” agreement; however, Actigraph results did not support questionnaire validity. There may be limits in the precision of waist-mounted accelerometers at the individual level, particularly for estimating moderate-intensity activity (1,24). Combined heart rate and accelerometry methods have recently been shown to provide validity results for questionnaire classification that are slightly stronger than our 24PAR results (24). Future studies should consider implementing these more intensive objective measures of PA behaviors. However, developing complex individual prediction equations and additional efforts to capture both heart rate and motion become less feasible as the study size and the length of the measurement period increase.

This research has several strengths. We employed both self-report and objective measures of PA, and in general both provided some evidence supporting the validity of the two questionnaires tested in population-level comparisons. Second, the length of our period (28 d) would be predicted to minimize intraindividual variability in daily and weekly activity pattern. We have previously found that 10–14 d of assessment are required to capture 80% of the interindividual variability when using either 24PAR or objective measures (11,14).

The limitations of very brief PA assessments, such as the ones developed in this study, should be noted. First, attempting to capture the complexity of human PA patterns that are the composite of multiple types, frequencies, intensities, and durations of activity done for many different purposes (e.g., occupation, leisure time) in only a few short questions will not capture important contextual details associated with the activity. Information about specific activities of interest, such as strength training, walking, or transportation, cannot be assessed by these questionnaires, because all activity domains are reported in aggregate. Moreover, short, all-encompassing questions are likely to be the most cognitively challenging for respondents to accurately comprehend and formulate a reasonable response to. Our finding that participants tended to overreport the frequency of vigorous-intensity activity may reflect difficulties in understanding our intensity definitions. Another possibility is that participants were unsure how to formulate a single response to cover daily activities that ranged from moderate to vigorous intensity.

In conclusion, results from this investigation suggest that both the OPEN and CLOSED questionnaires can be reliably obtained from one administration to the next. In addition, in comparison to our criterion measures, both instruments had relatively high classification accuracy, low rates of extreme misclassification, and validity coefficients for the 24PAR indicating “fair to good” agreement. Results were stronger in women than men. To further establish the generalizability of these questionnaires, they should be evaluated in more diverse ethnic groups and populations from a more heterogeneous socioeconomic background.

The authors would like to acknowledge the important contribution to this report by Gordon Willis, Ph.D, in design, analysis, and writing phases of the project. Dr. Willis is a scientist in the Applied Research Program, Division of Cancer Control and Population Sciences, at the National Cancer Institute, Bethesda, MD.

This work was funded by the Centers for Disease Control and Prevention under a cooperative agreement with the Prevention Research Center at the University of South Carolina, Columbia (Special Interest Project #13-2000, U48/CCU409664)

REFERENCES

1. Ainsworth, B. E., D.R. Bassett, Jr., S.J. Strath. Comparisonof three methods for measuring the time spent in physical activity. Med.Sci. Sports Exerc. 32:457–464, 2000.
2. Ainsworth, B. E., W. L. Haskell, M. C. Whitt. Compendium of physical activities: an update of activity codes and MET intensities. Med. Sci. Sports Exerc. 32:S498–S516, 2000.
3. Bassett, Jr.,D. R., B. E. Ainsworth, A. M. Swartz, S. J. Strath, W. L. O’Brien, and G. A. King. Validity of four motion sensors in measuring moderate intensity physical activity. Med. Sci. Sports Exerc. 32:471–480, 2000.
4. Brownson, R. C., A. A. Eyler, A. C. King, Y. L. Shyu, D. R. Brown, and S. M. Homan. Reliability of information on physical activity and other chronic disease risk factors among US women aged 40 years or older. Am. J. Epidemiol. 1999;149:379–391.
5. Casperson, C. J. A collection of physical activity for health-related research: behavioral risk factor surveillance system. Med. Sci. Sports Exerc. 29:S1–S203, 1997.
6. Centers for Disease Control. Prevalence of physical activity, including lifestyle activities among adults—United States 2000–2001. MMWR Morb. Mortal Wkly. Rep. 52:764–769, 2003.
7. Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20:37–46, 1960.
8. Kriska, A. M., and C. J. Caspersen. Introduction to a collection of physical activity questionnaires. Med. Sci. Sports Exerc. 29:S5–S9, 1997.
9. Landis, J. R., and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics 33:159–174, 1977.
10. Matthews, C. E. Calibration of accelerometer output for adults. Med. Sci. Sports Exerc. (in press), 2005.
11. Matthews, C. E., B. E. Ainsworth, R. W. Thompson, and D. J. Bassett. Sources of variance in daily physical activity levels as measured by an accelerometer. Med. Sci. Sports Exerc. 34:1376–1381, 2002.
12. Matthews, C. E., K. D. DuBose, M. LaMonte, C. Tudor-Locke, and B. E. Ainsworth. Evaluation of a computerized 24-hour physical activity recall (24PAR). Med. Sci. Sports Exerc.[abstract]. 34:s41, 2002.
13. Matthews, C. E., P. S. Freedson, E. J. Stanek III. Seasonal variation of household, occupational, and leisure-time physical activity: longitudinal analyses from the seasonal variation of blood cholesterol study. Am. J. Epidemiol. 153:172–183, 2001.
14. Matthews, C. E., J. R. Hebert, P. S. Freedson. Sources of variance in daily physical activity levels in the seasonal variation of blood cholesterol study. Am. J. Epidemiol. 153:987–995, 2001.
15. Matthews, C. E., J. R. Hebert, P. S. Freedson, E. J. Stanek, I. S. Ockene, and P. A. Merriam. Comparing physical activity assessment methods in the seasonal variation of blood cholesterol levels study. Med. Sci. Sports Exerc. 32:976–984, 2000.
16. Pate, R. R., M. Pratt, S. N. Blair. Physical activity and public health. A recommendation from the Centers for Disease Control and Prevention and the American College of Sports Medicine. JAMA 273:402–407, 1995.
17. Rauh, M. J. D., M. F. Hovell, C. R. Hofstetter, J. F. Sallis, and A. Gleghorn. Reliability and validity of self-reported physical activity in Latinos. Int. J. Epidemiol. 21:966–971, 1992.
18. Romaguera, R. A., R. R. German, and D. N. Klaucke. Evaluating Public Health Surveillance. In: Principles and Practice of Public Health Surveillance, edited by Teutcle S. M, and Churchill R. E. New York: Oxford. 176–193, 2000.
19. Sallis, J. E. A collection of physical activity questionnaires for health-related research: Seven-day physical activity recall. Med. Sci. Sports. Exerc. 29:S89–S103, 1997.
20. Sallis, J. F., W. L. Haskell, P. D. Wood, S. P. Fortmann, T. Rodgers, S. N. Blair, and R. S. Paffenbarger. Physical activity assessment methodology in the Five-City Project. Am. J. Epidemiol. 121:91–106, 1985.
21. Shea, S. A. D. Stein, R. Lantigua, and C. E. Basch. Reliability of the behavioral risk factor survey in a triethnic population. Am. J. Epidemiol. 133:489–500, 1991.
22. Stein, A. D., J. M. Courval, R. I. Lederman, and S. Shea. Reproducibility of responses to telephone interviews: demographic predictors of discordance in risk factor status. Am. J. Epidemiol. 141:1097–1105, 1995.
23. Stein, A. D. R. I. Lederman, and S. Shea. The Behavioral Risk Factor Surveillance System questionnaire: its reliability in a statewide sample. Am. J. Public Health. 83:1768–1772, 1993.
24. Strath, S. J., D. R. Bassett, Jr., S. Ham, and A. M. Swartz. Assessment of physical activity by telephone interview versus objective monitoring. Med. Sci. Sports Exerc. 35:2112–2118, 2003.
25. Strath, S. J. D. R. Bassett, Jr., D. L. Thompson, and A. M. Swartz. Validity of the simultaneous heart rate-motion sensor technique for measuring energy expenditure. Med. Sci. Sports Exerc. 34:888–894, 2002.
26. Taylor, C. B., T. Coffey, K. Berra, R. Iaffaldo, K. Casey, and W. L. Haskell. Seven-day activity and self-report compared to a direct measure of physical activity. Am. J. Epidemiol. 120:818–824, 1984.
27. Tryon, W. W., and R. Williams. Fully proportional actigraphy: A new instrument. Behavior Research Methods, Instruments & Computers. 28:392–403, 1996.
28. Tudor-Locke, C., B. E. Ainsworth, R. W. Thompson, and C. E. Matthews. Comparison of pedometer and accelerometer measures of free-living physical activity. Med. Sci. Sports Exerc. 34:2045–2051, 2002.
29. Willis, G. B. Cognitive Interviewing and Questionnaire Design: A Training Manual (Working Paper Series, No. 7). Working Paper 7:1994.
Keywords:

EXERCISE; ACCELEROMETER; 24-H RECALL; SURVEILLANCE; RELIABILITY; VALIDITY

©2005The American College of Sports Medicine