Share this article on:

Evaluation of Physical Activity Measures Used in Middle-Aged Women


Medicine & Science in Sports & Exercise: July 2009 - Volume 41 - Issue 7 - pp 1403-1412
doi: 10.1249/MSS.0b013e31819b2482
Basic Sciences

Purpose: To evaluate the reliability and validity of five commonly used physical activity questionnaires (PAQ) in women aged 45-65 yr with varying physical activity (PA) levels.

Methods: Data were obtained from the Evaluation of Physical Activity Measures in Middle-aged Women (PAW) Study and included 66 women (aged 52.6 ± 5.4 yr). PAQ evaluated include Modifiable Activity Questionnaire (past week and past month version), Nurses' Health Study PAQ, Active Australia Survey, and Women's Health Initiative PAQ. Intraclass correlation coefficients (ICC) between administrations of the PAQ were used to assess test-retest reliability. Spearman rank-order correlation coefficients were used to examine the associations of PA and physical fitness data with PAQ summary estimates.

Results: Accelerometer-determined median (25th, 75th percentiles) times (min·d−1) spent in moderate-lifestyle [760-1951 counts (ct)], moderate-walk (1952-5724 ct), vigorous (≥5725 ct), and combined moderate and vigorous PA (MVPA ≥ 1952 ct) during the 35 d of observation were 66.0 (51.2, 81.3), 23.1 (14.1, 34.6), 0.4 (0.0, 2.3), and 24.3 (15.9, 41.6) min, respectively. The PAQ were shown to be reproducible and relatively stable over time (ICC = 0.32 to 0.91) and were associated with total counts per day (ct·d−1, 0.46 to 0.60, all P < 0.001), and most were associated with many facets of physical fitness, including cardiorespiratory fitness (0.36 to 0.46, P < 0.01), body composition (−0.27 to −0.34, P < 0.05), and muscular fatigue (−0.25 to −0.44, P < 0.05).

Conclusions: The PAQ evaluated in this study were shown to be reliable and associated with PA and physical fitness measures. Current findings support the utility of these PAQ for PA assessment in research studies of middle-aged women.

1Department of Health Promotion, Social & Behavioral Health, University of Nebraska Medical Center, Omaha, NE; 2Cancer Prevention Fellowship Program, National Cancer Institute, Bethesda, MD; 3Healthy Lifestyles Research Center, Program in Exercise and Wellness, Arizona State University, College of Nursing and Health Innovation, Mesa, AZ; and 4Wellness, Health, Nutrition, and Physical Education Division, Chandler Gilbert Community College, Mesa, AZ

Address for correspondence: Kelley Pettee Gabriel, Ph.D., College of Public Health, Department of Health Promotion, Social & Behavioral Health, University of Nebraska Medical Center, 986075 Nebraska Medical Center, Omaha, NE 68198-6075; E-mail:

Submitted for publication October 2008.

Accepted for publication January 2009.

Physical activity (PA) is a complex behavior and selecting the proper assessment tool is a challenging task for researchers. In some settings, objective measures, such as accelerometers, are optimal when assessing PA levels. However, in research settings where the use of objective measures may not be practical, PA questionnaires (PAQ) may be more appropriate to measure participant PA levels. PAQ have been developed for use in different population groups, each with varying levels of reliability and validity (17). However, PAQ developed and validated for use inone specific population subgroup or study design (e.g., surveillance vs intervention) are often used in completely different populations or settings to assess PA levels (2,16). When using or altering an existing survey measure for use in a different population, it is imperative to consider the reliability and validity of the assessment tool for the population or setting of interest (22). Often, a measure deemed accurate in one population or setting may not be appropriate to measure PA in another (i.e., using questionnaires developed for adults in studies with children or adolescents).

Historically, PAQ were designed for research studies involving men and primarily measured occupational or sport/leisure activities (25,28). Within the past decade, PAQ have been developed or modified for use in women (34). Similarly, PAQ have been designed (24,36) or adopted (5,19) specifically for use in studies of middle-aged women. However, the methods used to evaluate these questionnaires have differed for each measure and may have included comparisons with a study-defined criterion measure (i.e., PA diary or recall, motion detection, or fitness parameter). Newer and more sophisticated objective devices (i.e., accelerometers) that provide an improved measure for validation currently exist. However, the simultaneous comparison of PAQ used in middle-aged women has not been previously evaluated, which would allow for more direct comparisons of questionnaires that were specifically designed to assess PA levels in this population.

With increasing interest regarding the role of PA on reducing morbidity and mortality among middle-aged and older women, it is imperative for researchers to have access to information regarding the survey instrument's reproducibility in controlled settings and convergent validity against meaningful objective measures of PA and physical fitness. Failure to expose a PAQ to a rigorous examination of the instrument's psychometric properties within the population of interest or setting may result in an inaccurate assessment of PA and increased risk for nondifferential misclassification. In most instances, nondifferential misclassification will result in a reduction in the overall strength of association between physical activity and the health outcome of interest leading to spurious conclusions, including missed associations. Given the interest in assessing PA levels in middle-aged women, there is need to examine the accuracy of PA measures used in health studies using similar evaluation measures. Therefore, the purposes of the current report are to evaluate the test-retest reliability and convergent validity of five PAQ that are commonly used in larger health studies involving middle-aged women.

Back to Top | Article Outline


The Evaluation of Physical Activity Measures in Middle-Aged Women (PAW) study was designed to evaluate the psychometric properties of six PA measures (i.e., five questionnaires and a walking-based performance measure) used in epidemiological studies of PA and health. The protocol used in the PAW Study was based on the successful design of an earlier study, the Study of Activity, Fitness, and Exercise (SAFE) (17) and summarized in Table 1. Women between the ages of 45 and 65 yr were recruited for the PAW Study from the Greater Phoenix, Arizona metropolitan area, with most participants obtained from the Arizona State University (Polytechnic) campus or from newspaper advertisements. Eligibility criteria included the ability to walk at least one block without the use of an assistive device (e.g., cane, walker, or wheelchair), ability to participate in a 6-wk study with no plans to move from the Phoenix area during the study protocol, ability to communicate in English (verbal and written), competence to understand and sign the written informed consent, willingness to wear activity monitors throughout the study duration, and absence of chronic or acute conditions that would affect the ability to be physically active. All participants provided written informed consent, and all protocols were approved by the institutional review board at Arizona State University. Data were collected from August 2007 (mid month) to May 2008 (early month) to avoid possible changes in PA due to extreme summer heat. Participants completed six consecutive weekly visits, with each visit lasting approximately 30-60 min.

Seventy-seven women were screened, and 66 (85.7%) enrolled into the study. Among those who were not enrolled (n = 11), reasons included lack of time (n = 9), family obligations (n = 1), and preexisting health condition (n = 1) that precluded participation in the study. Among the 66 women enrolled in the study, 39 (59%) were classified as having a moderate risk for cardiovascular disease on the basis of the American College of Sports Medicine (ACSM) and the American Heart Association (AHA) criteria (13), whereas the remaining 27 (41%) were considered low risk. Finally, because the study was designed to evaluate PAQ inmiddle-aged women, subjects were classified according to their menopausal status. According to the Stages of Reproductive Aging Workshop (STRAW) criteria (33), 83.3% of study participants were either perimenopausal (21.2%) or postmenopausal (62.1%).

Back to Top | Article Outline

Physical Activity Questionnaires

Table 2 presents the PA questionnaires that were evaluated in the PAW Study. PAQ were included in this study if they met two main criteria: 1) widely used and 2) minimal available information relating to the instrument's reproducibility or convergent validity with objective measures of physical activity.

Back to Top | Article Outline

Modifiable Activity Questionnaire.

The Modifiable Activity Questionnaire (MAQ) assesses leisure physical activities during the past month (PMMAQ) and the past week (PWMAQ) and extreme levels of inactivity due to disability. The PMMAQ also measures historical competitive sports participation. Physical activity levels are calculated as the product of the duration and frequency of each activity (h·wk−1), weighted by an estimate of the metabolic equivalent (MET) of that activity (1), and summed for all activities performed. Data are expressed as MET hours per week (MET·h·wk−1) (19) (Table 2).

Back to Top | Article Outline

Nurses' Health Study II Physical Activity Questionnaire.

The Nurses' Health Study II (NHS) PAQ, a past-week recall, included information on 10 moderate and vigorous leisure time physical activities, usual walking pace, number of flights of stairs climbed daily, and four sedentary activities. Physical activity levels are calculated as the product of the duration and frequency of each leisure time physical activity (h·wk−1), weighted by an estimate of the MET of the activity (1), and summed to give a total activity score in MET hours per week (36) (Table 2).

Back to Top | Article Outline

Active Australia Survey.

The Active Australia Survey measures leisure physical activities (i.e., brisk walking, moderate and vigorous leisure activity, and vigorous housework or gardening) during the past week and sitting time during a usual week and weekend day. Physical activity levels are calculated as the duration of leisure physical activities during the past week (min·wk−1) (5,29) (Table 2).

Back to Top | Article Outline

Women's Health Initiative Physical Activity Questionnaire.

The Women's Health Initiative (WHI) PAQ is composed of questions that measure the frequency and duration of four walking speeds (i.e., <2 mph = 2.5 METs, 2-3 mph = 3.0 METs, 3-4 mph = 4.0 METs, and <4 mph = 4.5 METs) and three types of activity classified by intensity (i.e., mild, moderate, strenuous, or very hard) during the past week. Physical activity levels are calculated as the product of the duration and frequency of each activity level (h·wk−1), weighted by a standard estimate of the MET level for each activity intensity level (i.e., mild = 3 METs, moderate = 4 METs, and vigorous = 7 METs), and summed across walking and all activity intensity levels (15,24) (Table 2).

Each questionnaire was administered twice, and the test-retest reliability was established during 1-wk (i.e., PWMAQ, NHS, Active Australia. and WHI PAQ) to 1-month (i.e., PMMAQ) intervals depending on the recall time frame of the instrument.

Back to Top | Article Outline

Objective Measures of Physical Activity

The objective PA measures used in the PAW Study are summarized in Table 3 and included the ActiGraph GT1M accelerometer (Pensacola, FL) and the Yamax Digiwalker SW-200 pedometer (Yamax USA, San Antonio, TX). The ActiGraph is a small (3.8 × 3.7 × 1.8 cm), uniaxial piezoelectric accelerometer that is typically worn at the waist, which measures acceleration in the vertical plane. Data output from the ActiGraph accelerometer are activity counts, which quantify the amplitude and frequency of detected accelerations. Activity counts are summed over a user-specified time interval (i.e., epoch). The sum of the activity counts in a given epoch is related to activity intensity and can be categorized by intensity (e.g., light, moderate, vigorous) on the basis of validated activity count cut points (23). The Digiwalker is a small (5.2 × 3.9 × 1.9 cm) electronic pedometer that is worn at the waist on the midline of the thigh. The Yamax has a horizontal, spring-suspended lever arm mechanism that moves up and down with vertical accelerations of the hip (6). When accelerations are ≥0.35g, the lever arm makes an electrical contact and one event (i.e., step) is recorded and displayed on the digital display screen (6). Technical specifications, as well as the reliability and validity of the ActiGraph (23,27) and Digiwalker (6), have been described previously.

The participants wore the ActiGraph (dominant hip) and Digiwalker (nondominant hip) everyday and were asked to record the time at which they put on the monitors in the morning, the time they took off the monitors at night, and the total number of accumulated steps (pedometer only) in a PA diary provided by study staff. The participant was instructed to reset the pedometer to zero every morning. At the end of each week, the participant returned the PA diary to study staff and was given another diary to complete during the following week. The number of pedometer steps recorded in the PA diary each week was averaged to obtain a 7-d daily average. Data from the accelerometer were downloaded and processed weekly. Downloaded data were screened for wear time using previously reported methods consistent with publicly available SAS code developed to process the 2003-2004 NHANES accelerometer-determined PA data (26). Each day, a minimum of 10 h of wear time was required for data to be considered for further use in calculating reported accelerometer-determined outcome variables. Average total activity counts per day were calculated using summed daily counts detected during wear periods and were compared with the PAQ summary estimate as a measure of total PA. Time (i.e., min) spent in moderate- and vigorous-intensity PA was estimated using the cut-points by Freedson et al. (11), and moderate-lifestyle intensity activities were estimated using cut-points proposed by Matthews (23). The resulting activity count ranges for light [100-759 counts per minute (ct·min−1)], moderate-lifestyle (760-1951 ct·min−1), moderate-walk (1952-5724 ct·min−1), and vigorous intensity (≥5725 ct·min−1) were computed for each day with ≥10 h of wear time. Two distinct moderate- to vigorous-PA (MVPA) categories were computed including lifestyle-MVPA and walk-MVPA, with thresholds of ≥760 and ≥1952 ct·min−1, respectively (11,23). Weekly summary accelerometer- and pedometer-determined physical activity estimates were compiled for all participants with at least four valid days of 10 h or more of wear time.

Back to Top | Article Outline

Objective Measures of Physical Fitness

The objective measures of physical fitness that were included are summarized in Table 3. Cardiorespiratory fitness, body composition, flexibility, balance, and muscular strength and endurance measures were included in the current report to examine the sensitivity of the PAQ to components of physical fitness. All women completed a single-stage submaximal treadmill walking test established by Ebbeling et al. (9) to estimate V˙O2max (R2 = 0.86, SEE = 4.85 mL·kg−1·min−1). Briefly, the submaximal protocol consisted of two 5-min stages. After completing a 2- to 3-min warm-up, the treadmill speed was gradually increased until the participant selected a brisk but maintainable speed at a pace that corresponded to 50%-70% of her age-predicted max HR. During stage 1, the women walked at the selected walking pace at 0% grade. After the completion of the first stage, walking speed was maintained while the treadmill incline increased to 5%. The prediction equation used to estimate V˙O2 was based on age, steady state HR, and walking speed (mph). Steady state HR was defined as a HR within ±5 bpm during the final 2 min of stage 2.

In women classified as low cardiovascular health risk (n=27), maximal oxygen uptake (V˙O2max, mL·kg−1·min−1) was measured using open circuit indirect calorimetry (AEI Technologies, Inc., Naperville, IL) during a treadmill maximal-graded exercise test (i.e., modified Balke protocol [32]). V˙O2max values were excluded from the analyses if the participant reported use of blood pressure medication, as they affect HR response to exercise (n = 10 for the submaximal treadmill test and n = 1 for the maximal treadmill test), and those who completed the maximal-graded exercise test, but did not meet ACSM criteria (13,14) for the achievement of V˙O2max (n = 4).

Anthropometric measures included body composition, average waist circumference, and height and weight. Body mass index (BMI, kg·m−2) was calculated from height (m) and weight (kg) measured with a stadiometer and a calibrated balance beam scale, respectively. Body composition was measured using percent (%) body fatness that was obtained through bioelectrical impedance (Tanita Body Composition Analyzer, TBF-300A; Tanita, Arlington Heights, IL). Lower back and hamstring flexibility was assessed using the YMCA sit-and-reach test protocol (3), lower extremity performance (i.e., composite scored which includes static balance, gait, and timed chair stands) was assessed using the short physical performance battery (SPPB) (12), and static balance was measured with the Frailty and Injuries: Cooperative Studies of Intervention Techniques (FICSIT-4) (30). Quadricep and hamstring muscular fitness, including both strength and endurance, was measured bilaterally using an isokinetic dynamometer (HUMAC NORM; Computer Sports Medicine, Inc., Stoughton, MA). Muscular strength was assessed at 60°·s−1, and endurance was obtained using the fatigue index score at 240°·s−1, expressed as peak power minus minimum power, divided by the time interval in seconds between peak power and minimum power (i.e., inverse relationship between fatigue index score and muscular endurance) (4).

Back to Top | Article Outline

Statistical Methods

Univariate analyses were conducted on measured parameters including demographics, anthropometric measures, subjective and objective PA levels, body composition, flexibility, balance, and cardiorespiratory and muscular fitness. All variables were assessed for normality. Normally distributed variables were reported as mean and SD, non-normally distributed variables were reported as medians with 25th and 75th percentiles, and proportions were noted for categorical variables. The test-retest reliability of the PAQ was assessed using intraclass correlation coefficients (ICC). The strength of agreement for the ICC ranges was interpreted as follows: <0.00, poor; 0.00-0.20, slight; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, substantial; and 0.81-1.00, almost perfect (20). Spearman rank-order correlation coefficients were used to determine the association between the estimates from the first administration of the PA survey and the direct measures of PA and fitness. Partial correlations were used to further explore whether the results warranted further adjustment by age and BMI. All correlations between the PAQ and objective physical activity measures were based on comparable reporting and data collection intervals (i.e., accelerometer data were matched to fit the recall period of each individual PAQ). For all PAQ but the PMMAQ, the first administration of the survey was used when examining correlations with objective physical activity and physical fitness measures. ICC were estimated using SPSS 15.0 (Chicago, IL), and all remaining statistical analyses were conducted using SAS 9.1 (Cary, NC).

Back to Top | Article Outline


Descriptive Statistics

The characteristics of the study participants are presented in Table 4. The mean age of the participants was 52.6 ± 5.4 yr (13.6% were ≥60 yr), 81.8% were non-Hispanic white, 51.5% reported having a 4-yr college degree, and 9.1% were current smokers. The mean BMI was 26.8 ± 5.1 and the prevalence of overweight (BMI 25-29.9 kg·m−2) and obese (BMI ≥ 30 kg·m−2) was 30.3% and 24.2%, respectively. Average waist circumference was 85.3 ± 12.6 cm and the prevalence of abdominal obesity (waist circumference ≥88 cm) was 37.9% (Table 4).

Table 4 also provides a description of the subjective and objective PA measures and physical fitness variables. Although the PAQ assessed activity performed during different weeks, the summary estimates in PA levels between measures were similar, with the exception of the WHI PAQ, which yielded the lowest PA estimate. Accelerometer-determined median time spent at varying intensity levels was highest for combined lifestyle MVPA and lowest for vigorous intensity PA during the 35 d of observation. The average number of days (≥4 d) that the participants wore the accelerometer was 6.3 ± 0.7 d·wk−1 (or 30.7 ± 4.8 d during 35 d of observation) and 14.4 ± 1.1 h·d−1 (≥10 h). The average number of daily pedometer steps taken by participants was 9178.1 ± 4046.9.

The PAW Study protocol incorporated measures representing all five domains of physical fitness to examine whether reported physical activity was associated with objective measures of physical fitness. When using age-based criteria reported by the ACSM, estimated V˙O2max levels (n = 56) suggest that 18.2% of women aged 45-49 yr (ACSM defined category 40-49 yr; n = 22), 7.4% aged 50-59 yr (n = 27), and 0% of women 60 yr and older (n = 7) had below average cardiorespiratory fitness (3). The percentage of participants with below average cardiorespiratory fitness increased when measured V˙O2max levels (n = 22) were examined [i.e., 66.7% of women between the ages of 45 and 49 yr (n = 15) and 71.4% of women between 50 and 55 yr (n = 7)] (3). On the basis of percent body fat from bioelectrical impedance, 78.3% of PAW Study participants aged 45-49 yr (n = 23), 51.5% aged 50-59 yr (n = 33), and 0% aged 60 yr or older (n = 9) were considered below average or overly fat (3). Regarding flexibility, 22.2% of women between the ages of 46 and 55 yr (n = 18) and 34.1% aged 56-65 yr (n = 44) were considered having below average low back and hamstring flexibility (3). Eighty percent of the study participants obtained the maximum score possible on the SPPB and 33.3% achieved the highest score on the FICSIT-4, with 98.5% of participants having a score of 20 or higher. Lower extremity muscle strength and endurance was greater on the right side when compared with the left.

Back to Top | Article Outline


The test-retest ICC with 95% confidence limits for the PAQ is shown in Table 5. The test-retest reliability for the PMMAQ was 0.64, which suggests moderate reproducibility during the 4-wk interval. Test-retest reliability coefficients ranged from 0.74 to 0.91 in past-week questionnaires that were either interviewer- or self-administered (i.e., PWMAQ and WHI PAQ, respectively) during the visit, suggesting considerable reproducibility from 1 wk to the next. PAQ that were administered through the mail and completed at home before the next visit (i.e., NHS PAQ and Active Australia) yielded lower correlations when compared with those that were administered in-person during the visit (Table 5).

Back to Top | Article Outline


PAQ versus objective PA measures.

All PAQ were significantly correlated with average daily pedometer steps (0.43-0.59, all P ≤ 0.001). All PAQ were related with total accelerometer counts per day (ct·d−1; 0.46 to 0.60, all P < 0.001), with the strongest associations observed with the interviewer-administered PAQ (i.e., PMMAQ and PWMAQ). Although all PAQ were associated with moderate-walk activity minutes per day (0.40 to 0.58, all P < 0.001), the strongest associations were shown with the interviewer-administered PAQ when compared with those that are self-administered. Further, only the interviewer-administered PAQ were related to moderate-lifestyle activity minutes per day with observed correlation coefficients of 0.26 and 0.25 for the PMMAQ and PWMAQ, respectively (both P < 0.05). All PAQ, except the Active Australia, were associated with vigorous activity (0.34 to 0.45, all P < 0.001), with the strongest association observed with WHI PAQ. All PAQ were related with combined lifestyle- and walk-MVPA minutes per day (0.31 to 0.55, all P < 0.01, and 0.45 to 0.60, all P < 0.001, respectively); however, the highest correlations were observed with the interviewer-administered PAQ.

Back to Top | Article Outline

PAQ versus physical fitness measures.

For cardiorespiratory fitness, only the NHS PAQ was significantly related to measured V˙O2max (0.54, P < 0.05), and all PAQ, except the Active Australia Survey, were significantly correlated with estimated V˙O2max (0.33 to 0.49, all P < 0.05). All PAQ were inversely related to percent body fat (−0.25 to −0.35, all P < 0.05) and were positively associated with flexibility (0.27-0.39, all P < 0.05) except the Active Australia Survey. None of the PAQ summary estimates were significantly related to lower leg performance or static balance, as measured by the SPPB and FICSIT-4, or to quadricep and hamstring muscle strength. However, all PAQ were inversely related to right leg muscle fatigue (−0.25 to −0.43, all P < 0.05), with the strongest association observed with the PMMAQ. No PAQ were associated with left quadriceps and hamstring strength, and only the WHI PAQ was significantly associated with left leg muscle fatigue (−0.24, P < 0.05). In general, after adjustment for age and BMI, the relationships between the estimates from the first administration of the PA survey and the direct measures of PA and fitness did not differ (Table 6).

Back to Top | Article Outline


In the current investigation, the test-retest reliability and convergent validity of PAQ that are commonly used in research studies of middle-aged women were examined. Findings suggest that although all PAQ evaluated in the current report were reliable, the reproducibility of PAQ administered through the mail yielded lower correlations than those administered in-person (including self- and interviewer-administered). This is an important observation that could potentially have several implications for health research studies that use mailed PAQ to measure participant PA levels. Limited reproducibility of a PA instrument may lead to reduced precision when measuring change in physical activity levels (i.e., prospective studies) or adherence to physical activity goals (i.e., intervention studies). The PAQ evaluated in the current report were all associated with raw (i.e., total and average counts per day) and derived moderate- to vigorous-intensity PA (i.e., min·d−1), which demonstrates the convergent validity of the PAQ with objective PA measures. The lack of association that was shown between PAQ summary estimates and accelerometer-determined light-intensity activity is not surprising given that most PAQ evaluated in the current report did not require participants to recall lower-intensity activities. To our knowledge, no previous study has concurrently evaluated PAQ that were developed or modified for use in middle-aged women and no prior work has evaluated multiple, population-specific PAQ using the same criterion for reliability and validity.

In the current study, PAQ summary estimates were significantly related to cardiorespiratory fitness and muscle endurance. This finding is not surprising given that participation in many activities included in the surveys (i.e., walking) correspond to these facets of physical fitness. Further, the significant correlations with percent body fat are consistent with the literature supporting an inverse relationship between physical activity and obesity (10). The relationship with flexibility is a novel finding but not unexpected because of the nature of the physical activities that were included or used as cues during recall (i.e., calisthenics, yoga). The PAQ evaluated in the current report were not significantly associated with static balance and muscle strength. One possible explanation for the lack of a significant association is that the SPPB and FICSIT-4 are used in older adults to assess functional status. In the PAW Study, a ceiling effect was observed with both measures, which may have contributed to the null findings. The nonsignificant correlations that were shown with muscular strength suggest that PAQ developed for use in middle-aged women are not sensitive to differentiate lower extremity muscular strength. Regardless, these findings have important implications for researchers who may be interested in examining balance or muscular strength outcomes. Future studies that include age-appropriate measures that are responsive to differences in balance and lower extremity muscle strength are needed.

Back to Top | Article Outline

Modifiable Activity Questionnaire.

The MAQ was originally developed using a past-year recall time frame and has been shown to be both reliable (19) and valid (19,31) in a wide variety of populations. Before the current report, the accuracy of MAQ versions using different recall time frames had not been evaluated. Much like the past-year version, findings from the current report suggest that both the PWMAQ and PMMAQ demonstrate good stability and convergent validity with accelerometer-determined PA levels.

Back to Top | Article Outline

Nurses' Health Study Physical Activity Questionnaire.

The NHS PAQ was developed for use in the NHS study cohort and was also used in the Women's Health Study. In a previous evaluation study, the 2-yr test-retest reproducibility coefficient for the NHS PAQ was 0.59 (36). In the current report, the 1-wk test-retest reliability was slightly lower than in past evaluations, which may be because of differences in statistical procedures used to determine reliability (i.e., Pearson correlations adjusted for within-person measurement vs intraclass correlation coefficients that were used in the current report). The previous evaluation study also examined the convergent validity of the NHS PAQ with a PA diary showing correlations of 0.62 (36), whereas comparisons with accelerometer data in the current study yielded lower correlations. These findings are not surprising given that the previous report compared the NHS PAQ summary estimates with a PA diary, which may be subject to bias and participant reactivity.

Back to Top | Article Outline

Active Australia Survey.

Women's Health Australia used the Active Australia Survey, which was designed to measure participation in leisure time PA in Australians (7). In a separate evaluation study, which was not limited to middle-aged women, the 1-d test-retest reliability of the Active Australia Survey was 0.64 (5); a higher correlation (i.e., 1-wk test-retest ICC = 0.32) than what was observed in the current report, which may be explained by the fact that participants recalled activities on at least 6 d that were common to both surveys. To our knowledge, before the current investigation, the convergent validity of the Active Australia Survey against objective measures of PA and physical fitness had not been examined.

Back to Top | Article Outline

Women's Health Initiative Physical Activity Questionnaire.

The WHI used a PA questionnaire that was designed specifically for use in the WHI study, including both the observational study and randomized clinical trial component. In a prior study (21), the test-retest reliability of the WHI PAQ, administered approximately 3 months apart, ranged from 0.53 to 0.72 using a weighted kappa statistic and for ICC = 0.77 total physical activity-slightly lower than what was observed in the current study. However, the findings of the evaluation reported by McTiernan et al. (24) were from a substudy that was designed for the purpose of evaluating the psychometric properties of the PAQ. Also, in the current report, the first and second administrations of the WHI PAQ were conducted 1 wk apart. Similar to the MAQ and Active Australia Survey, to our knowledge, the validity of the WHI PAQ had not been examined before the current report (24).

Researchers are often faced with the challenging task of weighing the PAQ's relevance to the population versus overall accuracy when designing research studies. As shown by the PAQ that were evaluated in the current report, subjective measures may differ by the PA components (i.e., sports and leisure, walking, housework, and gardening) that are surveyed. Whereas some PAQ are activity-specific (i.e., MAQ, NHS), others use intensity categories with representative activities (i.e., WHI, Active Australia) as cues during recall. PA survey measures also differ by mode of administration (i.e., interviewer- vs self-administered). For PAQ that are designed to measure a variety of activities during a longer recall time frame, more precise estimates may be obtained using instruments that are interviewer-administered; however, interviewer-administered questionnaires may also lead to larger staff burden. In contrast, mailed surveys may be more prone to participant-level mistakes (e.g., skipped questions and incomplete data), which could have led to the lower reliability correlations that were observed in the current report. Researchers using mailed surveys should consider including strategies to reduce participant-level errors (i.e., immediate and thorough review by a trained staff member when PAQ is returned). Researchers should select a PAQ that elicits information on the PA components or activities that are most relevant to the desired outcome of interest. It has been suggested that middle-aged women primarily engage in lower-intensity activities. However, it is unclear whether participation in lower-intensity PA is related to health-related outcomes that are prevalent among middle-aged women. Assuming a beneficial relationship, lower-intensity activities are generally harder for respondents to accurately recall (8); therefore, innovative strategies are needed to accurately measure activities across the PA intensity spectrum (18).

When interpreting the findings, several limitations need to be considered. The PAQ evaluated in the present study are not inclusive of all PAQ that are used in middle-aged women; therefore, other instruments that are used in this population should be considered. Also, depending on which day of the week the mailed survey was completed by the participant, the objective PA data may not have directly corresponded to the 7-d reference period of the survey. However, it is important to note that PAW Study participants were asked not to change their physical activity levels during the study and when correlations with mailed surveys were reanalyzed using accelerometer-determined PA averaged across the 6 wk of the study (i.e., 35 d), findings were similar (data not shown). Reproducibility, established during a short period may also be subject to a learning effect; however, on the basis of the range of observed reliability correlations (i.e., ICC = 0.32 to 0.91), it is difficult to determine to what extent this affected the results. Finally, it is difficult to make direct comparisons between the current report and previous findings because this study was the first to simultaneously evaluate these PAQ using the same criteria for reliability and validity.

The study population for the current study was a convenience sample of mostly highly educated, non-Hispanic white women, which may limit the generalizability of the findings to more diverse populations of women. Further, 98.5% of PAW Study participants were compliant with accelerometer wear time rules (i.e., ≥10 h on ≥4 d). This high compliance rate suggests that PAW Study participants were highly motivated, which may also limit overall generalizability; however, it provided an ideal setting with which to evaluate the convergent validity of the PAQ. Furthermore, the average number of days (>30 d) and hours per day (>14 h) that PAW Study participants wore the accelerometer indicate that any intraindividual variability in the objective measures due to the intensive assessment was minimized. Less than half (40.9%) of the total study population were classified as low risk according to ACSM risk stratification criteria (3), limiting our ability to perform maximal treadmill-graded exercise tests on all women in the study because of safety considerations. Further, only 23 (1additional participant was excluded for blood pressure medication use) individuals of the 27 women taking the V˙O2max had end point criteria needed to classify true V˙O2max (i.e., plateau in V˙O2 with an increase in workload; respiratory exchange ratio ≥1.12; maximum HR ± 10 beats of age-predicted maximal HR; and ratings of perceived exertion >17) (3). As such, the study was limited in the ability to examine associations between PAQ and measured V˙O2max. Nevertheless, significant relationships were observed between the NHS PAQ and measured V˙O2max, and all PAQ, except the Active Australia Survey, were associated with estimated V˙O2max. Finally, the use of an accelerometer to validate PAQ is limited in that waist-worn, uniaxial accelerometers are less accurate when assessing nonambulatory movements (i.e., cycling, weightlifting) and water activities (i.e., swimming, water aerobics) (35).

Recent research efforts have been designed to investigate the relationship between PA and health outcomes that specifically target middle-aged women. This study provides new information regarding the test-retest reliability and convergent validity of multiple, consecutively administered PAQ in a population of healthy, middle-aged women. The findings from the current report support the utility of these PAQ for to assess PA in middle-aged women and place emphasis on the need for researchers to consider the psychometric properties of the PAQ, characteristics of the study population, and associated study-related logistical concerns when selecting PA assessment tool(s).

The authors thank the 66 dedicated PAW Study participants and the contributions of Rebecca Rankin and Justin Leonard. This research was funded by the American College of Sports Medicine, 2007 Paffenbarger-Blair Endowment for Epidemiological Research on Physical Activity that was awarded to Dr. Kelley Pettee Gabriel while working as a postdoctoral research associate at Arizona State University. The results of the present study do not constitute endorsement by the American College of Sports Medicine.

Back to Top | Article Outline


1. Ainsworth BE, Haskell WL, Whitt MC, et al. Compendium of physical activities: an update of activity codes and MET intensities. Med Sci Sports Exerc. 2000;32(9 suppl):S498-516.
2. Ainsworth BE, Macera CA, Jones DA, et al. Comparison of the 2001 BRFSS and the IPAQ Physical Activity Questionnaires. Med Sci Sports Exerc. 2006;38(9):1584-92.
3. Armstrong L, Balady GJ, Berry MJ, et al. ACSM's Guidelines for Exercise Testing and Prescription. 7th ed. Philadelphia (PA): Lippincott Williams & Wilkins; 2006. p. 72; 115-29.
4. Brown LE, Weir JP. ASEP Procedures Recommendation I: Accurate assessment of muscular strength and power. J Exerc Physiol Online. 2001;4(3):1-21.
5. Brown WJ, Trost SG, Bauman A, et al. Test-retest reliability of four physical activity measures used in population surveys. J Sci Med Sport. 2004;7(2):205-15.
6. Crouter SE, Schneider PL, Bassett DR Jr. Spring-levered versus piezo-electric pedometer accuracy in overweight and obese adults. Med Sci Sports Exerc. 2005;37(10):1673-9.
7. Dixon T, Searles A. The Active Australia Survey: A Guide and Manual for Implementation, Analyses, and Reporting [Internet]; [cited 2003 Sep 4]. Available from:
8. Durante R, Ainsworth BE. The recall of physical activity: using a cognitive model of the question-answering process. Med Sci Sports Exerc. 1996;28(10):1282-91.
9. Ebbeling CB, Ward A, Puleo EM, et al. Development of a single-stage submaximal treadmill walking test. Med Sci Sports Exerc. 1991;23(8):966-73.
10. Fogelholm M, Kukkonen-Harjula K. Does physical activity prevent weight gain-a systematic review. Obes Rev. 2000;1(2):95-111.
11. Freedson PS, Melanson E, Sirard J. Calibration of the Computer Science and Applications, Inc. accelerometer. Med Sci Sports Exerc. 1998;30(5):777-81.
12. Guralnik JM, Simonsick EM, Ferrucci L, et al. A short physical performance battery assessing lower extremity function: association with self-reported disability and prediction of mortality and nursing home admission. J Gerontol. 1994;49(2):M85-94.
13. Haskell WL, Lee IM, Pate RR, et al. Physical activity and public health: updated recommendation for adults from the American College of Sports Medicine and the American Heart Association. Med Sci Sports Exerc. 2007;39(8):1423-34.
14. Howley ET, Bassett DR Jr, Welch HG. Criteria for maximal oxygen uptake: review and commentary. Med Sci Sports Exerc. 1995;27(9):1292-301.
15. Hsia J, Wu L, Allen C, et al. Physical activity and diabetes risk in postmenopausal women. Am J Prev Med. 2005;28(1):19-25.
16. IPAQ Core Group International Physical Activity Questionnaire: Cultural Adaptation [Internet]; [cited Sep 15]. Available from: Accessed September 15, 2008.
17. Jacobs DR Jr, Ainsworth BE, Hartman TJ, et al. A simultaneous evaluation of 10 commonly used physical activity questionnaires. Med Sci Sports Exerc. 1993;25(1):81-91.
18. Kriska AM, Caspersen CJ. Introduction to a collection of physical activity questionnaires. Med Sci Sports Exerc. 1997;29(6):S5-9.
19. Kriska AM, Knowler WC, LaPorte RE, et al. Development of questionnaire to examine relationship of physical activity and diabetes in Pima Indians. Diabetes Care. 1990;13(4):401-11.
20. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-74.
21. Langer RD, White E, Lewis CE, et al. The Women's Health Initiative Observational Study: baseline characteristics of participants and reliability of baseline measures. Ann Epidemiol. 2003;13(9 Suppl):S107-21.
22. Masse LC, Ainsworth BE, Tortolero S, et al. Measuring physical activity in midlife, older, and minority women: issues from an expert panel. J Womens Health. 1998;7(1):57-67.
23. Matthews CE. Calibration of accelerometer output for adults. Med Sci Sports Exerc. 2005;37(11 suppl):S512-22.
24. McTiernan A, Kooperberg C, White E, et al. Recreational physical activity and the risk of breast cancer in postmenopausal women: the Women's Health Initiative Cohort Study. JAMA. 2003;290(10):1331-6.
25. Morris J, Heady JA, Raffle PAB, et al. Coronary heart disease and physical activity of work. Lancet. 1953;265:1053-7, 111-20.
26. National Cancer Institute [Web site]. Risk Factor Monitoring and Methods; [cited Aug 28]. Available from: Accessed in 2008.
27. Nichols JF, Morgan CG, Chabot LE, et al. Assessment of physical activity with the Computer Science and Applications, Inc., accelerometer: laboratory versus field validation. Res Q Exerc Sport. 2000;71(1):36-43.
28. Paffenbarger RS, Hale WE. Work activity and coronary heart disease mortality. N Engl J Med. 1975;292:545-50.
29. Government of South Australia. Physical Activity Survey Methodology; [cited 2005 Jan 18]. Available from: Accessed January 18, 2008.
30. Rossiter-Fornoff JE, Wolf SL, Wolfson LI, et al. A cross-sectional validation study of the FICSIT common data base static balance measures. Frailty and Injuries: Cooperative Studies of Intervention Techniques. J Gerontol A Biol Sci Med Sci. 1995;50(6):M291-7.
31. Schulz LO HI, Smith CJ, Kriska AM, Ravussin E. Energy intake and physical activity in Pima Indians: comparison with energy expenditure measured by doubly-labeled water. Obes Res. 1994;2541-8.
32. Sidney S, Haskell WL, Crow R, et al. Symptom-limited graded treadmill exercise testing in young adults in the CARDIA study. Med Sci Sports Exerc. 1992;24(2):177-83.
33. Soules MR, Sherman S, Parrott E, et al. Executive summary: Stages of Reproductive Aging Workshop (STRAW). Climacteric. 2001;4(4):267-72.
34. Sternfeld B, Ainsworth BE, Quesenberry CP. Physical activity patterns in a diverse population of women. Prev Med. 1999;28(3):313-23.
35. Troiano RP. Translating accelerometer counts into energy expenditure: advancing the quest. J Appl Physiol. 2006;100(4):1107-8.
36. Wolf AM, Hunter DJ, Colditz GA, et al. Reproducibility and validity of a self-administered physical activity questionnaire. Int J Epidemiol. 1994;23(5):991-9.


©2009The American College of Sports Medicine