Understanding physical activity (PA) and its on impact on health is an important public health challenge (35). Nearly half of the American population does not engage in enough PA to prevent disease or benefit health (9). Compared with men, women participate in less vigorous leisure activity (7,8) and may engage in more sedentary behaviors (8,24). Furthermore, previous studies have indicated that minority women report even less leisure time PA than white women (7,8,25,35). Additional research on PA behaviors in women and minority populations would help guide public health policy and interventions.
Previous research demonstrates that women engage in different types and patterns of PA than men (2,3,34). Women may have a different interpretation or understanding of what PA means to them (2,14,15,21,30,34). Because many PA questionnaires used in epidemiologic research were originally designed for white male populations, they may not accurately measure PA in women (2,21,33). This makes accurate and reliable measurement of PA in women and in minority populations especially challenging. Additionally, few PA questionnaires have detailed measurement properties reported among minority or other special populations of women. Furthermore, the validity and the reliability of PA questionnaires may be impacted by other attributes such as age, length of time between test and retest, or level of PA. These attributes may affect the ability of individuals to remember, comprehend, and answer questions.
One study that has attempted to address issues of PA measurement in women is the Women's Health Initiative (WHI) Observational Study. The WHI is a long-term, multicenter, racially and ethnically diverse national cohort study of 161,808 women. The WHI enables scientists to study relationships between lifestyle, health risk factors, and specific disease outcomes. To date, over 230 articles have been published using WHI data. Several of these articles have explored the associations of PA with major diseases (10,16,20,22,23). To adequately study risk factors, like PA, it is important for researchers to understand the questionnaires' measurement properties. The objective of this article is to examine the test-retest reliability of the WHI PA questionnaire in a random sample of the WHI participants overall and by race/ethnicity, age, time between test and retest, and level of PA.
Between 1994 and 1998, over 93,676 women between 50 and 79 yr were enrolled at one of 40 clinic centers across the United States into the WHI Observational Study (19). Eligibility for enrollment included the intention to reside in the area for at least 3 yr, free from any major medical condition that would impact survival within 3 yr of study entry, and no reported mental illness, dementia, alcoholism, or drug dependency. Full details on the study cohort and design are available elsewhere (19).
Between October 1996 and June 1997, a subsample of the women enrolled in the WHI Observational Study was selected to participate in the Measurement and Precision Study. Participants (n = 1092) were randomly recruited within the 40 clinic centers and were stratified by age and race/ethnicity (American Indian/Alaskan Native, Asian or Pacific Islander, black or African American, Hispanic/Latino, and white).
The purpose of the Measurement and Precision Study was to assess test-retest reliability of several self-administered questionnaires. Each clinic center was randomly assigned to repeat a set of baseline questionnaires (19). At approximately 12-wk intervals (range = 8-15 wk), half of the women (n = 567) repeated questions on exercise/recreational activities (Form 34) and the other half (n = 512) repeated questions related to household, yard, and sedentary activities (Form 42). The two questionnaires were distributed between the samples to reduce the time burden on the participants. The 12-wk time interval (8-15 wk) was chosen to minimize a "learned" response to the instrument so that participants would not recall their previous answers. Institutional review board approval was obtained by each participating WHI center before data collection, and participants provided their written informed consent.
The PA questionnaire was self-administered at enrollment. The questionnaire was intentionally worded without reference to a specific time frame (e.g., last week, last month, last year) to collect "usual" activity or patterns of activity. It was designed to collect different types of activities by grouping them together by intensity. This was done to reduce the burden and the time needed to complete the questionnaire. The questionnaire was divided into two forms to collect information on usual PA. On the first form, participants reported their usual exercise or recreational activity (mild, moderate, strenuous, and walking activities). On the second form, participants were asked about heavy indoor household activities and yard activities. Both forms were completed at the same time, either at the clinics or mailed to the participant, and then returned to the clinic for review.
The questionnaire grouped exercise or recreational activities into three separate intensities (mild, moderate, and strenuous) based a range of MET values associated with the type of activities described. The participants then reported the usual frequency (six categories, from 0 to 5+ d·wk−1) and duration (four categories, from <20 min to ≥60 min) of activities performed at each intensity level. Episodes of walking outside of the home (10 min or more) were reported separately through frequency (six levels, 0-7 d·wk−1), duration (four levels, <20 min to ≥60 min), and usual speed (four levels, 2-5 mph). Questions on household activities were assessed as hours per week (five categories, from <1 to ≥10 h). Yard activities included the number of months per year (five categories, <1 month to ≥10 months) and hours per week (five categories, <1 to ≥10 h) the activities were performed. Participants were also asked to report number of hours spent sitting and lying down, including sleep, each day (eight categories, <4 to ≥16 h). In addition, the women were also asked to recall whether or not they engaged in strenuous activity (yes or no) at 18, 35, and 50 yr. The questionnaire and the scoring protocol can be found in Appendix 1.
The WHI PA measures were designed to be summarized into continuous variables estimating weekly energy expenditure (MET·h·wk−1) from each type of activity (mild, moderate, strenuous, walking, household, and yard). An estimated MET level for each type of activity was assigned from a compendium of activities (1) (Appendix 2), where the MET level is kilocalories per kilogram of body weight expended each hour during a specific activity. Summary variables were created by combining frequency, duration, and MET-estimated intensity in the following equation: [(frequency of activity per week × minutes per session × MET for that activity) / (60 min·h−1)]. These summary variables in "MET-hours" quantify the total kilocalories expended per kilogram per week. MET units are independent of body weight.
Participants answered questions on several important health behaviors and demographic attributes. Race/ethnicity (American Indian/Alaskan Native, Asian or Pacific Islander, black or African American, Hispanic/Latino, and white), education (10 levels), main occupation (professional/managerial, technical/sales/administrative, service/labor, and homemaker), retirement status, martial status, smoking status, and general health were all self-reported at the first clinic visit. Additionally, height and weight for each individual were measured at this visit and were used to calculate body mass index (BMI; weight in kilograms divided by height in meters squared) and were categorized as underweight (<18.5 kg·m−2), normal weight (18.5 to <25 kg·m−2), overweight (25 to <30 kg·m−2), and obese (≥30 kg·m−2) (37).
Two-level kappa and weighted kappa (three to eight levels) statistics were used to assess the test-retest reliability of each individual question or corresponding component (e.g., frequency, duration). Weighting for the kappa statistics was applied using the default in SAS, the Cicchetti-Allison form, taking into account the degree of nonagreement between the test and the retest. Agreement between the test and the retest was categorized into five categories: poor (0 to <0.2), fair (0.2 to <0.4), moderate (0.4 to <0.6), substantial (0.6 to <0.8), and almost perfect (0.8-1.0) (18). Test-retest reliability of the continuous variables was assessed with the Shrout and Fleiss (29) intraclass correlations coefficient (ICC1,1). This ICC1,1 uses test and retest measures to estimate single trial reliability instead of the average of repeated measures. More specifically, we calculated the ICC1,1 and the 95% confidence intervals (CI) using a one-way ANOVA model (29,31,32) and then assessed the proportion of the total variance (true variability and measurement error) that was attributable to participant variability.
Stratified analyses were performed by race/ethnicity, time between test and retest (≤3 months vs >3 months), age (50 to ≤65 yr, >65 to 79 yr), and level of recreational activity (one or more episodes vs none). Lastly, because the participants were not randomized to the type of activity form (exercise/recreation form vs household/yard form), differences between the two samples were also examined.
The majority of the sample (n = 1092) reported good, very good, or excellent health (90%), and the average age was 64 yr old (Table 1). The population was predominantly white (66%) followed by Hispanic (14%), African American (13%), and Asian/Pacific Islander (7%). Only 1% of the women identified themselves as American Indian/Alaskan Natives. These women were excluded from the racially stratified analysis only due to inadequate sample size. Most women had completed high school (93%) and reported an occupation (current or former) other than being a homemaker (90%). More than half of them (55%) were retired. Approximately half of the sample (51%) reported never smoking and more than half (57%) were overweight or obese. The majority of the women were married, whereas one third were either widowed, separated, or divorced.
Although participants were randomly chosen from within each center, each center was assigned to only one of the two PA forms (exercise/recreational activity vs yard/household). Several differences in the populations were found between the two forms. Differences of 5% or more were observed between the two samples for the following variables: race/ethnicity, education, and BMI. A greater proportion of the participants who answered the questionnaire on exercise/recreation activities were normal weight (43% vs 36%), white (69% vs 63%), and college graduates (40% vs 34%) compared with the sample that answered the questions on household/yard activities. Differences were not observed between general health, occupational status, marital status, and smoking.
At baseline, 73% of the women were not strenuously active, and more than half had not participated in regular strenuous activity in their earlier adulthood (aged 18, 35, and 50 yr; data not shown). At least 80% of the women reported some walking. However, when all exercise was combined, fewer than half of the women reported fewer than 10 MET·h·wk−1 (median = 9.0 MET·h·wk−1; SD = 14.3). Whites and Asian/Pacific Islanders had higher median levels of total recreational activity than Hispanic and African Americans (9.8, 8.7, 7.5, and 7.5 MET·h·wk−1, respectively). A similar pattern was observed for strenuous recreational activity and moderate to strenuous recreational activity by race/ethnicity (data not shown). More women reported at least one episode of moderate recreational activity (e.g., easy swimming, biking, or dancing) than mild recreational activity (e.g., bowling or golf; Table 2).
Within the entire sample, substantial test-retest reliability was demonstrated in most summary measures, with the exception of mild recreational activity, which had lower reliability (Table 3). The continuous estimate of total PA (ICC1,1) was 0.76 (95% CI = 0.71-0.79), and the categorical estimate of total PA (weighted kappa) was 0.61 (95% CI = 0.56-0.66; Tables 3 and 4).
Reliability was similar when the sample was reduced to only those women who reported at least one episode of exercise or recreational activity (Table 3). Stratifying by race/ethnicity resulted in a loss of precision, but the associations were similar (Table 5). The exception was mild recreational activity that consistently demonstrated the lowest reliability, especially in nonwhite participants. When stratified by age, women who were ≤65 yr demonstrated higher reliability than women >65 yr (Table 6). However, the magnitude of these differences was small, as the measures in both strata remained similar to the reliability of the entire sample. Additionally, the population of women who repeated the tests within 3 months also tended to have slightly higher reliability compared with women for whom more than 3 months had passed at retest (Table 6).
In general, the reliability of the individual questions on the components of frequency and duration of exercise (strenuous, moderate, mild, and walking) was between 0.36 and 0.62 for the entire sample (Table 4). Better reliability was observed for the strenuous and walking components than moderate or mild components. The reliability estimates of hours spent sitting and lying down as well as yard and indoor household activities ranged from 0.60 to 0.71 for the entire sample (Table 3).
History of strenuous activity at the ages of 18, 35, and 50 yr, measured by kappa statistics, ranged between 0.53 and 0.55 overall (Table 4). Similar to the summary measures, reliability was not meaningfully influenced by restricting the analysis to only women who reported at least one episode of exercise or recreational activity. When stratified by the other relevant covariates (age, race/ethnicity, and time between tests), the reliability of moderate, strenuous, and walking PA were all fair to moderate.
The WHI PA questionnaire demonstrated moderate to substantial test-retest reliability in a racially diverse sample of postmenopausal women. The reliability estimates observed in this sample are similar to reliability measures from other self-reported questionnaires designed for women (6) and for older adults (36). Additionally, the PA in this population generally paralleled activity patterns observed in the US population of adults (7,8,35).
The most consistent difference in the test-retest reliability estimates appeared to be lower reliability in the mild exercise or activity measures. Although it is possible that the lower reliability observed in the mild intensity questions may be an artifact of reduced precision, it is consistent with other research (27,36). Activities of mild intensity are less memorable and less likely to be recalled and are consequently less well captured by self-report questionnaires. Another potential explanation for the weaker performance of the mild activity measures may be a result of the questionnaire design. Mild walking, a popular recreational activity in this population, was assessed separately from other mild-intensity activities and showed higher reliability than mild activity. Therefore, if walking had been included in the mild activity measure, instead of assessed separately, mild activity might have shown higher reliability.
Differences in test-retest reliability were not observed when reducing the sample to only women who reported at least one episode of any exercise or recreational activity. Interestingly, there were also no meaningful differences in reliability observed across race/ethnic groups. Previous studies have been mixed in their reporting of differences in reliability by race/ethnicity (5,12,28). However, it is also important to consider the wide CI in the race/ethnicity estimates because stratifying the data resulted in a loss of precision.
Although we did not observe differences in reliability between the different race/ethnic groups or by level of activity, some patterns were observed by age and length of time between test and retest. Women who were 65 yr or younger demonstrated better test-retest reliability than women who were older. Variability of PA in older women may be influenced by many factors, such as changing health status (e.g., fatigue, injury, disease progression), retirement, or loss of a spouse (4,11,13). Any of these changes within the study period could impact questionnaire reliability as women's activity patterns are affected. Additionally, aging is associated with cognitive decline that can impact memory and could in turn affect reliability (26).
Not surprisingly, a slightly higher pattern was observed in some measures among the sample of women who repeated tests within a three-month period compared with women who experienced more than 3 months between the tests. One explanation could be because tests repeated within a shorter time frame are more likely to be given in the same season or comparable time of year with regards to weather. Furthermore, a change in activity (either increase or decrease) could have occurred after the administration of the first questionnaire, such that the reliability estimates would be lower.
Although reliability could be explored with these data, validation of the WHI PA questionnaire could not be assessed. However, the questionnaire's validity was recently explored among 74 women enrolled in the Women's Healthy Eating and Living Study (17). In this convenience sample of women, the WHI PA questionnaire was correlated with both the accelerometer (Actigraph 7164) and the 7-d PA recall (r = 0.73 and 0.88, respectively). Although the WHI questionnaire had 100% sensitivity for identifying women who met the PA guidelines, the specificity was only 60%. The questionnaire tended to underestimate moderate activities and overestimate vigorous activities.
Despite the diverse and large sample, this study had several limitations. The WHI sample was not population based and may not be representative of a specific source population. White women comprised a larger sample than other racial/ethnic groups. Due to the small sample sizes representing Hispanic, African American, and Asian/Pacific Islander women, the bounds of the lower CI were estimated below zero in several of the stratified analyses. Additionally, the level of education in our sample was very high, and we were unable to examine variation in test-retest reliability by education. Another limitation to this study was that participants were not randomized to the two forms, and some differences were observed between the two groups.
Several other considerations should be made when using the questionnaire. Although the WHI PA assessment included a measure of yard and household activity, it was not a comprehensive measure of women's potential activities. Several domains of activity such as nonmotorized transportation (active travel), child or elder care activity, and work or occupational PA were not included in the WHI PA questionnaire.
Reliable and valid questionnaires are a cost-effective and useful method for collecting PA information in large cohort studies, such as in the WHI Observational Study. However, measurement of PA is challenging because many questionnaires do not collect detailed information on types of activities and use terminology many women do not identify with (2,21,33,34). The WHI PA questionnaire is one of the first questionnaires to examine different types of PA in a large, multiethnic sample of women. This analysis shows that the different domains of PA behavior, such as recreational, yard, and household activity, can be reliably estimated in an ethnically diverse sample of postmenopausal women.
The WHI study was funded by the National Institutes of Health (NIH)/National Heart, Lung, and Blood Institute (NHLBI). This work was supported in part by the NIH/NHLBI #5-T32-HL007055. The results of the present study do not constitute endorsement by the NIH or the ACSM.
The authors are also indebted to Dr. David Couper, Dr. Gerardo Heiss, Dr. Steve Marshall, and Dr. June Stevens for their valuable feedback on the analysis and manuscript.
WHI Program Office. National Heart, Lung, and Blood Institute, Bethesda, MD: Elizabeth Nabel, Jacques Rossouw, Shari Ludlam, Joan McGowan, Leslie Ford, and Nancy Geller.
WHI Clinical Coordinating Center. Fred Hutchinson Cancer Research Center, Seattle, WA: Ross Prentice, Garnet Anderson, Andrea LaCroix, Charles L. Kooperberg, Ruth E. Patterson, Anne McTiernan; Medical Research Labs, Highland Heights, KY: Evan Stein; University of California at San Francisco, San Francisco, CA: Steven Cummings.
WHI Clinical Centers. Albert Einstein College of Medicine, Bronx, NY: Sylvia Wassertheil-Smoller; Baylor College of Medicine, Houston, TX: Aleksandar Rajkovic; Brigham and Women's Hospital, Harvard Medical School, Boston, MA: JoAnn E. Manson; Brown University, Providence, RI: Charles B. Eaton; Emory University, Atlanta, GA: Lawrence Phillips; Fred Hutchinson Cancer Research Center, Seattle, WA: Shirley Beresford; George Washington University Medical Center, Washington, DC: Lisa Martin; Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA: Rowan Chlebowski; Kaiser Permanente Center for Health Research, Portland, OR: Yvonne Michael; Kaiser Permanente Division of Research, Oakland, CA: Bette Caan; Medical College of Wisconsin, Milwaukee, WI: Jane Morley Kotchen; MedStar Research Institute/Howard University, Washington, DC: Barbara V. Howard; Northwestern University, Chicago/Evanston, IL: Linda Van Horn; Rush Medical Center, Chicago, IL: Henry Black; Stanford Prevention Research Center, Stanford, CA: Marcia L. Stefanick; State University of New York at Stony Brook, Stony Brook, NY: Dorothy Lane; The Ohio State University, Columbus, OH: Rebecca Jackson; University of Alabama at Birmingham, Birmingham, AL: Cora E. Lewis; University of Arizona, Tucson/Phoenix, AZ: Cynthia A Thomson; University at Buffalo, Buffalo, NY: Jean Wactawski-Wende; University of California at Davis, Sacramento, CA: John Robbins; University of California at Irvine, CA: F. Allan Hubbell; University of California at Los Angeles, Los Angeles, CA: Lauren Nathan; University of California at San Diego, LaJolla/Chula Vista, CA; Robert D. Langer; University of Cincinnati, Cincinnati, OH: Margery Gass; University of Florida, Gainesville/Jacksonville, FL: Marian Limacher; University of Hawaii, Honolulu, HI: J. David Curb; University of Iowa, Iowa City/Davenport, IA: Robert Wallace; University of Massachusetts/Fallon Clinic, Worcester, MA: Judith Ockene; University of Medicine and Dentistry of New Jersey, Newark, NJ: Norman Lasser; University of Miami, Miami, FL: Mary Jo O'Sullivan; University of Minnesota, Minneapolis, MN: Karen Margolis; University of Nevada, Reno, NV: Robert Brunner; University of North Carolina, Chapel Hill, NC: Gerardo Heiss; University of Pittsburgh, Pittsburgh, PA: Lewis Kuller; University of Tennessee Health Science Center, Memphis, TN: Karen C. Johnson; University of Texas Health Science Center, San Antonio, TX: Robert Brzyski; University of Wisconsin, Madison, WI: Gloria E. Sarto; Wake Forest University School of Medicine, Winston-Salem, NC: Mara Vitolins; Wayne State University School of Medicine/Hutzel Hospital, Detroit, MI: Michael Simon.
Conflict of interest: Dr. Morimoto works for Exponent in Menlo Park, CA.
1. Ainsworth B, Haskell W, Whitt M, et al. Compendium of physical activities: an update of activity codes and MET intensities. Med Sci Sport Exer
. 2000;32(9 suppl):S498-516.
2. Ainsworth BE. Issues in the assessment of physical activity in women. Res Q Exerc Sport
3. Ainsworth BE, Irwin ML, Addy CL, Whitt MC, Stolarczyk LM. Moderate physical activity patterns of minority women: the Cross-Cultural Activity Participation Study. J Womens Health Gend Based Med
4. Brown WJ, Trost SG. Life transitions and changing physical activity patterns in young women. Am J Prev Med
5. Brownson RC, Eyler AA, King AC, Shyu YL, Brown DR, Homan SM. Reliability of information on physical activity and other chronic disease risk factors among US women aged 40 years or older. Am J Epidemiol
6. Cauley JA, LaPorte RE, Sandler RB, Schramm MM, Kriska AM. Comparison of methods to measure physical activity in postmenopausal women. Am J Clin Nutr
7. CDC. Physical activity trends-United States, 1990-1998. MMWR Morb Mortal Wkly Rep
8. CDC. Prevalence of no leisure-time physical activity-35 States and the District of Columbia, 1988-2002. MMWR Morb Mortal Wkly Rep
9. CDC. Prevalence of regular physical activity among adults-United States, 2001 and 2005. MMWR Morb Mortal Wkly Rep
10. Chlebowski RT, Pettinger M, Stefanick ML, Howard BV, Mossavar-Rahmani Y, McTiernan A. Insulin, physical activity, and caloric intake in postmenopausal women: breast cancer implications. J Clin Oncol
11. Evenson K, Rosamond W, Cai J, Diez-Rioux A, Brancati F. The influence of retirement on leisure-time physical activity: the Atherosclerosis Risk in Communities Study. Am J Epidemiol
12. Evenson KR, McGinn AP. Test-retest reliability of adult surveillance measures for physical activity and inactivity. Am J Prev Med
13. Eyler AA. Correlates of physical activity: who's active and who's not? Arthritis Rheum
14. Henderson K, Ainsworth B. Researching leisure and physical activity with women of color: issues and emerging questions. Leis Sci
15. Henderson KA, Ainsworth BE. A synthesis of perceptions about physical activity among older African American and American Indian women. Am J Public Health
16. Hsia J, Wu L, Allen C, et al. Physical activity and diabetes risk in postmenopausal women. Am J Prev Med
17. Johnson-Kozlow M, Rock CL, Gilpin EA, Hollenbach KA, Pierce JP. Validation of the WHI brief physical activity questionnaire among women diagnosed with breast cancer. Am J Health Behav
18. Landis JR, Koch GG. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics
19. Langer RD, White E, Lewis CE, Kotchen JM, Hendrix SL, Trevisan M. The Women's Health Initiative Observational Study: baseline characteristics of participants and reliability of baseline measures. Annals of Epidemiology
20. Manson JE, Greenland P, LaCroix AZ, et al. Walking compared with vigorous exercise for the prevention of cardiovascular events in women. N Engl J Med
21. Masse L, Ainsworth B, Tortolero S, et al. Measuring physical activity in midlife, older, and minority women: issues from an expert panel. J Women's Health
22. McTiernan A, Kooperberg C, White E, et al. Recreational physical activity and the risk of breast cancer in postmenopausal women: the Women's Health Initiative Cohort Study. JAMA
23. McTiernan A, Wu L, Chen C, et al. Relation of BMI and physical activity to sex hormones in postmenopausal women. Obesity (Silver Spring)
24. Nielsen Media Research. Nielsen Media Research Reports Television's Popularity is Still Growing
[press release]. New York (NY): Nielsen Media Research; 2006.
25. Ransdell L, Wells C. Physical activity in urban white, African-American, and Mexican-American women. Med Sci Sports Exercise
26. Rikli RE. Reliability, validity, and methodological issues in assessing physical activity in older adults. Res Q Exerc Sport
27. Sallis JF, Haskell WL, Wood PD, et al. Physical activity assessment methodology in the Five-City Project. Am J Epidemiol
28. Shea S, Stein AD, Lantigua R, Basch CE. Reliability of the behavioral risk factor survey in a triethnic population. Am J Epidemiol
29. Shrout PE, Fleiss JL. Intraclass correlations-uses in assessing rater reliability. Psychol Bull
30. Sternfeld B, Ainsworth B, Quesenberry C Jr. Physical activity patterns in a diverse population of women. Prev Med
31. Streiner D, Norman G. Health Measurement Scales: A Practical Guide to Their Development and Use
. Oxford: Oxford Medical Publications. 1995. p. 111-2.
32. Streiner DL. Learning how to differ: agreement and reliability statistics in psychiatry. Can J Psychiatry
33. Tudor-Locke C, Henderson KA, Wilcox S, Cooper RS, Durstine JL, Ainsworth BE. In their own voices: definitions and interpretations of physical activity. Womens Health Issues
34. Tudor-Locke CE, Myers AM. Challenges and opportunities for measuring physical activity in sedentary adults. Sports Med
35. US Surgeon General's report on physical activity and health. From the Centers for Disease Control and Prevention. JAMA
36. Washburn RA. Assessment of physical activity in older adults. Res Q Exerc Sport
37. World Health Organization. Physical status: the use and interpretation of anthropometry. In: Report of a WHO Expert Committee
. World Health Organization, editor. Geneva (Switzerland): WHO Technical Report Series. 1995. p. 854.