Physical inactivity is now considered a global health concern, but no standardized approaches to measurement exist, and international comparisons and global surveillance are difficult (3,10,11). In 1996, one of the authors (MLB) initiated an international effort to develop comparable measures, and this was followed by the development of an International Consensus Group, which met in Geneva in 1998. The objective was to develop a self-reported measure of physical activity suitable for assessing population levels of physical activity across countries.
Initial pilot testing occurred during 1998–1999, and eight versions of the International Physical Activity Questionnaire (IPAQ) were developed, with four short and four long versions of the questionnaire. These could be administered by telephone interview or self-administration. There were two different reference periods under investigation, either the “last 7 d” or a “usual week” (see Appendix A, Table 1). To determine the measurement properties of these questionnaires, a reliability and validity study was carried out in 14 centers in 12 countries during 2000 (see Appendix A, Table 2 for country codes). This paper reports on the international reliability and validity study of the IPAQ instruments, in an effort to determine the suitability of different forms of the IPAQ instrument for international physical activity surveillance.
Related short and long physical activity questionnaires were pilot tested and selected for international evaluation. The questionnaires were designed to be used by adults aged 18–65 yr. The short version (9 items) provided information on the time spent walking, in vigorous- and moderate-intensity activity and in sedentary activity. Participants were instructed to refer to all domains of physical activity. The long version (31 items) was designed to collect detailed information within the domains of household and yard work activities, occupational activity, self-powered transport, and leisure-time physical activity as well as sedentary activity. An additional question asked about the pace of walking and cycling. Standard methods were used to translate and adapt the questionnaires to the study centers in different countries (8,15). In each country, clinical research or institutional ethics committees had approved the study, and informed consent (“written” where literacy allowed this) was obtained from participants at each study site.
The reliability study was conducted over a 3- to 7-d period, requiring two participant contacts. During the first visit, the selected version of IPAQ was completed, and demographic data were obtained. Participants also read and signed institutional human subject consent forms. Up to 1 wk later, participants completed the same IPAQ version(s). In the validity studies, participants completed the same assessments but also wore a CSA motion detector (now MTI) for 1 wk between visits 1 and 2, and had height and weight measured. Centers that administered both the reliability and validity protocols also had participants complete a third study visit, 3 d after the second visit to complete the reliability component.
Validation of reported activity levels used objective data recorded on the Computer Science and Application’s Inc. (Shalimar, FL) accelerometer (CSA model 7164). The technical specification and performance properties of the CSA activity monitor have been described elsewhere (4–7,9,12,16). Participants wore the monitors during the 7 d of the validity study, and data were summed and stored in 1-min intervals.
Protocol and data management.
All centers used a standardized protocol for reliability and validity assessment, overseen by a protocol coordinator (BEA). Adherence to the study protocols was well maintained, with few variations to the procedures reported. In South Africa and Guatemala, the telephone versions of the questionnaires were administered via personal interview. In general, samples were drawn from specific populations and were usually convenience samples, but collectively, the participants represented a wide range of age, education, income, and activity levels (Table 2). Additional qualitative input from each data collection site was also received to assist the understanding of issues surrounding the administration and interpretation of IPAQ, across developed and developing countries. The Data Management Center (Sydney, Australia: AEB, ALM) and two members of the IPAQ Executive [BEA and CLC] developed a standardized approach to cleaning, scoring, and analyzing the data.
Scoring and data reduction.
After cleaning the data for missing and out-of-range values, the data collected from the long IPAQ questionnaires were summed within each physical activity domain to estimate the total time spent in occupational, transport, household, and leisure related physical activity, as well as total time reported sitting per week. Data from the short IPAQ questionnaires were summarized according to the physical activities recorded (walking, moderate, and vigorous activities) and estimated time spent sitting per week. Note that the sitting questions were developed as separate indicators and not as part of the summed physical activity score. Both the short and long form data were then used to estimate total weekly physical activity by weighting the reported minutes per week within each activity category by a MET energy expenditure estimate assigned to each category of activity (Table 1). MET levels were obtained from the 2000 compendium of physical activities (1) to include moderate-intensity activities between 3 and 6 METs and vigorous-intensity activities as >6 METs (1). The weighted MET-minutes per week (MET·min·wk−1) were calculated as duration × frequency per week × MET intensity, which were summed across activity domains to produce a weighted estimate of total physical activity from all reported activities per week (MET·min·wk−1). For brevity and clarity of presentation, only the total physical activity MET-minutes per week and total minutes per week in sitting activities are reported here.
Additional data collected on self-reported walking and cycling pace were evaluated separately from the main IPAQ analysis to determine whether these questions improved the data quality. Further additional analysis excluded the walking time reported to be at a slow pace. The MET estimates used to weight the different intensities of reported walking and cycling according to pace are shown in Table 1.
Total physical activity, expressed as minutes per week was also categorized to determine the proportion of each sample who met the CDC-ACSM physical activity guideline, which is often interpreted as “at least 150 min·wk−1 of at least moderate-intensity physical activity” (13,17). To more properly represent time spent in vigorous activity at the recommended volume (defined as three 20-min sessions per week for a total of 60 min·wk−1), time spent in vigorous activity was multiplied by two, consistent with the method used by Armstrong et al. (2).
The raw CSA data were prepared for analysis using software written in Microsoft Access by the Swedish study Center (UE, YA, MS). For the CSA data to be included in the analyses at least 600 min of registered time was required each day for at least 5 d (one of which had to be a weekend day). The total amount of physical activity recorded by the CSA, expressed as total counts per registered time (counts·min−1), was used as the criterion measure in the validity analyses. To identify the time spent in moderate-intensity activity, the CSA counts had to be >1952 and <5724 counts·min−1, and to identify time spent in vigorous-intensity activity, the CSA counts had to be >5725 counts·min−1 (6). These data were used to compare with the 150-min·wk−1 criterion of adequate physical activity described previously with the time spent in moderate and vigorous-intensity activity recorded by the CSA being treated in the same way as for the self-report IPAQ data.
To validate the IPAQ sitting data, a pragmatic CSA cut-point of < 100 counts·min−1 was used to define time spent being sedentary. This cut-point was determined from several studies, which have appraised the activity counts recorded by the CSAs during a variety of different activities (5,16).
Three separate types of analyses were planned, depending on the IPAQ study protocol adopted by each data collection site:
- Reliability assessment: the test-retest repeatability of the same IPAQ forms administered at two different times not more than 8 d apart for the “last 7 d” recall forms and not more than 10 d apart for the “usual week” forms.
- Concurrent (inter-method) validity: compared the self-report data from two different IPAQ forms that were administered during the same day, e.g., comparing the agreement between the long and short IPAQ forms. A comparison between telephone and self-administered modes of data collection was also conducted by Canada.
- Criterion validity: compared the physical activity and sitting data from the self-report IPAQ forms with the CSA measure of physical activity recorded over 7 d.
As the self-reported IPAQ data were not normally distributed, nonparametric Spearman correlation coefficients (ρ) were calculated as a primary measure of agreement: between visits (reliability), between forms (concurrent validity), and between the IPAQ data and CSA counts (criterion validity). All valid data were included, without excluding outliers, as this did not affect the results to any substantive degree. The categorical data were analyzed by calculating the percent agreement (average correct classification, ACC), to determine how many participants in each sample were classified in the same category by the method being compared (either reliability across visits, concurrent validity within visits and/or criterion validity between IPAQ forms and the CSA data).
The samples were reasonably similar across data collection sites, with comparable gender proportions and each sample being predominantly middle-aged (Table 2). Comparisons of education levels showed the urban samples were slightly more educated than the rural samples from SA and GU, and the UK1, USA2, and NET (see Appendix A, Table 2 for country abbreviations) samples were more educated than other IPAQ samples. The majority of subjects were employed and reported working similar hours per week across all samples, except for one of the rural SA samples. Most samples were from residents of large cities, but adults from smaller communities were included in the UK2, CAN, SA, and SW samples. Overall, the samples used in all sites tended to be well educated, and the only representative population samples used were from CAN and SW.
Long and Short Questionnaire Reliability
Test-retest reliability data for the long IPAQ questionnaires are presented in Table 3. These data show Spearman correlation coefficients ranging from 0.96 (USA2) to 0.46 (SA Ru), but most were around 0.8 indicating very good repeatability. Overall, long form data were completed by 1880 adults, with a median of 3699 MET-min reported weekly, and 89% meeting the 150 min·min−1 threshold. The long forms’ pooled data showed a repeatability coefficient of ρ = 0.81 (95% CI 0.79–0.82). The various long forms showed similar levels of repeatability, with L7T (N = 200, ρ = 0.79), L7S (N = 294, ρ = 0.77), LUT (N = 482, ρ = 0.76), and LUS (N = 904, ρ = 0.83). Specific analyses for activity by intensity levels (not shown here) indicated that the repeated recall of vigorous physical activities were generally better than moderate physical activities. However, further analyses using pace specific MET-minutes per week did not substantially change the correlation coefficients for total physical activity (Table 3). The repeatability coefficients based on data using specific pace related MET-minutes either decreased or stayed the same in 21 of the 22 separate correlations. Furthermore, excluding the data that reported “slow walking pace” resulted in 19 of 21 correlation coefficients decreasing or staying the same. Additional analyses based on including or excluding occupational (job-related) physical activity in the total estimate did not influence their repeatability (Table 3).
The test-retest reliability of sitting recall between visits in the IPAQ long forms was generally good with more than four-fifths of the coefficients above 0.70. The lowest reliability coefficients were observed in the rural GU and SA samples. The ability of the IPAQ long forms to reliably classify respondents using the categorical estimate of “sufficient physical activity” was very good, with percent agreements ranging from 1.00 (NET and USA2) to 0.84 (SW) (Table 3). There were no major differences in repeatability coefficients between the “last 7 d” and the “usual week” reference periods or the telephone/personal interview and self-administered modes of administration.
Reliability data for the IPAQ short questionnaires are presented in Table 4. Repeatability was again at an acceptable level, with 75% of the correlation coefficients observed above 0.65 and ranging from 0.88 (USA2 and GU Ub) to 0.32 (rural SA). Overall, the short questionnaires were completed by 1974 people, with a median of 2514 MET·min reported across all studies and 82% achieving the 150-min cut-point for “sufficiently active.” The pooled ρ was 0.76 (95% CI 0.73–0.77). The various short forms were similar in their estimated repeatability, with the S7T (N = 300, ρ = 0.74), S7S (N = 292, ρ = 0.75), and SUS (N = 906, ρ = 0.79). Only the SUT questionnaire was slightly less repeatable (ρ = 0.64, 95% CI 0.58–0.69).
Including reported walking and cycling “pace” did not greatly influence the reliability of the data, where 20 of the 23 reliability studies correlations decreased or stayed the same (Table 4). Similarly using pace to exclude the “slow-paced walking” data again did not influence (17 of the 23 reliability coefficients decreased or stayed the same). The sitting time data were quite repeatable, with two-thirds of all repeatability coefficients above 0.70. The categorical estimates of sufficient physical activity between visits were very repeatable with percent agreement ranging from 100% (USA2) to 77% (JAP).
The observed concurrent validity (inter-method) coefficients between IPAQ forms suggested that the short and long forms showed reasonable agreement (Table 5). The pooled ρ, for comparisons between long and short forms was 0.67 (95% CI 0.64–0.70) and for comparisons of different short instruments was 0.58 (0.51–0.64). Over half the correlation coefficients calculated between the data collected at visit 1 were above 0.65, whereas with the data collected at visit 2, over 60% of the correlations were over 0.65, and by visit 3 all correlations were above 0.65.
The amount of time reported sitting in the long and short IPAQ forms were in agreement, with correlations ranging from 0.96 (USA2 and FIN) to 0.57 (SW) at visit 1. The correlation coefficients for sitting did not appear to be influenced by the reference period (“last 7 d” or “usual week”) nor mode of administration (telephone or self) as indicated by all the data but specifically by the two CAN samples (Table 5).
The correlation coefficients for the short and long form total MET-minutes per week also did not appear to be influenced by reference period (“last 7 d” or “usual week”). These correlation coefficients were reasonable for both modes of administration (telephone or self- administered).
The criterion validity of the self-report IPAQ data against CSA accelerometers is shown in Table 6 for both the long and short forms. Overall, there was fair to moderate agreement between the two measures, with a total of 744 adults testing the long forms against the CSA (pooled ρ = 0.33, 95% CI 0.26–0.39), and for the short forms and CSA (N = 781, ρ = 0.30, 95% CI 0.23–0.36).
The criterion validities of the long and short forms were almost equivalent, but there appeared to be a wider range of correlation values associated with the long form. Higher associations were observed between the data produced by the categorical estimates of sufficient physical activity, with about 80% of the estimates showing agreement coefficients of at least 70% and around four-fifths of all individuals being similarly classified by both the IPAQ forms and CSA data. This may in part be due to the fact that the majority of participants were already meeting the requirement for sufficient physical activity. Again using walking and cycling pace to more specifically estimate MET-minutes per week scores did not improve the criterion validity correlations, and in most instances excluding slow-paced activity only worsened the relationship.
The correlation between the IPAQ sitting data and an estimate of sitting accounted for by the CSAs showed similar correlations to the physical activity data indicating moderate agreement between subjective and objective measures of this sedentary behavior (Table 6).
Process (qualitative) feedback.
A summary of the qualitative reports submitted by the data collection centers identified some research issues among the data collection sites. The more frequent issues reported were: (i) technical problems with the CSA monitors; (ii) not having enough CSAs on hand, which was a particular problem for developing countries (as they are expensive); (iii) difficulty in conducting follow-up interviews at the exact protocol-defined time; (iv) interpretation of a “usual week” was sometimes problematic as participants were not able to identify “what is usual?” and participants deferred to recall of the “last 7 d” as a “usual week”; (v) difficulty in distinguishing vigorous and moderate physical activity; (vi) pace was not consistently defined in all cultures; (vii) the inability of participants to accurately estimate the number of 10-min bouts they had participated in; and finally, (vii) the examples of vigorous- and moderate-intensity activities used were not always locally relevant despite the IPAQ protocol allowing the use of culturally applicable examples.
Some developing countries sites also reported a preference for using the self-administered mode of data collection, as telephones were not sufficiently available. To overcome this issue the Guatemalan and South African investigators used the telephone script to administer the forms as a personal interview instead of using the telephone, which appears a viable option in developing countries. Administering the questionnaire this way may be the preferred option as some participants completing the questionnaire by self-administration skipped some questions.
The short form was generally better received in sites that administered both the long and short forms. The long form was reported as being “too boring and repetitive,” and too long, and therefore expensive, for routine surveillance. Nine data collection sites reported a preference for using the “last 7 d” over the “usual week” reference period and this preference was expressed by both developed and developing countries.
The burgeoning global problem of physical inactivity (10,13,17), and the need for population surveillance and inter-country comparisons, has led to the development of the IPAQ measure. These IPAQ instruments underwent several stages of development and testing, culminating in this large multi-country reliability and validity study. These results of the IPAQ reliability and validity study show that IPAQ exhibited measurement properties that are at least as good as other established self-report physical activity measures. For comparison purposes, a recent review (14) summarized reliability and criterion validity results for seven self-report physical activity measures evaluated in adults. They reported reliability correlations ranged from 0.34 to 0.89, with a median of about 0.80 and criterion validity correlations ranged from 0.14 to 0.53, with a median of about 0.30. Typical IPAQ correlations were about 0.80 for reliability and 0.30 for validity. Considering the diversity of the samples and countries present in this study, compared with the usual developed country samples, these results support the acceptability of the psychometric performance of the IPAQ questionnaires.
Given the minimal contribution that walking and cycling pace made to reliability and validity, these pace questions have been removed from both the long and short versions of IPAQ. One further question was the issue of occupational activity, collected in detail in the long form, which may have contributed to the absolute differences between long and short forms. Excluding the job-related physical activity did not substantially influence correlations between long and short forms, suggesting that the short form questionnaires provided a global estimate of total physical activity, including a similar amount of job related activity.
Additional analyses conducted on the reliability and validity of the sitting questions led to the decision that only weekday sitting time needed to be included on the short form. This was based on reasoning that 5 weekdays tends to be more representative of sitting time than two weekend days and may permit better tracking of societal transitions in emerging economies as they adopt sedentary lifestyle patterns of the industrialized nations. However, the IPAQ long form retains both questions on weekday and weekend sitting time.
The reliabilities of the long and short forms were comparable, as were the “usual week” and “last 7 d” reference periods. The reliability of telephone administration was not very different to a self-administered method of data collection. Both the long and short form reliability testing showed evidence of a “learning effect” over time, where subjects who were administered the same IPAQ forms over serial visits showed improvements over time in reliability and inter-method agreement.
More IPAQ countries expressed a qualitative preference for using the short form as they seemed to be more acceptable to both investigators and survey respondents. However, it is clear that although some respondents found the long questionnaires difficult to answer, the data are reproducible and can provide reliable estimates for a range of physical activity domains. Furthermore, after testing, most sites indicated a preference for the “last 7 d” reference period for population prevalence studies.
A strength of IPAQ is that it was tested in both developed and developing countries, and demonstrated acceptable reliability and validity properties across both, especially in the urban samples. Limitations of this study include the generally volunteer samples from urban settings, albeit from diverse cultures. Only two centers, Sweden and Canada, used representative samples. Overall, the samples were highly active, given the selection effects in the samples used and also given the number of physical activity domains that the IPAQ measures consider. Compared with usual physical activity surveillance tools, such as the BRFSS, which measures mostly leisure time physical activity (LTPA), the IPAQ instrument assesses multiple domains of activity in addition to LTPA. This is needed for an internationally acceptable physical activity measure, especially in developing countries, but will lead to higher prevalence rates, as multiple domains are reported. Given this higher prevalence, new cut-points for “health” may need to be explored. The urban samples did show better reliability than in the two rural samples. This may have been due to educational differences and less experience in completing surveys, or to greater daily variability in physical activity patterns or variation in types of activity carried out among rural populations. There were also some differences in reported interpretation of the questions in different cultures, but this did not influence the observed measurement properties of IPAQ. Further work is also required to examine the absolute validity, especially between CSA and self-reported IPAQ data.
Thus, IPAQ can be used with confidence in developed countries or in urban samples in developing countries, but with some caution in rural or low literacy samples from developing countries. Further research is recommended to examine possible cultural or population differences in validity and reliability of IPAQ, as well as further exploration of any regional, gender, age, or socioeconomic differences. It is important to note that the primary target group for IPAQ was middle-aged adults and that IPAQ measurement properties in older adults or adolescents are not known.
The results of the IPAQ study are broadly relevant to a wide range of countries. The content validity of IPAQ is high, because frequency, intensity, and duration of physical activity are assessed, as well as sedentary behavior, which is an emerging concern. The long form evaluates four domains of physical activity (occupational, transport, household, and leisure) that are relevant for intervention planning. IPAQ is suitable for any mode of administration, and examples can be culturally adapted for local populations.
Based on the results of these qualitative and quantitative results, the IPAQ Executive concluded that: first, the IPAQ short “last 7 d” measure could be used for national and regional prevalence studies. To have internationally comparable prevalence studies, one measurement instrument should be used, and the short form IPAQ “last 7 d” is recommended based on participating country preference. The short form is feasible to administer, and there was no difference between the reliability and validity of the short and long IPAQ forms. Second, the long version of IPAQ could be used for research purposes or studies requiring more detail on the separate domains or dimensions of physical activity. Third, questions related to walking and bicycling pace should be excluded from the questionnaire, and fourth, time spent sitting should continue be an integral part of IPAQ. Finally, caution should be used when comparing population prevalence rates between the long and short versions, because the long version appears to produce higher estimates of physical activity.
This international study has demonstrated that reliable and valid physical activity data can be collected by the IPAQ instruments in many countries. These initial results are promising and suggest that these instruments are ready for use to compare population estimates of physical activity. The World Health Organization (WHO), WHO Pan-American Health Organization, the WHO Mega Country Project, and the European Union are developing international health monitoring projects and are likely to adopt short versions of IPAQ for use in these surveillance systems.
Note: The IPAQ short form “last 7 d” questionnaire is available for download at http://www.ipaq.ki.se; the current (August 2002) telephone administered version is provided in Appendix B.
We recognize the participation of the following persons in the planning and implementation of this project:
Members of the International Consensus Group for the Development of an International Physical Activity Questionnaire: Barbara Ainsworth, Adrian E Bauman, Hamadi Benaziza, Stephen Blair, Michael L. Booth, Cora L. Craig, Alana Diamond, W. Drygas, Ulf Ekelund, Peter Fentem, Shigeru Inoue, Deborah Jones, Toshihito Katsumura, Ilona Kickbusch, Vicki Lambert, Brian Martin, Victor Matsudo: Willem van Mechelen, Pekka Oja, Rimma Potemkina, Michael Pratt, Michael Sjöström, James F. Sallis, Ilkka Vuori, Alexander Woll, and Agneta Yngve.
IPAQ Reliability and Validity Study Group coordinators: Australia: Fiona Bull; Brazil: Victor Matsudo, Sandra Matsudo; Canada: Cora L. Craig, Storm J. Russell; Finland: Pekka Oja; Guatemala: Manuel Ramirez Zea; The Netherlands: Wilhelm van Mechelen; Japan: Toshihito Katsumura; Portugal: Jorge Mota, Luis Sardinha; South Africa: Vicki Lambert; Sweden: Michael Sjöström, Angeta Yngve, Ulf Ekelund; United States, San Diego: James F. Sallis, Jeanne Nichols; United States, South Carolina: Barbara Ainsworth; United Kingdom, Bristol: Mark Davis, Angie Page, Ashley Cooper; United Kingdom, Cambridge: Nicholas J. Wareham.