In epidemiological studies, the relationship between exposure to a risk factor and the occurrence of disease is the issue of interest. The assessment of risk factor status for individual subjects is not always straightforward. Variables such as sex and age will rarely cause problems, but it is very difficult to measure individual dietary habits. Another aspect of human behavior that is hard to assess precisely is physical activity. The various methods used to assess activity or energy expenditure differ both in their ability to estimate this variable and their suitability for use in epidemiological studies. The most valid methods, i.e., the doubly labeled water method or calorimetry, are not feasible for large populations. In large studies a questionnaire frequently is used, because this is a relatively easy and inexpensive method. However, the validity and repeatability of a questionnaire can vary, not only between questionnaires but also within different populations (11). Therefore, in pilot studies of a smaller group of subjects, the questionnaires must be compared with other methods of assessment of physical activity that are presumed to be of better quality, but are less suitable for large populations.
The European Prospective Investigation into Cancer and Nutrition (EPIC) is a prospective cohort study, conducted in seven European countries(16). The Dutch cohort consists of two subcohorts: a group of men and women from the cities of Amsterdam, Doetinchem, and Maastricht, and a group of women recruited from a breast cancer screening project in Utrecht. By means of a questionnaire, the subjects report on their normal food intake, their medical history, and lifestyle factors such as smoking, physical activity, etc. In a pilot study, 134 men and women aged 20-70 yr, and from all four participating cities, completed two questionnaires about habitual physical activity (in the preceding year). In addition, a 3-day activity diary was completed once a month for four months, and energy intake was assessed by means of twelve 24-h dietary recalls. The diary served as the main reference for assessing relative validity of the activity questionnaires. Results for the total pilot study population will be reported elsewhere(14,15). A subpopulation of 33 elderly women(aged 51-71 yr) from the city of Utrecht also wore a Caltrac accelerometer for 24 h.
In this paper, results of the repeatability and relative validity analyses are described for the group of elderly women, because the Utrecht subcohort differs in structure from the other subcohort, but will supply half of the Dutch subjects (i.e., approximately 20,000 women) for the EPIC study. Furthermore, the Caltrac measurements were only taken in this subpopulation.Table 1 lists the different assessment methods used, the periods of time they refer to, and the months of administration. Total duration of the pilot study was 13 months.
Thirty-five women, aged 51-71 yr, and recruited from a breast cancer screening project in Utrecht, the Netherlands, participated in the pilot study. All subjects signed an informed consent form. Two women who exhibited a marked change in activity pattern during the study period due to retirement were excluded from the analyses, because the alteration in activity pattern would falsely cause a negative effect on the repeatability results.Table 2 gives the characteristics of the remaining 33 women.
Physical Activity Questionnaires
Two physical activity questionnaires were completed by the subjects: a modification of the Baecke (2) questionnaire (called the Baecke questionnaire) and a newly developed one (called the preEPIC questionnaire).
The modified Baecke questionnaire (14) consisted of 19 questions in the categories work, sports, and leisure time. Three questions were added to the original questionnaire, concerning gardening, do-it-yourself activities, and sleeping. Most of the questions were coded on a five-point scale; a question about the subject's job had only three categories. The questionnaire yielded three scores: one for work activity, one for sports, and one for leisure-time activity. The scores were added together to obtain an overall activity score.
The pre-EPIC questionnaire was more detailed, consisting of 28 questions about rest, transport to and from work, activity at work, household activities, sports, and other activities. Subjects estimated the amount of time they spent on average on different activities within these categories. Literature sources were used to compute energy expenditure(8). The questionnaire was designed to cover all aspects of average daily life, so that the amount of time described in the questionnaire should add up to 24 h. In fact, most subjects (71%) covered only 15-19 h. Therefore, energy expenditure was corrected by multiplying it by (24 h/time covered). The assumption underlying this correction was that underestimation was equally divided over the activity categories.
Both questionnaires were administered three times during the 13-month study period: at baseline and at 5 and 11 months. This way, it could be evaluated whether reporting of “usual” activity is influenced by season.
An additional question about whether or not the subject had sweated or had been out of breath while active during the past week was asked because relationships between self-reported sweating and physical activity(18), and between sweating and a lower incidence of coronary heart disease (12), have been described. The answer was coded as 0 (never during the past week) and 1 (1 or more times).
A 3-day activity diary was administered four times in four consecutive months. On average, the days were evenly distributed over all 7 days of the week. The diary was based on the one developed by Bouchard et al.(4). The 3 days were divided into 5-min periods. For every period, subjects had to assign one of 10 letter codes that correspond to specific activities (e.g., W or S for walking or sitting). The letter T was used for training or sporting activities, the X for activities that did not correspond to one of the other letter codes. If T or X was used, subjects made a note to identify the activity. The aim of the diary was to estimate total daily energy expenditure. Energy expenditure per 24 h was computed using tables for activity-specific energy cost from literature(8). The original instrument was validated by Bouchard et al. (4), who found that it was a procedure suitable to estimate energy expenditure in population studies. In a review article(9) LaPorte et al. state that: “Although diary procedures using published values of activity intensity may not provide accurate estimates of caloric expenditure, they seem to be adequate to rank-order individuals according to overall activity levels.”
For the whole pilot study, the activity diary was chosen as the reference method because it is a recording method; the questionnaires were recall methods covering the past year. Thus, the errors of the two methods were assumed to be relatively free of correlation.
The Caltrac (Hemokinetics, Madison, WI) is a portable accelerometer that can be attached to the waist belt. It measures accelerations of the whole-body center of gravity. After height, weight, age, and sex are entered into the device, accelerations of the body are transformed into kilocalories. The Caltrac is programmed to calculate the resting metabolic rate (RMR) in kcal·min-1 using the formula (for women): RMR = (331 × weight (pounds) + 351 × height (inches) - 352 × age + 49854)/100,000 (formula copyrighted by Hemokinetics, Inc.). In this study, the Caltracs were programmed with height, weight, age, and sex of the subjects, thus yielding total amount of kilocalories spent in 24 h. The kilocalorie score divided by RMR can be interpreted as activity counts(1), which are presented in the tables.
Several studies have been conducted to validate the Caltrac. It has been shown to be valid for detecting interindividual differences in energy expenditure, but for estimation of individual activity, the results are not conclusive (3,13). In this study a single 24-h Caltrac reading was used.
Caltracs were worn by different participants on different days of the week, because the activity diary did also contain all days. In this way, on a group level, energy expenditure was comparable for both methods. At night, the Caltracs were detached from the waist, but they counted resting metabolism during the time they were not worn. The study design did not permit letting the subjects wear the Caltracs on Sundays, nor to register more than 1 day.
Because the Caltrac is a mechanical device, it is a purely objective method; therefore, the error of this measurement presumably will not correlate with that of the other two activity measurements.
Mean daily energy intake was assessed using 12 monthly 24-h dietary recalls, distributed equally over all days of the week. Assuming body weight has remained stable throughout the relevant year, mean energy intake should be approximately equal to mean total energy expenditure. In this study, body weight was measured five times and there was no trend in the direction of eigher a decreased or an increased mean body weight.
Means and standard deviations were calculated for all measurements. For assessment of reproducibility and relative validity, Pearson's correlation coefficients were calculated to evaluate the strength of linear relationships between variables. Subjects were classified as active and inactive for each measurement, using the mean score as a boundary. Percentage of agreement and kappa (percentage of agreement corrected for chance) was computed for every combination of methods.
Because physical activity is a complex behavioral entity, factor analysis was carried out to investigate whether the various measurement methods estimated the same facet of activity, or different aspects. The outcomes for the group of subjects who reported sweating or being out of breath, and the group who did not, were compared after correcting for age differences.
In Table 1 means and standard deviations for the activity measurement methods are presented. Energy expenditure as given by the activity diary was significantly higher than energy intake computed from the pre-EPIC questionnaire and from dietary recall (paired t-test,P < 0.05). The repeatability of the questionnaires for this group of women (Table 3) showed higher correlations for the Baecke than for the pre-EPIC questionnaire.
Table 4 presents correlation coefficients among the various measurement methods. Correlations varied widely, from -0.43 to 0.64. The two questionnaires and the diary intercorrelated reasonably well(0.45-0.64), whereas the Caltrac showed low-to-moderate correlations with the other methods. The pre-EPIC questionnaire and, to a lesser extent, the Baecke questionnaire and the diary, showed negative correlations with energy intake. In the same table, percentages of agreement and Cohen's kappa (percentage of agreement corrected for chance) between the methods are presented, showing roughly the same pattern. However, while the correlations of the Caltrac with the questionnaires and the diary were rather low, the agreement of the Caltrac with these variables was better.
To find out whether these different activity measures either represented different facets of physical activity or measured some common aspects, a factor analysis was performed using the five measurement methods as variables. When one of the five measurements was missing, the subject was not included in the factor analysis, leaving 27 women. The analysis extracted two factors by principal-components analysis. Factor 1 accounted for 47.4% of total variance, and factor 2 for 23.6%. Thus, 71% of total variance could be explained by the two factors together. Table 5 shows the rotated factor matrix. The coefficients in the factor matrix can be interpreted as the correlations between the variables and the factors. Factor 1 exhibited a strong correlation with the questionnaires and the diary, and moderate correlations with the Caltrac and energy intake. Factor 2 correlated strongly with the Caltrac reading and energy intake.
Women who reported perspiring or having been out of breath had higher Baecke scores than women who did not (Table 6). The difference between the two groups turned out to be concentrated in the work score of this questionnaire; the differences in the sports and leisure-time scores were not significant. The other measurement methods showed no significant differences.
Physical (in)activity is an important issue when it comes to studies about nutrition, lifestyle, and the occurrence of chronic diseases, whether it is regarded as a confounder in the relationship between diet and disease or is studied as a risk factor. Cardiovascular disease (12), several forms of cancer (17), diabetes(6), and osteoporosis (5) are examples of diseases that probably are related to physical inactivity.
Depending on the goal of the study, a physical activity assessment method should measure the individual energy expenditure, or at least rank subjects correctly according to their physical activity. In most large-scale studies a questionnaire is used to estimate physical activity because it is an inexpensive and relatively simple method.
In the Dutch EPIC pilot study two questionnaires proposed for use in the cohort study were tested for repeatability and validity in a population of 134 men and women from both subcohorts. Reproducibility for the modified Baecke questionnaire for men was 0.85 at five months and 0.80 at 11 months; for women, 0.83 and 0.77, respectively. Pearson's correlations with the activity diary were 0.56 for men and 0.44 for women (14). Reproducibility of the pre-EPIC questionnaire was 0.86 (five months) and 0.80(11 months) in men; in women it was 0.62 and 0.68, respectively. Correlations with the activity diary were 0.70 for men and 0.60 for women(15). Here the results for 35 subjects from one subcohort are described, because this cohort is an older group consisting only of women. It may be possible that the validity of the questionnaires is different for different populations. Moreover, the Caltrac was only used in this subgroup.
Repeatability of the Baecke questionnaire was good (0.82 and 0.73 at 5 and 11 months, respectively); the results for the pre-EPIC questionnaire were poor-to-moderate (0.42 and 0.60). Answers of the Baecke questionnaire are given on a five-point scale. It is possible that answers on such a scale are more easily reproduced (unintentionally) than the answers of the pre-EPIC questionnaire, which are estimates of time spent on different activities. For the pre-EPIC questionnaire, 11-month repeatability was much better than 5-month. The 11-month repeatability improved to 0.80 after exclusion of one outlier of the October 1992 pre-EPIC questionnaire. The very low 5-month repeatability may partly be caused by seasonal influences. The lower 5-month reproducibility of the pre-EPIC questionnaire was less outspoken in the total female pilot study population. Maybe, for elderly women, seasonal activities(walking, cycling) are more important. On the other hand, since the influence was not seen in men and not for the other questionnaire, it is possible that in such a small population the low correlation was due to chance.
Jacobs et al. (7) tested 10 questionnaires for repeatability in a population of 78 men and women aged 20-59. Most 1-month test-retest correlation coefficients exceeded 0.75. Assessment of the long-term repeatability (1 yr) of two questionnaires resulted in lower correlations of 0.30-0.71. Wolf et al. (19) evaluated the repeatability of the questionnaire on physical activity and inactivity used in the Nurses' Health Study II in a representative (white) and an Afro-American sample. Two-year repeatability coefficients were 0.59 and 0.39, respectively.
Both the Baecke and the pre-EPIC questionnaire correlated reasonably well with the activity diary (r = 0.51 for the Baecke questionnaire and 0.64 for the pre-EPIC). Unlike the repeatability results, the pre-EPIC questionnaire had the higher correlation. This can be partly explained by the fact that the same literature source was used to calculate energy expenditure per activity for both the pre-EPIC questionnaire and the diary. Both questionnaires correlated badly with the Caltrac reading (r = 0.2). To get an estimation of individual usual activity, only one 24-h measurement is not sufficient(11). However, the design of the study, with the already heavy schedule, did not permit to make more Caltrac recordings. It was expected that when the Caltrac reading was compared with the activity diary for the same 24-h period, there would be a better relationship. In fact, the correlation improved only marginally, to 0.45 (P < 0.05). Only classification into high and low levels of activity showed agreement between the questionnaires and the Caltrac. The poor results are more likely caused by the imperfect Caltrac measurement than by the fact that the questionnaire does not measure physical activity objectively, and the Caltrac does. Energy intake correlated negatively with most of the other activity measurements. Probably, women who are more concerned about their health eat less and are more active, or underreport energy intake more than others.
Correlations between reference methods and activity questionnaires vary in the literature from low to high. Jacobs et al. (7) used several reference methods for assessing validity of 10 physical activity questionnaires (including wearing the Caltrac for 28 days). Most correlations between the questionnaires and the reference methods were low to moderate. Recently, Miller et al. (10) compared five activity questionnaires with a 7-day Caltrac measurement and found correlations of 0.25-0.79; 0.79 for a 7-day activity recall for the same days the Caltrac was worn. In the study of Wolf et al. (19) validity of the questionnaire on physical activity and inactivity used in the Nurses' Health Study II was assessed using four 7-day recalls and diaries. Correlations ranged from 0.41 to 0.83.
The factor analysis extracted two factors. Factor 1 was mainly related to the Baecke questionnaire, the pre-EPIC questionnaire, and the diary, and to a lesser extent to the Caltrac reading and energy intake (negatively). This factor accounted for almost 50% of the total variance. Probably this factor represents normal physical activity. The interpretation of the second factor is not very clear. It shows rather strong correlations with the Caltrac reading (short-term physical activity) and with energy intake (12 recalls over a year). If a three-factor analysis is performed instead of a two-factor, this second factor is split into two factors, one correlating very strongly with the Caltrac reading (0.94) and one correlating with energy intake (0.93). Probably, this second factor does not represent a separate aspect of physical activity, but rather a kind of rest group. The factor analysis results might tell us that the three measurements that showed the highest intercorrelations, i.e., the two questionnaires and the diary, indeed measure a common entity, namely usual physical activity. The other two methods have a weaker relationship with this usual activity, and probably represent other aspects of energy expenditure.
The Baecke questionnaire was the only method that showed a significant difference between women who reported perspiring or being out of breath in the past week. The other measurements showed very small contrasts between the two groups. Maybe the reporting of perspiration or panting is more subjective than it seems, and therefore correlates more strongly with the Baecke questionnaire, which is also rather subjective. However, it is not clear why the difference was only seen in the work section of the questionnaire.
The coefficients of repeatability and relative validity found in this study were within the range of values reported by other authors. The relatively low correlations found in all studies illustrate the difficulty in measuring physical activity by questionnaire. Another problem of such studies is the absence of a real and applicable gold standard. A problem in comparing various studies is the difference in methodology and time frame.
The results of this study indicate that these questionnaires can be used to rank older women into categories of relatively high and low physical activity. The Baecke questionnaire is completed quickly and easily, whereas the pre-EPIC questionnaire is more detailed and probably gives a better estimation of energy expenditure.
1. Ainsworth, B. E., D. R. J. Jacobs, and A. S. Leon. Validity and reliability of self-reported physical activity status: the Lipid Research Clinics questionnaire. Med. Sci. Sports Exerc.
2. Baecke, J. A. H., J. Burema, and J. E. R. Frijters. A short questionnaire for the measurement of habitual physical activity in epidemiological studies. Am. J. Clin. Nutr.
3. Balogun, J. A., D. A. Martin, and M. A. Clendenin. Calorimetric validation of the Caltrac accelerometer during level walking.Phys. Ther.
4. Bouchard, C., A. Tremblay, C. Leblanc, G. Lortie, R. Savard, and G. Theriault. A method to assess energy expenditure in children and adults. Am. J. Clin. Nutr.
5. Gutin, B. and M. J. Kasper. Can vigorous exercise play a role in osteoporosis prevention? A review. Osteoporos. Int.
6. Helmrich, S. P., D. R. Ragland, and R. S. Paffenbarger, Jr. Prevention of non-insulin-dependent diabetes mellitus with physical activity. Med. Sci. Sports Exerc.
7. Jacobs, D. R. J., B. E. Ainsworth, T. J. Hartman, and A. S. Leon. A simultaneous evaluation of 10 commonly used physical activity questionnaires. Med. Sci. Sports Exerc.
8. James, W. P. T. and Schofield, E. C. Appendix 4.2: Energy cost of physical activity classified in alphabetical order. In: Human Energy Requirements
. Oxford: Oxford University Press, 1990, pp. 133-135.
9. Laporte, R. E., H. J. Montoye, and C. J. Caspersen. Assessment of physical activity in epidemiologic research: problems and prospects. Public Health Rep.
10. Miller, D. J., P. S. Freedson, and G. M. Kline. Comparison of activity levels using the Caltrac accelerometer and five questionnaires. Med. Sci. Sports Exerc.
11. Montoye, H. J., H. C. G. Kemper, W. H. M. Saris, and R. A. Washburn (Eds). Measuring Physical Activity and Energy Expenditure
. Champaign, IL: Human Kinetics, 1995, pp. 43-62, 72-96.
12. Morris, J. N. Exercise in the prevention of coronary heart disease: today's best buy in public health. Med. Sci. Sports Exerc.
13. Pambianco, G., R. R. Wing, and R. Robertson. Accuracy and reliability of the Caltrac accelerometer for estimating energy expenditure. Med. Sci. Sports Exerc.
14. Pols, M. A., P. H. M. Peeters, H. B. Bueno de Mesquita, et al. Validity and repeatability of a modified Baecke questionnaire on physical activity. Int. J. Epidemiol.
15. Pols, M. A., P. H. M. Peeters, M. C. Ocke, et al. Relative validity and repeatability of a new questionnaire on physical activity. Prev. Med.
16. Riboli, E. Nutrition and cancer: background and rationale of the European Prospective Investigation into Cancer and Nutrition(EPIC). Ann. Oncol.
17. Sternfeld, B. Cancer and the protective effect of physical activity: the epidemiological evidence. Med. Sci. Sports Exerc.
18. Washburn, R. A., S. R. W. Goldfield, K. W. Smith, and J. B. McKinlay. The validity of self-reported exercise-induced sweating as a measure of physical activity. Am. J. Epidemiol.
19. Wolf, A. M., D. J. Hunter, G. A. Colditz, et al. Reproducibility and validity of a self-administered physical activity questionnaire. Int. J. Epidemiol.