Improving Self-Reports of Active and Sedentary Behaviors in Large Epidemiologic Studies : Exercise and Sport Sciences Reviews

Secondary Logo

Journal Logo

Perspective for Progress

Improving Self-Reports of Active and Sedentary Behaviors in Large Epidemiologic Studies

Matthews, Charles E.1; Moore, Steven C.1; George, Stephanie M.2; Sampson, Joshua3; Bowles, Heather R.2

Author Information
Exercise and Sport Sciences Reviews: July 2012 - Volume 40 - Issue 3 - p 118-126
doi: 10.1097/JES.0b013e31825b34a0
  • Free


Passmore and Durnin (34) painstakingly developed a methodology to estimate energy expenditure in free-living humans, involving direct observation, time diaries, and metabolic measures. This approach was simplified in the course of developing physical activity questionnaires that were designed to examine the relation between usual physical activity levels and disease in large-scale epidemiologic studies (29). These questionnaires, which typically rely on long-term recall to estimate usual levels of exposure, have been invaluable in demonstrating the numerous health benefits of physical activity (36) and, more recently, the adverse effects of sedentary behaviors (33). Yet, the questionnaires used in these studies are likely to contain substantial measurement error and, in terms of physical activity, at best, only capture 50% of the variation in objectively measured activity energy expenditure (31).

Measurement errors in prospective epidemiologic studies usually attenuate or reduce the magnitude of observed behavior-disease associations, resulting in a loss of statistical power for the hypothesis being tested (39). Furthermore, quantitative measures of the amount (or dose) of exposure associated with either benefit (physical activity) or risk (sedentary time) may be biased because of these errors (41). If the errors are sufficiently large (15), measurement error could pose considerable challenges for translating results from epidemiologic studies to physical activity guidelines that inform health promotion efforts and public policy.

In this article, we focus on the measurement error problem for self-reports of “usual” levels, of active and sedentary behaviors in studies designed to provide quantitative estimates of health risks associated with a given level of these exposures. We use the term usual to indicate a long-term average dose or volume of these behaviors (e.g., >1 yr), and make a distinction between self-report methods that use long-term recall and averaging to estimate usual behavior (i.e., questionnaires) and methods that use short-term recall of behavior to estimate usual levels of activity or sedentary behavior. Given their limited ability to evaluate dose-response relationships, we do not consider questionnaires that were designed only to classify individuals into broad categories of activity, (e.g., instruments such as the Lipid Research Clinics and Stanford Usual Activity Questionnaires).

In the first section of the article, we review the strengths and limitations of existing questionnaires that commonly assess usual levels of active and sedentary behaviors. Next, we describe the consequences of measurement errors in these questionnaires in epidemiologic studies and then consider the available options for minimizing these consequences and/or reducing the level of error in the exposures by using better measures. In the final section, we discuss the potential use for short-term recalls to provide less error-prone estimates of usual levels of exposure in large-scale epidemiologic studies.

Strengths and Weaknesses of Physical Activity and Sedentary Behavior Questionnaires

There is ample evidence from observational studies that questionnaire-based physical activity measures are associated with reduced risk for many chronic diseases such as diabetes, cardiovascular disease, and osteoporosis, as well as certain cancers (e.g., colon, breast, and endometrial) (30,36). In addition, relative to a broad range of biological (e.g., fitness, fatness), objective (doubly labeled water, accelerometers), and other self-report (e.g., diaries) comparison measures, there is evidence that many physical activity questionnaires are able to capture valuable information (45). Results from these studies suggest that many questionnaires can provide a useful ranking of active or sedentary behaviors, but their major limitation is that the level of error in quantifying dose or absolute volume is large.

Reporting errors in assessments of active and sedentary behaviors emanate from misreporting of two basic elements of dose: 1) the usual duration of the behaviors reported, or 2) the intensity of the activities reported (34) in relation to relevant exposure metrics (e.g., metabolic equivalents (METs), bone loading) (3,14). For the sake of simplicity, we will consider errors in duration and intensity separately, although we recognize that errors in determining intensity can affect the errors in duration. In general, the approach to assessing the usual amount of time spent engaged in specific types of behavior has been to directly ask about the usual duration (per week or per day) of the activity or to use a decomposition strategy that asks for information about activity frequency (i.e., number of months, days per week) and duration (average time per occasion) separately. Reporting errors in one or both of these decomposed elements can result in large errors in the estimate of usual duration. Interestingly, Passmore and Durnin (34) were keenly aware of the importance of obtaining accurate duration estimates in their measures: “In estimating the expenditure of any individual, it is our experience that larger errors are likely to arise from the failure to determine correctly the length of time spent in any activity rather than in any assessment of the metabolic cost of that activity.” Doing more to reduce the magnitude of the errors in reported duration in active and sedentary behaviors may be one opportunity to substantially reduce the errors in our measurements.

To consider the influence of activity intensity on health, reports of the usual activity duration are combined typically with standard intensity values, such as METs or bone loading units (14), to estimate a duration-intensity weighted metric for the activities reported (e.g., MET — hours per day). It is recognized that intensity values may not reflect the relative intensity of the activity performed and that, for many activities, there can be a large interindividual and intraindividual variation in the physiological effects of a given activity (2,41). This latter caveat may be exacerbated for questionnaire items that ask about a broad range of activities (e.g., household chores) or that use physiological cues to help classify the energy cost of the activity (e.g., increased heart rate, sweating). Analytic errors in the intensity components may arise from errors that emerge when a fixed tabular value (e.g., a MET value) is applied to an individual’s report of an activity, whereas reporting errors in intensity arise when respondents misclassify a behavior in the wrong intensity category (e.g., reporting a light activity as moderate). Reducing intensity-reporting errors also may be an important approach to reducing overall measurement errors in self-report instruments.

The Cognitive Demands Involved in Reporting Long-Term Averages are Extraordinary

Reporting autobiographical information on a questionnaire about usual participation in active and sedentary behaviors forces respondents to retrieve and organize a great deal of information to formulate a response (27). It long has been known that vigorous activities (often, more structured exercise) tend to be more reliably reported than moderate-intensity activities (37,45) and that other lower intensity daily activities (e.g., nonexercise activity), often performed in several short bouts within a day, are the least reliably reported. Indeed, questions about household activities were dropped from early questionnaires because of the difficulties associated with reporting them (29). A striking example of the challenges associated with reliably assessing common daily activity was observed by DiPietro et al. (12) in her examination of the test-retest reliability of the Yale Physical Activity Survey. Figure 1 illustrates that test-retest reproducibility (i.e., reliability), indicating the ability of respondents to provide consistent answers for specific activities on the questionnaire, is best for less frequent activities done in specific episodes and worst for the most prevalent daily activities (27). Instruments to assess sedentary behaviors are starting to appear and, consistent with physical activity, more structured sedentary behaviors seem to be more reliably reported (17).

Figure 1:
Reproducibility and prevalence of reporting specific activities on the Yale Physical Activity Survey. Adapted from data presented in (12).

Studies using advanced activity monitors provide insight into the magnitude of the cognitive demands associated with reporting usual levels of activity, particularly common daily activities. Levine et al. (24) recently reported that adults engaged in an average of 47 bouts of active and sedentary behaviors each day and that the average amount of time spent upright and ambulatory was about 6.5 h per day, mostly accumulated in short bouts of activity. Assuming these estimates are representative for adults, to report literally what they usually did over 1 month, a respondent would have to cognitively process information about 1400 bouts of activity and nearly 200 h of active time. Clearly, the cognitive demands are staggering, and, thus, it is not surprising that errors in reporting physical activity by questionnaire, particularly common daily activities, are common.

Measurement Error in Questionnaires Attenuates Behavior-Disease Associations

Studies that concurrently have evaluated risk for mortality associated with low levels of objectively measured physical activity energy expenditure and activity reported by questionnaire have indicated that associations with measured activity energy expenditure are much stronger than those obtained by self-report. Manini et al. (26) examined mortality outcomes in relation to physical activity energy expenditure measured by doubly labeled water (DLW) among older adults and noted nearly a 70% reduction in risk among the most active participants as measured by DLW but without association with self-reported activity. In addition, studies that have measured cardiorespiratory fitness as well as physical activity reported using a questionnaire have indicated that associations with objectively measured fitness are consistently stronger than those with self-reported physical activity (8). Collectively, these data are consistent with the notion that measurement errors in physical activity questionnaires attenuate the strength of associations and indicate that the impact of the errors may be substantial. Although we know less about the potential measurement error in reported sedentary behaviors, it is likely that attenuation due to error may obscure these associations as well.

Although attenuation of the strength of the true associations between active and sedentary behaviors and disease often are discussed as a limitation in etiologic studies, the actual level of attenuation is unknown. Measurement error models can quantify these effects. Here, we introduce a simple model to describe these errors and use information derived from the model to assess impact of random errors on epidemiologic associations (i.e., attenuation). To quantify these parameters, and the magnitude of the attenuation, consider the simple model where Qi is an unbiased estimate of the true value (Ti) for individual i. The additional term ([Latin Small Letter Open E]i) is random error with a mean of 0 and variance (σ[Latin Small Letter Open E]2).

For example, a study might be interested in testing the hypothesis that time spent sitting and watching television is associated with increased risk for endometrial cancer. Investigators would use a questionnaire to estimate the true amount of exposure (Ti) but with some level of random error. The questionnaire-based estimate of television viewing (Qi) would then be used to quantify any association with this health outcome. If the level of random error in questionnaire is small, then Qi is a good approximation of Ti, or the true amount of sitting and watching television and any real signal between television and endometrial cancer would be observable. However, if the amount of random error on the questionnaire was large, say 100% of the true value, then the questionnaire would provide a poorer approximation of Ti, and the signal between television watching as measured by questionnaire and the outcome would be obscured by the “noise” associated with random errors. In this simple model, the amount of attenuation of the true behavior-disease association that is due to measurement error in the questionnaire can be quantified as an attenuation factor (4,22). Specifically, the attenuation factor (λ) is defined as follows:

where the variance of the true measure is σT2 (4). When the measurement errors are very small, the attenuation factor is close to 1.0, but as these errors increase, the attenuation factor typically gets smaller, as does the strength of the associations that can be observed.

As an approximation, if we let the relative risk (RR), or the risk for disease comparing high with low levels of an exposure, denote the strength of the underlying association between the true exposure (Ti) and the outcome, then the magnitude of the RR that is observable with the questionnaire can be estimated as RRλ (4). Therefore, if the attenuation factor is 0.5, and the true RR for endometrial cancer is increased 1.20 times for each additional hour of television viewing, we would observe a RR of only 1.10 using the questionnaire (i.e., RR = 1.200.5 = 1.10). Similarly, if the true RR for television viewing and heart disease is 4.0, we would observe a RR of only 2.0 using the questionnaire (i.e., RR = 4.00.5 =2.0).

In addition to random error, self-reports also can include systematic errors or biased reports of active and sedentary behavior, and these errors can further decrease the attenuation factor and can quickly reduce the magnitude of the relative risks that are observable in etiologic studies to an undetectable value.

Improving Self-Report Measures and Obtaining More Accurate Behavior-Disease Associations

In Figure 2, we present three basic options for reducing the impact of measurement errors in epidemiologic research on physical activity and health. The first uses statistical methods to quantify and correct for errors in questionnaires, whereas the latter options reflect exposure assessment methods that simply are less error prone. The options are as follows: 1) use measurement error correction methods to minimize the impact of reporting errors on questionnaires (42), 2) use objective indicators of active and sedentary behaviors to eliminate reporting errors, or 3) use short-term recalls to reduce the magnitude of the reporting errors in estimates of usual levels of behavior. Hybrids of these basic options also are possible. For example, a calibration study outlined in Option 1 (see next section) also could be applied to Option 3 to adjust for random and systematic errors present in short-term recalls (32), and measurement error correction approaches also could be applied to minimize intraindividual error in activity monitor data (46). In the remainder of the report, we discuss the problems and prospects associated with the three basic options outlined in Figure 2.

Figure 2:
Options for improving measures of activity-related behaviors and obtaining better estimates of true behavior-disease associations

Option 1. Use Measurement Error Correction to Minimize Impact of Errors in Questionnaires

The first option is to evaluate the measurement error in questionnaires that assess usual levels of active and sedentary behaviors through a calibration study and then adjust the strength of the associations observed using measurement error correction methods, for example, (21,32,42).

The calibration study measures the level of relevant behaviors on a small subset of study participants with a reference instrument, which is presumed to be more accurate than the questionnaire used in the larger study. With this information, we can reconstruct an estimate of the true effect size from our study. In the simplest case described earlier (equation 1), we could estimate the true RR by exponentiation of the naive RR using the inverse attenuation factor (1/λ). However, usually, such reconstruction requires more complex measurement error models. Here, we expand equation 1 to accommodate this complexity. General “activity-related” bias, or systematic errors that are expressed over the range of the exposure, can be accounted for by including an intercept β0 and a slope β1 term to describe the relation between the questionnaire (Qi) and the true value derived from a reference measure (Ti). Examples include the following:

Although, each individual, by definition, must continue to have only a single true value of usual exposure, they might receive a questionnaire at multiple time points. Therefore, we require an additional subscript and let Qij be the questionnaire value reported for individual i at time j. When this occurs and multiple measurements are taken on each individual, it is possible to estimate systematic reporting errors within the same individual over time (i.e., “person-specific” bias, ri) (22,32). For example, individual i may underestimate consistently her true time sitting and watching a television on the questionnaire. We now relate the questionnaire value(s) and the true value for individual i by the following (22):

We generally assume that ri follows a normal distribution with mean 0 and variance σr2. The attenuation factor resulting from the above model would be

Close inspection of the model in equation 4 reveals that the quantities derived for two of the three error terms estimated for Q (i.e., activity-related and person-specific biases) are dependent on the value of the reference measure, which is taken to be an unbiased estimate of the true value (Ti). Although the reference measures commonly used in physical activity studies, such as physical activity monitors and DLW, can provide insight into the ability of self-report instruments to rank-order individuals, greater scrutiny of these methods — and the questionnaires against which they are compared — is necessary in the context of estimating the bias terms in measurement error models.

If systematic errors are present in the reference measures, then the instrument may not provide accurate estimates of the bias terms in the model and, thus, may not provide accurate estimates of validity of the instrument or the attenuation factors derived from the results. For example, the first-generation physical activity monitors that used 1-min epoch data and linear regression calibrations to estimate energy expenditure performed well in laboratory studies of walking and running, but they clearly underestimated the energy cost of many common daily activities requiring less ambulation, such as household chores (28). Consistent with this finding, recent comparisons against DLW indicate that this class of accelerometers may underestimate physical activity energy expenditure by at least 10% (e.g., (10,20)). Results from studies that use this class of activity monitors should be interpreted accordingly. Considerable progress is being made in the assessment of common daily activities by accelerometer (e.g., (16,43)), and we are hopeful that studies in free-living subjects will demonstrate that the accuracy of these devices will improve sufficiently to meet the requirements of a valid reference measure in this context. New devices that measure body position and sedentary behavior with better accuracy seem to be promising options and should be evaluated for this purpose (e.g.,(17,21)).

After accounting for resting metabolism and dietary thermogenesis, DLW can be used to estimate the average level of physical activity energy expenditure, and many consider this method to be the best available reference measure of overall physical activity energy expenditure. However, there is an important caveat for using this method in the context of measurement error modeling from questionnaires of usual physical activity levels. DLW is an integrated measure of the energy expenditure resulting from all of the different activity behaviors that participants engage in during the measurement period. In contrast, most questionnaires assess only a select subset of activities generally believed to contribute most to overall physical activity energy expenditure. Neilson et al. (31) recently showed that most questionnaires substantially underestimated activity energy expenditure in comparison to DLW, most likely because they fail to assess common daily activities that contribute to overall energy expenditure. Thus, potential differences in the scope of the activities assessed by questionnaires and DLW estimates of overall physical activity energy expenditure warrant careful consideration when using DLW as a reference measure to quantify the error structure in the self-reports of physical activity.

The recent focus on the adverse health effects of sedentary behaviors (33) have highlighted the need to measure sedentary behaviors in etiologic studies (33). Although time spent sitting is associated with reduced physical activity energy expenditure (25), the inability of DLW to quantify time spent in sedentary behaviors directly suggests that a measure of energy expenditure may not be a suitable reference measure in calibration studies designed to determine the error structure of sedentary behavior questionnaires. The next generation of physical activity monitors, which assess body position directly, may be required for this purpose (e.g., (18,23)).

In summary, implementation of calibration studies and measurement error correction methods to estimate the error structure of questionnaire-based estimates of usual behavior and adjust risk estimates for attenuation may be a valuable approach for future epidemiologic investigations. When the assumptions of the method are met, they offer an opportunity to more accurately estimate the true magnitude of association between physical activity and the health outcomes of interest.

Option 2. Use Objective Indicators of Behavior to Eliminate Reporting Errors

One attractive option for dealing with errors associated with self-report would be to completely eliminate this source of error by opting to use objective indicators of behavior rather than self-report instruments. We use the phrase “objective indicators of behavior” to describe measurements derived from physical activity monitors, which measure body motion and/or position to make inferences about behavior, and DLW, which can measure physical activity energy expenditure resulting from time spent in different behaviors (11,26). The major strength of these measures, of course, is that errors associated with self-report are completely removed, the analytic errors inherent in the measures are relatively low (e.g., laboratory error for DLW, technical reliability of accelerometers), and, accordingly, the level of attenuation in the associations observed would be expected to be greatly reduced (11) (Fig. 2). However, as noted previously, accelerometer data also can contain systematic and random measurement error, and a single DLW assessment is subject to error-associated intraindividual variation. An additional limitation of using objective indicators of behavior alone in large studies is the general absence of contextual information provided by the measures. Contextual information may include insight about the type of activity (e.g., aerobic vs strengthening activities), as well as information about the behavioral setting within which participants engage in a given behavior (e.g., at home or work, sitting in a car). Key scientific questions of public health importance relate as much to the amount of a behavior as the context within which the behavior occurs. The value of contextual information cannot be underestimated because this data element facilitates translation of the evidence for specific behavior-disease associations to health interventions and to public policy.

The relatively higher cost and logistical demands associated with implementing objective measures in large-scale studies also can limit the use of these methods. Objective measures have been extremely valuable in providing new insights into physical activity and health in small to moderate-sized studies (e.g., (24,26)), but in very large studies designed to examine rare health outcomes such as cancer (40), cost and feasibility often remains a limiting factor. For these reasons, reliance on objective indicators of behavior alone is not always the best measurement option, particularly in studies that seek to understand the context in which active and sedentary behaviors occur and in very large studies where costs associated with activity monitoring are more difficult to manage.

Option 3. Use Short-Term Recalls and Reduce Reporting Errors in Behaviors

This approach to improving self-reported measures of active and sedentary behaviors is to use a more accurate and detailed self-report instrument that is capable of reducing the magnitude of the errors in the information reported (Fig. 2). The application of measurement error correction models can minimize further the impact of random error, as well as systematic errors if a calibration study is conducted with valid reference measures (32) (i.e., a hybrid combining Options 1 and 3; Fig. 2). Given the cognitive demands associated with reporting usual activity levels via questionnaires, significant advances in reducing reporting errors in these questionnaires seem unlikely. The question is whether there are other more accurate self-report methods that might be considered.

Following the lead of nutritionists (39) and time use researchers (7), multiple 24-h recalls could be used to improve assessment of active and sedentary behaviors. Because they generally have been assumed to be less error prone and more detailed, short-term recalls have been used commonly in energy requirement studies (34) and to examine the measurement properties of physical activity questionnaires (e.g., (19)). An important advantage of short-term recalls is that they rely more extensively on the recollection of specific behaviors/events using episodic memories, whereas questionnaires of usual behaviors often force respondents to rely on generic memories of past events and to use estimation strategies to report past behavior (27). Among time use researchers, there is some consensus that short-term recalls are a preferred method of capturing information about the kinds of unstructured common daily behaviors (e.g., housework) that traditionally have proven the most difficult for physical activity researchers to measure (7).

In particular, short-term recalls have the potential to reduce errors in the duration of the activities reported as compared with estimates derived from questionnaires of usual levels of exposure. For example, by reducing the recall interval on the previous day to specific segments within the day (e.g., morning, afternoon, and evening), short-term recalls begin to limit the scope of allowable reporting errors (5). If a respondent is allowed to report more specifically the duration of the individual bouts of active or sedentary behavior they engage in, rather than daily totals, then the information provided can be tallied by the data collection system, which should further reduce mathematical errors in the reporting process. Thus, a major advantage of short-term recalls may be their ability to rein in errors in estimating the duration of active and sedentary behavior on days for which the reports are provided.

Use of Short-Term Recalls of Active and Sedentary Behaviors in Epidemiologic Studies

Over the last decade, 24-h physical activity recalls (24PARs) have been administered by phone in a number of studies, the results of which provide insight into the potential utility for their use in etiologic studies. A study among middle-aged adults found that 24PARs were correlated with accelerometer measures of physical activity and that only two to three 24PARS were required to achieve reasonable correlations (32) with a questionnaire that previously had been found to explain 45% of the variance of physical activity energy expenditure as measured by DLW (35). In a study of postmenopausal women that compared seven 24PARs with DLW measures over 14 days, no significant differences in physical activity energy expenditure between measures were found, and reporting errors were not associated with body mass index or social desirability (1). Cabalaro et al. (9) compared estimates of total energy expenditure (kcal/d) and time spent in moderate-vigorous activity from two different pattern recognition activity monitors to similar metrics derived from the 24PAR. The 24PAR-based estimates of total energy expenditure were not significantly different from, and were highly correlated with (r ∼ 0.9), expenditure from the monitors. Correlations for moderate-vigorous activity duration were lower but still relatively high (r ∼ 0.6). Results from a recent study that used the 24PAR are consistent with objective monitoring studies indicating that adults spend little time in moderate-vigorous activity, the majority of their time in sedentary behaviors, and a considerable amount of time in light activity, suggesting that short-term recalls may be particularly useful in gathering information about sedentary behaviors and common daily activities (Fig. 3). Collectively, this series of studies and other recent reports (44) using similar methods suggest that there may be considerable utility in using short-term recalls of active and sedentary behaviors in epidemiologic studies.

Figure 3:
Allocation of active and sedentary time during waking hours in adults via short-term recalls. Adapted from data presented in (Matthews CE, Ainsworth BE, Hanby C, et al. Development and testing of a short physical activity recall questionnaire. Med Sci Sports Exerc. 2005;37(6):986–94).

Obstacles to Using Short-Term Recalls in Large Epidemiologic Studies

Although short-term recalls, such as diaries or previous day recalls, generally are considered to be less error prone, they have been used rarely as a primary assessment of activity behaviors because of the costs of obtaining a sufficient number of repeated measures to estimate usual activity levels, the high participant burden, and coding and data entry costs associated with diaries. Furthermore, study participants may not comply with protocols for completing diaries, thereby potentially introducing reporting errors. For example, a diary protocol may require participants to record their activities at set intervals over a day to minimize forgetting, but participants may put off recording for a more personally convenient time. Recall errors may be introduced by delaying the recording of activities beyond specified windows of recall and report. Computer-assisted interviews by phone can reduce costs associated with coding and data entry and may limit the participant burden, but the expense of conducting the interviews can be high. However, mobile devices (e.g., phones, tablets) and computers linked to internet-based data collection methods for short-term recalls may resolve these problems because self-administration by participants and automated data collection processes have the potential to obviate the need for interviewers (39).

The other major obstacle associated with using short-term recalls is concerned about how effective assessment of only a few days of observation may be in providing useful estimates of usual levels of active and sedentary behaviors. This error, considered intraindividual variation in behavior (or within-person error), is captured by the ε term in the models. For our discussion, we shall assume that all [Latin Small Letter Open E]ij are distributed normally and independent of each other, but repeat measurements often may not satisfy these assumptions (6). For example, measurements recorded within the same week can be correlated because of weather, work, or health. Similarly, measurements recorded on the same day of the week may be correlated because of work schedules and exercise and television viewing habits. However, if we intelligently design our collection of replicate measures, we can obtain a relatively accurate and unbiased estimate of usual activity levels. In fact, when our assumptions of normality and independence are met for our [Latin Small Letter Open E]ij term, only a few repeat measures over time can be extremely useful in reducing the impact of intraindividual variation in behavior on our measures. The Table describes how the attenuation factor and statistical power increases with the number of replicates under these simplifying assumptions, as a function of the percentage of the total variation attributable to the intraindividual variation in behavior. In this example, we estimate the effect on statistical power for a 100-subject study for each effect size at an alpha level of 0.01. When intraindividual variation associated with a single replicate recall is greatest (i.e., 80%), the addition of two additional recalls (three total replicates) results in an approximate doubling of the attenuation factor (from 0.20 to 0.43), an increase in the strength of the observable association, and an approximate doubling of the statistical power available. The Table also shows that as the total number of replicates increases, the benefit of additional replicate measures begin to diminish, particularly beyond three to four recalls. This is consistent with results from nutritional epidemiology demonstrating that four replicate 24-h dietary recalls can substantially reduce random measurement errors (38). We have presented this simple scenario to highlight the idea that a modest number of replicate measures can substantially can reduce measurement error associated with intraindividual variation in behavior. However, because daily variation can follow specific patterns over time (e.g., seasonality, day of the week effects), real-life scenarios are more complicated, and the optimal method for quantifying intraindividual variation and the schedule for collecting replicates requires careful thought (6).

Influence of intraindividual variation in behavior and number of replicates on attenuation, bias in observed relative risks, and on statistical power.

There are, of course, some limitations to using short-term recalls in epidemiologic studies. First, this approach may reduce but does not eliminate measurement error, and it only assesses current behavioral exposures during a given measurement period (e.g., a 12-month period at study baseline). Information about historical activity patterns, which could be important for some health outcomes, cannot be measured directly and questionnaire-based approaches would be required to capture this information. Short-term recalls also may be less adept at estimating levels of less frequent behaviors, such as exercise participation or more seasonal activities. However, statistical methods are being developed that may be able to translate a few discrete observations of less frequent behaviors into meaningful estimates of usual levels of behaviors (e.g., (13)).

A New Direction in Assessment of Activity and Sedentary Behavior in Epidemiologic Studies

The Activities Completed Over Time in 24 Hours system is a self-administered Web-based physical activity assessment tool that has been developed by investigators at the National Cancer Institute. It asks respondents to report how they spent their time in the previous 24 h including time sleeping and in active and sedentary behaviors. The program leads respondents through four 6-h periods, asking them to record their activities on a timeline. They browse and select from more than 100 individual activities listed and can search for an additional 110 exercise and sports activities. Follow-up questions determine time spent in each activity, as well as selected activity-specific questions (e.g., body posture, rating of perceived exertion during exercise). Respondents typically report 20 to 30 distinct active/sedentary behaviors in each recall day. Summary values for time spent sleeping and in active and sedentary behaviors, as well as energy expenditure (MET — hours per day) are derived from the information reported. The goal is to have Activities Completed Over Time in 24 Hours available to interested researchers, providing a Web site to register studies and to provide access to the system for respondents to complete recalls. A demonstration version of the current instrument is available for review (


Existing self-report questionnaires of active and sedentary behaviors that are suitable for use in large-scale epidemiologic studies are known to contain substantial errors. For future large-scale epidemiologic studies of physically active and sedentary behaviors and health, we present three options for improving our assessment of these important behavioral exposures: 1) correcting errors in self-report questionnaires of usual behaviors analytically using calibration studies and measurement error correction models, 2) eliminating reporting errors by using objective indicators of behavior, or 3) by reducing the magnitude of the reporting errors through the use of short-term recalls. Given that short-term recalls may reduce the magnitude of reporting errors, and because they also offer the opportunity for gathering salient contextual information about the behaviors reported, we highlight the potential for short-term recalls to be used in future epidemiologic studies and discuss how we might overcome obstacles to their use.

The authors have no funding disclosures or conflicts of interest to declare.


1. Adams SA, Matthews CE, Moore CG, Cunningham JE, Fulton J, Hebert JR. The effect of social desirability and social approval on self-reports of physical activity. Am. J. Epidemiol. 2005; 161: 389–98.
2. Ainsworth B, Haskell W, Leon A, et al.. Compendium of physical activities: classification of energy costs of human physical activities. Med. Sci. Sports Exerc. 1993; 25: 71–80.
3. Ainsworth B, Haskell W, Whitt M, et al.. Compendium of physical activities: an update of activity codes and MET intensities. Med. Sci. Sports Exerc. 2000; 32: S498–S516.
4. Armstrong BG. The effects of measurement errors on relative risk regressions. Am. J. Epidemiol. 1990; 132: 1176–84.
5. Baranowski T. Validity and reliability of self-report measures of physical activity: an information-processing perspective. Res. Q. Exerc. Sport 1988; 59: 314–27.
6. Baranowski T, Masse LC, Ragan BWG. How many days was that? We’re still not sure, but we’re asking the question better! Med. Sci. Sports Exerc. 2008; 40: S544–9.
7. Bianchi SM, Milkie MA, Sayer JP, Robinson JP. Is anyone doing the housework? Trends in the gender division of household labor. Soc. Forces 2000; 79: 191–228.
8. Blair SN, Ching Y, Holder SJ. Is physical activity or physical fitness more important in defining health benefits? Med. Sci. Sports Exerc. 2001; 33: S379–99.
9. Calabro MA, Welk GJ, Carriquiry AL, Nusser SM, Beyler NK, Matthews CE. Validation of a Computerized 24-Hour Physical Activity Recall (24PAR) instrument with pattern-recognition activity monitors. J. Phys. Act. Health 2009; 6: 211–20.
10. Colbert LH, Matthews CE, Havighurst TC, KIM K, Schoeller DA. Comparative validity of physical activity measures in older adults. Med. Sci. Sports Exerc. 2011; 43: 867–76.
11. Colbert LH, Schoeller DA. Expending our physical activity (measurement) budget wisely. J. Appl. Physiol. 2011; 111: 606–7.
12. DiPietro L, Caspersen CJ, Ostfeld AM, Nadel ER. A survey for assessing physical activity among older adults. Med. Sci. Sports Exerc. 1993; 25: 628–42.
13. Dodd KW, Guenther PM, Freedman LS, et al.. Statistical methods for estimating usual intake of nutrients and foods: a review of the theory. J. Am. Diet. Assoc. 2006; 106: 1640–50.
14. Dolan SH, Williams DP, Ainsworth BE, Shaw JM. Development and reproducibility of the bone loading history questionnaire. Med. Sci. Sports Exerc. 2006; 38: 1121–31.
15. Ferrari P, Friedenreich C, Matthews CE. The role of measurement error in estimating levels of physical activity. Am. J. Epidemiol. 2007; 166: 832–40.
16. Freedson PS, Lyden K, Kozey-Keadle S, Staudenmayer J. Evaluation of artificial neural network algorithms for predicting METs and activity type from accelerometer data: validation on an independent sample. J. Appl. Physiol. 2011; 111: 1804–12.
17. Healy GN, Clark B, Winkler EAH, Gardiner PA, Brown WJ, Matthews CE. Measurement of adults’ sedentary time in population-based studies. Am. J. Prev. Med. 2011; 41: 216–27.
18. Hustvedt BE, Christophersen A, Johnsen LR, et al.. Description and validation of the ActiReg: a novel instrument to measure physical activity and energy expenditure. Br. J. Nutr. 2004; 92: 1001–1008.
19. Jacobs D, Ainsworth B, Hartman T, Leon A. A simultaneous evaluation of 10 commonly used physical activity questionnaires. Med. Sci. Sports Exerc. 1993; 25: 81–91.
20. Johannsen DL, Calabro MA, Stewart J, Franke W, Rood JC, Welk GJ. Accuracy of armband monitors for measuring daily energy expenditure in healthy adults. [Miscellaneous Article]. Med. Sci. Sports Exerc. 2010; 42: 2134–40.
21. Kipnis V, Midthune D, Freedman LS, et al.. Empirical evidence of correlated biases in dietary assessment instruments and its implications. Am. J. Epidemiol 2001; 153: 394–403.
22. Kipnis V, Subar AF, Midthune D, et al.. Structure of dietary measurement error: results of the OPEN Biomarker Study. Am. J. Epidemiol. 2003; 158: 14–21.
23. Kozey-Keadle S, Libertine A, Lyden K, Staudenmayer J, Freedson P. Validation of wearable monitors for assessing sedentary behavior. Med. Sci. Sports Exerc. 2011; 43: 1561–7.
24. Levine JA, McCrady SK, Lanningham-Foster L, Kane PH, Foster RC, Manohar CU. The role of free-living daily walking in human weight gain and obesity. Diabetes 2008; 57: 548–54.
25. Levine JA, Lanningham-Foster LM, McCrady SK, et al.. Interindividual variation in posture allocation: possible role in human obesity. Science 2005; 307: 584–6.
26. Manini TM, Everhart JE, Patel KV, et al.. Daily activity energy expenditure and mortality among older adults. J.A.M.A. 2006; 296: 171–9.
27. Matthews CE. Techniques for physical activity assessment: self-report instruments. In: Welk G, Dale D, editors. Physical Activity Assessments for Health-Related Research. Champaign (IL): Human Kinetics; 2002. pp. 107–23.
28. Matthews CE. Calibration of accelerometer output for adults. Med. Sci. Sports Exerc. 2005; 37: S512–22.
29. Montoye HJ. Introduction: evaluation of some measurements of physical activity and energy expenditure. Med. Sci. Sports Exerc. 2000; 32: S439–41.
30. Moore SC, Gierach GL, Schatzkin A, Matthews CE. Physical activity, sedentary behaviours, and the prevention of endometrial cancer. Br. J. Cancer 2010; 103: 933–8.
31. Neilson HK, Robson PJ, Friedenreich CM, Csizmadi I. Estimating activity energy expenditure: how valid are physical activity questionnaires? Am. J. Clin. Nutr. 2008; 87: 279–91.
32. Nusser SM, Beyler NK, Welk GJ, Carriquiry AL, Fuller WA, King BMN. Modeling errors in physical activity recall data. J. Phys. Act. Health 2012; 9: S56–S67.
33. Owen N, Healy GN, Matthews CE, Dunstan DW. Too much sitting: the population health-science of sedentary behavior. Exerc. Sport Sci. Rev. 2010; 38: 105–13.
34. Passmore R, Durnin JVG. Human energy expenditure. Physiol. Rev. 1955; 35: 801–40.
35. Philippaerts RM, Westerterp KR, Lefevre J. Doubly labelled water validation of three physical activity questionnaires. Int. J. Sports Med. 1999; 20: 284–9.
36. Physical Activity Guidelines Advisory Committee. Physical Activity Guidelines Advisory Committee Report. Washington (DC): U.S. Department of Health and Human Services; 2008.
37. Sallis JF, Saelens BE. Assessment of physical activity by self-report: limitations and future directions. Res. Q. Exerc. Sport 2000; 71: S1–4.
38. Schatzkin A, Kipnis V, Carroll RJ, et al.. A comparison of a food frequency questionnaire with a 24-hour recall for use in an epidemiological cohort study: results from the biomarker-based Observing Protein and Energy Nutrition (OPEN) study. Int. J. Epidemiol. 2003; 32: 1054–62.
39. Schatzkin A, Subar AF, Moore S, et al.. Observational epidemiologic studies of nutrition and cancer: the next generation (with better observation). Cancer Epidemiol. Biomark. Prev. 2009; 18: 1026–32.
40. Schatzkin A, Subar AF, Thompson FE, et al.. Design and serendipity in establishing a large cohort with wide dietary intake distributions : The National Institutes of Health-American Association of Retired Persons Diet and Health Study. Am. J. Epidemiol. 2001; 154: 1119–25.
41. Shephard RJ. Limits to the measurement of habitual physical activity by questionnaires. Br. J. Sports Med. 2003; 37: 197–206.
42. Spiegelman D, Schneeweiss S, McDermott A. Measurement error correction for logistic regression models with an “alloyed gold standard”. Am. J. Epidemiol 1997; 145: 184–96.
43. Staudenmayer J, Pober D, Crouter SE, Bassett DR, Freedson P. An artificial neural network to estimate physical activity energy expenditure and identify physical activity type from an accelerometer. J. Appl. Physiol. 2009; 107: 1300–7.
44. van der Ploeg HP, Merom D, Chau JY, Bittman M, Trost SG, Bauman AE. Advances in population surveillance for physical activity and sedentary behavior: reliability and validity of time use surveys. Am. J. Epidemiol. 2010; 172: 1199–206.
45. van Poppel MNM, Chinapaw MJM, Mokkink LB, van Mechelen W, Terwee CB. Physical activity questionnaires for adults: a systematic review of measurement properties. Sports Med. 2010; 40: 565–600.
46. Wong MY, Day NE, Wareham NJ. Measurement error in epidemiology: the design of validation studies II: bivariate situation. Stat. Med. 1999; 18: 2831–45.

exposure assessment; exercise; sitting; measurement error; disease prevention

©2012 The American College of Sports Medicine