Journal Logo

APPLIED SCIENCES

Reproducibility of Accelerometer and Posture-derived Measures of Physical Activity

SAINT-MAURICE, PEDRO F.1; SAMPSON, JOSHUA N.1; KEADLE, SARAH KOZEY2; WILLIS, ERIK A.1,3; TROIANO, RICHARD P.4; MATTHEWS, CHARLES E.1

Author Information
Medicine & Science in Sports & Exercise: April 2020 - Volume 52 - Issue 4 - p 876-883
doi: 10.1249/MSS.0000000000002206

Abstract

Integrating better measures of physical activity (PA) into epidemiological studies is essential to quantify and understand the relationship between habitual PA and disease more completely (1), and such measures have recently been incorporated into large observational cohorts (e.g., UK Biobank, German National Cohort) to assess activity–disease associations (2,3). The goal of these studies is to test the hypothesis that long-term PA participation (e.g., over 12 months) alters disease processes occurring over many years and ultimately is associated with reduced risk of developing chronic diseases. However, PA is typically monitored using only a 7-d administration protocol under the assumption that seven consecutive days of measurement provides sufficient information on an individual’s long-term activity or “habitual PA” levels (e.g., 6–12 months) (4–8). Although many studies have evaluated the optimal number of days of observation needed to achieve more stable estimates of PA for a single administration (9–14), less is known about the variability of accelerometer-based measures of PA from one 7-d administration to the next, or whether a single administration can provide an accurate representation of an individual’s habitual PA. The Women’s Health Study (N = 209; 70.6 yr) used repeated annual assessments over 2–3 yr and reported intraclass correlation coefficients (ICC) of 0.70–0.75 for time spent in sedentary, light, and moderate-to-vigorous PA (MVPA) (15), suggesting that these metrics were reproducible from one administration to the next. However, this study was limited to older women, and these findings may not be generalizable to younger adults or men. The assessment of reproducibility is particularly important for prospective studies because low reproducibility, as shown by lower ICC, is indicative of increased random measurement error. Increased random error in PA measurements can reduce statistical power and result in underestimations of the activity–disease associations (16–19).

To our knowledge, the amount of variation in PA from one 7-d accelerometer-based administration to the next or the extent to which a single administration can reflect habitual PA is not well understood, nor do we know how much this random error may affect statistical results derived from a single 7-d monitor administration. In this study, we specifically describe: 1) between- and within-subject variability in two 7-d accelerometry administrations measured over 6 months and 2) the estimated effect of multiple accelerometer-based administrations on PA variability, statistical power, and bias or attenuation in regression coefficients. Results are provided for various activity metrics obtained from two different devices: sitting, standing, and stepping measured by the activPAL, and sedentary behavior, light, and MVPA measured by the ActiGraph GT3X.

METHODS

This study used data from the Interactive Diet and Activity Tracking in the AARP study, a measurement cohort designed to test the properties of various PA and diet measures and understand the implications of measurement error for epidemiological studies involving these exposures. The Interactive Diet and Activity Tracking in AARP enrolled 1082 active members (50–74 yr old) of the AARP residing in Pittsburgh, Pennsylvania. Inclusion criteria were as follows: having Internet access, not being on a weight loss diet, a body mass index (BMI) <40 kg·m−2, and being free of major medical conditions and mobility limitations. Participants were asked to visit the study center three times during a 12-month period to complete diet and PA monitoring protocols and received $450 upon completion of the study. PA was assessed twice using accelerometers, and participants were not asked to change their behavior as part of the study. The study was approved by the National Cancer Institute Special Studies Institutional Review Board, and all participants provided consent for participation. Details about the study are available at https://biometry.nci.nih.gov/cdas/learn/idata/study-summary/.

PA monitoring

Participants completed two 7-d PA monitoring administrations that were 6 months apart. Participants were fitted with an ActiGraph and an activPAL and were given instructions to wear the monitors for seven consecutive days except during water-based activities. Participants were instructed to wear the monitors while awake (from the time they got out of bed until they went to bed), and waking day was determined from monitor wear logs and reports of time out of/into bed. Assessments were spread over all seasons and took place in either December–February (n = 180; 22.9%), March–May (n = 359; 45.7%), June–August (n = 166; 21.1%), and September through November (n = 80; 10.2%). The second administration took place approximately 6 months later.

The ActiGraph (model GT3X; Pensacola, FL) is a triaxial accelerometer (i.e., measures vertical, horizontal, and perpendicular acceleration) and is among the most popular activity monitoring devices used in epidemiological research involving measures of PA (20,21). The monitor was worn on the right hip and was initialized to record total acceleration (i.e., three-axis acceleration) using 1-s epochs and with the low-frequency extension enabled. The ActiGraph records acceleration continuously, and we used the Choi algorithm to identify periods of nonwear during the waking day as determined from the monitor wear logs (22,23). In this study, we used the Sojourn 3x algorithm, which was developed in adults and uses second-by-second counts generated from three-dimensional acceleration features from the ActiGraph device. This method uses a hybrid machine-learning and neural network approach to classify time spent in distinct sedentary, light PA, and MVPA events that we summed over each valid day (24). Data from the ActiGraph were processed to generate time (h·d−1) spent in sedentary, light PA, and MVPA.

The activPAL3™ (PAL Technologies Ltd., Glasgow, United Kingdom) uses proprietary algorithms based on an inclinometer and acceleration to identify sitting/lying versus upright positions and distinguish standing still from stepping. The activPAL was placed over the right thigh using an adhesive patch, and participants were instructed to wear the monitor at all times, except during water-based activities. This device is particularly accurate when measuring time spent sitting and is among the most popular devices for epidemiological research focused on sedentary behaviors and health (25). activPAL data were downloaded in 15-s epochs and processed to generate time (h·d−1) spent in sitting/lying, standing, and stepping. Nonwear time was estimated using the activPAL software during the waking day as determined from the monitor wear logs.

Days when the monitor was worn for fewer than 10 h were deemed incomplete and not considered in the aggregate 1-wk estimate computed for each participant. Participants with at least 1 d of monitor data from each device were included in the analyses (Supplemental Table S1, Supplemental Digital Content 1, intraclass correlations when data were restricted to participants with 1 vs 4 or more days of valid data, https://links.lww.com/MSS/B809). Waking nonwear time for the two monitors differed on average by 1.1 h·d−1 (1.8 ± 1.0 h·d−1 for ActiGraph and 0.7 ± 0.6 h·d−1 for activPAL). To minimize differences and enhance comparison between ActiGraph and activPAL outcomes, we imputed nonwear time in proportion to the average time/day spent in the various intensities/posture (i.e., sitting/lying, standing, and stepping for activPAL, and sedentary, light PA, and MVPA for ActiGraph). For example, if a participant had a total of 60 min of missing data on a valid day, this amount of nonwear time was first multiplied by the participant’s proportion of time spent in sedentary, light PA, and MVPA on that day, and the proportional duration values were added to total estimated time/day spent in various intensities. This imputation procedure was applied individually to all participants as needed. Our primary results are also presented without imputed nonwear time to examine if this approach affected our results (Supplemental Table S2, Supplemental Digital Content 2, intraclass correlations for data with imputed vs nonimputed estimates, https://links.lww.com/MSS/B810).

Statistical methods

We first examined the reliability of the average weekly estimate for each 7-d administration and summary metric by computing ICC values and their respective 95% confidence intervals (CI). Intraclass correlations were computed as:

where represents the between-subject variance and represents the within-subject variance from one administration to the other. The ICC reflects the percentage of the total variability in these 7-d PA administrations that can be attributed to between-subject differences. Hence, ICC values range from 0 to 1.0, with lower ICC values indicating higher amounts of within-subject variability and vice versa. ICC values and respective CI values were computed using a freely available macro (https://www.hsph.harvard.edu/donna-spiegelman/software/icc9/) that uses the Proc Mixed procedure in SAS and that allows for adjustment for fixed effects (26). In this study, we computed both unadjusted ICC and ICC adjusted for sex, age group (50–59, 60–69, or ≥70 yr), BMI (normal weight if BMI < 25.0 kg·m−2, overweight if 25.0 kg·m−2 ≥ BMI < 30.0 kg·m−2, or obese if BMI ≥ 30.0 kg·m−2), and month of first data collection (December–February, March–May, June–August, or September–November).

Studies can improve their measures of PA by repeating 7-d administrations multiple times during the year. Here, we use the (i.e., between-subject variance), (i.e., within-subject variance), and ICC obtained from equation 1 (Supplemental Table S3, Supplemental Digital Content 3, between- and within-subject variances by accelerometer device and respective PA metrics, https://links.lww.com/MSS/B811) to assess the theoretical effect of multiple administrations on the 1) ICC, 2) statistical power for detecting activity–disease associations, and 3) attenuation in the estimates of the activity–disease associations. For these calculations, we effectively assume that the selected weeks represent random weeks within the year. In other words, an individual’s relative behavior (i.e., as compared with their long-term habitual behavior) during one administration is uncorrelated with their relative behavior another administration.

  • 1. We evaluated ICC values for measures of PA metrics when, instead of having the average of the physical metric from a single week (i.e., equation 1), we have the average of the PA metric from N administrations (i.e., equation 2):

Increasing the number of administrations (N) reduces the contribution of within-subject variation in the denominator resulting in an increase in the ICC.

  • 2. We evaluated the power to detect an activity–disease association when we have the average of N administrations. We consider a study that uses a t-test to test the PA metric measured in nA cases and nU controls. Letting μ1 and μ0 be the average of the true value (e.g., lifetime average) in cases and controls, be the variance of the values among individuals in a group, and Δ be the effect size defined as:

Then the power of test will be where Φ is the cumulative distribution function for a standard normal distribution and is the quantile of that distribution. For our power calculations, we fix nA = 100, nU = 1000, ICC1 = 0.6, and Δ = 0.3. Parameters were chosen to reflect a typical study size and detectable effect. We note that increasing the number, N, of administrations reduces the within-subject variability (i.e., higher ICC) and results in increased statistical power. Estimates for ICC ranging from 0.4 to 0.8 are also presented in supplemental material (Figure S1, Supplemental Digital Content 4, statistical power by number of administrations for ICC values ranging from 0.4 to 0.8, https://links.lww.com/MSS/B812).

  • 3. We evaluated the attenuation in the estimate of the activity–disease association when we have N administrations. We consider a study that assesses the exposure/disease association by logistic regression. It has been shown that if the actual relationship is described by log(OR) = β0 + β1X, where X is the true value (i.e., lifetime average), then the expected value of the estimated coefficient using an imperfect measure (averaged over N administrations) is (27). We define the percent attenuation by:

and the estimated OR by:

Greater within-subject variability indicated by lower ICC values can attenuate the magnitude of the activity–disease association (). The percent attenuation reflects the proportion of the actual association that is observed for a given ICC. Increasing the number of administrations reduces within-subject variability (i.e., higher ICC) and thus reduces the amount of attenuation in the observed regression coefficients. The ICC used to compute the estimated attenuation in coefficients was assumed to be 0.6, and estimates for ICC ranging from 0.4 to 0.8 are also presented in supplemental material (Figure S2, Supplemental Digital Content 5, percentage attenuation in the expected estimate of the regression coefficient as the number of administrations increases for ICC values ranging from 0.4 to 0.8, https://links.lww.com/MSS/B813).

RESULTS

At least one averaged week of activPAL/ActiGraph data was available from 450 men and 464 women, who were on average 63.2 ± 5.9 yr at the first visit. Table 1 shows that participants reported an average waking day of 15.8 ± 1.1 h·d−1 and had an average of 4.7 ± 1.7 d·wk−1 and 1.7 ± 0.5 wk of valid data. On average (over the two administrations), participants spent approximately 9–10 h·d−1 in sitting/sedentary, 4–5 h·d−1 in standing/light activities, and approximately 2 h·d−1 in stepping/MVPA. These various PA metrics remained similar from one administration to the other (Supplemental Table S4, Supplemental Digital Content 6, mean an SD for PA/sedentary estimates at each measurement period and stratified by sex, https://links.lww.com/MSS/B814), whereas the distributions of time in each metric averaged across administrations varied across sex, age, and BMI, but remained stable across season. Overall, male participants, younger participants, and participants with normal weight were more active (i.e., spent more time in standing/stepping or in light PA/MVPA) and less time in sitting/sedentary.

T1
TABLE 1:
Mean (SD) for PA/sedentary hours/week obtained from activPAL and ActiGraph, Interactive Diet and Activity Tracking in AARP, 2012–2013.

The unadjusted ICC values for the activPAL reached 0.60 and were similar for sitting, standing, and stepping (Table 2). This indicates that 60% of the total variance was attributed to between-subject differences, and that 40% of total variance of the metrics was attributed to within-subject variation, or random error. After adjusting for sex, age group, BMI, and season, the ICC values were slightly attenuated: 0.58 (95% CI, 0.53–0.63) for sitting, 0.62 (0.57–0.67) for standing, and 0.57 (0.51–0.62) for stepping. ICC values ranged from 0.37 to 0.74 when stratified by sex, age group, BMI, and season, and were generally higher for male, younger, and normal-weight participants. The highest ICC for stepping was for data initially collected in June–August (0.74 (0.65–0.81)), whereas the lowest ICC was noted for obese participants (0.37 (0.14–0.68)).

T2
TABLE 2:
Intraclass correlations for activPAL by intensity, stratified by sex, age, BMI, and season, Interactive Diet and Activity Tracking in AARP, 2012–2013.

Results for ActiGraph also showed that ICC values were approximately 0.60 for sedentary time, light PA, and MVPA, again indicating that approximately 40% of the total variance was attributed to within-subject variation. ICC values were similar across ActiGraph metrics and slightly attenuated after adjustments for sex, age group, BMI, and season. Intraclass correlations ranged from 0.42 to 0.67 when stratified by the different demographic factors and were also generally higher among the younger age group and normal-weight participants. Women had higher ICC values compared with men. The highest ICC was for data initially collected in June–August for time in light PA (0.68 (0.59–0.76)) and the lowest ICC was for obese participants for sedentary time (0.42 (0.20–0.68)) and light PA (0.42 (0.21–0.66); Table 3). The ICC values for activPAL and ActiGraph remained similar when analyses were restricted to nonimputed data (Supplemental Table S2, Supplemental Digital Content 2, ICC for data with imputed vs nonimputed estimates, https://links.lww.com/MSS/B810).

T3
TABLE 3:
Intraclass correlations for ActiGraph by intensity, stratified by sex, age, BMI, and season, Interactive Diet and Activity Tracking in AARP, 2012–2013.

Change in ICC values with replicate monitor administrations

When we modeled the effect of reducing within-subject variability by increasing the number of replicate administrations, we found that the most noticeable increases in ICC values resulted from one or two additional administrations (i.e., a total of 2–3 replicates), and this trend was similar in results for activPAL and ActiGraph. For example, administering the activPAL twice reduces within-subject variability by half and results in an increase in the ICC associated with sitting time from 0.58 to 0.74 (Fig. 1, left panel). Similar comparisons for ActiGraph and measure of sedentary time would result in an increase in ICC from 0.56 to 0.72 (Fig. 1, right panel). Further increases in ICC values were of smaller magnitude and tended to plateau as the number of administrations increased.

F1
FIGURE 1:
Intraclass correlations for activPAL and ActiGraph by number of administrations (in weeks).

Effect of replicate administrations on statistical power

Our evaluation of the effect of conducting replicate administrations on the power to detect an activity–disease association showed that statistical power could be notably increased by averaging two administrations and modestly increased by using additional administrations (Fig. 2). However, the benefit of additional administrations was diminished when the ICC values approached 0.8 or larger (Supplemental Figure S1, Supplemental Digital Content 4, statistical power by number of administrations for ICC ranging from 0.4 to 0.8, https://links.lww.com/MSS/B812). Finally, we note that increasing sample size as opposed to the number of replicate administrations will result in larger increases in statistical power (Supplemental Table S5, Supplemental Digital Content 7, statistical power for N subjects and N replicate administrations, https://links.lww.com/MSS/B815), but will often be more costly.

F2
FIGURE 2:
Statistical power (y-axis) to detect an exposure/outcome association as the number of administrations (x-axis) increases. The ICC of a single administration was assumed to be 0.6. Other parameters were n A = 100, n u = 1000, Δ = 0.3.

Effect of within-subject variation and replicate administrations on attenuation in regression coefficients

Figure 3 shows the amount of attenuation in regression coefficients resulting from within-subject variability for one to six 7-d monitor administrations for an ICC of 0.6, which approximates what we observed in the overall population for most metrics. A single 7-d administration resulted in a attenuation in the observed regression coefficients of 40%. Increasing the number of replicate administrations to two reduced the attenuation to 25%, with modest decrements by further increasing the number of administrations. The lower the ICC associated with a measure, the greater the benefit in reducing attenuation by increasing the number of replicate administrations (Supplemental Figure S2, Supplemental Digital Content 5, percentage attenuation in the expected estimate of the regression coefficient as the number of administrations increases for ICC ranging from 0.4 to 0.8, https://links.lww.com/MSS/B813).

F3
FIGURE 3:
The percentage attenuation (y-axis) in the expected estimate of the regression coefficient as the number of administrations (x-axis) increases. The ICC of a single administration was assumed to be 0.6.

DISCUSSION

This study examined the reproducibility of accelerometer-based administrations over a 6-month period in a large sample of adults. Estimates from the activPAL and ActiGraph revealed an ICC of ~0.60 indicating fair–good reproducibility, with just over half of the total variability in these measures being attributed to between-subject variance. On the other hand, results also indicated that as much as 40% of the variance in one 7-d monitor administration may be random error due to behavioral variability for the metrics studied in this investigation. Our theoretical examinations suggested that this amount of random error can attenuate statistical associations by as much as 40%. In recent years, there has been substantial investment (e.g., logistic, financial) in incorporating accelerometer-based measures of PA into large epidemiologic studies to improve the measurement of PA, largely under the assumption that one 7-d administration is adequate to minimize within-subject variability. However, our results suggest that random measurement error associated with this administration protocol may be greater than anticipated. These findings highlight the need to implement strategies to quantify and minimize the adverse effects of this underappreciated source of measurement error.

Many studies have examined the number of days of monitoring needed to achieve a reliable estimate of PA over short periods of time (e.g., 1 wk or a month) (9–14,28). Adequate stability (i.e., ICC ≥0.8) is generally believed to be achieved if a PA administration protocol includes four or more consecutive days (9–14,28), and the most common administration approach uses seven consecutive days (21). However, for etiological studies, researchers are usually interested in characterizing usual PA over longer periods of time (e.g., months or years), and we still lack descriptions of variability from one 7-d administration period in relation to longer assessment periods. One study assessed the reproducibility of time spent in sedentary, light PA, and MVPA obtained from 7-d ActiGraph administrations using repeated annual assessments over 2–3 yr (15). Keadle et al. found that random error accounted for only 25%–30% of the total variance in each administration period (ICC of 0.70–0.75). The higher ICC in Keadle et al. when compared with our results may be explained in part by differences in the age and sex distributions of the two studies. For example, we found somewhat higher ICC in women (i.e., ICC ranged from 0.61 to 0.66), but not in older adults (i.e., ≥70 yr; ICC ranged from 0.55 to 0.64), an age stratum that would be most similar with the sample assessed in the study by Keadle et al. The PA administrations in the study by Keadle et al. were conducted 13–15 months apart, a study design that better accounted for seasonal variation. In our study, the monitor administrations were only 6 months apart, but our detailed analysis by season did not show large differences in ICC values when seasonal variation would be expected to be greatest (e.g., summer/winter administrations), which would minimize the differences between studies. Therefore, the reason for the differences between these two reports remains unclear, and additional studies in large heterogenous populations are needed to provide further insight.

Our study extends the work of Keadle et al. and explores the stability of 7-d PA administrations on a large sample of male and female adults (n = 914), with PA measured using two well-established devices. We showed that our accelerometer-based measures had acceptable reproducibility, but there was still a considerable amount of random error (i.e., ~30%–40%) that is likely to penalize studies interested in activity–disease associations. For example, we found that the ICC for sedentary time obtained from the ActiGraph for one 7-d administration was only 0.56, which was low enough to result in a significant attenuation of the sedentary–disease association. This suggests that the magnitude of associations observed for mortality and sedentary time of 1.3 previously documented using one 7-d ActiGraph administration in the National Health and Nutrition Examination Survey could be as high as 1.3(1/0.56) = 1.6 when attenuation due to random error is accounted for (29). Such attenuation distorts the quantification of the dose–response for sedentary time. These sources of error and implications have been addressed in studies of diet (30–33) but have received much less attention in population-based studies using devices. Our study explores this gap by providing estimates of random error associated with metrics obtained from devices and describes the implications of such error for epidemiological studies interested in activity–disease associations.

Our finding that estimates of activity–outcome association can be attenuated by as much as 40% reinforces the need to adopt strategies to reduce this source of error. The amounts of random error described in our study will result in loss of statistical power and lead to substantial attenuation in regression coefficients, hence justifying including measurement error reduction strategies in epidemiological studies. We explored if compliance criteria could affect variability; however, we found that using 4 or more valid days per week had only a modest effect on ICC across the various PA metrics compared with ≥1 d, and using this approached also reduced our sample size by 20% (n = 728). This suggests that more restrictive compliance criteria might not be the ideal strategy to maximize the stability of PA metrics. In prospective studies, one common strategy to reduce random error is to obtain multiple/replicate measures of an exposure. The average or sum of the exposure scores is then used to obtain a better representation of the exposure (34). However, in larger studies, it might be impractical to obtain replicate measures for all individuals to account for random error. Instead, one strategy has been to conduct an internal calibration study in a subset of the cohort who complete replicate assessments of the exposure to quantify the amount of random error in the exposure of interest and then implement measurement error corrections in the main study (33). Intraclass correlations can then be determined and used to adjust for random error in future measures of the exposure in the full cohort (35). The large degree of random error (i.e., 40% attenuation) reported in our study suggests that future studies integrating adjustments for random error will also result in improved activity–disease associations. Our supplemental figures (S1 and S2) include results that illustrate the implications of higher ICC values on statistical power and attenuation in regression coefficients.

This study provides novel information on the reproducibility of PA measures, but there are some limitations worth mentioning. Our estimates of reliability are limited to the devices and body sites of monitor attachment examined, only to behaviors evaluated in the waking day, and to the PA metrics investigated. For example, more commonly, devices are now being used at the wrist (20,21), and therefore, future studies should examine the reproducibility of PA with wrist-worn devices so that the implications for epidemiological studies are also well understood. However, our choice to examine reproducibility using the ActiGraph was logical considering that this device is the most commonly used among the scientific community (21). For example, the ActiGraph has been used in the National Health and Nutrition Examination Survey (2003–2006 cycles; nearly 15,000 individuals were assessed with the ActiGraph) (20). We also chose to examine reproducibility of posture using the activPAL, a monitor that has been shown to provide accurate estimates of sitting time (25). This exposure has also demonstrated to be predictive of various health conditions, and also both all-cause and cause-specific mortality, and hence has sparked interest among epidemiologists (36–38). Therefore, the monitors used in our study are likely of great relevance. We also imputed nonwear time during the waking day to minimize potential differences between monitors because the mean differences in nonwear time were >1 h·d−1. This approach assumes that the distribution of time spent in each activity intensity was the same during nonwear and the wear time periods. This assumption can bias the mean values for each activity metric; however, it is not clear how such bias, if present, would affect results from the variability analysis. To assess this, we computed ICC using only nonimputed data and found that ICC remained similar. Thus, our approach standardized wear time across monitors, facilitating direct comparisons between PA metrics, without affecting our estimated ICC values. We also did not distinguish between true variability in behavior from variability due to technical error in the monitors. However, studies have demonstrated that variability due to technical error associated with these monitors is likely very small (<2%) (39,40). We also limited our consideration of random error in PA to an independent/exposure variable. Random error has a fundamentally different effect on statistical results when the PA measures are used as dependent variables. Random error in the dependent variable does not result in attenuation of associations but instead leads only to wider CI (i.e., loss of precision) (16). Thus, our evaluation of PA as an exposure is more relevant considering that epidemiological studies are often designed to detect activity–outcome associations. Our examinations of within-subject variability were also limited to two administrations, 6 months apart. Therefore, the implications of a single administration for epidemiological studies described here are anchored to the measurement timeline used in this study and are rather theoretical. Having two administrations also limited our ability to test the assumption of statistical independence for our two measures. If the assumption of statistical independence for our measures was violated, we could have introduced potential bias into our estimates of the ICC. Consider the example where administrations were collected on consecutive weeks, and further consider a person who only exercises in the summer. In this example, the variability in activity levels during two consecutive winter weeks would underestimate the true variability of an individual’s activity levels over the year. In turn, this would result in overestimating the ICC and underestimating the benefit of repeat observations. However, our measures were collected 6 months apart, and we actually detected little seasonal variability, suggesting that our assumption of independence was reasonable. Additional studies are needed to examine PA variability over multiple administrations and longer periods of time to provide additional insights into the implications of random error for epidemiological studies. Finally, our study also included a relatively homogeneous sample of adults based in Pittsburgh, Pennsylvania, and therefore, our findings may be limited to adults of similar age and educational backgrounds. Future studies should explore the reproducibility of PA metrics using more diverse samples (e.g., younger adults, ethnically diverse, from various economic backgrounds) and over longer periods of time.

In conclusion, we found that level of reproducibility associated with different PA metrics over 1 y was acceptable but that there is still a considerable amount of random error from one 7-d monitor administration to the next that could attenuate statistical associations in accelerometer-based studies. Reproducibility varied by age, sex, and BMI, and therefore, future studies should implement strategies to minimize the adverse effect of random measurement error in accelerometer-based investigations.

This research was supported in part by the National Institutes of Health/National Cancer Institute Intramural Research Program.

The work of P. F. S.-M. was partially funded by an individual fellowship grant awarded by the Fundacao para a Ciencia e Tecnologia (Portugal; SFRH/BI/114330/2016) under the Programa Operacional Potencial Humano)/Fundo Social Europeu program.

All authors in this study report no conflicts of interest.

The results of the present study do not constitute endorsement by the American College of Sports Medicine.

The results of the study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation.

REFERENCES

1. Matthews CE, Moore SC, George SM, Sampson J, Bowles HR. Improving self-reports of active and sedentary behaviors in large epidemiologic studies. Exerc Sport Sci Rev. 2012;40(3):118–26.
2. Doherty A, Jackson D, Hammerla N, et al. Large scale population assessment of physical activity using wrist worn accelerometers: the UK Biobank Study. PLoS One. 2017;12(2):e0169649.
3. German National Cohort (GNC) Consortium. The German National Cohort: aims, study design and organization. Eur J Epidemiol. 2014;29(5):371–82.
4. Tudor-Locke C, Camhi SM, Troiano RP. A catalog of rules, variables, and definitions applied to accelerometer data in the National Health and Nutrition Examination Survey, 2003–2006. Prev Chronic Dis. 2012;9:E113.
5. Lee IM, Shiroma EJ. Using accelerometers to measure physical activity in large-scale epidemiological studies: issues and challenges. Br J Sports Med. 2014;48(3):197–201.
6. Trost SG, McIver KL, Pate RR. Conducting accelerometer-based activity assessments in field-based research. Med Sci Sports Exerc. 2005;37(11 Suppl):S531–43.
7. Troiano RP, Berrigan D, Dodd KW, Masse LC, Tilert T, McDowell M. Physical activity in the United States measured by accelerometer. Med Sci Sports Exerc. 2008;40(1):181–8.
8. Atkin AJ, Gorely T, Clemes SA, et al. Methods of measurement in epidemiology: sedentary behaviour. Int J Epidemiol. 2012;41(5):1460–71.
9. Sasaki JE, Junior JH, Meneguci J, et al. Number of days required for reliably estimating physical activity and sedentary behaviour from accelerometer data in older adults. J Sports Sci. 2018;36(14):1572–7.
10. Aadland E, Ylvisaker E. Reliability of objectively measured sedentary time and physical activity in adults. PLoS One. 2015;10(7):e0133296.
11. Hart TL, Swartz AM, Cashin SE, Strath SJ. How many days of monitoring predict physical activity and sedentary behaviour in older adults? Int J Behav Nutr Phys Act. 2011;8:62.
12. Jerome GJ, Young DR, Laferriere D, Chen C, Vollmer WM. Reliability of RT3 accelerometers among overweight and obese adults. Med Sci Sports Exerc. 2009;41(1):110–4.
13. Dontje ML, Dall PM, Skelton DA, Gill JMR, Chastin SFM, Seniors USP Team. Reliability, minimal detectable change and responsiveness to change: indicators to select the best method to measure sedentary behaviour in older adults in different study designs. PLoS One. 2018;13(4):e0195424.
14. Jaeschke L, Steinbrecher A, Jeran S, Konigorski S, Pischon T. Variability and reliability study of overall physical activity and activity intensity levels using 24 h-accelerometry-assessed data. BMC Public Health. 2018;18(1):530.
15. Keadle SK, Shiroma EJ, Kamada M, Matthews CE, Harris TB, Lee IM. Reproducibility of accelerometer-assessed physical activity and sedentary time. Am J Prev Med. 2017;52(4):541–58.
16. Hutcheon JA, Chiolero A, Hanley JA. Random measurement error and regression dilution bias. BMJ. 2010;340:c2289.
17. Knuiman MW, Divitini ML, Buzas JS, Fitzgerald PE. Adjustment for regression dilution in epidemiological regression analyses. Ann Epidemiol. 1998;8(1):56–63.
18. Kelly P, Fitzsimons C, Baker G. Should we reframe how we think about physical activity and sedentary behaviour measurement? Validity and reliability reconsidered. Int J Behav Nutr Phys Act. 2016;13:32.
19. Freedman LS, Schatzkin A, Midthune D, Kipnis V. Dealing with dietary measurement error in nutritional cohort studies. J Natl Cancer Inst. 2011;103(14):1086–92.
20. Troiano RP, McClain JJ, Brychta RJ, Chen KY. Evolution of accelerometer methods for physical activity research. Br J Sports Med. 2014;48(13):1019–23.
21. Migueles JH, Cadenas-Sanchez C, Ekelund U, et al. Accelerometer data collection and processing criteria to assess physical activity and other outcomes: a systematic review and practical considerations. Sports Med. 2017;47(9):1821–45.
22. Choi L, Ward SC, Schnelle JF, Buchowski MS. Assessment of wear/nonwear time classification algorithms for triaxial accelerometer. Med Sci Sports Exerc. 2012;44(10):2009–16.
23. Choi L, Liu Z, Matthews CE, Buchowski MS. Validation of accelerometer wear and nonwear time classification algorithm. Med Sci Sports Exerc. 2011;43(2):357–64.
24. Lyden K, Keadle SK, Staudenmayer J, Freedson PS. A method to estimate free-living active and sedentary behavior from an accelerometer. Med Sci Sports Exerc. 2014;46(2):386–97.
25. Edwardson CL, Winkler EAH, Bodicoat DH, et al. Considerations when using the activPAL monitor in field-based research with adult populations. J Sport Health Sci. 2017;6:162–78.
26. Hertzmark E, Spiegelman D. The SAS ICC9 Macro. 2010 [cited 15 May 2018]. https://www.hsph.harvard.edu/donna-spiegelman/software/icc9/.
27. Spiegelman D, Schneeweiss S, McDermott A. Measurement error correction for logistic regression models with an “alloyed gold standard.” Am J Epidemiol. 1997;145(2):184–96.
28. Donaldson SC, Montoye AH, Tuttle MS, Kaminsky LA. Variability of objectively measured sedentary behavior. Med Sci Sports Exerc. 2016;48(4):755–61.
29. Matthews CE, Keadle SK, Troiano RP, et al. Accelerometer-measured dose-response for physical activity, sedentary time, and mortality in US adults. Am J Clin Nutr. 2016;104(5):1424–32.
30. Prentice RL, Willett WC, Greenwald P, et al. Nutrition and physical activity and chronic disease prevention: research strategies and recommendations. J Natl Cancer Inst. 2004;96(17):1276–87.
31. Schatzkin A, Subar AF, Moore S, et al. Observational epidemiologic studies of nutrition and cancer: the next generation (with better observation). Cancer Epidemiol Biomarkers Prev. 2009;18(4):1026–32.
32. Kipnis V, Subar AF, Midthune D, et al. Structure of dietary measurement error: results of the OPEN biomarker study. Am J Epidemiol. 2003;158(1):14–21; discussion 2-6.
33. Bennett DA, Landry D, Little J, Minelli C. Systematic review of statistical approaches to quantify, or correct for, measurement error in a continuous exposure in nutritional epidemiology. BMC Med Res Methodol. 2017;17(1):146.
34. White E, Hunt JR, Casso D. Exposure measurement in cohort studies: the challenges of prospective data collection. Epidemiol Rev. 1998;20(1):43–56.
35. Horn-Ross PL, Lee VS, Collins CN, et al. Dietary assessment in the California Teachers Study: reproducibility and validity. Cancer Causes Control. 2008;19(6):595–603.
36. Katzmarzyk PT. Physical activity, sedentary behavior, and health: paradigm paralysis or paradigm shift? Diabetes. 2010;59(11):2717–25.
37. Thorp AA, Owen N, Neuhaus M, Dunstan DW. Sedentary behaviors and subsequent health outcomes in adults a systematic review of longitudinal studies, 1996–2011. Am J Prev Med. 2011;41(2):207–15.
38. Patel AV, Maliniak ML, Rees-Punia E, Matthews CE, Gapstur SM. Prolonged leisure-time spent sitting in relation to cause-specific mortality in a large US cohort. Am J Epidemiol. 2018;187(10):2151–8.
39. Welk GJ, Schaben JA, Morrow JR Jr. Reliability of accelerometry-based activity monitors: a generalizability study. Med Sci Sports Exerc. 2004;36(9):1637–45.
40. Silva P, Esliger DW, Mota J, Welk G. Technical reliability assessment of the ActiGraph GT1M accelerometer. Meas Phys Educ Exerc Sci. 2010;14(2):79–91.
Keywords:

SEDENTARY; MODERATE-TO-VIGOROUS PHYSICAL ACTIVITY; STEPPING; RELIABILITY; ATTENUATION FACTOR

Supplemental Digital Content

Copyright © 2019 by the American College of Sports Medicine