The US Department of Health and Human Services (DHHS) recommends that pregnant women participate in 150 min of moderate-intensity aerobic activity per week, and the American College of Obstetricians and Gynecologists recommends 20 to 30 min of moderate exercise per day, most or all days of the week throughout pregnancy (1,2). However, studies have shown that only 13.8% of pregnant women meet the DHHS physical activity (PA) guidelines (3). Benefits from PA participation during pregnancy include a reduced risk of preeclampsia, gestational diabetes, excessive weight gain, and cesarean delivery, whereas no negative effects on the maternal–fetal dyad have been established (1,4–7). Although PA has been shown to benefit most women, additional research is needed to decide the optimal intensity and frequency of PA for women to participate in throughout pregnancy and the effects of this PA on maternal and birth outcomes (1). To achieve this and improve the PA guidelines for pregnant women, reliable and valid data collection techniques are required.
The validity of various devices, including the SenseWear armband (SenseWear), Omron pedometer (Omron), and ActiGraph accelerometer (ActiGraph), has been examined in pregnant women, but these studies only assessed their validity at one or two time points during pregnancy or used a second PA measurement device as the criterion measure, rather than using indirect calorimetry (8–11). It is important to consider how the reliability and validity of these devices can be affected by anatomical and physiological changes that occur in women during pregnancy. For example, Crouter et al. (12) determined that a pedometer’s validity was lower in an obese population because of the pedometer tilt angle at the hip. However, any PA device’s tilt angle would likely change throughout pregnancy when worn at the hip.
In addition to limited validity research, we found no published studies that examined the reliability of PA monitors when worn during pregnancy. This is an important omission because poor reliability significantly affects a device’s potential validity. In nonpregnant individuals, low reliability and validity results have been described when devices are worn while subjects walk at slow speeds (13,14). This finding has relevance to pregnant women who have been shown to lower their PA intensity as gestation progresses (15). There is also a lack of published research examining the reliability and validity of these devices prospectively from pregnancy through postpartum. This is significant because researchers often compare postpartum PA levels with those during pregnancy.
Limited research indicates that more information on the reliability and validity of device-measured PA assessment during multiple time points during pregnancy and postpartum is needed. Therefore, the purpose of this study was to determine the reliability and validity of three popular device-measured PA monitors—SenseWear, Omron, and ActiGraph—when worn in two consecutive weeks during the second and third trimesters, and 12 wk postpartum.
METHODS
Study population and recruitment
The sample consisted of 33 women (age, 29.6 ± 3.5 yr) recruited and enrolled before 20 wk of gestation from obstetrical care clinics, local health clubs, and word of mouth. Inclusion criteria included maternal age of ≥18 yr, nonsmoker, ability to read and speak English, and pregnancy considered low risk by the health care provider. Women provided written informed consent to complete six laboratory visits and allowed us to gather obstetric and neonatal records from the delivery hospital including gestational week of delivery, birth weight, 1- and 5-min Apgar scores, mode of delivery, and gestational weight gain. The study protocol was approved by the university’s institutional review board.
Equipment
Laboratory visits occurred in the Human Energy Research Laboratory at approximately 21 and 32 wk of pregnancy, and 12 wk postpartum. These time points were chosen because they are close to the middle of the second and third trimesters, and 12 wk postpartum is needed for women to return to their prepartum physiological state. There were two visits at each time period, 1 wk apart. Height and weight were measured on a calibrated wall stadiometer and electronic scale to the nearest centimeters and 0.1 kg, respectively.
The criterion instrument used for energy expenditure estimation was the Oxycon Mobile portable metabolic analyzer (CareFusion, Hoechberg, Germany), which measures breath-by-breath expired gasses, allowing for calculation of oxygen consumption (V˙O2) and carbon dioxide production. This system has been shown to be reliable and valid at various exercise intensities and is an appropriate criterion for the testing done in this study (16). Other devices worn included the ActiGraph (ActiGraph, LLC, Fort Walton Beach, FL; Model: GT3X+) and Omron (Omron Healthcare, Inc, Bannockburn, IN; Model: Hj-720it), which were placed on the anterior axillary line of the right hip and secured with an elastic belt. A second ActiGraph was placed on the lateral side of the right ankle. Finally, the SenseWear (BodyMedia, Inc, Pittsburgh, PA; Model: MF-SW) was worn over the left triceps.
Laboratory tasks
During each laboratory visit, participants completed seven different activity tasks for 5 min each, for a total of 35 min per session. These tasks were a mixture of activities of daily living and locomotor activities and were always completed in the same order, from the estimated lowest to highest intensity. With the exception of the treadmill walk, the women were instructed to complete the tasks at any speed or intensity they preferred. Less than 5% of participants took more than 1-min rest between tasks. The tasks included the following:
- Laundry: filled empty laundry basket with four towels at a table, walked to a second table three meters away, folded the towels into thirds. Repeated process.
- Dusting: dusted counters and shelves. The subject had the choice to move objects.
- Sweeping: with a broom, subject swept confetti from one cone to another, 3 m apart. Repeated process.
- Child care: the subject picked up toys from the ground as a research assistant tossed them within cones 3 m away.
- Hallway walking: two cones were placed 31 m apart. The subject walked from one cone to another and back at a self-selected speed. Repeated process.
- Treadmill walking: Walked on treadmill at 3 mph. Option was given to hold railings.
- Aerobics: Pregnancy aerobic warm-up video was completed. Examples of tasks include squats, lunges, and stretches.
Data collection and reduction
Minutes 2.5–4 of each task were used to estimate Oxycon (mL·kg−1·min−1) and ActiGraph steady-state data (counts per minute). Oxycon-measured breath-by-breath expired respiratory gasses were collected, and 30-s time periods were averaged in relative terms (mL·kg−1·min−1). ActiGraph raw data were collected at a sampling rate of 30 Hz, and data were reintegrated to 1-s epochs. Omron steps were calculated manually from the difference of steps between the end and start of each 5-min task, which were recorded on a datasheet during each laboratory visit. SenseWear kilocalorie and step data were analyzed for the entire 35-min visit at each time point and not for individual tasks because of proprietary limitations of the SenseWear program.
Statistical analysis
Reliability and validity analyses, as illustrated in the following text, were completed for all laboratory visits and devices (visits 1 and 2, visits 3 and 4, visits 5 and 6). However, for ease of understanding, reliability and validity analyses will be explained using the Omron data collected during the first and second laboratory visits. Also, reliability and validity analyses are presented for the entire 35-min visit at each time point because we were most interested in results of the PA measurement devices worn for a variety of activities that women might be completing in a day.
Reliability
Total steps taken in visit 1 were compared with the total steps taken in visit 2 as determined by the Omron. Interclass reliability (Rxx) was calculated via Pearson correlation. Both multitrial and single-trial intraclass reliabilities were estimated using ANOVA, where Rxx = (MSs − MSe)/MSs for two trials and (MSs − MSe)/(MSs + MSe) for single trial. MSs was the mean square for subjects, MSe was the mean square for error, and Rxx was the reliability coefficient for the measures. SEM values were calculated as SEM = Sx × SQRT(1 − Rxx), where Sx was the SD of the measures. SEM values were calculated in relative (%) terms for the total visits.
Validity
Each individual’s total steps taken for the seven tasks were averaged between visits 1 and 2. Total step values were then compared with the criterion measure Oxycon V˙O2 results (which were calculated the same way as described for the Omron results) in relative (mL·kg−1·min−1) terms using Pearson correlation.
RESULTS
Participants were an average of 167.1 cm tall, and 72.8, 78.3, and 69.6 kg at 21 and 32 wk of gestation and 12 wk postpartum, respectively. Because of missed appointments or device malfunction, a sample size between 23 and 27 was used for the reliability analysis for each device for visits 1 through 4, but a sample size of 19–25 was used for visits 5 and 6. For the validity analysis of each device, a sample size of 32–33 was used for visits 1 and 2, 30–31 was used for visits 3 and 4, and 26–28 was used for visits 5 and 6. Table 1 shows the means and SD of each device’s recordings at each time point.
TABLE 1: Means and SD at each time point, for each device.
Reliability
Interclass correlation reliability results for each device are presented in Table 2. The reliability of the devices was moderate to strong because 66% (n = 12/18) of the Pearson correlations were between 0.6 and 1.0 (17). Only two of six correlation coefficients examining the relationship between visits 5 and 6 were greater than 0.6.
TABLE 2: Interclass reliability coefficients (via Pearson correlation) for the entire 35-min visit, at each time point, for each device.
Multitrial intraclass correlation coefficients are presented in Table 3. Values were generally higher in comparison with the interclass correlations and largely in the moderate to strong range because 38% (n = 7/18) of the intraclass reliabilities (ICC) were between 0.6 and 0.79 and 50% (n = 9/18) were between 0.8 and 1.0 (88% are greater than 0.6) (17). Devices had the lowest reliability when worn at visits 5 and 6 because 83% (n = 5/6) of the six correlations were less than 0.8. As expected, single-trial reliability coefficients were slightly lower for all devices and study time points compared with multitrial values. Forty-four percent (n = 8/18) of the single trial coefficients were between 0.6 and 0.79; however, only 27% (n = 5/18) were between 0.8 and 1.0 (Table 4). The hip ActiGraph and both the steps and kilocalories calculated by the SenseWear had the lowest single-trial reliability coefficients for visits 5 and 6, whereas the Omron had the lowest coefficients at visits 3 and 4. The ankle ActiGraph had the lowest single-trial reliability for both visits 1 and 2. SEM values represent how repeated measures on the same instrument vary from the theoretical “true” value. Table 5 depicts the SEM values for each device between visits, which ranged from 7% to 23% of the mean values. For four of the six devices, SEM values were highest when worn at visits 5 and 6.
TABLE 3: Multitrial intraclass reliability coefficients (via ANOVA) for the entire 35-min visit, at each time point, for each device.
TABLE 4: Single trial reliability coefficients for the entire 35-min visit, at each time point, for each device.
TABLE 5: SE of measurement expressed in units and percent of the mean units (%) for the entire 35-min visit, at each time point, for each device.
Validity
For the validity analysis, each device was compared with the criterion Oxycon V˙O2 results in relative (mL·kg−1·min−1) terms. Comparison between relative V˙O2 and devices showed that 40% (n = 6/15) and 46% (n = 7/15) of the validity coefficients were between 0.4 and 0.59 and between 0.6 and 0.79, respectively (Table 6).
TABLE 6: Validity coefficients (via Pearson correlation) for the entire visit, for each time point, for each device when compared with relative V˙O2.
DISCUSSION
The purpose of this study was to determine the reliability and validity of three popular PA measurement devices worn at multiple time points during pregnancy and postpartum using a variety of lifestyle and locomotor activities. It has been well over a decade since reliability and validity of PA measurement devices have been evaluated systematically during pregnancy, and the technology has changed significantly since then (18). It was not our purpose to predict a particular outcome measure (V˙O2, counts, steps, etc) but rather determine the relationship between different devices measuring a conglomeration of different movements that are similar to activities of daily living.
Reliability
Compared with validity, fewer studies have been conducted on the reliability of the PA devices evaluated in this investigation, and we found no published reports of their reliability when worn during pregnancy. Thus, direct comparison with previous studies performed throughout gestation is not possible. In the present study, the Omron had a large range of ICC values across visit time points (r = 0.4–1.0). The triaxial ActiGraph, worn at the ankle and hip during our study, produced overall high ICC values (r = 0.6–0.9). The ActiGraph worn at the ankle had almost as high of reliability as indirect calorimetry as determined by low SEM values (average percent of the mean, 9.8 (ActiGraph) and 7.7 (relative V˙O2)). The SenseWear consistently had lower reliability at visits 5 and 6 (32 wk of gestation), as shown by the single and multitrial reliability coefficients and SEM values. However, it produced high ICC values (r = 0.8–0.9) at the other visit time points. This implies that the SenseWear may not be the ideal choice of device if data collection can only occur in women’s third trimester.
Overall, results from the current study showed that three commonly used PA measurement devices have similar reliabilities at all study time points during pregnancy and postpartum when completing various lifestyle and locomotor activities. When the activity and intensity are the same for all women, differences in reliability are largely a function of biological variability or device measurement error. Our study participants were instructed to complete the tasks at any intensity they preferred, with the exception of walking on the treadmill where speed was set at 4.8 kph (3 mph). Although this research focused on the reliability of the devices for a variety of activities, we calculated SEM for the activity that was standardized for everyone (treadmill walking). It would be expected that SEM would be lowest for the treadmill walking, compared with the activities overall, and less influenced by differences in the participant’ effort from visit to visit. This was indeed the case, except for the ActiGraph worn at the hip, which had the lowest SEM percent mean for walking in the hallway; however, there was only a 2.5% difference between the treadmill and hallway walk tasks (13.2% and 10.7%, respectively). The average SEM percent mean for the devices was 8.3%, 10.8%, 20.3%, 33.3%, 52.1%, 74.7%, and 96.4% for the treadmill, hallway walk, aerobic video, child care, laundry, sweeping, and dusting, respectively. Also, although it had a large range of ICC values, the Omron had the highest reliability of the devices for the treadmill as determined by its SEM (percent of the mean, 4.6%). The significance of these findings is that although the devices seem to be reliable for a set intensity, even a very reliable device may be affected by a women’s chosen activity intensity on a given day.
In addition, it is not surprising that overall, intraclass reliabilities were determined to be higher than interclass. At moderate activity intensity, variability of counts, steps, or kilocalories among women was fairly small. Thus, the limited range of responses tends to result in lower interclass reliability determined via Pearson correlations.
Single-trial reliability coefficients were lower than the intraclass coefficients for all devices and time points (Table 4). The ankle ActiGraph had the lowest single-trial reliability at 21 wk of gestation, the Omron at 32 wk of gestation, and the hip ActiGraph and SenseWear kilocalories and steps at 12 wk postpartum. This is important for researchers to consider if they are only able to collect data on participants one time, during one trimester. However, 72% (n = 13/18) of the coefficients were greater than 0.6, representing that these devices have moderate to strong single-trial reliability overall and data could potentially only be collected once at each time point for each participant (17).
The devices seem to be most reliable when worn during the third trimester, when the women’s adnominal area is most solid. We hypothesize this to be because the devices would be less likely to move around the belt when pressed against a firmer abdominal area late in pregnancy compared with that when the women have much less weight gain in the second trimester and postpartum. This would allow for very little variation between the two visits and therefore high reliability.
Validity
The SenseWear seems to be a valid instrument when compared with indirect calorimetry in both pregnant and nonpregnant subjects (11,19). However, similar to reliability results, some studies have found the SenseWear to be less valid at slower, compared with faster, walking speeds (13,20). Because of limitations of the SenseWear program, individual tasks could not be compared between indirect calorimetry and the SenseWear in our study. Overall, the SenseWear was the least valid device when kilocalorie data were compared with relative V˙O2 (average r = 0.39), but had comparable validity to the ActiGraph placed at the ankle and hip when step data were used (r = 0.66).
Omron validity has been shown to be high when worn by pregnant and nonpregnant populations when compared with manual counting as a criterion measure (8,21). One study showed lower correlations for slower treadmill walking speeds when the Omron was worn in the pants pocket, but not if it was worn as a necklace or in a carrier bag, when compared with manual counting (14). In the current study, the Omron was compared with indirect calorimetry, not manual counting, but showed correlations between r = 0.40 and 0.59 for the total visit, which was similar to the other devices examined.
We were not able to locate any published studies examining the validity of the ActiGraph compared with indirect calorimetry when worn by pregnant women. However, level of agreement for step counting between the uniaxial ActiGraph and pedometers or manual counting has been shown to be moderate to strong when worn during pregnancy (8–10). This agrees with the current study because both the ActiGraph and Omron produced similar correlation results compared with indirect calorimetry.
Crouter et al. (22) examined the relationships between a variety of uniaxial ActiGraph regression equations and energy expenditure determined by indirect calorimetry when nonpregnant subjects completed an assortment of activities. The authors concluded that the equations are only valid for the activities and populations for which they were developed and do not work well for a wide range of intensities. A pregnancy-specific calibration equation relating ActiGraph counts per minute to energy expenditure has not been published, and this must be considered when interpreting ActiGraph results.
The validity of the SenseWear and ActiGraph has been examined when nonpregnant participants completed household or lifestyle activities when compared with indirect calorimetry. During 120 min of household and sport activities, the SenseWear had a strong relationship to indirect calorimetry (ICC = 0.73), whereas the correlation between the uniaxial ActiGraph and indirect calorimetry was 0.55 (19). These results agree with validity results reported by others between the uniaxial ActiGraph and indirect calorimetry during lifestyle activities (23,24). Our study shows that the triaxial accelerometer showed similar validity to the other PA devices examined when compared with the criterion indirect calorimetry. Overall, it is difficult to make direct comparisons with previously published studies because a variety of data analyses have been used, and no studies have been published comparing indirect calorimetry results with those of ActiGraphs when worn during pregnancy.
A previous study stated that the accuracy of a device is negatively affected by increasing tilt angle (12). It is assumed that the devices worn at the hip’s tilt angle would be most affected at the 32-wk time period compared with the other two data collection points. Although hip circumference was not measured in this study, the potential change in tilt angle did not seem to affect the validity of the devices worn at the hip because similar validity coefficients were calculated during pregnancy to those during postpartum.
Our study is unique in that multiple PA measurement devices were worn by women performing various lifestyle and locomotor activities, at multiple time points during pregnancy and postpartum. Indirect calorimetry was also used as the criterion measure, rather than a second PA device. No previously published studies have examined the reliability of the devices when worn during pregnancy, and only a few have studied the validity. However, there were limitations to this study. Although participants completed a variety of everyday activities, they were performed in a laboratory environment. Although this improves study internal validity, caution must be taken when comparing with free-living results. Also, because of proprietary limitations with the SenseWear program, results of individual tasks could not be compared directly with indirect calorimetry results. Finally, activities performed by our study participants were mostly performed at a moderate intensity, so we are not sure how well the devices would perform for women completing more vigorous PA during pregnancy. However, from a public health perspective, researchers are more likely to have an interest in measuring moderate PA when performed during pregnancy because most women are not meeting the DHHS and American College of Obstetricians and Gynecologists PA recommendations.
CONCLUSIONS
Overall, the PA measurement devices examined in our study showed moderate to strong reliability and validity during pregnancy and postpartum. Intraclass reliability showed very similar values among the devices. The SenseWear had slightly higher interclass reliability during pregnancy, but much lower when worn postpartum. However, it is important to note that the SenseWear is no longer available for purchase. The hip and ankle ActiGraphs had consistently moderate to strong reliability at all test time points, and the Omron had slightly lower reliability at 32 wk of gestation compared with 21 wk of gestation and 12 wk postpartum. The ActiGraphs also had slightly higher validity results than the other devices. Taken together, these results support the use of any of these devices when conducting research on moderate PA during pregnancy, and we believe that pregnant women’s PA levels can be compared across studies that have used the different devices evaluated here. However, the reliability and potentially validity of the devices may be affected if higher activity intensity occurs. In the future, for researchers interested in validating PA questionnaires, devices evaluated in the present study seemed to provide similar validity in our study conditions.
No acknowledgements or funding sources to report.
The results of this study do not constitute endorsement by the American College of Sports Medicine and are presented without fabrication, falsification, or inappropriate data manipulation. All authors declare that they have no professional relationship with companies or manufacturers who will benefit from the results of the present study.
REFERENCES
1. ACOG Committee Opinion No. 650: Physical activity and exercise during pregnancy and the postpartum period.
Obstet Gynecol. 2015;126(6):e135–42.
2. United States Department of Health and Human Services.
2008 Physical Activity Guidelines for Americans. Washington (DC): U.S. Department of Health and Human Services; 2008. p. 76. Available from: U.S. GPO, Washington.
3. Evenson KR, Wen F. National trends in self-reported physical activity and sedentary behaviors among pregnant women: NHANES 1999–2006.
Prev Med. 2010;50(3):123–8.
4. Aune D, Saugstad OD, Henriksen T, Tonstad S. Physical activity and the risk of preeclampsia: a systematic review and meta-analysis.
Epidemiology. 2014;25(3):331–43.
5. Aune D, Sen A, Henriksen T, Saugstad O, Tonstad S. Physical activity and the risk of gestational diabetes mellitus: a systematic review and dose-response meta-analysis of epidemiological studies.
Eur J Epidemiol. 2016;31:967–97.
6. Da Silva SG, Ricardo LI, Evenson KR, Hallal PC. Leisure-time physical activity in pregnancy and maternal-child health: a systematic review and meta-analysis of randomized controlled trials and cohort studies.
Sports Med. 2017;47(2):295–317.
7. Domenjoz I, Kayser B, Boulvain M. Effect of physical activity during pregnancy on mode of delivery.
Am J Obstet Gynecol. 2014;211(4):401.e1–11.
8. Connolly CP, Coe DP, Kendrick JM, Bassett DR Jr, Thompson DL. Accuracy of physical activity monitors in pregnant women.
Med Sci Sports Exerc. 2011;43(6):1100–5.
9. Harrison CL, Thompson RG, Teede HJ, Lombard CB. Measuring physical activity during pregnancy.
Int J Behav Nutr Phys Act. 2011;21(8–19.
10. Kinnunen TI, Tennant PW, McParlin C, Poston L, Robson SC, Bell R. Agreement between pedometer and accelerometer in measuring physical activity in overweight and obese pregnant women.
BMC Public Health. 2011;11:501.
11. Smith KM, Lanningham-Foster LM, Welk GJ, Campbell CG. Validity of the SenseWear® Armband to predict energy expenditure in pregnant women.
Med Sci Sports Exerc. 2012;44(10):2001–8.
12. Crouter SE, Schneider PL, Bassett DR Jr. Spring-levered versus piezo-electric pedometer accuracy in overweight and obese adults.
Med Sci Sports Exerc. 2005;37(10):1673–9.
13. Brazeau AS, Beaudoin N, Belisle V, Messier V, Karelis AD, Rabasa-Lhoret R. Validation and reliability of two activity monitors for energy expenditure assessment.
J Sci Med Sport. 2016;19(1):46–50.
14. De Cocker KA, De Meyer J, De Bourdeaudhuij IM, Cardon GM. Non-traditional wearing positions of pedometers: validity and reliability of the Omron HJ-203-ED pedometer under controlled and free-living conditions.
J Sci Med Sport. 2012;15(5):418–24.
15. Borodulin KM, Evenson KR, Wen F, Herring AH, Benson AM. Physical activity patterns during pregnancy.
Med Sci Sports Exerc. 2008;40(11):1901–8.
16. Carter J, Jeukendrup AE. Validity and reliability of three commercially available breath-by-breath respiratory systems.
Eur J Appl Physiol. 2002;86(5):435–41.
17. Jackson SL.
Research Methods and Statistics: A Critical Thinking Approach. 5th ed. Belmont (CA): Wadsworth Cengage Learning; 2016. pp. 508.
18. Stein AD, Rivera JM, Pivarnik JM. Measuring energy expenditure in habitually active and sedentary pregnant women.
Med Sci Sports Exerc. 2003;35(8):1441–6.
19. Berntsen S, Hageberg R, Aandstad A, et al. Validity of physical activity monitors in adults participating in free-living activities.
Br J Sports Med. 2010;44(9):657–64.
20. Machač S, Procházka M, Radvanský J, Slabý K. Validation of physical activity monitors in individuals with diabetes: energy expenditure estimation by the multisensor SenseWear Armband Pro3 and the step counter Omron HJ-720 against indirect calorimetry during walking.
Diabetes Technol Ther. 2013;15(5):413–8.
21. Lee JA, Williams SM, Brown DD, Laurson KR. Concurrent validation of the Actigraph GT3X+, Polar Active accelerometer, Omron HJ-720 and Yamax Digiwalker SW-701 pedometer step counts in lab-based and free-living settings.
J Sports Sci. 2015;33(10):991–1000.
22. Crouter SE, Churilla JR, Bassett DR Jr. Estimating energy expenditure using accelerometers.
Eur J Appl Physiol. 2006;98(6):601–12.
23. Swartz AM, Strath SJ, Bassett DR Jr, O’Brien WL, King GA, Ainsworth BE. Estimation of energy expenditure using CSA accelerometers at hip and wrist sites.
Med Sci Sports Exerc. 2000;32(9 Suppl):S450–6.
24. Welk GJ, Blair SN, Wood K, Jones S, Thompson RW. A comparative evaluation of three accelerometry-based physical activity monitors.
Med Sci Sports Exerc. 2000;32(9 Suppl):S489–97.