Accurate and uninterrupted measurement of various human movement behaviors across complete (24 h) days is essential in understanding the interactions between these behaviors and their impact on health and well-being (1). Over the last decade, accelerometers have been the most preferred device-based measure to assess these behaviors (2,3). However, traditional methods for processing raw accelerometer data have several limitations including ambiguous intensity threshold decisions, proprietary algorithms, and lack of activity type measures (4). Furthermore, these approaches may not be suitable to assess 24-h movement patterns (5). To overcome these challenges and facilitate accurate estimates of physical activity, researchers have moved toward advanced processing methods involving the combined application of raw accelerometer data and various machine-learning algorithms (6).
Rapidly growing interest in this field has spurred researchers to evaluate the performance of several machine-learning algorithms for predicting physical activity components (activity type and intensity) under various study conditions (e.g., accelerometer placement positions, and the number of accelerometers used concurrently). One of the key opportunities of machine learning is the ability to use multiple sensors to improve the detection of human movement. Traditional processing methods do not allow for integration of raw accelerometer data from multiple units. Several machine-learning studies have evaluated the efficacy of more than one accelerometer (up to nine sensors (7)), and various accelerometer placement combinations (e.g., wrist, waist, back, and thigh) for classification of physical activity behaviors. However, increasing the number of sensors may affect compliance due to increased participant burden. Single wrist-worn devices are becoming popular due to improvements in device wear time (8), yet the optimal placement site (or combination of placement sites) that offers high compliance and can effectively discern various movement behaviors is currently unknown.
Although machine learning techniques offer considerable promise in detecting various physical activity behaviors, their application is currently limited in free-living conditions. Most machine-learning studies have been conducted in laboratory settings (6), which are controlled environments and may not be sensitive to the intricacies of movement in free-living settings. In fact, several studies have revealed that machine-learning models developed in laboratory conditions demonstrate poor performance when tested outside the laboratory (9,10). A recent validation study conducted in a controlled laboratory environment used a random forest machine learning classifier to achieve exceptional accuracy (>99%) in classifying six physical activity types in both adults and children using a thigh and back accelerometer (11). Although these results are promising, their validity in settings outside of the laboratory remains unknown. Therefore, the purpose of this study is 1) to investigate the criterion validity of this dual-accelerometer system (back–thigh) in semi free-living conditions, and 2) examine the efficacy of other accelerometer placement combinations (e.g., back–wrist, thigh–wrist) for classifying physical activity and sedentary behaviors in children and adults.
Children (age, 6–15 yr) and their parents were invited to participate in this study through advertisements at a local school and on the university campus. The children’s parents contacted the research team if they were interested in participating. Participants were deemed eligible if they were free from disability and were able to perform a range of physical activities in their free-living environment. Before participation, each parent and child gave their written informed consent and assent (respectively). All participants received a gift voucher to reimburse them for their time. Ethical approval was obtained from the AUT University Ethics Committee (18/99).
Data collection initially involved a visit to the university campus for approximately 10 to 15 min. Upon arrival, the study protocol was explained to each participant, before they were equipped with three Axivity AX3 accelerometers (Axivity, York, UK) and a wearable camera (SnapcamLite, iON Ltd, UK). One accelerometer was positioned on the anterior aspect of their thigh (midway between knee and hip), one was positioned on their lower back, and the third one on their dominant wrist. These were placed on the same side as the participant’s handedness (left or right). Both the back and thigh sensors were attached using purpose-made hypoallergenic adhesive foam pouches (Herpa Tech, Stockholm, Sweden) (12), whereas the wrist sensor was attached using an Axivity silicon wrist band. Finally, the wearable camera was clipped to the participant’s clothing lapel. After being equipped with the instruments, participants left the facility and were encouraged to perform a variety of physical activities in their free-living environment for a period of 2 h (duration limited by battery life of wearable camera). To obtain a variety of human movement behaviors within a limited timeframe, participants were provided with a list of activities (Table 1) to guide them. However, these were not strictly enforced, and participants were generally encouraged to carry out their everyday free-living activities. Participants later returned to the university campus where the instruments were collected.
The Axivity AX3 is a small (23 × 32.5 × 8.9 mm; 11 g) waterproof triaxial accelerometer with a configurable sampling frequency between 12.5 and 3200 Hz, and a bandwidth range between ±2g and ±16g. The accelerometer has an internal memory of 512 MB that can store 14 d of continuous acceleration data sampled at 100 Hz. It also incorporates a real-time quartz clock and a skin temperature sensor (range, 0°–40°C) which can be used for accurate wear time detection (12). The accelerometers used in this study were configured to record at 100 Hz with ±8g of bandwidth. A total of 12 individual sensors were used in this study, of which three (for the back, thigh, and wrist placements) were randomly assigned to each participant. All sensors were configured and downloaded using OmGui (version 22.214.171.124; Open Movement, Newcastle University, UK).
The SnapCam Lite is a small (42 × 42 × 13.4 mm, 25.6 g) wearable clip-camera that can record both photos (in intervals of 30 s) and videos at 720p (30 frames per second). The camera has a MicroSD storage (up to 32 GB) and battery capacity to record continuous video for approximately 2 h. In this study, the cameras were configured to record videos of the free-living environment as the direct observation criterion measure. Video recordings were then used to generate ground truth activity labels used in the model training process.
Data preprocessing and feature generation
The accelerometer data and the concurrent video footage were time synchronized using a marker in the sensor data. This was achieved by identifying a clear postural transition (e.g., sit to stand) in every participant’s recorded video, and visually inspecting the concurrent accelerometer data for a resultant change in signal. This process enabled the exact alignment of sensor timestamp with video frame during data processing.
The raw data from AX3s were downloaded and imported into MATLAB (release 2017b; The MathWorks, Inc., Natick, MA). The sensor data were resampled to 100 Hz using a cubic interpolation as the sensor sample rate is known to fluctuate (13). To ensure measurement reliability, the sensors were calibrated; and passed through a 25-Hz Butterworth low pass filter to eliminate skin and clothing artefact. A detailed description of this process can be found in our previously published work (11).
To generate ground truth activity labels, each participant’s video recording was annotated using the “Simple Video Coder” annotation software. First, a configuration file was generated where each activity label (to be annotated) was assigned to a hotkey on the keyboard (e.g., 1, sitting; 2, standing). Activities were then annotated by watching the video footage and pressing the corresponding hotkey at the start of the activity, and again at the end of the activity. This annotation process was repeated for every participant’s video and the start and stop times of all activities were then exported to a spreadsheet. A more detailed description of the software is available elsewhere (14). To ensure each 5-s epoch of accelerometer data was matched to a valid ground truth label, activities that were less than 5 s in duration were excluded (e.g., the first and/or last epoch of each activity). This also meant transitional activities (e.g., sit-to-stand transitions) were excluded from being annotated. This was considered beyond the scope of our study design as it will require a different data processing strategy (e.g., segmentation of data into windows shorter than 5 s). Furthermore, annotating transitional activities can be challenging depending on the transition type (e.g., labeling a walk-to-run transition is more difficult than a sit-to-stand transition).
Activities performed by adults and children were grouped into seven distinct activity classes that occur over a 24-h day. All sedentary activities were annotated as either sitting, lying, or standing, whereas ambulatory activities were annotated as either walking, running, or cycling. Any standing activity that occured with slight movement (e.g., household tasks, vacuuming, washing) was annotated as “dynamic standing.” All ambulatory activities performed by children that were not running, walking or cycling (e.g., trampoline jumping, playing in a park, swinging) were grouped into a “dynamic movement” activity class, resulting in eight different activity labels for children.
Feature generation is an important phase in machine learning where several predictive properties (features) of the raw accelerometer signal are extracted. The data were first segmented into 5-s nonoverlapping epochs (windows). Various time- and frequency-domain features were calculated over each epoch for each accelerometer pair (i.e., thigh–back, back–wrist, and thigh–wrist) individually. In line with our past work (11), a total of 142 features were generated which comprises both the time and frequency components of the signal. The time domain features include the mean, median, standard deviation, magnitude, coefficient of variation, minimum, maximum, 25th and 75th percentiles, skewness, kurtosis, axis correlations (between-axis and between-sensor), and roll, pitch, and yaw, whereas the frequency domain features include the dominant frequency, signal power (calculated using fast Fourier transform). These features were computed for each sensor (across three axes) and between sensors (where applicable).
The machine-learning algorithm used in this study was an ensemble learner called the random forest, which is a collection of many individual decision trees (15). Each decision tree is generated using a bootstrap sample of the training data. To increase diversity among the trees, a random subset of features (m) is selected from the full data set at each node split in each tree. The feature which maximizes information gain is selected for the split. This random feature selection also prevents overfitting the training data (16). Each tree outputs a class (activity type) prediction for each observation, which are tallied across all trees to select the final class prediction by majority vote.
Model building and analyses were performed separately for the adult and child samples (for each accelerometer combination), resulting in six different models. All classifiers were trained, tuned, and validated in R version 3.5.1 (17) using the “randomForest” package (18). The optimal random forest tuning parameter (mtry), which is the number of randomly selected features eligible for each node split was identified by evaluating model performance with several mtry values; mtry = 3 was selected as it maximized classification performance. Similarly, the number of trees in each forest (ntree) was set at 350, as there was no improvement in model performance beyond this number.
The predictive performance of each model was evaluated using leave-one-out cross-validation (LOOCV). This is a type of cross-validation where the model is trained on all participants’ data except one, which is left out and considered as the test set. Overall model performance is estimated by repeating this process for each participant in the data set, averaging the results. This validation method was chosen as it determines model performance based on independent data, and hence may be less biased. For each of the activity-class predictions, the sensitivity, specificity, and balanced prediction accuracy were calculated. Sensitivity refers to the ability of the model to correctly classify the activity when the activity is present (i.e., true positive). Specificity refers to the ability of the classifier to reject the activity when it is not present (i.e., true negative). The balanced prediction accuracy for each activity is calculated as the mean of sensitivity and specificity.
A sample of 15 children (mean age, 10.0 ± 2.6 yr; 66.6% male) and 15 adults (mean age, 31.5 ± 10.8 yr; 33.3% male) successfully completed the study. In total, there were 18,239 5-s epochs coded with activity class obtained from the adult sample, whereas 15,256 were obtained from the child sample. Three different machine-learning models were developed using different placement combinations (thigh–back, thigh–wrist, and back–wrist) for both children and adults (six models in total). The random forest training and validation process for each model took, on average, 12.3 and 11.7 min to complete for the adult and child samples, respectively. Model training took place on a computer system with an Intel Xeon E5-1620 v3 CPU, and 32 GB of RAM.
Tables 2 and 3 illustrate the accuracy metrics of each activity class for the adult and child sample (respectively) when three different placement combinations were evaluated (thigh–back, thigh–wrist, and back–wrist).
Figure 1 compares the balanced accuracies achieved by the machine-learning models in detection of each activity in both the adult and child samples. Overall, the back–thigh model achieved the highest LOOCV accuracy (across all activity classes) of 95.6% (95% CI, 95.3–95.9) for the adult sample, and 92% (95% CI, 91.6–92.4) for the child sample (Fig. 2). The lowest performance was observed for the model generated using the back–wrist combination in the adult sample (75.4%; 95% CI, 74.9–76.2).
Table 4 presents the confusion matrices of model performance from the back–thigh combination. The confusion matrix for this sensor combination is presented given it performed the best. Confusion matrices are available as supplemental content for the back–wrist sensors (see Table, Supplemental Digital Content 1, confusion matrices for back–wrist, http://links.lww.com/MSS/B696) and the thigh–wrist sensors (see Table, Supplemental Digital Content 2, confusion matrices for thigh–wrist, http://links.lww.com/MSS/B697). These matrices present the number of 5-s epochs that are correctly classified or misclassified for each activity class. Standing and dynamic standing were the two main areas of confusion (more than 300 epochs) in the adult sample, whereas dynamic movement had the highest number of misclassifications and was confused with most other activities in the child sample. In contrast, running and lying activities had the least number of misclassifications in both the adult and child samples.
This study investigated the validity of a thigh–back dual-accelerometer system for classifying human movement behaviors in children and adults in semi free-living conditions and evaluated the efficacy of other accelerometer placement combinations. This study builds upon previous work which illustrated exceptional classification performance in laboratory conditions (11). Our results indicate that the machine-learning model developed using the thigh and back accelerometer achieved the highest overall accuracy (at least 11% higher than other tested dual-accelerometer systems) and was able to discern seven distinct activity classes with 95.6% accuracy in the adult sample, and eight distinct activity classes with 92% accuracy in the child sample. The other placement combinations achieved an overall balanced accuracy ranging between 75% and 84.5%. The back–thigh combination clearly outperformed other combinations when classifying sedentary activities, such as sitting and lying in both samples. This is probably because these placement sites simultaneously capture the orientation of the upper and lower bodies and, hence, can effectively discriminate various upright and nonupright postures (e.g., sitting vs standing). Contrastingly, all placement combinations performed well in classifying ambulatory activities (cycling, running, and walking). The thigh–wrist combination performed marginally better for classifying running and cycling activities in the child sample. Dynamic standing was also slightly better classified with this combination in both samples. This is somewhat expected, as standing with movement (e.g., doing household tasks, such as washing dishes) may also involve sensitive hand or arm movements which are effectively captured by the wrist sensor. Lastly, all three combinations performed similarly for classifying dynamic movement in children.
Although several studies in the past have used various machine learning algorithms to classify accelerometer data into activity types, only a few have been conducted in free-living conditions (6). Furthermore, most of these free-living studies have used data from single or many (3+) accelerometers. Single accelerometers may be less intrusive for participants, improving compliance; however, there may be a performance trade-off. Ellis et al. (19) classified four distinct activities (sitting, vehicle time, standing, walking/running) in free-living conditions using a random forest classifier (coupled with a Hidden Markov Model) with a performance accuracy of 84.6% using a single wrist-worn accelerometer. Similarly, another free-living study in 132 adults achieved an overall accuracy of 87% in classifying six distinct activity classes (sleep, vehicle time, sitting/standing, bicycling, walking, mixed activity) from a single wrist-worn sensor (20). Other free-living studies that used a single hip-worn accelerometer demonstrated moderate performance (~80%) in classifying five to six activity classes (sitting, riding in vehicle, walking, cycling, standing still, standing moving) using a random forest classifier (21,22). The results observed in the present study are seemingly higher than previous single-accelerometer studies. The machine-learning models developed in this study were trained with features extracted from two accelerometers worn simultaneously, unlike these studies that have trained their models using features generated from a single accelerometer (worn on the hip, wrist, or back).
Several studies that have used data from multiple accelerometers have exhibited very high classification accuracy. For instance, Fullerton et al. (7) used nine body-worn accelerometers to achieve an accuracy of 97.6% in classifying eight different activities of daily living, and Gao et al. (23) used four sensors to classify five free-living activities with an accuracy of 96.4%. Although multiple sensors demonstrate high performance, these protocols are likely to be impractical in larger studies. Evidently, the similarly high accuracy achieved in the present study using dual sensors represents a promising step, and when combined with previous wear time compliance results (12,24), this approach may provide an optimal balance between compliance and model performance in monitoring and understanding 24-h time-use behaviors. Even so, there are several other factors which contribute to the feasibility of this dual-accelerometer system. The cost of equipment per participant is essentially doubled, and generating features from multiple accelerometers (as opposed to one) requires more computation time and resources.
A strength of the present study is the inclusion of both children and adults. Most previous machine-learning models developed from free-living data are specific to adults (inclusive of older adults). Children tend to have varied movement patterns when compared with adults (25), hence, it may be essential to train individualized machine-learning models. Nonetheless, the current study sample was confined to healthy adults and children of specific age ranges and did not include clinical populations. The generalizability and interchangeability of machine-learning models across different population groups (e.g., young children, older adults, clinical groups) is not well understood and is an area for future work.
Although most free-living studies have classified distinct sedentary and ambulatory activities, not many have identified light intensity activities (standing with movement or dynamic standing) that occur during household tasks. The classification of these behaviors is another strength of the present study. We were able to capture these light intensity activities due to our novel approach for obtaining ground truth video captured by wearable cameras. Most free-living studies have obtained ground truth labels by annotating images captured in intervals (20 or 30 s). Although static images are a form of direct observation, they may be captured too infrequently to distinguish activities such as dynamic standing. Furthermore, they may miss exact transitions between activities and can introduce error into the activity labels. However, the limited battery life of small and portable wearable cameras prevents longer periods of video recording. Shorter periods of video recording have also limited the scope of the present study for capturing free-living patterns of time use and prevent the application of some machine learning algorithms. For example, the Hidden Markov Model has been used to improve prediction accuracy by learning the probabilities of transitioning from one activity to another (19), but these methods are only applicable with longer measurement durations where patterns of time use can be learned. Future advancements in wearable camera technology may enable longer periods of recording that will allow researchers to better understand and estimate free-living movement patterns.
Although we demonstrated high accuracy in classifying various activity types with two sensors in free-living conditions, our study design was limited to activity types. This meant that the intensity component of physical activity was not examined. For example, different speeds of walking, running, and cycling yield different levels of energy expenditure and can be highly variable between individuals. Therefore, it is essential that future work explores an integrated measurement system that can concurrently capture both the intensity and type components of activity. However, obtaining reliable ground truth criterion measures for intensity is challenging in free-living conditions. A further limitation of our study design was the intentional omission of behavioral and postural transitions (e.g., sit-to-stand). The first and/or last epoch of each activity were removed from the training data to ensure that every 5-s epoch did not contain two distinct activity types, and therefore had a valid ground truth label. Accordingly, the results presented in this study represent the accuracy of single-class epochs; yet, transitions between different activities is a fundamental part of free-living behavior. Identifying precise transition events will require a different data processing strategy than what we used (e.g., segmentation of data into windows shorter than 5 s, or a two-pass approach). Labeling transition periods in accelerometer data can also be problematic depending on the transition type. Many transitions are instantaneous (such as a slow walk to a brisk walk, or standing to dynamic standing), meaning there is no observable or measurable period of transition. On the other hand, sit-to-stand transitions occur over a period, and methods to quantify their duration are an active area of research (26). Although our approach does not identify transitions per se, we can infer that a transition occurred when the predicted activity changes across consecutive epochs (e.g., an epoch of sitting followed by an epoch of standing can be considered as a sit-to-stand transition). Nevertheless, the accuracy of our models when predicting epochs that contain multiple activity types is unknown, and therefore, the overall accuracy may be reduced when applied to new free-living data. Finally, accurate estimation of sleep (as opposed to lying) is another crucial element in developing 24-h behavioral profiles, and there is scope for future studies in this regard.
To progress the time-use epidemiology field of research, it is vital to accurately capture 24-h movement profiles in free-living conditions. Previous work with a dual-sensor system in a controlled environment showed great potential for capturing a broad range of physical activity behaviors. When validated in semi free-living conditions, the same dual-sensor system demonstrated high accuracy in classifying various human movement behaviors. Considering these findings with recent wear-time compliance results, a dual-sensor protocol may offer the optimal trade-off between participant compliance and model classification performance. Although our results represent a promising step toward building accurate time-use behavior profiles, further work is needed to expand the scope of this measurement system to detect other components of behavior (e.g., activity intensity, sleep, behavioral transitions) that are related to health.
The authors would like to thank all participants for their involvement in this study. AN received the 2018 AUT Human Potential Centre Fees Scholarship Award which funded the study. Participant incentives were covered using internal department funds. The results of this study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. The results of the present study do not constitute endorsement by ACSM.
The authors declare that they have no competing interests.
A. N. was responsible for study design and recruitment, and managed data collection with assistance from L. M. and T. S. Data cleaning and processing were performed by A. N. and T. S. A. N. performed the analysis and drafted the manuscript, with critical feedback provided by all authors.
1. Pedišić Ž, Dumuid D, Olds T. Integrating sleep, sedentary behaviour, and physical activity research in the emerging field of time-use epidemiology: definitions, concepts, statistical methods, theoretical framework, and future directions. Kinesiol Int J Fund Appl Kinesiol
2. Troiano RP. A timely meeting: objective measurement of physical activity. Med Sci Sports Exerc
3. Troiano RP, McClain JJ, Brychta RJ, Chen KY. Evolution of accelerometer methods for physical activity research. Br J Sports Med
4. Mâsse LC, Fuemmeler BF, Anderson CB, et al. Accelerometer data reduction: a comparison of four reduction algorithms on select outcome variables. Med Sci Sports Exerc
5. Rosenberger ME, Buman MP, Haskell WL, McConnell MV, Carstensen LL. 24 Hours of sleep, sedentary behavior, and physical activity with nine wearable devices. Med Sci Sports Exerc
6. de Almeida Mendes M, da Silva ICM, Ramires VV, Reichert FF, Martins RC, Tomasi E. Calibration of raw accelerometer data to measure physical activity: a systematic review. Gait Posture
7. Fullerton E, Heller B, Munoz-Organero M. Recognizing human activity in free-living using multiple body-worn accelerometers. IEEE Sens J
8. Troiano R, Mc Clain J, editors. Objective measures of physical activity, sleep, and strength in US National Health and Nutrition Examination Survey (NHANES) 2011–2014. Proceedings of the 8th International Conference on Diet and Activity Methods Rome, Italy; 2012.
9. Gyllensten IC, Bonomi AG. Identifying types of physical activity with a single accelerometer: evaluating laboratory-trained algorithms in daily life. IEEE Trans Biomed Eng
10. Van Hees VT, Golubic R, Ekelund U, Brage S. Impact of study design on development and evaluation of an activity-type classifier. J Appl Physiol
11. Stewart T, Narayanan A, Hedayatrad L, Neville J, Mackay L, Duncan S. A dual-accelerometer system for classifying physical activity in children and adults. Med Sci Sports Exerc
12. Duncan S, Stewart T, Mackay L, et al. Wear-time compliance with a dual-accelerometer system for capturing 24-h behavioural profiles in children and adults. Int J Environ Res Public Health
13. Doherty A, Jackson D, Hammerla N, et al. Large scale population assessment of physical activity using wrist worn accelerometers: the UK Biobank study. PLoS One
14. Barto D, Bird CW, Hamilton DA, Fink BC. The simple video coder: a free tool for efficiently coding social video data. Behav Res Methods
15. Breiman L. Random forests. Machine Learning
16. Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning
. 2nd ed. New York: Springer series in statistics New York; 2008. p. 745.
17. R Core Team. R: A Language and Environment for Statistical Computing
. Vienna, Austria: R Foundation for Statistical Computing; 2018.
18. Liaw A, Wiener M. Classification and regression by randomForest. R news
19. Ellis K, Kerr J, Godbole S, Staudenmayer J, Lanckriet G. Hip and wrist accelerometer algorithms for free-living behavior classification. Med Sci Sports Exerc
20. Willetts M, Hollowell S, Aslett L, Holmes C, Doherty A. Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants. Sci Rep
21. Kerr J, Patterson RE, Ellis K, et al. Objective assessment of physical activity: classifiers for public health. Med Sci Sports Exerc
22. Rosenberg D, Godbole S, Ellis K, et al. Classifiers for accelerometer-measured behaviors in older women. Med Sci Sports Exerc
23. Gao L, Bourke AK, Nelson J. Evaluation of accelerometer based multi-sensor versus single-sensor activity recognition systems. Med Eng Phys
24. Schneller MB, Bentsen P, Nielsen G, et al. Measuring children’s physical activity: compliance using skin-taped accelerometers. Med Sci Sports Exerc
25. Oba N, Sasagawa S, Yamamoto A, Nakazawa K. Difference in postural control during quiet standing between young children and adults: assessment with center of mass acceleration. PLoS One
26. Godfrey A, Barry G, Mathers JC, Rochester L, editors. A comparison of methods to detect postural transitions using a single tri-axial accelerometer. In: 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2014 26–30 Aug. 2014. p
. 2014. pp. 6234–7.