The accurate measurement of habitual physical activity is important for physical activity surveillance, evaluating the efficacy of behavior change initiatives, and furthering our understanding of the link between activity patterns and health (1). Since the early 2000s, wearable motion sensors (accelerometers) have become the preferred method of objectively estimating free-living movement in all populations (2). These methods are usually based on mapping accelerometer output to several intensity categories via a set of thresholds or “cut points” (2). This has been useful for evaluating adherence to global and national activity guidelines, which have historically focused on moderate- to vigorous-intensity physical activity (MVPA) almost exclusively (3). However, important contributions over the last several years suggest that a broader, unified approach to understanding and promoting human movement is required to tackle public health challenges related to inactivity (3). Considering all movement behaviors within a 24-h period, inclusive of sleep, sedentary (sitting) time, light-intensity movement, and MVPA shape a fast emerging field in health research called time-use epidemiology (4). As this paradigm gains traction, it presents new challenges for physical activity researchers: traditional physical activity measurement procedures were not designed to capture and evaluate 24-h movement patterns.
A core premise of time-use epidemiology is the compositional nature of movement behaviors. All behaviors within a 24-h period are mutually exclusive; an individual can only engage in one behavior at a time, and the summation of time allotted to all behaviors is always 24 h (4). Clearly, two important considerations for objectively monitoring these behaviors are 24-h wear time compliance, and the ability to discern different activities and postures that comprise a 24-h day, including sleep. Traditional accelerometer studies usually require 8 to 12 h of wear time per day (5), which threatens the validity of 24-h time-use data. There has been a recent push toward 24-h monitoring protocols for accelerometers located at the hip (6) and on the wrist (7); however, sufficient wear time alone does not preclude erroneous classification of movement behaviors. Data collected from nine wearable devices positioned on the thigh, hip, back, or wrist revealed that no single device accurately captures all movement behaviors over a 24-h period (using the respective manufacturer’s algorithms) (8).
The classification of activity type has seen significant progress in recent years, as machine learning techniques become more accessible. Machine learning algorithms generate a predictive model by learning complex patterns among raw accelerometer data, before predicting activity type on unseen data. These models overcome the much-publicized limitations of cut point methodology (9); in particular, the inability to identify activity type and posture. Within a time-use framework, it is particularly important to differentiate between nonambulatory movements, such as lying (sleep), standing, and sitting, given these may have discrete pathways to health, and form distinct components of a 24-h time-use profile (10). Recent work has illustrated the feasibility of machine learning applied to hip-mounted or wrist-worn accelerometers for classifying various activity types and postures (11–16). The accuracy of these predictive models generally ranges from 70% to 90%, but varies depending on the activity types under study, the machine learning techniques used, and how the raw data are treated before modeling.
One strategy to overcome wear time difficulties and improve activity recognition models may be with a dual-accelerometer measurement system, where accelerometers are affixed directly to the skin. This strategy, recently trialed in Denmark and New Zealand, uses two small accelerometers attached to the thigh and lower back. This protocol has demonstrated improvements in 24-h wear time compliance in adults and children (17,18), with many achieving 5 to 7 complete days of uninterrupted wear time. Despite displaying high wear time compliance, the validity of this system for accurate activity and posture detection in adults and children remains unknown, and the efficacy of wearing two sensors compared to a single sensor has not yet been formally established. Therefore, the aims of this study are 1) to demonstrate the validity of a dual-accelerometer system for classifying physical activity and sedentary behaviors in children and adults, and 2) examine the efficacy of using two sensors relative to each sensor individually.
Children from a local school (age 7–15 yr) were invited to participate in this study. Their parents were also invited to participate alongside their child. Children took home an information pack about the study, and their parents contacted the research team if they were interested in participating. Participants were eligible if they were free from disability and were able to perform physical activity freely. Written informed consent and assent were obtained from each parent and child (respectively) prior to participation. Each participant received a mall voucher to reimburse them for their time. Ethical approval was obtained from the AUT University Ethics Committee (17/220).
Data collection involved a single visit to the AUT Millennium Research Laboratory for approximately 1 h. Upon arrival, the testing protocol was explained to each participant, before they were equipped with two Axivity AX3 accelerometers (Axivity, York, UK). One accelerometer was positioned on the anterior aspect of their thigh at the midpoint between the knee and the hip, whereas the second was positioned on their lower back, offset from the spine. These were either placed on the left or right side of the body to align with the participants’ handedness. Each sensor was attached using purpose-made hypoallergenic adhesive foam pouches (Herpa Tech, Stockholm, Sweden), or a series of medical tape, as detailed elsewhere (17,18).
Ten semistructured activity trials were performed by each participant. These trials were designed to mimic real-world activities and encompass the primary postural positions that are occupied during a typical 24-h day. Table 1 provides a full description of activities, which include various types of sitting, lying, standing, walking and running. Multiple sitting and lying positions were included as previous work has shown lower classification accuracy for sedentary activities compared to walking and running (11–13,16). During nonambulatory activities, participants were given various reading, drawing, coloring, or puzzle-related tasks to keep them occupied. Participants were told they did not have to remain perfectly still (e.g., able to fidget and cross legs while sitting) to mimic free-living conditions as much as possible (19). All walking and running trials were performed on a Bertec instrumented treadmill (Bertec Corp., Columbus, OH). This is an in-floor, split-belt treadmill commonly used for gait analysis, as walking kinematics are approximate to over-ground walking (20). Each activity was 6 min in duration, although lying and treadmill activities were organized into three intervals of 2 min.
The Axivity AX3 is a triaxial accelerometer that is small (23 × 32.5 × 8.9 mm) and light weight (11 g). The sample rate is configurable between 12.5 and 3200 Hz, and the measurement range between ±2g and ±16g. With 512 MB of internal memory, the sensor can store 14 d of continuous data sampled at 100 Hz. Unlike other accelerometers used in this field, the AX3 is waterproof and has a temperature sensor (range, 0°C–40°C) that can be used for accurate wear time detection (17,18). In this study, the accelerometers were initialized to record at 100 Hz with a range of ±8g. All sensors were setup and downloaded using OmGui (version 126.96.36.199; Open Movement, Newcastle University, UK). A total of 31 individual sensors were used in this study, and the back and thigh sensors for each participant were randomly selected from this pool.
Fifth-generation iPad cameras (model A1823; Apple, Inc.) were used to record all activity trials, which would later be used to generate ground truth activity labels. Video was recorded at 720p (30 frames per second) using three separate cameras that were positioned to capture all angles of the testing environment. The camera footage and accelerometer data were time synchronized before participants arrived at the laboratory by creating a data marker in the sensor record. This was achieved by grouping all sensors together in a bag before striking it with a short impulse force while in view of all the cameras. This resulted in a clear spike in the sensor data which enabled the alignment of sensor timestamp with video frame during data processing.
Raw AX3 data were downloaded and imported into MATLAB (release 2017b, The MathWorks, Inc., MA). Although sensors were initialized to sample at 100 Hz, the data were resampled to 100 Hz using a cubic interpolation as the sample rate is known to fluctuate (21). To ensure sensors performed similarly under the same conditions, all sensor axes were calibrated by applying an offset and scale factor, which were generated from a calibration trial (18,22,23). Each sensor was left completely static for 30 s in six different orientations. These six positions place each of the three axes in positive and negative gravity, where the measured acceleration would ideally equal ±1g. The center point of each positive and negative axis pair was adjusted to reside at 0g (i.e., the offset) before being multiplied by a scale, so the minima and maxima resided at ±1g (22,23). When the original accelerometers are no longer available, other methods of calibration can be applied, such as the method proposed by van Hees et al. (24) where correction factors are derived from a nonmovement period in the sensor record. This process is important as uncalibrated data may exacerbate the misclassification of activity type; we have noticed some sensor axes can be up to 20% out of alignment (i.e., measure 0.8g when perpendicular to gravity). When averaged across the three axes, the average offset among sensors (mean ± SD) was 0.043g ± 0.039g (range, −0.024g to 0.161g), whereas the average scaling factor was 1.015 ± 0.013 (range, 0.992–1.052). All calibrated data were then passed through a 25-Hz fourth-order Butterworth low pass filter to remove skin and clothing artefact. Next, each participant’s video record was annotated second-by-second and used to document the precise start and end times of each activity trial. All sitting and reclining activities were combined, and all lying activities were combined, resulting in a total of six activity classes. These activities were combined because (in most cases) they represent the same component of a 24-h time-use composition. Accelerometer data within these time periods were extracted in preparation for feature generation.
Feature generation is a process where useful properties of the raw accelerometer data are extracted. This is a crucial step in the machine learning workflow as features that contain strong predictive signal will almost certainly lead to better classification performance. Using MATLAB, data were first partitioned into 5-s nonoverlapping windows, and several time- and frequency-domain features were computed over each window. Time-domain features capture how the signal changes over time, whereas frequency-domain features illustrate how much of the signal lies within different frequency bands. In line with past work (13,25), the mean (for each axis, and across sensors), standard deviation, coefficient of variation, median, 25th and 75th percentiles, minimum, maximum, skewness, kurtosis, axis correlations (between-axis and between-sensor), and roll, pitch and yaw were computed. For some features (e.g., mean, standard deviation, percentiles), the data were passed through a second filtering step. Specifically, a 1-Hz fourth-order Butterworth low-pass filter was implemented to extract the gravity component of the signal, so orientation vectors could be observed. For all other features, the gravitational components were subtracted from the base signal, removing orientation effects that could influence classification accuracy. The dominant frequency, its magnitude, as well as the total signal power, were also calculated using a Fast Fourier Transform. These were computed for each of the three axes and the vector magnitude where applicable, resulting in 142 features for each 5-s window.
Machine learning algorithms are able to recognize and learn subtle patterns among data without being explicitly programmed, and show great promise for advancing physical activity measurement. This has spurred a recent increase in the number of activity recognition studies employing machine learning techniques. The machine learning model used in the present study was the random forest, which is an ensemble learner, or a collection of many individual decision trees (26). Each decision tree is generated using a bootstrap sample of the training data. At each node split in each tree, m features are randomly selected from the full dataset, and the feature and split point which minimizes a cost function (in this case the Gini index) is selected for the split. The Gini index is a measure of node purity—small values indicate that a node mainly contains observations from a single class (27). This random feature selection at each tree node reduces the correlation among trees and prevents overfitting the training data (27). Each tree produces a class (activity type) prediction for each observation, which are tallied across all trees to select the final class prediction by majority vote.
Model building and analyses were performed separately for adult and child samples. All classifiers were trained, tuned, and validated in R version 3.4.1 (28) using the ‘caret’ package (29). The train function within this package was used to identify the optimal random forest tuning parameter (mtry), which is the number of randomly selected features eligible for each node split. Several candidate mtry values were evaluated using 10-fold cross validation. The proportion of correct predictions for both the child and adult samples was maximized using mtry = 2, so this value was selected for the final models. In the interest of reducing computational time, the number of trees in each forest (ntree) was set at 100, as visualizing the error convergence of initial models with 500 trees revealed no improvement beyond this figure. To assess the efficacy of the dual-sensor protocol relative to using a single sensor, this model building process was repeated twice more, only using features derived from the back sensor, or thigh sensor, respectively. All cross-sensor features (such as axis correlations and means across sensors) were also excluded, meaning these single-sensor models were trained using a 62-feature data set.
The predictive performance of each model was evaluated using leave-one-out cross-validation (LOOCV). This is a form of cross validation where the model is trained on all participants’ data except one, which is held out and treated as the test set. Performance is estimated by repeating this process for each participant in the dataset, averaging the results. Leave-one-out cross-validation has less bias compared to other cross validation methods, although it is computationally expensive to perform, as the number of models fit is equal to the number of participants in the dataset. For each of the six activity classes, the sensitivity, specificity, and balanced accuracy were computed. Sensitivity refers to the proportion of positive cases that are correctly identified (e.g., proportion of sitting observations identified as sitting), while specificity refers to the proportion of negative cases correctly identified (i.e., proportion of nonsitting observations identified as nonsitting). The balanced accuracy is the mean of the sensitivity and specificity, and is preferred over standard accuracy as it accounts for class imbalance. The importance of each accelerometer feature for improving model prediction accuracy was estimated using the Mean Decrease Gini metric, available in the “randomForest” R package. This is a measure specific to tree-based models, and estimates how each feature contributes to the homogeneity (i.e., purity) of nodes in each tree. It is calculated during the model training process by permuting each predictor variable and calculating the corresponding change in performance (30). Features that result in nodes with higher purity have a higher Mean Decrease Gini coefficient, which is indicative of higher feature importance.
One child participant was excluded due to a thigh sensor malfunction resulting in a final sample size of 41 children (mean age = 11.0 ± 4.80 yr; 46.5% male) and 33 adults (mean age = 42.4 ± 9.89 yr; 48.5% male). In total, 14,526 5-s epochs coded with activity class were obtained from the adult sample, while 17,722 were obtained from the child sample. The random forest training and validation process took 12.1 and 30.2 min to complete for the adult and child samples, respectively. Model training took place on a computer system with an Intel Xeon E3-1505M v6 CPU, and 32 GB of RAM.
When averaged over all six activity classes, the accuracy of duel-sensor models using LOOCV was 99.1% (95% confidence interval, 98.9–99.2) for the adult sample, and 97.3% (95% confidence interval, 97.1–97.5) for the child sample. Table 2 illustrates the accuracy metrics for each activity class for both the adult and child samples. All activities achieved a balanced accuracy greater than 99% in the adult sample, and ranged from 96.8% to 99.3% in child sample.
Table 3 presents the confusion matrices for both adult and child samples. These matrices represent the number of 5-s epochs correctly classified or misclassified for each activity class. The main areas of confusion were between lying/sitting epochs in both samples, and between slow walk/fast walk epochs in the child sample.
The feature importance for each of these models is illustrated in Figure 1. Examining feature importance can help explain what components of the accelerometer signal contributed to model performance. The top 15 accelerometer features (out of 142) are displayed, ranked by descending importance. Time-domain features using the x axis or z axis are prominent in both models, as well as axis means computed across both sensors (i.e., TB_mean). Features derived from the thigh sensor appear more prominent than those derived from the back sensor.
The efficacy of the dual sensor protocol relative to a single sensor is depicted in Figure 2. For both samples, there was a clear drop in accuracy for nonambulatory activities when a single sensor was used. Using only the back sensor, the balanced accuracy for sitting dropped to 85.4% and 82.7% in the adult and child samples, respectively, whereas standing accuracy was reduced to 76.9% and 72.9%, respectively. When using only the thigh sensor, lying showed the most pronounced decline, which dropped to 80.3% in the adult sample, and to 78.2% in the child sample. Similarly, the balanced accuracy for sitting was also reduced to 90.0% and 87.8% in child and adult samples (respectively) when using the thigh sensor. A complete record of single sensor results is available as an appendix (see Supplemental Table, Supplementary Digital Content 1, LOOCV performance when single sensors are used, http://links.lww.com/MSS/B344).
This study aimed to validate a dual-accelerometer system for classifying physical activity and sedentary behavior in children and adults, and evaluate the efficacy of wearing two sensors compared with one sensor. This builds on previous work which has shown good wear time compliance with this measurement protocol (17,18). Our results indicate that machine learning techniques were able to differentiate between six distinct activity classes with exceptionally high accuracy in both adult (>99%) and child (>97%) samples. When a single thigh or back accelerometer was used, there was a pronounced drop in accuracy for nonambulatory activities (up to a 26.4% decline), while the classification accuracy of walking and running activities remained comparatively stable. When previous wear time compliance results are taken together with our findings, it represents a promising step forward for objectively monitoring and understanding 24-h time-use behaviors.
Several previous studies have used machine learning techniques to classify accelerometer data into activity types. Using wrist-worn accelerometers, Chowdhury et al. (16) compared several individual classifiers with ensemble methods across three separate datasets. Activity type categories varied among the three datasets, but overall LOOCV accuracy (presented as F1 scores) ranged from 65.1% to 86%, with random forests averaging 79.7% to 85.0%. Pavey et al. (11) used a random forest to classify wrist-derived data collected from 21 adults into four activity classes (sedentary, stationary plus, walking, and running). Overall LOOCV accuracy was 92.7%, and ranged from 80.1% to 95.7% among individual activity classes. Hip-worn accelerometers were used by Hagenbuchner et al. (14), who compared an artificial neural network with a deep learning ensemble for classifying five activity classes (sedentary, light, MVPA, walking, and running) among 11 preschool children. The neural network exhibited lower LOOCV accuracy (69.7%) compared with the deep learning ensemble (82.6%). Two studies have compared both hip and wrist placement: Ellis et al. (13) used a random forest to classify four activity classes (household, stairs, walking, and running) in a sample of 40 adults. The LOOCV accuracy of the hip and wrist-based models were 92.3% and 87.5%, respectively. Lastly, Trost et al. (12) used regularized logistic regression to discern seven activity classes (lying, sitting, standing, walking, running, basketball, and dancing) in 52 children and adolescents. When evaluated on a separate test set, the hip and wrist-based models achieved 91.0% and 88.4% accuracy, respectively.
Evidently, our classification results in the present study are considerably higher than previous work. This is most likely because previous studies have trained machine learning models using features engineered from a single accelerometer worn on the hip or wrist, while our training data were generated from two accelerometers worn simultaneously. Examining the importance of features used to train the random forest provides further insight: of the 142 features used to train each model, the top 15 features (Fig. 1) are similar for both adult and child models. Features derived from the back and thigh sensor are both represented, and those that accounted for the orientation of both sensors concurrently (e.g., TB_mean_TzBx) were prominent in both models. The differences in magnitude of importance between the adult and child models are possibly due to the different sitting tasks performed by each group, kinematic differences in movement, and the random variability among trees introduced by the random forest algorithm.
The importance of duel-sensors is further evidenced by our single-sensor results (Fig. 2), where the accuracy of nonambulatory activities dropped to levels comparable with previous work. Given the orientation of the torso and lower limbs, it is logical that the back sensor cannot accurately discriminate between sitting and standing, and the thigh sensor cannot discriminate between sitting and lying. However, recent work has illustrated a novel approach to develop angle thresholds for separating these three postures with a single hip-worn accelerometer (31), which is promising. Another important observation was that walking and running activities still maintained high accuracy with a single sensor. This is probably because accelerometers can measure different levels of intensity irrespective of sensor placement, and explains why previous single-sensor studies have exhibited higher accuracy for these types of activities (11–13,16). Our activity testing protocol also integrated multiple forms of lying (i.e., side, prone, supine) and sitting (i.e., chair, stool, floor, and reclining) into the same activity class. This may have contributed to higher accuracy for these activities as it has been common to only monitor chair sitting and lying supine (12,31). Our sample size was also larger than most previous studies, which is advantageous for machine learning models as they tend perform better with more training data.
It must be noted, however, that the number of participants is not the only sampling consideration for machine learning studies. As these predictive models learn to map a set of input features to an output, there needs to be enough data and variation among the data to adequately capture the relationships between the input and output that exist in the population. Therefore, the optimal study sample size is dependent upon several factors, such as the quantity of training data collected per participant, the volume of data collected in each activity class, the quality of features generated, and the complexity of the problem and learning algorithm of choice (32). These complications probably explain why a priori sample size is rarely calculated and reported in activity recognition studies (11–16), with sample size commonly based on previous work.
Although we demonstrated exceptionally high accuracy when both sensors were used, these data were collected under semicontrolled conditions in a laboratory setting. It is likely that the accuracy of these models will drop when used to predict data collected under free-living conditions (11,19); in fact, models trained using free-living data have been shown to perform better when classifying free-living activities in older adults (33). Our study design also meant that postural transitions and other specific activities, such as walking up/down stairs, cycling, swimming, and sitting in a motor vehicle were, not considered. More complex patterns of movement can also occur as part of daily living, such as playing sport, or performing household chores. Although many of these more complex patterns may simply be a combination of the different movements observed in this study, it is unclear how our classification models will perform when applied to these types of data. Another factor which may impact the generalizability of these results is participant body size. Past work has shown that the angle of hip-worn accelerometers can be affected by abdominal obesity (34). It is likely that sensors placed on the back or thigh are less prone to this tilt problem, but we are unable to draw these conclusions as body composition and fat distribution were not assessed in this study.
For these results to be truly generalizable to free-living settings, future work should consider a free-living validation trial; however, methods for obtaining valid and reliable ground truth data in free-living populations is challenging. Direct observation and a secondary accelerometer have been used as the criterion measure in past work (11,33), but wearable cameras with adequate memory and battery capacity offer more promise for maintaining data integrity over a longer period. With the collection of free living data, researchers can also explore other avenues of analysis. Most activity recognition work to date has treated individual windows of movement data as independent entities, yet there are temporal dependencies among these data that could be exploited by certain learning algorithms. For example, a 5-s window classified as sitting, is more likely to be followed by another window of sitting compared with another activity type. Hidden Markov Models and recurrent neural networks may be more suitable for these time-series data problems (35). Lastly, identifying periods of sleep (rather than simply lying) is an important component of a 24-h behavior profile. Although most time-use work has relied on self-reported sleep times, several studies have attempted to estimate sleep with accelerometery (36). These studies are normally limited to sleep duration, but it may be possible to integrate machine learning techniques to derive more complex measures of sleep quality. This dual-sensor protocol holds promise in this regard, and represents an additional line of future work.
Within a time-use epidemiology framework, the ability to accurately monitor 24-h movement profiles is a vital step for progressing this important field of research. Past accelerometer-based measurement protocols are either hindered by lack of wear time compliance, or the inability to accurately discern activities and postures from raw sensor data. The dual-sensor protocol we evaluated has exhibited high 24-h uninterrupted wear time compliance in recent work, and our results build on this by demonstrating almost perfect classification of activity and posture in adults and children. This type of information can be used to build robust time-use behavior profiles, including detailed patterns of sitting, and how components of time-use are related to each other. The next step will be to examine the generalizability of these findings in a free-living setting.
The authors would like to thank all participants for their involvement in this study, and Roselinde Van Nee for her assistance during data collection.
The authors declare that they have no competing interests.
No external funding was received for this study. Participant incentives were covered using internal department funds. The results of this study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. The results of the present study do not constitute endorsement by ACSM.
T. S. was responsible for study design and recruitment, and managed data collection with assistance from L. H. and A. N. Data cleaning and processing were performed by L. H., J. N., T. S., and A. N. T. S. performed the analysis and drafted the manuscript, with critical feedback provided by all authors.
1. Sallis JF, Owen N, Fotheringham MJ. Behavioral epidemiology: a systematic framework to classify phases of research on health promotion and disease prevention. Ann Behav Med
2. Trost SG, McIver KL, Pate RR. Conducting accelerometer-based activity assessments in field-based research. Med Sci Sports Exerc
. 2005;37(11 Suppl):S531–43.
3. Chaput JP, Carson V, Gray CE, Tremblay MS. Importance of all movement behaviors in a 24 hour period for overall health. Int J Environ Res Public Health
4. Pedišić Ž, Dumuid D, Olds T. Integrating sleep, sedentary behaviour, and physical activity research in the emerging field of time-use epidemiology
: definitions, concepts, statistical methods, theoretical framework, and future directions. Kinesiology: International journal of fundamental and applied. Kinesiology
5. Cain KL, Sallis JF, Conway TL, Van Dyck DV, Calhoon L. Using accelerometers in youth physical activity studies: a review of methods. J Phys Act Health
6. Tudor-Locke C, Barreira TV, Schuna JM, et al. Improving wear time compliance with a 24-hour waist-worn accelerometer protocol in the International Study of Childhood Obesity, Lifestyle and the Environment (ISCOLE). Int J Behav Nutr Phys Act
7. Troiano R, Mc Clain J, editors. Objective measures of physical activity, sleep, and strength in US National Health and Nutrition Examination Survey (NHANES) 2011–2014. Proceedings of the 8th International Conference on Diet and Activity Methods 2012; Rome, Italy
8. Rosenberger ME, Buman MP, Haskell WL, McConnell MV, Carstensen LL. Twenty-four Hours of Sleep, Sedentary Behavior, and Physical Activity with Nine Wearable Devices. Med Sci Sports Exerc
9. Trost SG, Loprinzi PD, Moore R, Pfeiffer KA. Comparison of accelerometer cut points for predicting activity intensity in youth. Med Sci Sports Exerc
10. Tremblay MS, Aubert S, Barnes JD, et al. Sedentary Behavior Research Network (SBRN) - Terminology Consensus Project process and outcome. Int J Behav Nutr Phys Act
11. Pavey TG, Gilson ND, Gomersall SR, Clark B, Trost SG. Field evaluation of a random forest activity classifier for wrist-worn accelerometer data. J Sci Med Sport
12. Trost SG, Zheng Y, Wong W-K. Machine learning for activity recognition: hip versus wrist data. Physiol Meas
13. Ellis K, Kerr J, Godbole S, Lanckriet G, Wing D, Marshall S. A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers. Physiol Meas
14. Hagenbuchner M, Cliff DP, Trost SG, Van Tuc N, Peoples GE. Prediction of activity type in preschool children using machine learning techniques. J Sci Med Sport
15. Ren X, Ding W, Crouter SE, Mu Y, Xie R. Activity recognition and intensity estimation in youth from accelerometer data aided by machine learning. Applied Intelligence
16. Chowdhury AK, Tjondronegoro D, Chandran V, Trost SG. Ensemble methods for classification of physical activities from wrist accelerometry. Med Sci Sports Exerc
17. Schneller MB, Bentsen P, Nielsen G, et al. Measuring children’s physical activity: compliance using skin-taped accelerometers. Med Sci Sports Exerc
18. Duncan S, Stewart T, Mackay L, et al. Wear-time compliance with a dual-accelerometer system for capturing 24-hour behavioural profiles in children and adults. Int J Environ Res Public Health
19. van Hees VT, Golubic R, Ekelund U, Brage S. Impact of study design on development and evaluation of an activity-type classifier. J Appl Physiol
20. Lee SJ, Hidler J. Biomechanics of overground vs. treadmill walking in healthy individuals. J Appl Physiol
21. Doherty A, Jackson D, Hammerla N, et al. Large scale population assessment of physical activity using wrist worn accelerometers: the UK Biobank Study. PLoS One
22. Bouten CV, Koekkoek KT, Verduin M, Kodde R, Janssen JD. A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity. IEEE Transactions on Biomedical Engineering
23. Lötters JC, Schipper J, Veltink PH, Olthuis W, Bergveld P. Procedure for in-use calibration of triaxial accelerometers in medical applications. Sens Act A Phys
24. van Hees VT, Fang Z, Langford J, et al. Autocalibration of accelerometer data for free-living physical activity assessment using local gravity and temperature: an evaluation on four continents. J Appl Physiol (1985)
25. Liu S, Gao RX, Freedson PS. Computational methods for estimating energy expenditure in human physical activities. Med Sci Sports Exerc
26. Breiman L. Random Forests. Machine Learning
27. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. Springer Series in Statistics
. New York (NY): Springer-Verlag New York; 2008. 745 p.
28. Core R. Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2017.
29. Kuhn M. Building Predictive Models in R Using the caret Package. Journal of Statistical Software
30. Calle ML, Urrea V. Letter to the editor: Stability of Random Forest importance measures. Brief Bioinform
31. Vähä-Ypyä H, Husu P, Suni J, Vasankari T, Sievänen H. Reliable recognition of lying, sitting, and standing with a hip-worn accelerometer. Scand J Med Sci Sports
32. Raudys SJ, Jain AK. Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell
33. Sasaki JE, Hickey AM, Staudenmayer JW, John D, Kent JA, Freedson PS. Performance of activity classification algorithms in free-living older adults. Med Sci Sports Exerc
34. Feito Y, Bassett DR, Tyo B, Thompson DL. Effects of body mass index and tilt angle on output of two wearable activity monitors. Med Sci Sports Exerc
35. Ellis K, Kerr J, Godbole S, Staudenmayer J, Lanckriet G. Hip and wrist accelerometer algorithms for free-living behavior classification. Med Sci Sports Exerc
36. Tudor-Locke C, Barreira TV, Schuna JM Jr, Mire EF, Katzmarzyk PT. Fully automated waist-worn accelerometer algorithm for detecting children’s sleep-period time separate from 24-h physical activity or sedentary behaviors. Appl Physiol Nutr Metab