Accelerometer-based motion sensors are viewed as best practice methodology for measuring physical activity in children 0 to 5 yr of age (1,2). However, to date, the research potential of wearable motion sensors has been underused, with data analysis restricted to the use of intensity-based “cut-points” or regression-based prediction models with significant measurement error (3–5). Pattern recognition methodologies, such as machine learning approaches, provide an opportunity to substantially improve accelerometer-based assessments of physical activity in children younger than 5 yr. However, the adoption of machine learning methods by movement scientists has been slow because they are not as easily implemented as cut-point methods.
To date, only three studies have developed and tested machine learning activity recognition models for children younger than 5 yr. Zhao et al. (6) evaluated a series of logistic regression and support vector machine (SVM) classifiers for recognition of five activity classes in preschool-age children (rest, quiet play, low active play, moderately active play, and very active play). Using proprietary outputs (60-s epoch) from a hip-mounted ActiGraph GT3X+ accelerometer as features, the best-performing model achieved an overall 10-fold cross-validation accuracy of 79.8%. Nam and Park (7) developed a prototype activity recognition system for infants and toddlers using data from a single waist-mounted accelerometer. A range of time and frequency domain features were inputted into seven different learning algorithms, including naïve Bayes classifier, Bayesian Network, SVM, decision tree, k-nearest neighbor, multilayer perceptron, and logistic regression. Tenfold cross-validation accuracy for 11 different activities, including crawling, climbing up, climbing down, and walking, ranged from 73.0% to 88.3%. Most recently, Hagenbuchner and colleagues (8) developed and tested a Deep Learning Ensemble Network (DLEN) for recognition of five basic activity classes in preschool-age children (sedentary (SED), light activities and games, moderate- to vigorous-intensity activities and games, walking, and running). Using simple statistical features in ActiGraph (hip-mounted) proprietary counts as inputs (10th, 25th, 50th, 75th, and 90th percentiles and lag-one autocorrelation), the DLEN achieved an overall classification accuracy of 82.6%. In comparison, a standard feed-forward multilayer perceptron achieved an overall accuracy of 69.7%.
Although the aforementioned studies support the utility of activity recognition using machine learning methods in young children, it is important to note that the classifiers developed in all three studies were trained using features from a single accelerometer worn on the hip or waist. No previous study involving preschool-age children has developed and tested activity recognition algorithms for wrist-worn accelerometer data. Validated activity recognition algorithms for the wrist are needed because wrist-mounted accelerometers are more convenient to wear, thus reducing the likelihood of missing data due to nonwear (9). More importantly, the wrist placement allows researchers and clinicians to monitor all movement behaviors (sleep, SED behaviors, light activity, and moderate- to vigorous-intensity physical activities (MVPA)) over a complete 24-h cycle (10). In addition, studies conducted in school-age children (11) and in adults (12–14) suggest that activity recognition algorithms trained on accelerometer data from multiple body locations (e.g., the combination of the wrist and ankle) achieve greater accuracy compared with those based on a single accelerometer. However, to date, the performance of activity classifiers on the basis of multiple sensing locations has not been investigated in preschool-age children.
To address these gaps in the research literature, the purpose of this study was to develop, test, and compare activity class recognition algorithms trained on raw accelerometer signal from the wrist, hip, and the combination of the wrist and hip in preschool-age children. To examine the utility of machine learning approaches relative to conventional cut-point methods, we derived count cut points for the classification of physical activity intensity (SED, light, and MVPA) and compared their performance with that of the newly developed activity class recognition models.
Eleven children 3 to 6 yr of age (mean age, 4.8 ± 0.87 yr; 55% girls; mean body mass index, 15.9 ± 1.0 kg·m−2; 9.1% overweight) participated in the study. Parent consent was obtained before participation. The study was approved by the University Research Ethics Committee.
Participants completed 12 semistructured activity trials during two laboratory visits scheduled within a 3-wk period. Participants undertook the following six trials at visit 1: watching television, sitting on floor being read to, standing making a collage on a wall, walking, playing an active game against an instructor, and completing an obstacle course. The remaining six trials were completed at visit 2: sitting on a chair playing a computer tablet game, sitting on floor playing quietly with toys, treasure hunt, cleaning up toys, bicycle riding, and running. Each trial was completed for 4–5 min. A detailed description of the activity trials can be found elsewhere (8). On the basis of energy cost and movement pattern (8,15), activity trials were categorized into five distinct physical activity classes—SED activities (TV, reading, tablet, and quiet play), light activities and games (art, treasure hunt, and clean-up), moderate to vigorous activities (active game, obstacle course, and bicycle), walking, and running. The five activity classes and the average MET level of the 12 activity trials are displayed in Table 1.
During each trial, participants wore an ActiGraph GT3X+ (ActiGraph Corporation, Pensacola, FL) on the right hip and nondominant wrist. Data were collected at 100 Hz. ActiLife software (Version 6.8.1) was used to construct date–time stamped files comprising raw acceleration signal in the vertical (axis 1), medial–lateral (axis 2), and anterior–posterior (axis 3) planes. Research comparing accelerometer output from the dominant and nondominant wrist has shown that the choice to wear the accelerometer on the nondominant or dominant wrist has no effect on results (16). The current study adopted the nondominant wrist location to be consistent with the approach used by sleep researchers who use accelerometers placed on the nondominant wrist to monitor sleep duration and quality (17).
Data processing and feature extraction
For each sensor location, accelerometer signal from each axis was transformed into a single-dimension vector magnitude (VM) using the equation [√(x2 + y2 + z2)]. VM instances recorded during minutes 2 to 4 were parsed and segmented into nonoverlapping 15-s windows. For each window, 18 time and frequency domain features were extracted. Features were selected on the basis of previous studies (18,19) and included mean, SD, minimum, maximum, interquartile range, percentiles (10th, 25th, 50th, 75th, 95th), coefficient of variation, signal sum, signal power, peak-to-peak amplitude, median crossings, dominant frequency between 0.25 and 5.0 Hz, magnitude of dominant frequency between 0.25 and 5.0 Hz, and signal entropy between 0.25 and 5.0 Hz.
Model training and evaluation
Two widely implemented supervised learning algorithms were used to construct the classifiers—random forest (RF) and SVM. An RF is an ensemble of decision tree models. Each tree is learned on a bootstrap sample of training data, and each node in the tree is split using the best among a randomly selected sample of features. The decisions from each tree are aggregated, and a final model prediction is based on majority vote. SVM performs classification tasks by mapping training instances to points in a multidimensional space of features and constructing decision boundaries, called hyperplanes, which maximize the distance or margin between instances of different classes. New observations are mapped to the multidimensional feature space and assigned a class prediction on the basis of which side of the hyperplane it lies. For each supervised learning approach, classification models were trained using features from accelerometer signal collected at the hip, wrist, and combined hip and wrist, thus providing a total of six classifiers for evaluation. We chose to implement RF and SVM classifiers because these algorithms have been shown to perform well in activity recognition studies involving other study populations (20–22) and are readily implemented using open-source platforms such as R and WEKA.
Classification models were trained, tuned, and cross-validated using the “kernlab,” “randomForest,” and “caret” packages within R (Version 3.2.2) (23). The R code and data, which include final trained models, are available on request. The train function within caret was used to implement the SVM and RF algorithms, optimize tuning parameters, and evaluate performance using leave-one-subject-out (LOSO) cross-validation. In LOSO cross-validation, the classification model is trained on data from all of the participants except one, which is “held out” and used as the test data set. The process is repeated until all participants have served as the test data, and the performance evaluation results are averaged.
In the RF models, the number of trees was set at 500. On the basis of the training data, the number of features randomly sampled at each split was optimized at three. SVM models were configured using a radial basis kernel function, automatic sigma estimation, with the soft margin or cost parameter optimized at 4.0. The cost parameter is a regularization parameter that adjusts the width of the hyperplane margin and controls the trade-off between overfitting and underfitting the data.
Model performance was evaluated in terms of overall recognition accuracy, calculated as the percentage of 15-s time windows correctly classified. Agreement between predicted and observed class labels was evaluated by calculating weighted kappa (κ) coefficients. κ is a more robust measure than simple percent agreement, because it takes into account the possibility of the agreement occurring by chance. Weighted κ also has the advantage of being applicable to multiclass classification scenarios (24). For the interpretation of the κ coefficients, we followed the ratings suggested by Landis and Koch (25): poor (0–0.2), fair (0.2–0.4), moderate (0.4–0.6), substantial (0.6–0.8), and almost perfect (0.8–1.0). In addition, for each classification model, confusion matrices were generated to summarize classification accuracy in each activity class.
Comparison to cut-point methods
To compare the performance of the machine learning classifiers with that of traditional cut-point methods, receiver operating characteristic (ROC) curves were used to identify the ActiGraph proprietary count thresholds (vertical axis and VM) providing the highest sensitivity (Se) and specificity (Sp) for differentiating: 1) SED from light-intensity physical activity (LPA) and MVPA, and 2) MVPA from SED and LPA. Replicating the methods used to evaluate the machine learning classifiers, performance was evaluated using LOSO cross-validation. One participant was iteratively excluded from each ROC curve analysis. The resultant cut point was then applied to the hold-out participant’s data. The process was repeated until all participants had served as hold outs, and the results were aggregated.
For direct comparison, the five activity classes predicted by the SVM and RF classifiers during LOSO cross-validation were mapped onto the traditional three intensity categories (SED, LPA, and MVPA). The moderate-to vigorous games, walking, and running activity classes were collapsed into a single MVPA category. The SED and light activities and games activity classes were mapped to the SED and LPA categories, respectively. Performance differences between the two methods were evaluated by comparing weighted κ coefficients achieved in the respective hold-out samples.
For these analyses, the intensity level assigned to each activity trial was based on two criteria: 1) the measured average energy cost of the activities as reported by Groβek et al. (15) and 2) an activity rating based on the Children’s Activity Rating Scale (CARS) (26). There is currently a lack of agreement on the definitions of SED, LPA, and MVPA in preschool-age children (27). Therefore, activity trials were classified as SED, LPA, or MVPA on the basis of the preschool-specific MET thresholds reported by Butte and colleagues (28) (SED/LPA, 1.5 METs; LPA/moderate physical activity, 2.8 METs; moderate physical activity/vigorous physical activity, 3.5 METs) and/or previously reported methods for classifying CARS direct observation scores (1–2, SED; 3, LPA; 4–5, MVPA) (27) (see Table 1).
Cross-validation performance accuracy for the RF and SVM models is displayed in Figure 1. For the RF models, mean overall accuracy for the hip, wrist, and combined hip and wrist was 80.2% (95% confidence interval (CI), 78.7%–81.6%), 78.1% (95% CI, 76.6%–79.6%), and 81.8% (95% CI, 80.4%–83.2%), respectively. For the SVM models, overall accuracy for the hip, wrist, and combined hip and wrist models was 81.3% (95% CI, 79.9%–82.8%), 80.4% (95% CI, 78.9%–81.9%), and 85.2% (95% CI, 83.8%–86.5%), respectively. For the hip, wrist, and combined hip and wrist, SVM achieved consistently higher activity recognition accuracy compared with RF.
Weighted κ coefficients for the RF and SVM models are displayed in Figure 2. Applying the rubric of Landis and Koch (25), the hip, wrist, and combined hip and wrist RF models exhibited substantial agreement (κ = 0.70–0.75). For the SVM models, agreement for the combined hip and wrist model bordered on almost perfect (κ = 0.80), whereas the individual hip and wrist models exhibited substantial agreement (κ = 0.73–0.74).
Confusion matrices for the RF and SVM models are presented in Tables 2 and 3, respectively. For the RF models, recognition accuracy was good to excellent for SED activities (≥89%); moderate for light-intensity games, moderate- to vigorous-intensity games, and running (69%–81%); and modest for walking (61%–63%). A similar pattern of results emerged for the SVM models. Recognition accuracy was excellent for SED activities (≥90%); moderate to high for light-intensity games, moderate-to-vigorous games, and running (70%–82%); and modest for walking (64%–71%). For both RF and SVM, walking was consistently misclassified as light-intensity games, whereas running was consistently misclassified as moderate-to-vigorous games. Across the five activity classes, the combined hip and wrist model provided higher recognition accuracy compared with the single-location hip and wrist models. This increase was most notable for running, which the combined hip and wrist SVM model increased recognition accuracy by just over 10%.
For vertical axis counts recorded at the hip, the optimal cut points for differentiating SED and MVPA from other intensity levels were 27 and 350 counts per 15 s, respectively. ROC area under the curve (ROC-AUC) ranged from 0.93 to 0.94. Se ranged from 88.4 to 89.4. Sp ranged from 85.1 to 85.8. For the VM of counts recorded at the hip, the optimal cut points for differentiating SED and MVPA from other intensity levels were 263 and 674 counts per 15 s, respectively. ROC-AUC ranged from 0.95 to 0.96. Se ranged from 88.1 to 90.9. Sp ranged from 85.1 to 86.2.
For vertical axis counts recorded at the wrist, the optimal cut points for differentiating SED and MVPA from other intensity levels were 349 and 1284 counts per 15 s, respectively. ROC-AUC ranged from 0.85 to 0.95. Se ranged from 61.3 to 80.2. Sp ranged from 90.1 to 93.9. For the VM of counts recorded at the wrist, the optimal cut points for differentiating SED and MVPA from other intensity levels were 625 and 2103 counts per 15 s, respectively. ROC-AUC ranged from 0.84 to 0.93. Se ranged from 60.3 to 77.0. Sp ranged from 91.2 to 93.5.
Figure 3 displays weighted κ coefficients for the RF and SVM models and the newly derived count cut points for this sample. For classification of physical activity intensity, the RF and SVM models for the hip and the combined hip and wrist exhibited almost perfect agreement (0.81–0.84), whereas the RF and SVM models for the wrist exhibited substantial agreement (0.76–0.78). In comparison, the count cut points derived for this sample exhibited only moderate to substantial agreement, with weighted κ statistics ranging from 0.49 to 0.65.
The current study developed and tested new machine learning models for the automatic identification of physical activity class in preschool-age children. RF and SVM activity classifiers trained on acceleration signal from the hip, nondominant wrist, and the combination of the hip and wrist achieved acceptable recognition accuracy for a range of physical activity classes routinely performed by young children at home and early childhood education and care settings. Importantly, our classifiers, trained on time and frequency domain features extracted from the raw signal VM over 15-s sliding windows, provided comparable or higher classification accuracy compared with previously published preschooler activity recognition algorithms trained on ActiGraph proprietary counts over 60 s (6,8). Moreover, when the activity classes were mapped onto traditional physical activity intensity categories, our machine learning models exhibited significantly higher classification accuracy compared with traditional cut-point methods.
Classifiers trained on hip accelerometer data exhibited marginally higher but comparable overall recognition accuracy compared with those trained on wrist data. This finding is consistent with the results of previous investigations comparing the performance of activity classifiers trained on hip and wrist accelerometer data. Trost et al. (19) compared the activity recognition rates achieved by hip and wrist logistic regression classifiers among children and adolescents 7 to 17 yr of age. Overall classification accuracy for the hip (91.0%) was only marginally higher than that achieved by the wrist (88.4%). Among healthy and overweight middle-age women, RF classifiers trained on hip accelerometer data provided higher overall activity recognition than did those trained on wrist-worn data (29). However, the magnitude of the differences was small (<5%) and unlikely to be of practical significance in field-based studies. That machine learning classifiers for wrist-worn accelerometer data consistently exhibit comparable performance with that of classifiers trained on hip data strongly supports the use of wrist-mounted accelerometers in epidemiological and intervention studies, where compliance with the monitoring protocol is critical. Moreover, in light of recent public health recommendations addressing movement behaviors, including sleep, over a complete 24-h cycle (30), the utility of the wrist placement has added significance.
The combined hip and wrist classifiers provided higher recognition accuracy compared with the single sensor models. This finding is consistent with the results of Ruch and colleagues (11), who developed and tested a custom-ensemble model (k-nearest neighbor, normal density discriminant function, customized decision tree) for identifying children’s physical activity type from ActiGraph proprietary counts (GT1M). In that study, the addition of wrist activity counts to a model trained on hip accelerometer data improved the classification accuracy by 23 percentage points from 44% to 67%. Our results are also consistent with adult studies in which small but statistically significant improvements in recognition accuracy were achieved when accelerometer features from multiple body locations were fused (13). In the current study, the increase in accuracy achieved by the combined hip and wrist classifier was comparatively modest (2%–5%). However, when examined at a class level, it was notable that the fusion of features from the hip and wrist locations improved the recognition of running by 10 to 12 percentage points. Inspection of the confusion matrices indicated that the improvement in performance was achieved through 1) a reduction in the misclassification of running as a SED activity, as was the case for the wrist classifiers, and 2) a reduction in the misclassification of running as moderate-to-vigorous activities and games, as was the case for the hip classifiers. In light of such findings, the extent to which multiple sensing locations improve physical activity recognition in preschool-age children warrants further investigation.
The newly developed RF and SVM classifiers significantly outperformed traditional cut-points methods for classifying physical activity intensity. After mapping the five activity classes to standard physical activity intensity categories, agreement for the machine learning algorithms ranged from 0.76 to 0.84. In contrast, agreement for cut points developed for this sample ranged from 0.49 to 0.65. The poorer performance of the cut points was primarily attributable to the misclassification of LPA as SED activity or MVPA, and/or the misclassification of MVPA as LPA. Confusion matrices for the classification of physical activity intensity can be found in Table, Supplemental Digital Content, Confusion matrices for prediction of SED, LPA and MVPA, https://links.lww.com/MSS/B88. Cut-point methods continue to be widely used because they are easy to implement and the results are readily interpretable. However, studies evaluating the performance of previously derived cut points in independent samples of preschool-age children indicate that the true intensity of physical activity is misclassified 35% to 45% of the time (3). In the current study, cut points developed for this sample misclassified the intensity of physical activity 29.8% to 38.5% of the time. This is because the relationship between activity counts and physical activity intensity varies considerably from activity to activity and between individuals completing the same activity (31). Moreover, the cut points are highly dependent on the activities included in the calibration study, the analytical methods used to determine thresholds, and the physical characteristics of the participant completing the calibration study (2,32). Accordingly, the development of user-friendly software tools to apply machine learning approaches should be prioritized to support the translation of these approaches into measurement practice.
For both the RF and SVM classifiers, recognition accuracy for walking was lower than the other activity modes, ranging from 61% to 71%. This result was largely a function of walking being misclassified as light activities and games. Reassuringly, only a very small proportion of walking instances were misclassified as running. This finding is consistent with the results from our previous study involving preschoolers in which a DLEN, trained on ActiGraph proprietary counts over a 60-s window, achieved 72.7% recognition accuracy for walking (8). Similar to the pattern observed in the current study, just over 18% of the walking instances were misclassified as light activities and games. That walking was repeatedly confused with light activities and games in both studies was perhaps not overly surprising, considering that the light activities and games class consisted of activities featuring significant periods of walking (cleaning up toys and treasure hunt). In the future, it may be more useful to develop classifiers that only recognize the postures and basic movements that are foundational to the daily activities and play behaviors of young children—lying down, sitting, standing, walking, and running. An alternative approach would be to apply clustering methods to identify natural groupings of physical activities performed by young children and develop classifiers to recognize these groupings (6). Although the activity targets for prediction will always depend on end user’s needs, more research is needed to identify the physical activity metrics that are most relevant to healthy development in children 0 to 5 yr of age.
The current study had several strengths. It is the first study to develop and test machine learning activity classifiers for preschoolers using features in the raw acceleration signal collected at the wrist, hip, and combined hip and wrist. Second, classification accuracy was evaluated using a wide variety of free play and daily activity classes performed by preschool-age children. Third, the classifiers were trained using a relatively small feature set that enhanced practicality and reduced processing time. Fourth, the performance of our classifiers was evaluated using LOSO cross-validation, which more closely simulates model performance when deployed as an “off the shelf” classifier in independent samples of preschoolers.
There were, however, several limitations that warrant consideration. First, the activity classifiers were trained and tested using data from controlled activity trials, which may not fully replicate the activity performances of young children in free-living contexts. Consequently, additional research is needed to evaluate performance of our classifiers under free-living conditions. The uptake of machine methods by movement scientists and public health researchers will continue to be low in the absence of empirical research demonstrating that such methods perform well in independent samples performing activity under real-world conditions. Second, features were extracted from sliding windows of 15 s, which may not provide sufficient resolution to capture the sporadic and pulsatile activity patterns of preschool-age children. Accordingly, future studies should explore the utility of simplified activity recognition algorithms that accurately identify a small number of activity classes over shorter time windows (i.e., 5-s windows). Third, to assess the performance of the machine learning classifiers relative to traditional cut-point methods, each activity class was assigned a physical activity intensity rating based on the average energy expenditure measured for each activity trial. It is acknowledged that, for some children, the energy cost of performing the individual activities included in each class may have differed from the average level. We are currently undertaking studies to develop and test machine learning models for prediction of energy expenditure in preschool-age children. Fourth, although our data set of greater than 2800 fully annotated observations was sufficient to evaluate and compare different machine learning models, the relatively small sample (N = 11) may influence the generalizability of the findings.
In summary, RF and SVM classifiers trained on accelerometer features from the hip, nondominant wrist, and combined hip and wrist can be used to predict physical activity class in preschool-age children. Although classifiers trained on hip or wrist data provided comparable recognition accuracy, the combination of hip and wrist accelerometer features yielded marginally better performance, particularly for recognition of running. Compared with sample-specific cut points for the hip or wrist, the machine learning algorithms provided higher classification accuracy for absolute physical activity intensity. Our findings add to a growing evidence base supporting the feasibility and accuracy of machine learning activity recognition algorithms in young children.
Funding for this project was provided by an Australian Research Council Discovery Project Grant: DP150100116—Modelling active play in preschool children using machine learning. Stewart Trost is a member of the ActiGraph Scientific Advisory Board. The authors declare no conflict of interest. The results of the study are presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. The results from the present study do notconstitute endorsement by the American College of Sports Medicine.
1. Cliff DP, Reilly JJ, Okely AD. Methodological considerations in using accelerometers to assess habitual physical activity in children aged 0–5 years. J Sci Med Sport
2. Trost SG. Measurement of physical activity in children and adolescents. Am J Lifestyle Med
3. Janssen X, Cliff DP, Reilly JJ, et al. Predictive validity and classification accuracy of actigraph energy expenditure equations and cut-points in young children. PLoS One
4. Trost SG, Way R, Okely AD. Predictive validity of three ActiGraph energy expenditure equations for children. Med Sci Sports Exerc
5. Trost SG, Loprinzi PD, Moore R, Pfeiffer KA. Comparison of accelerometer cut points for predicting activity intensity in youth. Med Sci Sports Exerc
6. Zhao W, Adolph AL, Puyau MR, Vohra FA, Butte NF, Zakeri IF. Support vector machines classifiers of physical activities in preschoolers. Physiol Rep
7. Nam Y, Park JW. Child activity recognition based on cooperative fusion model of a triaxial accelerometer and a barometric pressure sensor. IEEE J Biomed Health Inform
8. Hagenbuchner M, Cliff DP, Trost SG, Van Tuc N, Peoples GE. Prediction of activity type in preschool children using machine learning techniques. J Sci Med Sport
9. Fairclough SJ, Noonan R, Rowlands AV, Van Hees V, Knowles Z, Boddy LM. Wear compliance and activity in children wearing wrist- and hip-mounted accelerometers. Med Sci Sports Exerc
10. Freedson PS, John D. Comment on “estimating activity and sedentary behavior from an accelerometer on the hip and wrist.” Med Sci Sport Exerc
11. Ruch N, Rumo M, Mäder U. Recognition of activities in children by two uniaxial accelerometers in free-living conditions. Eur J Appl Physiol
12. Ellis K, Kerr J, Godbole S, Lanckriet G, Wing D, Marshall S. A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers. Physiol Meas
13. Cleland I, Kikhia B, Nugent C, et al. Optimal placement of accelerometers for the detection of everyday activities. Sensors (Basel)
14. Strath SJ, Kate RJ, Keenan KG, Welch WA, Swartz AM. Ngram time series model to predict activity type and energy cost from wrist, hip and ankle accelerometers: implications of age. Physiol Meas
15. Groβek A, van Loo C, Peoples GE, Hagenbuchner M, Jones R, Cliff DP. Energy cost of physical activities and sedentary behaviors in young children. J Phys Act Heal
. 2016;13(6 Suppl 1):S7–10.
16. Dieu O, Mikulovic J, Fardy PS, Bui-Xuan G, Béghin L, Vanhelst J. Physical activity using wrist-worn accelerometers: comparison of dominant and non-dominant wrist. Clin Physiol Funct Imaging
17. Ancoli-Israel S, Cole R, Alessi C, Chambers M, Moorcroft W, Pollak CP. The role of actigraphy in the study of sleep and circadian rhythms. Sleep
18. Liu S, Gao RX, Freedson PS. Computational methods for estimating energy expenditure in human physical activities. Med Sci Sports Exerc
19. Trost SG, Zheng Y, Wong WK. Machine learning for activity recognition: hip versus wrist data. Physiol Meas
20. Mannini A, Intille SS, Rosenberger M, Sabatini AM, Haskell W. Activity recognition using a single accelerometer placed at the wrist or ankle. Med Sci Sports Exerc
21. Kerr J, Patterson RE, Ellis K, et al. Objective assessment of physical activity: classifiers for public health. Med Sci Sports Exerc
22. Pavey TG, Gilson ND, Gomersall SR, Clark B, Trost SG. Field evaluation of a random forest activity classifier for wrist-worn accelerometer data. J Sci Med Sport
23. Kuhn M. Building predictive models in R using the Caret package. J Stat Softw
24. Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull
25. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics
26. Puhl J, Greaves K, Hoyt M, Baranowski T. Children’s Activity Rating Scale (CARS): description and calibration. Res Q Exerc Sport
27. Hislop JF, Bulley C, Mercer TH, Reilly JJ. Comparison of accelerometry cut points for physical activity and sedentary behavior in preschool children: a validation study. PediatrExerc Sci
28. Butte NF, Wong WW, Lee JS, Adolph AL, Puyau MR, Zakeri I. Prediction of energy expenditure and physical activity in preschoolers. Med Sci Sports Exerc
29. Ellis K, Kerr J, Godbole S, Staudenmayer J, Lanckriet G. Hip and wrist accelerometer algorithms for free-living behavior classification. Med Sci Sports Exerc
30. Tremblay MS, Carson V, Chaput JP, et al. Canadian 24-hour movement guidelines for children and youth: an integration of physical activity, sedentary behaviour, and sleep. Appl Physiol Nutr Metab
. 2016;41(6 Suppl 3):S311–27.
31. Trost SG, O’Neil M. Clinical use of objective measures of physical activity. Br J Sports Med
32. Lyden K, Kozey SL, Staudenmeyer JW, Freedson PS. A comprehensive evaluation of commonly used accelerometer energy expenditure and MET prediction equations. Eur J Appl Physiol