G-coefficients for interrater reliability of the MLSRI total score were 0.81 for the usual interventions and 0.28 for the VR interventions. Standard error of measurement was 2.8% for usual interventions and 4.7 % for VR. Bland-Altman graphs suggest no systematic biases between raters in MLSRI scores for either intervention. G-coefficients, shown in Table 4 for the 6 categories of the MLSRI, varied from 0.36 to 0.65 for usual interventions and from 0.17 to 0.72 for VR interventions.
Feasibility of MLSRI Rating
Table 5 provides descriptive results of the feasibility outcomes summarized by rater and intervention. The repeated-measures analysis of variance showed no significant differences by rater, intervention, or occasions in difficulty completing the MLSRI. Rater 1 was more confident than rater 2 in completing the MLSRI (P = .05), and this effect was dependent on the intervention, such that it occurred with the VR ratings (P = .036). There was a significant main effect for difference between raters for time to complete the MLSRI, with rater 2 taking more time than rater 1 (P < .001), and also for the difference between interventions with VR taking less time to rate than usual interventions (P = .003). Usual interventions were an average of 34.5 minutes (SD, 7.5) long, whereas the length of VR interventions was an average of 14.5 minutes (SD, 5.6).
Comparison of the use of MLSs between usual and VR-based interventions requires a reliable and valid measurement instrument. This study builds upon initial intra- and interrater reliability investigations15 of the newly developed MLSRI, and evaluates and compares the interrater reliability and feasibility of the revised MLSRI between usual and VR-based interventions.
The excellent interrater reliability for usual sessions for the total score demonstrates that it is possible for different raters to consistently use the MLSRI to differentiate between observed MLS use within videotaped intervention sessions. The interrater reliability for the total score (g-coefficient, 0.81) is improved from previous testing with the original MLSRI with this population in which the ICC was 0.50.15 This improvement may be due to greater clarity in item definitions and/or more rater experience with the instrument. High total score reliability may support MLSRI use in research studies in which the goal is to understand differences in MLS application between different therapists, intervention approaches, or children. However, category score interrater reliability g-coefficients were only adequate, although they were improved in 3 of the 5 categories (instructions, practice, and conduct) as compared with the ICCs achieved in the previous investigation.15
The 2 categorical items demonstrated varying degrees of agreement across sessions. It was not always possible for the raters to identify whether a person in the videotape was a caregiver (item 30), and the nature of recommendations for task practice outside of therapy (item 31) may not have been defined clearly enough within the rater training materials. This latter item represents the motor learning variables of amount of practice and transfer/generalization to real-world tasks. In addition to revisiting rater training materials for these items, it will be important to consider whether they could be changed to the 5-point scale, as opposed to capturing presence or absence with a categorical “yes or no” response.
In comparison with usual interventions, interrater reliability for the VR intervention total score was poor. Because the VR intervention videotapes were of shorter duration than the usual components, there was less time for raters to observe therapists interacting with clients, undertaking different tasks, or demonstrating any of the actions or verbalizations relevant to the MLSRI. This difference in intervention length meant that raters had to make decisions based on fewer data points than were available for usual interventions. If VR was used in similar ways with all children at both time points, this would reduce both the intrasubject and the intersubject variance and contribute to the poor reliability findings. Understanding whether these data points themselves were homogeneous as compared with usual interventions requires an additional task analysis through review of the intervention videotapes.
A systematic pattern of differences was found between raters for VR interventions (ie, the mean scores demonstrate that rater 1 consistently awarded lower scores than did rater 2, except for the “guidance” category for both interventions). Differences between raters' mean scores (Table 3) illustrate that the most problematic categories of the MLSRI for VR interventions were “practice,” “conduct,” and “VR.” Despite the provision of VR-specific instructions with respect to item rating, several items on the MLSRI may be more challenging to rate for VR than usual interventions. For example, items within the “practice” category including “repetitive,” “whole (rather than part),” “variable (rather than constant),” and “progressive” may have been more difficult to rate when raters did not have a good understanding of which game was being played, when games were changed or when difficulty levels were progressed, or whether different games were of differing challenge levels. Videotapes only captured the child and therapist, not the television screen. The question of whether a single trial of a VR game could in itself represent variable practice (given the potential for needing to react in different ways to unexpected and changing stimuli) was not addressed during rater training. Items within the conduct category may also have been more challenging to rate for VR interventions where differences in games were difficult to capture or where fewer environmental resources may have been used.
The 3 “VR” category items are among the most subjective on the instrument, requiring raters to judge both child and therapist intent behind observed verbalizations, actions, or facial expressions. These are the only items that endeavor to capture how the therapist capitalizes on the purported motor learning attributes of the Wii system itself (ie, its visual or auditory information and the motivation that it provides). Although capturing whether or not the Wii features may be offering motor learning benefits is important, the results demonstrate that these items require clarification.
Implications for Clinical and Research Use of the MLSRI
Ultimately, with further refinements, the clinical potential of the MLSRI is in description and measurement of the motor learning content of interventions for children and youth with ABI and comparison of MLS use between different types of interventions. This can enhance our understanding of clinical practice in this area and allow exploration of whether using MLSs can influence the motor learning outcomes important to physical therapists, such as retention and transfer of skills learned in therapy to daily life activities.
The high interrater reliability for usual interventions suggests that the MLSRI could be used by trained PT raters in clinical practice. However, the raters in this study required a time-intensive training process, which may limit clinical feasibility and uptake. Whether or not therapists would require the same extent of training as the student raters used in this study, needs to be determined. In research, the instrument could be used to compare use of MLSs between different rehabilitation settings or between therapists. Regardless of context, the MLSRI requires responsiveness evaluation if it is to be used to measure change over time, for example, before or after a therapist takes part in a knowledge translation initiative about MLS use.
Issues with the reliability for rating VR interventions limit recommendations for use in this area. Further work with the instrument for these interventions is needed to address whether these issues relate to rater training or experience with VR, issues with the instrument, logistical issues with the videotapes, or the ways in which the therapists used the VR in this study.
Although the 2 raters had each achieved excellent intrarater reliability in a previous study,15 they were PT students who did not have a great deal of experience either providing or observing therapy sessions. Although a lack of preconceived opinions may have been a positive in terms of augmenting the influence of rater training, the lack of experience may have affected their ability to recognize some MLSs and to judge therapeutic intent of observed interventions if they were more subtle, perhaps causing them to question their judgment and leading to more variable ratings.
It is not possible to estimate reliability confidence intervals when using g-coefficients, and this prevents interpretation of precision of the reliability estimates.
The study's small sample size likely limited variance in the use of MLSs between therapists and between videotapes, which was compounded for VR interventions by their decreased duration. Capturing each client at 2 occasions during their rehabilitation was a strategy to address this, but it may be that therapists adopt a particular approach or style with an individual child or it may be that they have a particular approach or style in general that is invariant, regardless of the child. It is also possible that the time between the 2 occasions may not have been long enough to capture any changes in the child that would cause the therapists to use MLSs differently.
Finally, as new commercially available VR video games are developed and integrated into practice, it will be important to understand whether the VR-specific items are relevant to these new technologies.
To enhance rater training materials, results from this study will be used to revise item wording and definitions, and the study videotapes will provide additional examples illustrating range of MLS application of each item. Subsequent reliability investigations require larger sample sizes and greater diversity in therapists and practice settings. With further evidence of its psychometric properties, the MLSRI may be used to determine if MLS application within usual interventions is related to intervention outcomes or to child characteristics. Using the MLSRI to determine which MLSs are used most frequently in practice may inform research to specifically evaluate the effectiveness of those MLSs. As described earlier, further instrument refinement and evaluation will be undertaken for subsequent reliability investigations relating to MLSRI use to rate VR-based interventions. In general, little is known about how using VR influences therapist behavior and decision making. A greater understanding of this, including qualitative description of therapists' perspectives on VR use in practice,20 will contribute to refining the VR-specific items.
Exploring the use of MLSs is an important perspective to describe PT interventions for children and youth with ABI. The MLSRI, a newly developed instrument to quantify the use of MLSs in practice, demonstrated excellent reliability for usual interventions. Issues with rater training, instrument item relevance to VR interventions, and characteristics of the videotaped sessions had an effect on the reliability of the MLSRI to rate VR-based interventions. With further development of rater training materials and ongoing psychometric property testing, the MLSRI could be a useful tool to measure the MLS content of usual and VR-based interventions for children and youth with ABI.
The authors thank the therapists and children for participation and the 2 physiotherapy student raters, Priyanka Banerjee-Guenette and Megan Pfeiffer, for assistance. The authors also thank Susan Cohen (research assistant) and Dr Steven Hanna and Professor Paul Stratford for their assistance with data analysis.
1. Taylor HG. Research on outcomes of pediatric traumatic brain injury: current advances and future directions. Dev Neuropsych. 2004;25(1–2):199–225.
2. Dumas HM, Haley SM, Carey TM, Shen NP. The relationship between functional mobility and the intensity of physical therapy intervention in children with traumatic brain injury. Pediatr Phys Ther. 2004;16:157–164.
3. Haley SM, Baryza MJ, Webster HC. Pediatric rehabilitation and recovery of children with traumatic brain injuries. Pediatr Phys Ther. 1992;4:24–30.
4. Beaulieu CL. Rehabilitation and outcome following pediatric traumatic brain injury. Surg Clin North Am. 2002;82(2):393–408.
5. Teplicky R, Law M, Rosenbaum P, Stewart D, DeMatteo C, Rumney P. Effective rehabilitation for children and adolescents with brain injury: evaluating and disseminating the evidence. Arch Phys Med Rehabil. 2005;86(5):924–931.
6. Nudo RJ, Plautz EJ, Frost SB. Role of adaptive plasticity in recovery of function after damage to motor cortex. Muscle Nerve. 2001;24(8):1000–1019.
7. Whyte J, Hart H. It's more than a black box; It's a Russian doll: defining rehabilitation treatments. Am J Phys Med Rehabil. 2003;82(8):639.
8. Giza CC, Kolb B, Harris NG, Asarnow RF, Prins ML. Hitting a moving target: basic mechanisms of recovery from acquired developmental brain injury. Dev Neurorehabil. 2009;12(5):255–268.
9. Schmidt RA, Lee TD. Motor Control and Learning: A Behavioral Emphasis. 4th ed. Champaign, IL: Human Kinetics; 2005.
10. Schmidt RA. Motor learning principles for physical therapy. In: Lister MJ, ed. Contemporary Management of Motor Control Problems: Proceedings of the II-Step Conference
. Fredericksberg, VA: Foundation for Physical Therapy; 1991:49–62
11. Levac D, Missiuna C, Wishart L, DeMatteo C, Wright V. Documenting the content of physical therapy for children with acquired brain injury: development and validation of the Motor Learning Strategy Rating Instrument. Phys Ther. 2011;91(5):689–699.
12. Babikian T, Asarnow R. Neurocognitive outcomes and recovery after pediatric TBI: meta-analytic review of the literature. Neuropsychol. 2009;23(3):283–296.
13. Deutsch JE, Borbely M, Filler J, Huhn K, Guarrera-Bowlby P. Use of a low-cost, commercially available gaming console (Wii) for rehabilitation of an adolescent with cerebral palsy. Phys Ther. 2008;88(10):1–12.
14. Saposnik G, Mamdani M, Bayley M, et al. Effectiveness of virtual reality exercises in stroke rehabilitation (EVREST): rationale, design and protocol of a pilot randomized controlled trial assessing the Wii gaming system. Int J Stroke. 2010;5:47–51.
15. Kamath T, Pfeifer M, Banerjee-Guenette P, et al. Reliability of the motor learning strategy rating instrument (MLSRI) for children and youth with acquired brain injury (ABI). Phys Occup Ther Pediatr. 2012;32(3):288–305.
16. Cronbach LJ, Nanda H, Rajaratnam N. The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. New York, NY: Wiley; 1972.
17. Mushquash C, O'Connor BP. SPSS and SAS programs for generalizability theory analyses. Behav Res Meth. 2006;38(3):542–547.
18. Boodoo GM, O'Sullivan P. Obtaining generalizability coefficients for clinical evaluations. Eval Health Prof. 1982;5:345–358.
19. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174.
20. Levac D, Miller P, Missiuna C. Usual and virtual reality video game–based physiotherapy interventions for children and youth with acquired brain injuries. Phys Occup Ther Pediatr. 2012;32(2):180–195; doi: 10.3109/01942638.2011.616266.
adolescent; brain injuries/rehabilitation; child; clinical decision making; disabled children/rehabilitation; motor skills; physical therapy modalities; psychomotor performance; reproducibility of results; sensory feedback; therapy/computer assisted methods© 2013 Lippincott Williams & Wilkins, Inc.