INTRODUCTION AND PURPOSE
Cerebral palsy (CP) causes disability in childhood, and population-based studies report a global prevalence of approximately 2 per 1000 live births.1 Cerebral palsy is a group of movement and posture disorders due to nonprogressive damage to a developing or immature brain often associated with activity limitation.2 The term “activity limitation,” included in the disability concept of the World Health Organization's International Classification of Functioning, Disability and Health: Children and Youth version (ICF-CY),3 refers to “difficulties an individual may experience in executing activities.” The principal activity limitations in everyday life associated with CP involve problems in motor function. These limit the performance of motor skills such as walking, climbing stairs, or running and the development of other daily activities such as eating, dressing, and grooming. There is a relationship between functional independence and motor impairment.2,4 Therapeutic approaches often focus on a child's motor capacity to carry out tasks and the performance of functional activities. The term “capacity” is defined as what a child can do in a standardized and controlled environment, and “performance” is defined as what a child actually does in his/her daily environment.5 These concepts are essential to consider in the clinical and everyday activity.
Research and clinical management of children and adolescents with CP are hindered by the heterogeneity of the disorder. The use of outcome measures is essential to determine significant changes. These measures help both physical and occupational therapists plan treatment, monitor progress, evaluate the effectiveness of an intervention program, compare and discriminate between individuals, and provide objective information to families.6 In order to design an effective intervention plan adapted to individual motor function development, the Gross Motor Function Classification System (GMFCS)7 was developed. The GMFCS is a standardized system, valid, reliable and stable, for classifying children with CP based on functional abilities and limitations.
In addition, a large number of standardized outcome measures are available to assess motor and functional skills in children with CP. The appropriate choice by pediatric rehabilitation professionals is a challenge. Professionals must make the selection based on whether the goal of the assessment is to discriminate between individuals, to evaluate changes over time, or to predict outcomes or prognosis of the children. Moreover, as Kirshner and Guyatt8 described in their methodological framework, these measures have different implications for validity, reliability, and responsiveness depending on the purpose. More specifically, an evaluative measure must be responsive to change, a discriminative measure must be reliable between assessors, and a predictive measure must provide the same results as a gold standard for criterion validity. Therefore, the outcome measure should fulfill the purpose for which it was created and be applied to a population for which it was developed and validated. It must contain relevant items and be feasible to use.
There are several reviews of outcome measures in CP.6,9–11 However, these reviews did not consider the clinical heterogeneity of CP by reviewing studies that used a representative sample including children and adolescents at all levels of the GMFCS scale. In methodological terms, these reviews were narrative, did not use contemporary search strategies or critical evaluation procedures, and did not make a systematic appraisal of the quality assessment of included articles. Because of this, it is necessary to review information on the quality of the psychometric properties and the use of these measures in research as well as in clinical management. This review provides up-to-date information that can guide and assist pediatric physical rehabilitation professionals to choose the most appropriate instrument to measure significant changes in children with CP.
The primary aims of this systematic review were to (1) examine validity, reliability, responsiveness, and clinical utility of outcome measures to assess changes in motor or functional skills in children and adolescents with CP and (2) evaluate both the quality and the results of the studies of the measurement properties.
This review was reported according to the Preferred Reporting Items for Systematic reviews and Meta-Analysis guideline (PRISMA)12 and prospectively registered in the International Prospective Register of Systematic Reviews (PROSPERO) at the Centre for Reviews and Dissemination (University of York, United Kingdom): CRD42018095108.
Search Strategy and Study Selection
Seven electronic bibliographic databases were systematically searched: PubMed/MEDLINE, ISI Web of Science, Science Direct, CINAHL (through EbscoHost), PEDro, and Biblioteca Virtual de la Salud (BVS). BVS allows free access through IBECS and LILACS databases to Iberoamerican scientific literature (Spain, Portugal as well as Latinoamerica and Caribbean regions).
The first search aimed to identify the available outcome measures used to assess motor or functional skills in children and adolescents with CP up to December 1, 2018. Key search terms were identified from key papers and matched to the Medical Subject Headings (MeSH) index, subsequently were searched as keywords and combining Boolean operators “AND”/“OR” in the search strategy. This search strategy had 3 elements: the construct (“motor skills”) combined with terms related to the target population/diagnosis (“cerebral palsy”) and the measurement instrument (“assessment” OR “measure” OR “tool”) (see Supplemental Digital Content 1, available at: http://links.lww.com/PPT/A274).
Outcome measures identified during the first search and their title (“measure name”) were used as terms for further searches of the 7 electronic databases to December 22, 2018, which aimed to review the psychometric properties of the outcome measures selected (see Supplemental Digital Content 1, available at: http://links.lww.com/PPT/A274).
Reference lists were searched manually to obtain articles not previously identified and minimize selection bias. Manuals and background articles were found to complete the search and obtain exhaustive information on the measures.
Articles were included in our review if they met the following criteria: (1) clinimetric studies of outcome measures for children and adolescents with CP; (2) evaluated a sample of children and adolescents with CP aged 0 to 18 years at all GMFCS levels; (3) outcome measures dedicated, in whole or in part, to assessing motor or functional skills with results on “capacity” or “performance”; (4) validity, reliability, and responsiveness data were available for children and adolescents with CP; and (5) published in peer-reviewed journals in any language. Articles were excluded if (1) the outcome measure used was a classification measure or assessed the quality of life, upper limb, or fine motor skills; (2) administration format was a semistructured interview or questionnaire; and (3) an isolated dimension/domain of evaluation.
To confirm eligibility, titles and abstracts of retrieved articles were screened independently by the first and second authors. Articles that met the inclusion criteria were retained for full-length review. Cases of disagreement or conflicting views were resolved through discussion until consensus was achieved between authors. The full texts of the selected studies, manuals, and background articles were collected and included in a data extraction form.
Data Extraction and Quality Assessment/Risk of bias
To structure this review, the PRISMA statement12 and “COSMIN method for systematic reviews of Patient‐Reported Outcome Measures (PROMs)”13 were used.
Descriptive information and psychometric properties (validity, reliability, responsiveness, and clinical utility) of the assessment tools were extracted using an adapted version of the CanChild Outcome Measures Rating Form.14 This form incorporated the ICF-CY framework3 and is considered the most appropriate and accepted scale to evaluate outcome measures for CP.
To evaluate the quality of the included studies, the first and second authors rated studies independently according to the COSMIN Risk of Bias checklist15; the third author made the final decision in case of disagreement. Four response options were defined for each COSMIN item: “very good,” “adequate,” “doubtful,” and “inadequate” rating (adapted from the score “excellent,” “good,” “fair,” or “poor” described in the previous version). Subsequently, an overall quality score of each study was assigned according to the score obtained by each psychometric property; the final qualification is the lowest score for any item in a box (“worst score counts”).
There was a degree of subjective judgment in the process of using the COSMIN Risk of Bias checklist because, in some articles, the terms and definitions used to describe measurement properties differed from one another. For this reason, and to attain maximum homogeneity, we used the international consensus on taxonomy, terminology, and definitions of measurement properties developed by the COSMIN group.16
The updated criteria for good measurement properties17 were applied to rate the results of the psychometric properties from each study. The first and second authors rated these results as sufficient (+), indeterminate (?), or insufficient (−), resorting to the third author in case of disagreement.
Data Analysis and Best Evidence Synthesis
To ascertain the degree of evidence for each psychometric property, we combined the number of studies, consistency of rating of results according to the updated criteria for good measurement properties, and the quality of studies according to COSMIN. In this way, we followed the recommendations of the Cochrane Back Review Group18 used in other reviews of psychometric properties.19–21 The level of overall evidence was rated as “strong,” “moderate,” “limited,” “conflicting,” or “unknown.” Following the recommendations of the current method, the best evidence synthesis included results from studies rated as “very good,” “adequate,” and “‘doubtful” as well as “inadequate” from the COSMIN Risk of Bias checklist. To adapt the criteria for sample size, we rated as “strong” when the total sample size of included studies was 100 or more, “moderate” for a total sample size between 50 and 99, “limited” for a total sample size between 25 and 49, and “unknown” when the sample size was fewer than 25.19–21
The process of identifying potential articles is depicted in the Figure. In total 12 articles, 2 motor skills outcome measures and 2 functional skills outcome measures were included for data extraction and quality assessment (Table 1).
The 4 assessments that met inclusion criteria were the Gross Motor Function Measure (GMFM) for the -6622 and -8823 items versions, as well as GMFM-66-Item Sets (GMFM-66-IS)24 and GMFM-66 Basal & Ceiling (GMFM-66B&C)25 modality, the Gross Motor Performance Measure (GMPM),26 the Pediatric Evaluation of Disability Inventory (PEDI),27 and the Functional Independence Measure for Children (WeeFIM).28
Many of the most commonly used motor or functional skills outcome measures were excluded for several reasons detailed in Supplemental Digital Content 2 (available at: http://links.lww.com/PPT/A275). For the PEDI and WeeFIM, we accepted those studies administered by clinical observation, excluding those that used semistructured interviews or parent reports. Other studies that used an isolated dimension of the evaluation were excluded because there was evidence that the reliability and validity of the separate dimension scores were not as strong as for the measure as a whole.29
Characteristics, Content, and Clinical Utility of Selected Measures
The characteristics and content of the included measures are summarized in Table 2. All instruments have an evaluative purpose, except GMFM, which can be evaluative (to measure the magnitude of longitudinal change), discriminative (between individuals on the GMFCS), or predictive (when motor curves are used), and PEDI and WeeFIM, which are both evaluative and discriminative. Both GMFM-88 and GMPM measure capacity rated through 5 dimensions related to the execution of motor skills (GMFM) or quality (GMPM). In contrast, PEDI and WeeFIM focus on activities relevant to daily function in both activity and participation domains, as well as on capacity or performance, depending on whether they are administered in a standardized or child's daily environment.
Details on the clinical utility are summarized in Table 3. The administration time depended on the number of items assessed, the skill of the assessor, or the child's level of cooperation and understanding. All instruments used an ordinal point scoring scale and the GMFM and PEDI have a software developed by a Rasch analysis to improve the interpretation of total and change scores.
Methodological Quality of the Studies
The results of the quality assessment of the psychometric properties rated by the COSMIN Risk of Bias Checklist are in Table 4. Of the 12 articles selected, 7 validity properties (structural and construct validity), 18 reliability properties (internal consistency, inter/intrarater, test-retest, and measurement error), and 6 responsiveness properties were assessed.
The quality of the validity properties was rated as “very good” (n = 2), “adequate” (n = 1), and “doubtful” (n = 4). The first and second authors agreed on the rating of all studies. Erroneous statistical methods or the lack of information about study design were the reasons for low scores.22,25,30–32
Reliability properties were rated as “very good” (n = 3), “adequate” (n = 5), “doubtful” (n = 4), and “inadequate” (n = 6). There was an agreement between the first and second authors in all of them except on 2 occasions, where the third author made the final decision.32,33 The main reasons for low scores were the lack of evidence about the participants' stability between administrations25,31,32 and if the test conditions were similar in both measurements (eg, environment, instructions).25,31–33
Responsiveness properties were rated as “adequate” (n = 1), “doubtful,” (n = 3) and “inadequate” (n = 1). The first and second authors agreed in rating (n = 6); consequently, the third author made the final decision for one study.34 Reasons for low scores were important flaws in study design.22,24,29,30,34,35
Two studies were on cross-cultural validity of an instrument (Korean version of the GMFM-88, Hebrew version of the PEDI). Both were rated as “inadequate” due to the absence of information or errors about aspects such as not using similar samples to compare relevant characteristics, expertise of the translators, whether the translation was reviewed by a committee as well as details concerning the approach used to analyze the data (confirmatory factor analysis or regression analyses).32,36
One study adapted the GMFM-88 for children with CP and Cerebral Visual Impairment (CVI) using a Delphi method.33 Reliability (test-retest and interrater) and internal consistency were assessed (Table 4).
Results of the Studies
Following the updated criteria for good measurement properties, the scores of the results of the studies are in Table 4. The studies for validity (n = 7) were rated as “sufficient” (n = 5) and “indeterminate” (n = 2). The results of the studies for reliability (n = 19) were rated as “sufficient” (n = 14) and “indeterminate” (n = 5). For responsiveness (n = 6), the results were rated as “sufficient” (n = 5) and “indeterminate” (n = 1). The update of the criteria for evaluating the results on internal consistency supports values of more than 0.95 of Cronbach α classified as “positive.” In these cases, the third author made the final decision.
Data Analysis and Best Evidence Synthesis
Gross Motor Function Measure
The GMFM versions (GMFM-88, GMFM-66, GMFM-IS, GMFM-B&C, K-GMFM-88, and GMFM-CVI) were the most investigated and were strongest in terms of evidence regarding psychometric properties.
The studies report strong evidence for construct validity24,30 and moderate evidence for responsiveness in the -66 version.24,29,30,35
Strong evidence was found for construct validity of the GMFM-66-Item Sets (GMFM-66-IS) version.22,24,25 However, the GMFM-66-Basal & ceiling approach (GMFM-66-B&C) provided moderate evidence for construct validity.22,25 Both GMFM-IS and GMFM-66-B&C reported moderate evidence for responsiveness22,24 and limited evidence in the case of test-retest reliability and measurement error due to the reduced sample size.25
For the -88 version, validity properties have not yet been studied in children with CP classified across all GMFCS levels. By contrast, other studies reported moderate evidence for internal consistency33 as well as limited evidence both for construct validity31,36 and measurement error33,34 due to the limited sample size and the lack of information about the design or statistical methods of the study, respectively. Conflicting evidence was found for interrater, intrarater reliability and responsiveness due to conflicting findings in multiple studies.29,31,33–36 Furthermore, there was unknown evidence for test-retest reliability due to important flaws related to the lack of information about the level of experience of the assessors and if administrations were independent.33
Gross Motor Performance Measure
With regards to GMPM, only construct validity and inter-rater reliability have been studied in children with CP across all GMFCS levels, with limited evidence for both as a consequence of limited sample sizes.36
Pediatric Evaluation of Disability Inventory
For the Hebrew version of PEDI,32 there was only 1 study on the reliability, internal consistency, and construct validity. Limitations in sample size meant we found moderate evidence for internal consistency as well as limited evidence for reliability and construct validity.
Functional Independence Measure for Children
There was no evidence for other psychometric properties other than structural validity and internal consistency. For both, strong evidence was reported by 1 study.37
In this systematic review, 4 assessments met the inclusion criteria. The results of the quality of their psychometric properties rated by the COSMIN Risk of Bias checklist revealed a wide range of rating in validity, reliability, and responsiveness. Although some results of the updated criteria for good measurement properties for validity, reliability, and responsiveness were rated as “sufficient,” most were rated as “adequate,” “doubtful,” or “indeterminate” due to the lack of information or inappropriate statistical methods.
The results demonstrate that, in contrast to other reviews,6,9–11 this systematic review performed a critical appraisal of the quality of the psychometric properties of the outcome measures. This work provided a summary of the characteristics of the measures, listing the target group, purpose, type, or the psychometric properties studied.
If we consider CP is a neurodevelopmental disorder characterized by a clinical heterogeneity both in type and in distribution, it is difficult to generalize the results obtained in the clinimetric studies to the population with CP if the sample is not represented by all GMFCS levels. The 5 GMFCS levels include participants with different types of CP in various degrees of severity, who might have different profiles of motor function.7,33
We selected those articles that considered all GMFCS levels in their sample to address the heterogeneity of this population. There has been no previous systematic review that considered this aspect. Some measurement properties, such as the reliability, results, and statistical methods, depend on the variation in scores in the study population. The value of the intraclass coefficient correlation (ICC) is usually higher in a heterogeneous population.38 These aspects are relevant and determine whether the results can be generalized.
The GMPM, PEDI, and WeeFIM results depend on a limited number of studies regarding the psychometric properties of validity, reliability, and responsiveness in children and adolescents with CP across all GMFCS levels (Table 4). The 3 instruments provided some evidence about validity and reliability (interrater and internal consistency mainly), but the major drawback was responsiveness, which was not addressed in any of the studies. This is a serious flaw because an instrument used in an evaluative application should be responsive. Together with reliability and measurement error, responsiveness is the most critical measurement property for an evaluative measure.8,19 The studies on the psychometric properties of the GMFM in all versions demonstrated that this measure considers the importance of including a representative sample of CP and report on the results related to all psychometric properties reviewed in this article. Responsiveness was analyzed in 6 studies.
The main aspect that may determine the clinical utility of a measure is the time spent in administration. The use of statistical methods, such as Rasch analysis (GMFM-66, Pediatric Evaluation of Disability Inventory Computer Adaptive Test [PEDI-CAT]) or an algorithmic approach (GMFM-66-IS, GMFM-66-B&C), provides shorter forms that reduce the time necessary to administer, thus reducing the number of items. The training process and the costs (manual, software, and courses), as well as required space and materials, may determine the choice of the measure.
The improvement and changes in some of the selected measures may determine the lack of evidence for the initial versions. This is true of the PEDI-CAT and the Quality Function Measure (a revision of the GMPM), which were not included in this review, because they did not use samples in the studies that included children classified across all GMFCS levels. Therefore, in its clinical and research application, it should be taken into account that its psychometric properties have only been studied in part of the population of children with CP.
The second aim of this review was to evaluate both the quality and the results of the studies of the measurement properties. Three studies of validity were rated “very good.” The remainder of the studies were rated “adequate” or “doubtful,” and 4 studies were rated “inadequate.” Errors in statistical methods for validity, reliability, and responsiveness studies of the GMFM-88/-66 such as P values for testing hypotheses,30 Spearman's rank coefficient for inter/intrarater reliability,31 and flaws in the design of the studies (eg, lack of information as to the ICC calculation as absolute or relative agreement, if test conditions were similar or total number of assessments and time interval between them not appropriate)24,25,31,34,35 were the contributors to low ratings. Responsiveness studies of the GMFM were the base of evidence about this property; therefore, some errors or inconsistencies could be reasonable since to date no results were reported on this psychometric property.
The most investigated measure reporting results on the different psychometric properties was the GMFM in all versions. In terms of quality of evidence, strong construct validity was found in several studies.22,24,25,30 However, the results of reliability and responsiveness properties were significantly heterogeneous. Evidence should be improved, especially for reliability. The lack of information on study design and small samples reduced ratings. There was strong evidence for construct validity and internal consistency for the WeeFIM, but the results came from only 1 study.37 Moderate evidence was reported for the PEDI from 1 study that assessed reliability, internal consistency, and construct validity.32 The evidence for the GMPM was limited for interrater and construct validity.36
Limitations and Recommendations
There are several limitations to this review. The absence of evidence and information regarding some psychometric properties (eg, the independence of administrations, time interval, participants' stability in the interim period, and the test conditions in the measurements for intrarater and test-retest reliability) make it difficult to determine and judge the quality of the studies.
Although the COSMIN Risk of Bias checklist is the most suitable and commonly used standardized method to assess the quality of the measurement properties and to identify strengths and weakness in study designs, we observed that floor effects were frequent. Studies rated as “very good” initially were rated later with a lower score, even as “inadequate,” for only 1 item, due to the method used to score them (“the lower score counts”). Additionally, we found difficulties in scoring reliability aspects because COSMIN uses the same items to assess the properties of interrater, intrarater, and test-retest reliability.
Regarding the application of the updated criteria for good measurement properties, it has been observed that concerning the previous version published by Terwee et al some changes made have generated certain limitations when evaluating the results. The latest version does not specify that values more than 0.95 of the Cronbach α for internal consistency should be considered as “negative” since they indicate item redundancy.
The lack of consensus in taxonomy and statistical score standards complicated the interpretation of information from some articles because the terms and definitions used to describe measurement properties differed from one another. To address this, we reviewed background papers and used the COSMIN terminology and taxonomy.16 Subjective judgment in the assessment of quality was minimized through the independent reviews by the first and second authors and subsequent consensus with a third person as necessary.
Small sample sizes were a limitation in several studies,25,29,35,36 although the high estimates of construct validity and reliability, and the narrow 95% confidence intervals, suggested that the sample sizes were adequate.25 Ko and Kim36 suggested that their results should not be generalized to all children with CP due to the limitations in sample and age range (10 months to 9 years 9 months). Wang and Yang35 suggested that this aspect may be related to their wide confidence intervals for specificity and sensitivity in the receiver operating characteristic curves to assess responsiveness. Lundkvist Josenby et al29 found longitudinal construct validity for children in GMFCS I-V levels using GMFM-88 in a long-term follow-up study but affirmed that a larger sample might yield more severity-dependent differences in responsiveness results. In this type of study, the authors must consider that the sample size depends on the psychometric property assessed and the chosen method. Factor analyses and IRT require a large sample size (n = 100-500), while for CTT, a smaller sample size is adequate (n = 50-100).39
Recommendations for further research include the importance of considering and reviewing the construct and matching the sample to the characteristics of the population. In such studies, uniform criteria, such as the use of a sample size including children across all GMFCS levels, may improve the quality of the studies and facilitate generalization of the results to the population with CP. In this way, “capacity” and “performance” measures may be combined to obtain global information about significant changes in both motor and functional skills of children or adolescents with CP. This would be useful in the clinic and for ensuring satisfactory evolution. Moreover, in order to improve the degree of evidence of psychometric properties, authors should provide a complete description of the design and the method used in their study.40
Four measures to assess motor or functional skills in children with CP were identified in this review. GMFM in all its versions was the most widely investigated providing the best results, with the strongest evidence for validity and responsiveness properties in studies with a sample of children and adolescents with CP across all GMFCS levels. However, reliability evidence should be improved to determine stability. Although other measures, such as the GMPM, PEDI, and WeeFIM, have reported interesting results, further studies, especially of responsiveness, are needed to provide evidence in a heterogeneous sample.
1. Stavsky M, Mor O, Mastrolia SA, Greenbaum S, Than NG, Erez O. Cerebral palsy
-trends in epidemiology and recent development in prenatal mechanisms of disease, treatment, and prevention. Front Pediatr. 2017;5:21.
2. Rosenbaum P, Paneth N, Leviton A, et al A report: the definition and classification of cerebral palsy
April 2006. Dev Med Child Neurol Suppl. 2007;109:8–14.
3. World Health Organization. International Classification of Functioning, Disability and Health: Children & Youth Version. https://apps.who.int/iris/bitstream/10665/43737/1/9789241547321_eng.pdf?ua=1
. Accessed January 28, 2018.
4. Ostensjø S, Carlberg EB, Vøllestad NK. Everyday functioning in young children with cerebral palsy
: functional skills
, caregiver assistance, and modifications of the environment. Dev Med Child Neurol. 2003;45(9):603–612.
5. Holsbeeke L, Ketelaar M, Schoemaker MM, Gorter JW. Capacity, capability, and performance: different constructs or three of a kind? Arch Phys Med Rehabil. 2009;90(5):849–855.
6. Ketelaar M, Vermeer A, Helders PJ. Functional motor abilities of children with cerebral palsy
: a systematic literature review of assessment measures. Clin Rehabil. 1998;12(5):369–380.
7. Palisano R, Rosenbaum P, Walter S, Russell D, Wood E, Galuppi B. Development and reliability of a system to classify gross motor function in children with cerebral palsy
. Dev Med Child Neurol. 1997;39(4):214–223.
8. Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis. 1985;38(1):27–36.
9. James S, Ziviani J, Boyd R. A systematic review of activities of daily living measures for children and adolescents with cerebral palsy
. Dev Med Child Neurol. 2014;56(3):233–244.
10. Harvey A, Robin J, Morris ME, Graham HK, Baker R. A systematic review of measures of activity limitation for children with cerebral palsy
. Dev Med Child Neurol. 2008;50(3):190–198.
11. Debuse D, Brace H. Outcome measures of activity for children with cerebral palsy
: a systematic review. Pediatr Phys Ther. 2011;23(3):221–231.
12. Moher D, Liberati A, Tetzlaff J, Altman DG, Group TP. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097.
13. Terwee CB, Prinsen CAC, Chiarotto A, et al COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study. Qual Life Res. 2018;27(5):1159–1170.
14. CanChild Centre for Disability Research. Outcome Measures Rating Form. https://www.canchild.ca/system/tenon/assets/attachments/000/000/372/original/measrate.pdf
. Published 2004. Accessed December 19, 2018.
15. Mokkink LB, de Vet HCW, Prinsen CAC, et al COSMIN Risk of Bias checklist for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1171–1179.
16. Mokkink LB, Terwee CB, Patrick DL, et al The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737–745.
17. Prinsen CAC, Mokkink LB, Bouter LM, et al COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1147–1157.
18. van Tulder M, Furlan A, Bombardier C, Bouter L, Editorial Board of the Cochrane Collaboration Back Review Group. Updated method guidelines for systematic reviews in the Cochrane collaboration back review group. Spine. 2003;28(12):1290–1299. doi:10.1097/01.BRS.0000065484.95996.AF.
19. Ammann-Reiffer C, Bastiaenen CH, de Bie RA, van Hedel HJ. Measurement properties of gait-related outcomes in youth with neuromuscular diagnoses: a systematic review. Phys Ther. 2014;94(8):1067–1082.
20. Benfer KA, Weir KA, Boyd RN. Clinimetrics of measures of oropharyngeal dysphagia for preschool children with cerebral palsy
and neurodevelopmental disabilities: a systematic review. Dev Med Child Neurol. 2012;54(9):784–795.
21. Gerber CN, Labruyère R, van Hedel HJA. Reliability and responsiveness of upper limb motor assessments for children with central neuromotor disorders: a systematic review. Neurorehabil Neural Repair. 2016;30(1):19–39.
22. Avery LM, Russell DJ, Rosenbaum PL. Criterion validity of the GMFM-66 item set and the GMFM-66 basal and ceiling approaches for estimating GMFM-66 scores. Dev Med Child Neurol. 2013;55(6):534–538.
23. Russell DJ, Rosenbaum PL, Cadman DT, Gowland C, Hardy S, Jarvis S. The gross motor function measure: a means to evaluate the effects of physical therapy. Dev Med Child Neurol. 1989;31(3):341–352.
24. Russell DJ, Avery LM, Walter SD, et al Development and validation of item sets to improve efficiency of administration of the 66-item Gross Motor Function Measure in children with cerebral palsy
. Dev Med Child Neurol. 2010;52(2):e48–e54.
25. Brunton LK, Bartlett DJ. Validity and reliability of two abbreviated versions of the Gross Motor Function Measure. Phys Ther. 2011;91(4):577–588.
26. Boyce WF, Gowland C, Hardy S, et al Development of a quality-of-movement measure for children with cerebral palsy
. Phys Ther. 1991;71(11):820–828; discussion 828-832.
27. Haley SM. Pediatric Evaluation of Disability Inventory (PEDI): Development, Standardization and Administration Manual. Boston, MA: New England Medical Center; 1992.
28. Msall ME, DiGaudio K, Rogers BT, et al The Functional Independence Measure for Children (WeeFIM): conceptual basis and pilot use in children with developmental disabilities. Clin Pediatr (Phila). 1994;33(7):421–430.
29. Lundkvist Josenby A, Jarnlo GB, Gummesson C, Nordmark E. Longitudinal construct validity of the GMFM-88 total score and goal total score and the GMFM-66 score in a 5-year follow-up study. Phys Ther. 2009;89(4):342–350.
30. Russell DJ, Avery LM, Rosenbaum PL, Raina PS, Walter SD, Palisano RJ. Improved scaling of the gross motor function measure for children with cerebral palsy
: evidence of reliability and validity. Phys Ther. 2000;80(9):873–885.
31. Beckung E, Carlsson G, Carlsdotter S, Uvebrant P. The natural history of gross motor development in children with cerebral palsy
aged 1 to 15 years. Dev Med Child Neurol. 2007;49(10):751–756.
32. Elad D, Barak S, Eisenstein E, Bar O, Herzberg O, Brezner A. Reliability and validity of Hebrew Pediatric Evaluation of Disability Inventory (PEDI) in children with cerebral palsy
: health care professionals vs. mothers. J Pediatr Rehabil Med. 2012;5(2):107–115.
33. Salavati M, Krijnen WP, Rameckers EAA, et al Reliability of the modified Gross Motor Function Measure-88 (GMFM-88) for children with both Spastic Cerebral Palsy
and Cerebral Visual Impairment: a preliminary study. Res Dev Disabil. 2015;45-46:32–48.
34. Ko J, Kim M. Reliability and responsiveness of the gross motor function measure-88 in children with cerebral palsy
. Phys Ther. 2013;93(3):393–400.
35. Wang H-Y, Yang YH. Evaluating the responsiveness of 2 versions of the gross motor function measure for children with cerebral palsy
. Arch Phys Med Rehabil. 2006;87(1):51–56.
36. Ko J, Kim M. Inter-rater reliability of the K-GMFM-88 and the GMPM for children with cerebral palsy
. Ann Rehabil Med. 2012;36(2):233–239.
37. Park EY, Kim WH, Choi YI. Factor analysis of the WeeFIM in children with spastic cerebral palsy
. Disabil Rehabil. 2013;35(17):1466–1471.
38. Terwee CB, Bot SDM, de Boer MR, et al Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42.
39. Terwee CB, Mokkink LB, Knol DL, Ostelo RWJG, Bouter LM, de Vet HCW. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21(4):651–657.
40. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1996;1(1):30–46.
41. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–428.
42. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005;19(1):231–240.