Measures of Motor and Functional Skills for Children With Cerebral Palsy: A Systematic Review : Pediatric Physical Therapy

Secondary Logo

Journal Logo

SYSTEMATIC REVIEWS

Measures of Motor and Functional Skills for Children With Cerebral Palsy: A Systematic Review

Ferre-Fernández, Marina PT, OT, MSc; Murcia-González, María Antonia PT, PhD; Barnuevo Espinosa, María Dolores PhD; Ríos-Díaz, José PT, PhD, MSc, BS

Author Information
Pediatric Physical Therapy 32(1):p 12-25, January 2020. | DOI: 10.1097/PEP.0000000000000661

Purpose: 

To review the level of evidence of the psychometric properties of outcome measures for motor or functional skills for children with cerebral palsy classified across I to V levels of the Gross Motor Function Classification System.

Methods: 

A systematic search was completed in PubMed/MEDLINE, ISI Web of Science, CINAHL, and 4 complementary databases. The COSMIN Risk of Bias checklist and the updated criteria for good measurement properties were applied to assess the quality.

Results: 

Four outcome measures were identified from 12 articles: Gross Motor Function Measure, Gross Motor Performance Measure, Pediatric Evaluation of Disability Inventory, and Functional Independence Measure for Children. Evidence levels for validity, reliability, and responsiveness varied among measures.

Conclusions: 

Gross Motor Function Measure in all versions was the most investigated measure providing the best results, with the strongest evidence for validity and responsiveness properties. Reliability evidence should be improved to determine stability.

INTRODUCTION AND PURPOSE

Cerebral palsy (CP) causes disability in childhood, and population-based studies report a global prevalence of approximately 2 per 1000 live births.1 Cerebral palsy is a group of movement and posture disorders due to nonprogressive damage to a developing or immature brain often associated with activity limitation.2 The term “activity limitation,” included in the disability concept of the World Health Organization's International Classification of Functioning, Disability and Health: Children and Youth version (ICF-CY),3 refers to “difficulties an individual may experience in executing activities.” The principal activity limitations in everyday life associated with CP involve problems in motor function. These limit the performance of motor skills such as walking, climbing stairs, or running and the development of other daily activities such as eating, dressing, and grooming. There is a relationship between functional independence and motor impairment.2,4 Therapeutic approaches often focus on a child's motor capacity to carry out tasks and the performance of functional activities. The term “capacity” is defined as what a child can do in a standardized and controlled environment, and “performance” is defined as what a child actually does in his/her daily environment.5 These concepts are essential to consider in the clinical and everyday activity.

Research and clinical management of children and adolescents with CP are hindered by the heterogeneity of the disorder. The use of outcome measures is essential to determine significant changes. These measures help both physical and occupational therapists plan treatment, monitor progress, evaluate the effectiveness of an intervention program, compare and discriminate between individuals, and provide objective information to families.6 In order to design an effective intervention plan adapted to individual motor function development, the Gross Motor Function Classification System (GMFCS)7 was developed. The GMFCS is a standardized system, valid, reliable and stable, for classifying children with CP based on functional abilities and limitations.

In addition, a large number of standardized outcome measures are available to assess motor and functional skills in children with CP. The appropriate choice by pediatric rehabilitation professionals is a challenge. Professionals must make the selection based on whether the goal of the assessment is to discriminate between individuals, to evaluate changes over time, or to predict outcomes or prognosis of the children. Moreover, as Kirshner and Guyatt8 described in their methodological framework, these measures have different implications for validity, reliability, and responsiveness depending on the purpose. More specifically, an evaluative measure must be responsive to change, a discriminative measure must be reliable between assessors, and a predictive measure must provide the same results as a gold standard for criterion validity. Therefore, the outcome measure should fulfill the purpose for which it was created and be applied to a population for which it was developed and validated. It must contain relevant items and be feasible to use.

There are several reviews of outcome measures in CP.6,9–11 However, these reviews did not consider the clinical heterogeneity of CP by reviewing studies that used a representative sample including children and adolescents at all levels of the GMFCS scale. In methodological terms, these reviews were narrative, did not use contemporary search strategies or critical evaluation procedures, and did not make a systematic appraisal of the quality assessment of included articles. Because of this, it is necessary to review information on the quality of the psychometric properties and the use of these measures in research as well as in clinical management. This review provides up-to-date information that can guide and assist pediatric physical rehabilitation professionals to choose the most appropriate instrument to measure significant changes in children with CP.

The primary aims of this systematic review were to (1) examine validity, reliability, responsiveness, and clinical utility of outcome measures to assess changes in motor or functional skills in children and adolescents with CP and (2) evaluate both the quality and the results of the studies of the measurement properties.

METHODS

This review was reported according to the Preferred Reporting Items for Systematic reviews and Meta-Analysis guideline (PRISMA)12 and prospectively registered in the International Prospective Register of Systematic Reviews (PROSPERO) at the Centre for Reviews and Dissemination (University of York, United Kingdom): CRD42018095108.

Search Strategy and Study Selection

Seven electronic bibliographic databases were systematically searched: PubMed/MEDLINE, ISI Web of Science, Science Direct, CINAHL (through EbscoHost), PEDro, and Biblioteca Virtual de la Salud (BVS). BVS allows free access through IBECS and LILACS databases to Iberoamerican scientific literature (Spain, Portugal as well as Latinoamerica and Caribbean regions).

The first search aimed to identify the available outcome measures used to assess motor or functional skills in children and adolescents with CP up to December 1, 2018. Key search terms were identified from key papers and matched to the Medical Subject Headings (MeSH) index, subsequently were searched as keywords and combining Boolean operators “AND”/“OR” in the search strategy. This search strategy had 3 elements: the construct (“motor skills”) combined with terms related to the target population/diagnosis (“cerebral palsy”) and the measurement instrument (“assessment” OR “measure” OR “tool”) (see Supplemental Digital Content 1, available at: https://links.lww.com/PPT/A274).

Outcome measures identified during the first search and their title (“measure name”) were used as terms for further searches of the 7 electronic databases to December 22, 2018, which aimed to review the psychometric properties of the outcome measures selected (see Supplemental Digital Content 1, available at: https://links.lww.com/PPT/A274).

Reference lists were searched manually to obtain articles not previously identified and minimize selection bias. Manuals and background articles were found to complete the search and obtain exhaustive information on the measures.

Articles were included in our review if they met the following criteria: (1) clinimetric studies of outcome measures for children and adolescents with CP; (2) evaluated a sample of children and adolescents with CP aged 0 to 18 years at all GMFCS levels; (3) outcome measures dedicated, in whole or in part, to assessing motor or functional skills with results on “capacity” or “performance”; (4) validity, reliability, and responsiveness data were available for children and adolescents with CP; and (5) published in peer-reviewed journals in any language. Articles were excluded if (1) the outcome measure used was a classification measure or assessed the quality of life, upper limb, or fine motor skills; (2) administration format was a semistructured interview or questionnaire; and (3) an isolated dimension/domain of evaluation.

To confirm eligibility, titles and abstracts of retrieved articles were screened independently by the first and second authors. Articles that met the inclusion criteria were retained for full-length review. Cases of disagreement or conflicting views were resolved through discussion until consensus was achieved between authors. The full texts of the selected studies, manuals, and background articles were collected and included in a data extraction form.

Data Extraction and Quality Assessment/Risk of bias

To structure this review, the PRISMA statement12 and “COSMIN method for systematic reviews of Patient‐Reported Outcome Measures (PROMs)”13 were used.

Descriptive information and psychometric properties (validity, reliability, responsiveness, and clinical utility) of the assessment tools were extracted using an adapted version of the CanChild Outcome Measures Rating Form.14 This form incorporated the ICF-CY framework3 and is considered the most appropriate and accepted scale to evaluate outcome measures for CP.

To evaluate the quality of the included studies, the first and second authors rated studies independently according to the COSMIN Risk of Bias checklist15; the third author made the final decision in case of disagreement. Four response options were defined for each COSMIN item: “very good,” “adequate,” “doubtful,” and “inadequate” rating (adapted from the score “excellent,” “good,” “fair,” or “poor” described in the previous version). Subsequently, an overall quality score of each study was assigned according to the score obtained by each psychometric property; the final qualification is the lowest score for any item in a box (“worst score counts”).

There was a degree of subjective judgment in the process of using the COSMIN Risk of Bias checklist because, in some articles, the terms and definitions used to describe measurement properties differed from one another. For this reason, and to attain maximum homogeneity, we used the international consensus on taxonomy, terminology, and definitions of measurement properties developed by the COSMIN group.16

The updated criteria for good measurement properties17 were applied to rate the results of the psychometric properties from each study. The first and second authors rated these results as sufficient (+), indeterminate (?), or insufficient (−), resorting to the third author in case of disagreement.

Data Analysis and Best Evidence Synthesis

To ascertain the degree of evidence for each psychometric property, we combined the number of studies, consistency of rating of results according to the updated criteria for good measurement properties, and the quality of studies according to COSMIN. In this way, we followed the recommendations of the Cochrane Back Review Group18 used in other reviews of psychometric properties.19–21 The level of overall evidence was rated as “strong,” “moderate,” “limited,” “conflicting,” or “unknown.” Following the recommendations of the current method, the best evidence synthesis included results from studies rated as “very good,” “adequate,” and “‘doubtful” as well as “inadequate” from the COSMIN Risk of Bias checklist. To adapt the criteria for sample size, we rated as “strong” when the total sample size of included studies was 100 or more, “moderate” for a total sample size between 50 and 99, “limited” for a total sample size between 25 and 49, and “unknown” when the sample size was fewer than 25.19–21

RESULTS

Study Selection

The process of identifying potential articles is depicted in the Figure. In total 12 articles, 2 motor skills outcome measures and 2 functional skills outcome measures were included for data extraction and quality assessment (Table 1).

F1
Fig.:
Flow diagram of the selection process for studies.
TABLE 1 - Studies of the Psychometric Properties of Outcome Measures
Measure Study Psychometric Property Evaluated Sample
Total Type of CP Distribution GMFCS (n) Age, Mean (Range)a
GMFM-66 Russell et al30 Construct validity
Responsiveness
228 173 SP, 17 DT/ATT
5 ATX, 15 HYP, 18 MX
42 HEMI, 82 QUA/TET
74 DI, 30 TRI
I (61), II (35), III (49), IV (48), V (35)b 6 y 6 mo (1 y 7 mo-11 y 10 mo), SD = 2 y 10 mo
GMFM-66/-66-IS Russell et al24 Construct validity 227 204 SP, 6 DT, 4 ATX, 13 MX 110 BIL, 94 UNI I (71), II (25), III (27), IV (26), V (23), I-III (55)c 7 (1 y 4 mo-13 y 8 mo), SD = 4 y 6 mo
Responsiveness 110 N/S N/S N/S N/S
GMFM-66-IS/-66-B&C Brunton and Bartlett25 Construct validity
Reliability (test-retest)
Measurement error
26 N/S 5 HEMI, 8 DI, 2 TRI, 11 QUA I (5), II (4), III (6), IV (7), V (4) 4 y 1 mo (1 y 11 mo-6 y 5 mo), SD = 1 y 2 mo
Avery et al22 Construct validity 227 204 SP, 6 DT, 4 ATX, 13 MX 110 BIL, 94 UNI I (71), II (24), III (27), IV (26), V (21), I-III (55) 6 y 11 mo (1-13 y), SD = 4 y 6 mo
Responsiveness 109 108 SP 54 BIL, 54 UNI N/S N/S
GMFM-88/-66 Wang and Yang35 Responsiveness 65 53 SP, 4 ATT, 8 HYP 11 HEMI, 23 QUA, 31 DI I (16), II (8), III (12), IV (18), V (10) 3 y 9 mo (6 mo-9 y 5 mo), SD = 1 y 11 mo
Reliability (intrarater, interrater) 3 N/S N/S N/S N/S
Lundkvist Josenby et al29 Responsiveness 41 41 SP 41 DI I (1), II (9), III (13), IV (17), V (1) 4 y 5 mo (2 y 6 mo-6 y 7 mo), SD = 1 y 1 mo
GMFM-88 Beckung et al31 Reliability (intrarater, interrater)
Construct validity
317 269 SP, 38 DT, 10 ATX 101 HEMI, 157 DI, 11 TET I (133), II (65), III (34), IV (49), V (36) (1-15 y)
Ko and Kim34 Reliability (intrarater, interrater)
Measurement error
84 N/S N/S I (14), II (9), III (22), IV (19), V (20) 3 y 9 mo (10 mo-9 y 9 mo), SD = 1 y 11 mo
Responsiveness 60 N/S N/S N/S N/S
GMFM-88-CVI Salavati et al33 Reliability (intrarater, interrater, test-rest)
Internal consistency
Measurement error
77 74 SP, 2 DT, 1 ATX 71 BIL, 6 UNI I (23), II (6), III (9), IV (19), V (20) 9 y 6 mo (4 y 2 mo-12 y), SD = 2 y 5 mo
K-GMFM-88/GMPM Ko and Kim36 Cross-cultural validity
Reliability (interrater)
Construct validity
39 N/S N/S I-II (12), III-V (27) 3 y 6 mo (2-7 y), SD = 1 y 3 mo
PEDI-H Elad et al32 Cross-cultural validity
Internal consistency
Reliability
Construct validity
73 11 SP, 10 ATT, 3 HYP, 1 ATX, 48 MX 22 QUA, 21 DI, 20 HEMI, 10 TRI I (6), II (26), III (15), IV (16), V (10) 8 y 10 mo (6-12 y), SD = 2 y 1 mo
WeeFIM Park et al37 Structural validity
Internal consistency
Reliability (interrater)
207 207 SP 57 QUA, 105 DI, 45 HEMI I (49), II (32), III (31), IV (19), V (76) 9 y 1 mo, SD = 2 y 9 mo
Abbreviations: ATX, ataxic; BIL, bilateral; CP, cerebral palsy; DI, diplegia; DT/ATT, dystonic/athetotic; GMFCS, Gross Motor Function Classification System; GMFM-66, Gross Motor Function Measure-66; GMFM-66-IS/-B&C, Gross Motor Function Measure-66-item set/basal & ceiling approach; GMFM-88, Gross Motor Function Measure-88; GMFM-88-CVI, Gross Motor Function Measure adapted for Cerebral Visual Impairment; GMPM, Gross Motor Performance Measure; HEMI, hemiplegia; HYP, hypotonic; MON, monoplegia; MX, mixed; N/S, not specified; PEDI, Pediatric Evaluation of Disability Inventory; QUA/TET, quadriplegia/tetraplegia; SD, standard deviation; TRI, triplegia; TRT, test-retest reliability; SP, spastic; UNI, unilateral; WeeFIM, Functional Independence Measure for Children.
aAges listed in years and months.
bGMFCS levels were not measured in one of the studies contributing to the validation sample; however, all children in that study ranged in function from level I to level III.24

The 4 assessments that met inclusion criteria were the Gross Motor Function Measure (GMFM) for the -6622 and -8823 items versions, as well as GMFM-66-Item Sets (GMFM-66-IS)24 and GMFM-66 Basal & Ceiling (GMFM-66B&C)25 modality, the Gross Motor Performance Measure (GMPM),26 the Pediatric Evaluation of Disability Inventory (PEDI),27 and the Functional Independence Measure for Children (WeeFIM).28

Many of the most commonly used motor or functional skills outcome measures were excluded for several reasons detailed in Supplemental Digital Content 2 (available at: https://links.lww.com/PPT/A275). For the PEDI and WeeFIM, we accepted those studies administered by clinical observation, excluding those that used semistructured interviews or parent reports. Other studies that used an isolated dimension of the evaluation were excluded because there was evidence that the reliability and validity of the separate dimension scores were not as strong as for the measure as a whole.29

Characteristics, Content, and Clinical Utility of Selected Measures

The characteristics and content of the included measures are summarized in Table 2. All instruments have an evaluative purpose, except GMFM, which can be evaluative (to measure the magnitude of longitudinal change), discriminative (between individuals on the GMFCS), or predictive (when motor curves are used), and PEDI and WeeFIM, which are both evaluative and discriminative. Both GMFM-88 and GMPM measure capacity rated through 5 dimensions related to the execution of motor skills (GMFM) or quality (GMPM). In contrast, PEDI and WeeFIM focus on activities relevant to daily function in both activity and participation domains, as well as on capacity or performance, depending on whether they are administered in a standardized or child's daily environment.

TABLE 2 - Characteristics and Content of Selected Measures
Measure Purpose Items Focus Criterion/Norm Dimensions/Domains Target Population Capacity/Performance CIF Domain
Age Rangea Diagnostic Group
GMFM23 E, PR, D 88/66 Changes in motor skills C
  1. Lying and rolling

  2. Sitting

  3. Crawling and kneeling

  4. Standing

  5. Walking, running, and jumping

5 mo-16 y CP Capacity A
GMPM26 E 20 Quality of movement of motor skills C
  1. Alignment

  2. Coordination

  3. Dissociated movement

  4. Stability

  5. Weight shift

0-12 y CP Capacity A
PEDI27 D, E 197
20
20
Level of functional skills
Amount of caregiver assistance
Environmental modification required
C, N
  1. Self-care

  2. Mobility

  3. Social function

6 mo-7.5 y (up to 20 y) Children with chronic illness or disabilities Capacity (FSS) Performance (LCA) A, P
WeeFIM28 D, E 18 Level of independence in ADL C
  1. Self-care

  2. Mobility

  3. Cognition

0-7 y (up to 18y) Developmental, genetic, or acquired disabilities Capacity
Performance
A, P
Abbreviations: A, activity; ADL, activities of daily living; C, criterion-referenced; CP, cerebral palsy; D, discriminative; E, evaluative; FSS, functional skills scale; GMFM, Gross Motor Function Measure; GMPM, Gross Motor Performance Measure; ICF, International Classification of Functional Disability and Health; LCA, level of caregiver assistance; N, norm-referenced; P, participation; PEDI, Pediatric Evaluation of Disability Inventory; PR, predictive; WeeFIM, Functional Independence Measure for Children.
aAges listed in years and months.

Details on the clinical utility are summarized in Table 3. The administration time depended on the number of items assessed, the skill of the assessor, or the child's level of cooperation and understanding. All instruments used an ordinal point scoring scale and the GMFM and PEDI have a software developed by a Rasch analysis to improve the interpretation of total and change scores.

TABLE 3 - Clinical Utility of Selected Measures
Measure Format of Administration Administration Time, min Examiner Qualifications Required Materials (Manual/Material/Space) Scoring Interpretability
GMFM23 Clinical observation 45-60 (less for GMFM-66)
28-24, IS/B&C
N/R, but familiarize with GMFM guidelines and score sheet recommended GMFM manual ($94.50), software, score sheet, common equipment in physiotherapy gym and stairs (with at least 5 steps) 88/66 items in 5 dimensions, rating an ordinal 4-point scale: 0 (the child cannot initiate the item) to 3 (the child can complete the item)
“NT” only for -66 version
Sum of dimension scores from ordinal score in percentage (0-100%) total score
Rasch analysis software (GMAE-2) to obtain interval-level score (GMFM-66)
GMPM26 Clinical observation N/E N/R, but familiarize with GMFM-GMPM guidelines and score sheet recommended GMFM manual, score sheet, common equipment in physiotherapy gym and stairs (with at least 5 steps) 20 items from GMFM assessed through 3 attributes, each rating an ordinal 5-point scale: 1 (severely abnormal) to 5 (consistently normal) The mean score for each attribute converted in a percentage (0-100%)
PEDI27 Observation by therapist, semistructured interview or report by parent/caregiver 30-60
15 (PEDI-CAT)
N/R PEDI manual ($125.95) and score forms ($45.60) FSS: 197 items rated as 1 (capable), 0 (not capable)
CAS: 20 items rated in 6 rank-ordered: 0 (totally dependent) to 5 (independent)
MS: rated on the level of modification required
FSS and CAS raw scores transformed into normative scores or scaled scores
Rasch analysis software (PEDI-CAT) ($89) calculate summary scores, develop individualized scoring profile and provided item maps
WeeFIM28 Observation by therapist, semistructured interview by parent/caregiver 20-30 Formal training required ($242) WeeFIM score form, facilities and materials to observe task performance 18 items, 3 domains, rating in a 7-level ordinal scale:
6-7 (independence)
1-5 (dependence)
Sum of subdomain scores from domain score
Total score (minimum = 18, maximum = 126)
Abbreviations: CAS, caregiver assistance scale; FSS, functional skill scale; GMAE-2, Gross Motor Ability Estimator; GMFM, Gross Motor Function Measure; GMPM, Gross Motor Performance Measure; PEDI, Pediatric Evaluation of Disability Inventory; PEDI-CAT, Pediatric Evaluation of Disability Inventory Computer Adaptive Test; MS, modification scale; NE, no specified; N/R, not required; WeeFIM, Functional Independence Measure for Children.

Methodological Quality of the Studies

The results of the quality assessment of the psychometric properties rated by the COSMIN Risk of Bias Checklist are in Table 4. Of the 12 articles selected, 7 validity properties (structural and construct validity), 18 reliability properties (internal consistency, inter/intrarater, test-retest, and measurement error), and 6 responsiveness properties were assessed.

TABLE 4 - Psychometric Properties of the Outcome Measures
Measure Study Psychometric Property Evaluated Method/Resultsa Quality Rating
Criteria for GMP COSMIN Risk of Bias
GMFM-66 Russell et al30 Construct validity
Hypothesis testing

Responsiveness
Mean differences in ICC = 0.0013 (95% CI = −0.082 to 0.0109)
3-way ANOVA to assess the interaction between change in time × severity × age (F = 116.3; df = 1222; P < .001)

GMFM-66 is more sensitive to change with children <5 y of age than over 5 y
+

+
D

D
GMFM-66/-IS Russell et al24 Construct validity
Agreement between scores

Hypothesis testing
Regression intercept (β0 = −0.54, 95% CI = −1.45 to 0.37), the estimated slope (β 1 = 1.01, 95%CI = 0.99 to 1.02)
ICC(A,1) = 0.994 (95% CI = 0.993 to 0.996) for absolute agreement and 2-way mixed-effects between GMFM-66-IS and GMFM-66 scores and not related with age (r = 0.015)
1-way ANOVA to test scores in sets of items. 2-way ANOVA to test item sets and full score
+ VG
Responsiveness ICC(A,1) = 0.92 (95% CI = 0.89 to 0.95), between change in GMFM-66-IS and GMFM-66 scores over 12 mo + D
GMFM-66-IS/-B&C Brunton and Bartlett25 Construct validity (convergent validity)
Reliability (test-retest)

Comparability

Measurement error

Minimal detectable change
ICC(2,1) = 0.994 (95% CI = 0.981 to 0.997) GMFM-66-IS vs GMFM.66
ICC(2,1) = 0.987 (95% CI = 0.972 to 0.994) GMFM-66-B&C vs GMFM.66
ICC(2,1) = 0.986 (95% CI = 0.969 to 0.994) for GMFM-66-IS
ICC(2,1) = 0.994 (95% CI = 0.987 to 0.997) for GMFM-66-B&C
ICC(2,1) = 0.0.984 (95% CI = 0.965 to 0.993) GMFM-66-IS vs GMFM.66-B&C for first session
ICC(2,1) = 0.970 (95% CI = 0.932 to 0.986) GMFM-66-IS vs GMFM.66-B&C for second session
SEM = 1.91 for GMFM-66-IS
SEM = 1.31 for GMFM-66-B&C
MDC = 5.29.91 for GMFM-66-IS
MDC = 3.63 for GMFM-66-B&C
+

+

?
D

A

A
Interpretability No differences in time to completion in 2 abbreviated versions (F = 0.26; df = 1; P = .61)
Avery et al22 Construct validity (convergent validity) ICC(2,1) = 0.994 (95% CI = 0.993 to 0.996) GMFM-66-IS vs GMFM.66. Similar ICCs 1 y later
ICC(2,1) = 0.998 (95% CI = 0.997 to 0.998) GMFM-66-B&C vs GMFM.66. Similar ICCs 1 y later
+ A
Responsiveness ICC(2,1) = 0.942 (95% CI = 0.903 to 0.966) for GMFM-66-IS in children with BI CP over 1 y
ICC(2,1) = 0.925 (95% CI = 0.874 to 0.956) for GMFM-66-B&C in children with BI CP over 1 y
ICC(2,1) = 0.889 (95% CI = 0.816 to 0.934) for GMFM-66-IS in children with UNI CP over 1 y
ICC(2,1) = 0.584 (95% CI = 0.377 to 0.735) for GMFM-66-B&C in children with UNI CP over 1 y
Significant differences in agreement between -IS and B&C in children with UNI CP. B&C is less sensible to change. Differences in number of items (-IS 29.8 average number items vs -B&C 21.6 average number items; t-test=14.1; P < .001
+ A
GMFM-88/-66 Wang and Yang35 Reliability (intrarater, interrater)

Responsiveness
Intrarater 17 therapist ICC(3,1) 17 ranged from 0.88 to 0.90 (GMFM-88)
Interrater 17 therapist ICC(2,1) reference 1 therapist more experienced ratings from 0.81 to 0.90
Testing with ROC responsiveness of GMFM-88 and -66 vs therapist perception (2 cutoff) over 3.5 mo
+

+
D

D
AUC GMFM-88 ranged from 0.826 to 0.758. Sensibility = 0.800 (95% CI = 0.643 to 0.957) and 0.852 (95% CI = 0.757 to 0947). Specificity = 0.725 (95% CI = 0.587 to 0.863) and 0.636 (95% CI = 0.352 to 0921)
AUC GMFM-66 ranged from 0.896 to 0.891. Sensibility = 0.702 (95% CI = 0.544 to 0.896) and 0.722 (95% CI = 0.803 to 0.842). Specificity = 0.925 (95% CI = 0.843 to 1.00) and 0.818 (95% CI = 0.590 to 1.00). Better performance for GMFM-66
Lundkvist Josenby et al29 Responsiveness ES and SRM values higher and showed early large changes for GMFM-88 total and goal total scores than GMFM-66, 12-18 mo after selective dorsal rhizotomy
3 y and 5 y after SDR all 3 GMFM scoring option showed large changes
+ D
GMFM-88 Beckung et al31 Reliability (intrarater, interrater)
Construct validity
Interrater ρ-Spearman coefficient = 0.91. Interrater ρ-Spearman coefficient = 0.99
Reached of the maximum GMFM score (GMFCS level): I 90% at 7 y, II 90% at 5 y, III 80% at 7 y, IV 30% 5 y
In GMFCS V median score was 20%. The CP subtype alone was no sufficient to predict gross motor development
?

?
I

D
Ko and Kim34 Reliability (intrarater, interrater) Intrarater for 2 therapists (experienced and novel) ICC(1,1) ranged 0.988 to 1.0 for different subgroups (age, GMFCS)
Interrater 10 therapists ICC(2,1) ranged 0.952 to 0.997 for different subgroups (age, GMFCS)
+ A
Measurement error (intrarater, interrater) Intrarater SEM ranged 0.0 to 0.76; SRD ranged 1.19 to 2.11 (subgroups age and GMFCS)
Interrater SEM ranged 1.31 to 5.19; SRD range 2.57 to 10.2 (subgroups age and GMFCS)
? D
Responsiveness Baseline: Total MC = 4.5 (SD = 3.2); ES = 0.3; SRM = 1.4
3 mo: Total MC = 8.3 (SD = 6.3); ES = 0.5; SRM = 1.3
6 mo: Total MC = 10.1 (SD = 6.8); ES = 0.6; SRM = 1.5
MC and ES increase gradually in dimension goal scale and in total scale in the follow-up assessments
? I
GMFM-88-CVI Salavati et al33 Internal consistency Cronbach α = 0.97-1.00 inside 5 dimensions (A-E) + VG
Reliability (intrarater, interrater, test-rest) ICC(2,1) absolute agreement intrarater (therapist familiar with children) = 1.00 (95% CI = 0.99 to 1.00)
ICC(2,1) absolute agreement intrarater (therapist not familiar with children) = 0.99 (95% CI = 0.99 to 1.00)
ICC(2,1) absolute agreement interrater in test = 1.00 (95% CI = 1.0 to 1.0) and retest = 1.0 (95% CI = 1.0 to 1.0)
+ I
Measurement error (intrarater, interrater, test-retest) LoA (Bland-Altman) intrarater (familiar with children) = −0.77 (±5.91)
LoA (Bland-Altman) intrarater (not familiar with children) = −0.68 (±6.63)
LoA (Bland-Altman) interrater in test = 0.23 (±1.82) and retest = 0.32 (±2.43)
? I
K-GMFM/GMPM Ko and Kim36 Cross-cultural validity Translation forward-backward-forward method ? I
Reliability (interrater) Interrater ICC(3,1) = 0.995 (95% CI = 0.991-0.998) and ranged from 0.978 to 0.992 in 5 dimensions (K-GMFM) + A
Interrater ICC(3,1) = 0.929 (95% CI = 0.864-0.963) and ranged from 0.863 to 0.923 in 5 dimensions (GMPM) + A
Construct validity Correlation between 5 dimensions and total score Spearman's ρ ranged 0.916 to 0.997 (P ≤ .01) ? VG
Construct validity (convergent validity) Correlation between domains K-GMFM vs GMPM. The correlations were positive in all pairs, Spearman's ρ ranged from 0.762 to 0.884 (all p ≤ 0.01)
PEDI-H Elad et al32 Cross-cultural validity Translation to Hebrew: (1) Four translators forward translation English PEDI; (2) Consensus 4 translators and head researcher; (3) Backward translation by a fifth translator (native English speaker therapist) (4) Metric units were converted into metric system; (5) No items were added or removed ? I
Construct validity (discriminative validity) One-way ANOVA to test PEDI-H scores with GMFCS (mild, moderate, severe CP). Significant differences between various severity groups in all domains + D
ROC (mild-moderate vs severe impairment)
AUCs of Mobility and Self-care domains were moderate-high ranged 0.892 to 0.967
Sensibility ranged 0.92 to 0.96 (HCP) and 0.77 to 0.97 (parents); specificity ranged 0.81 to 0.92
AUCs of Social Function in Functional Skills and Caregiver assistance subscales were low 0.686 and 0.676
Sensibility ranged 0.50 to 0.81 (HCP); specificity ranged 0.53 to 0.71
Internal consistency Internal consistency of subscales: Skill Function and Caregiver assistance with 3 domains each one (Self-care, Mobility, Social function)
Function Skills Cronbach α ranged 0.940 to 0.967; Caregiver Assistance Cronbach α ranged 0.962 to 0.970
+ VG
Reliability (intrarater) ICC(2,1) absolute agreement for HCP ranged 0.940 to 0.967
ICC(2,1) absolute agreement ≥0.94 in both younger and older children, all severity (GMFCS I-V) and distribution groups, except in right hemiparesis
+ D
WeeFIM Park et al37 Structural validity Exploratory factor analysis with principal component analysis with oblique rotation. Factor selection with eigenvalues >1 and factor loading >0.30
Confirmatory factor analysis for 3 factors: χ2 = 514.45; df = 123; Normed Fit Index = 0.92 (good); Tucker-Lewis Index = 0.92 (good), Comparative Fit Index = 0.94 (good), RMSEA = 0.12 (mediocre); Akaike Information Criterion = 646.49 (lower than for 1 or 2 factors)
3-factor WeeFIM is more acceptable than the original 2-factor WeeFIM (explaining 87% of the variance).
Self-care, motor, and cognitive domains should be treated as separated scales
+ VG
Internal consistency Cronbach α = 0.93 in motor subscale and 0.98 in cognitive subscale
Global Cronbach α = 0.98 (95% CI = 0.97 to 0.98)
+ VG
Reliability (interrater) ICC 0.98 motor subscale and 0.93 cognitive subscale. + I
Abbreviations: +, sufficient rating; ?, indeterminate rating; −, insufficient rating; A, adequate; ANOVA, analysis of variance; AR, absolute reliability; AUC, area under the curve; BI, bilateral cerebral palsy; CI, confidence interval; CP, cerebral palsy; criteria for GMP, criteria for good measurement properties; D, doubtful; EFA, exploratory factor analysis; ES, effect size; GMFCS, Gross Motor Function Classification System; GMFM-88/66, Gross Motor Function Measure 88/66; GMFM-66-IS, Gross Motor Function Measure Item Set; GMFM-66-B&C, Gross Motor Function Measure basal & ceiling approaches; GMFM-88-CVI, Gross Motor Function Measure adapted for Cerebral Visual Impairment; GMPM, Gross Motor Performance Measure; HCP, Health Care Professionals; I, inadequate; ICC, intraclass coefficient correlation; LoA, limits of agreement Bland-Altman method; MC, mean change; MDC, minimal detectable change; PEDI-H, Pediatric Evaluation of Disability Inventory in Hebrew; ROC, receiver operating characteristic; RMSEA, Root Mean Square Error of Approximation; SD, standard deviation; SEM, standard error of measurement; SRD, smallest real difference; SRM, standardized response mean; UNI, unilateral cerebral palsy; VG, very good; WeeFIM, Functional Independence Measure for Children.
aICC nomenclature: There are 2 main ways of classifying the ICC. McGraw and Wong40 specifies whether the ICC was calculated by absolute (A) or relative (C) agreement and a number that indicates the number of repetitions. From another point of view, Shrout and Fleiss41 detail the ANOVA model followed by 1 to 3 and a second number with the number of repetitions. From a methodological point of view, it is more important to detail whether it is an ICC of absolute or relative agreement than the ANOVA model because the consideration of the systematic error is different. More details can be consulted in Weir.42

The quality of the validity properties was rated as “very good” (n = 2), “adequate” (n = 1), and “doubtful” (n = 4). The first and second authors agreed on the rating of all studies. Erroneous statistical methods or the lack of information about study design were the reasons for low scores.22,25,30–32

Reliability properties were rated as “very good” (n = 3), “adequate” (n = 5), “doubtful” (n = 4), and “inadequate” (n = 6). There was an agreement between the first and second authors in all of them except on 2 occasions, where the third author made the final decision.32,33 The main reasons for low scores were the lack of evidence about the participants' stability between administrations25,31,32 and if the test conditions were similar in both measurements (eg, environment, instructions).25,31–33

Responsiveness properties were rated as “adequate” (n = 1), “doubtful,” (n = 3) and “inadequate” (n = 1). The first and second authors agreed in rating (n = 6); consequently, the third author made the final decision for one study.34 Reasons for low scores were important flaws in study design.22,24,29,30,34,35

Two studies were on cross-cultural validity of an instrument (Korean version of the GMFM-88, Hebrew version of the PEDI). Both were rated as “inadequate” due to the absence of information or errors about aspects such as not using similar samples to compare relevant characteristics, expertise of the translators, whether the translation was reviewed by a committee as well as details concerning the approach used to analyze the data (confirmatory factor analysis or regression analyses).32,36

One study adapted the GMFM-88 for children with CP and Cerebral Visual Impairment (CVI) using a Delphi method.33 Reliability (test-retest and interrater) and internal consistency were assessed (Table 4).

Results of the Studies

Following the updated criteria for good measurement properties, the scores of the results of the studies are in Table 4. The studies for validity (n = 7) were rated as “sufficient” (n = 5) and “indeterminate” (n = 2). The results of the studies for reliability (n = 19) were rated as “sufficient” (n = 14) and “indeterminate” (n = 5). For responsiveness (n = 6), the results were rated as “sufficient” (n = 5) and “indeterminate” (n = 1). The update of the criteria for evaluating the results on internal consistency supports values of more than 0.95 of Cronbach α classified as “positive.” In these cases, the third author made the final decision.

Data Analysis and Best Evidence Synthesis

Gross Motor Function Measure

The GMFM versions (GMFM-88, GMFM-66, GMFM-IS, GMFM-B&C, K-GMFM-88, and GMFM-CVI) were the most investigated and were strongest in terms of evidence regarding psychometric properties.

The studies report strong evidence for construct validity24,30 and moderate evidence for responsiveness in the -66 version.24,29,30,35

Strong evidence was found for construct validity of the GMFM-66-Item Sets (GMFM-66-IS) version.22,24,25 However, the GMFM-66-Basal & ceiling approach (GMFM-66-B&C) provided moderate evidence for construct validity.22,25 Both GMFM-IS and GMFM-66-B&C reported moderate evidence for responsiveness22,24 and limited evidence in the case of test-retest reliability and measurement error due to the reduced sample size.25

For the -88 version, validity properties have not yet been studied in children with CP classified across all GMFCS levels. By contrast, other studies reported moderate evidence for internal consistency33 as well as limited evidence both for construct validity31,36 and measurement error33,34 due to the limited sample size and the lack of information about the design or statistical methods of the study, respectively. Conflicting evidence was found for interrater, intrarater reliability and responsiveness due to conflicting findings in multiple studies.29,31,33–36 Furthermore, there was unknown evidence for test-retest reliability due to important flaws related to the lack of information about the level of experience of the assessors and if administrations were independent.33

Gross Motor Performance Measure

With regards to GMPM, only construct validity and inter-rater reliability have been studied in children with CP across all GMFCS levels, with limited evidence for both as a consequence of limited sample sizes.36

Pediatric Evaluation of Disability Inventory

For the Hebrew version of PEDI,32 there was only 1 study on the reliability, internal consistency, and construct validity. Limitations in sample size meant we found moderate evidence for internal consistency as well as limited evidence for reliability and construct validity.

Functional Independence Measure for Children

There was no evidence for other psychometric properties other than structural validity and internal consistency. For both, strong evidence was reported by 1 study.37

DISCUSSION

In this systematic review, 4 assessments met the inclusion criteria. The results of the quality of their psychometric properties rated by the COSMIN Risk of Bias checklist revealed a wide range of rating in validity, reliability, and responsiveness. Although some results of the updated criteria for good measurement properties for validity, reliability, and responsiveness were rated as “sufficient,” most were rated as “adequate,” “doubtful,” or “indeterminate” due to the lack of information or inappropriate statistical methods.

The results demonstrate that, in contrast to other reviews,6,9–11 this systematic review performed a critical appraisal of the quality of the psychometric properties of the outcome measures. This work provided a summary of the characteristics of the measures, listing the target group, purpose, type, or the psychometric properties studied.

If we consider CP is a neurodevelopmental disorder characterized by a clinical heterogeneity both in type and in distribution, it is difficult to generalize the results obtained in the clinimetric studies to the population with CP if the sample is not represented by all GMFCS levels. The 5 GMFCS levels include participants with different types of CP in various degrees of severity, who might have different profiles of motor function.7,33

We selected those articles that considered all GMFCS levels in their sample to address the heterogeneity of this population. There has been no previous systematic review that considered this aspect. Some measurement properties, such as the reliability, results, and statistical methods, depend on the variation in scores in the study population. The value of the intraclass coefficient correlation (ICC) is usually higher in a heterogeneous population.38 These aspects are relevant and determine whether the results can be generalized.

The GMPM, PEDI, and WeeFIM results depend on a limited number of studies regarding the psychometric properties of validity, reliability, and responsiveness in children and adolescents with CP across all GMFCS levels (Table 4). The 3 instruments provided some evidence about validity and reliability (interrater and internal consistency mainly), but the major drawback was responsiveness, which was not addressed in any of the studies. This is a serious flaw because an instrument used in an evaluative application should be responsive. Together with reliability and measurement error, responsiveness is the most critical measurement property for an evaluative measure.8,19 The studies on the psychometric properties of the GMFM in all versions demonstrated that this measure considers the importance of including a representative sample of CP and report on the results related to all psychometric properties reviewed in this article. Responsiveness was analyzed in 6 studies.

The main aspect that may determine the clinical utility of a measure is the time spent in administration. The use of statistical methods, such as Rasch analysis (GMFM-66, Pediatric Evaluation of Disability Inventory Computer Adaptive Test [PEDI-CAT]) or an algorithmic approach (GMFM-66-IS, GMFM-66-B&C), provides shorter forms that reduce the time necessary to administer, thus reducing the number of items. The training process and the costs (manual, software, and courses), as well as required space and materials, may determine the choice of the measure.

The improvement and changes in some of the selected measures may determine the lack of evidence for the initial versions. This is true of the PEDI-CAT and the Quality Function Measure (a revision of the GMPM), which were not included in this review, because they did not use samples in the studies that included children classified across all GMFCS levels. Therefore, in its clinical and research application, it should be taken into account that its psychometric properties have only been studied in part of the population of children with CP.

The second aim of this review was to evaluate both the quality and the results of the studies of the measurement properties. Three studies of validity were rated “very good.” The remainder of the studies were rated “adequate” or “doubtful,” and 4 studies were rated “inadequate.” Errors in statistical methods for validity, reliability, and responsiveness studies of the GMFM-88/-66 such as P values for testing hypotheses,30 Spearman's rank coefficient for inter/intrarater reliability,31 and flaws in the design of the studies (eg, lack of information as to the ICC calculation as absolute or relative agreement, if test conditions were similar or total number of assessments and time interval between them not appropriate)24,25,31,34,35 were the contributors to low ratings. Responsiveness studies of the GMFM were the base of evidence about this property; therefore, some errors or inconsistencies could be reasonable since to date no results were reported on this psychometric property.

The most investigated measure reporting results on the different psychometric properties was the GMFM in all versions. In terms of quality of evidence, strong construct validity was found in several studies.22,24,25,30 However, the results of reliability and responsiveness properties were significantly heterogeneous. Evidence should be improved, especially for reliability. The lack of information on study design and small samples reduced ratings. There was strong evidence for construct validity and internal consistency for the WeeFIM, but the results came from only 1 study.37 Moderate evidence was reported for the PEDI from 1 study that assessed reliability, internal consistency, and construct validity.32 The evidence for the GMPM was limited for interrater and construct validity.36

Limitations and Recommendations

There are several limitations to this review. The absence of evidence and information regarding some psychometric properties (eg, the independence of administrations, time interval, participants' stability in the interim period, and the test conditions in the measurements for intrarater and test-retest reliability) make it difficult to determine and judge the quality of the studies.

Although the COSMIN Risk of Bias checklist is the most suitable and commonly used standardized method to assess the quality of the measurement properties and to identify strengths and weakness in study designs, we observed that floor effects were frequent. Studies rated as “very good” initially were rated later with a lower score, even as “inadequate,” for only 1 item, due to the method used to score them (“the lower score counts”). Additionally, we found difficulties in scoring reliability aspects because COSMIN uses the same items to assess the properties of interrater, intrarater, and test-retest reliability.

Regarding the application of the updated criteria for good measurement properties, it has been observed that concerning the previous version published by Terwee et al some changes made have generated certain limitations when evaluating the results. The latest version does not specify that values more than 0.95 of the Cronbach α for internal consistency should be considered as “negative” since they indicate item redundancy.

The lack of consensus in taxonomy and statistical score standards complicated the interpretation of information from some articles because the terms and definitions used to describe measurement properties differed from one another. To address this, we reviewed background papers and used the COSMIN terminology and taxonomy.16 Subjective judgment in the assessment of quality was minimized through the independent reviews by the first and second authors and subsequent consensus with a third person as necessary.

Small sample sizes were a limitation in several studies,25,29,35,36 although the high estimates of construct validity and reliability, and the narrow 95% confidence intervals, suggested that the sample sizes were adequate.25 Ko and Kim36 suggested that their results should not be generalized to all children with CP due to the limitations in sample and age range (10 months to 9 years 9 months). Wang and Yang35 suggested that this aspect may be related to their wide confidence intervals for specificity and sensitivity in the receiver operating characteristic curves to assess responsiveness. Lundkvist Josenby et al29 found longitudinal construct validity for children in GMFCS I-V levels using GMFM-88 in a long-term follow-up study but affirmed that a larger sample might yield more severity-dependent differences in responsiveness results. In this type of study, the authors must consider that the sample size depends on the psychometric property assessed and the chosen method. Factor analyses and IRT require a large sample size (n = 100-500), while for CTT, a smaller sample size is adequate (n = 50-100).39

Recommendations for further research include the importance of considering and reviewing the construct and matching the sample to the characteristics of the population. In such studies, uniform criteria, such as the use of a sample size including children across all GMFCS levels, may improve the quality of the studies and facilitate generalization of the results to the population with CP. In this way, “capacity” and “performance” measures may be combined to obtain global information about significant changes in both motor and functional skills of children or adolescents with CP. This would be useful in the clinic and for ensuring satisfactory evolution. Moreover, in order to improve the degree of evidence of psychometric properties, authors should provide a complete description of the design and the method used in their study.40

CONCLUSIONS

Four measures to assess motor or functional skills in children with CP were identified in this review. GMFM in all its versions was the most widely investigated providing the best results, with the strongest evidence for validity and responsiveness properties in studies with a sample of children and adolescents with CP across all GMFCS levels. However, reliability evidence should be improved to determine stability. Although other measures, such as the GMPM, PEDI, and WeeFIM, have reported interesting results, further studies, especially of responsiveness, are needed to provide evidence in a heterogeneous sample.

REFERENCES

1. Stavsky M, Mor O, Mastrolia SA, Greenbaum S, Than NG, Erez O. Cerebral palsy-trends in epidemiology and recent development in prenatal mechanisms of disease, treatment, and prevention. Front Pediatr. 2017;5:21.
2. Rosenbaum P, Paneth N, Leviton A, et al. A report: the definition and classification of cerebral palsy April 2006. Dev Med Child Neurol Suppl. 2007;109:8–14.
3. World Health Organization. International Classification of Functioning, Disability and Health: Children & Youth Version. https://apps.who.int/iris/bitstream/10665/43737/1/9789241547321_eng.pdf?ua=1. Accessed January 28, 2018.
4. Ostensjø S, Carlberg EB, Vøllestad NK. Everyday functioning in young children with cerebral palsy: functional skills, caregiver assistance, and modifications of the environment. Dev Med Child Neurol. 2003;45(9):603–612.
5. Holsbeeke L, Ketelaar M, Schoemaker MM, Gorter JW. Capacity, capability, and performance: different constructs or three of a kind? Arch Phys Med Rehabil. 2009;90(5):849–855.
6. Ketelaar M, Vermeer A, Helders PJ. Functional motor abilities of children with cerebral palsy: a systematic literature review of assessment measures. Clin Rehabil. 1998;12(5):369–380.
7. Palisano R, Rosenbaum P, Walter S, Russell D, Wood E, Galuppi B. Development and reliability of a system to classify gross motor function in children with cerebral palsy. Dev Med Child Neurol. 1997;39(4):214–223.
8. Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis. 1985;38(1):27–36.
9. James S, Ziviani J, Boyd R. A systematic review of activities of daily living measures for children and adolescents with cerebral palsy. Dev Med Child Neurol. 2014;56(3):233–244.
10. Harvey A, Robin J, Morris ME, Graham HK, Baker R. A systematic review of measures of activity limitation for children with cerebral palsy. Dev Med Child Neurol. 2008;50(3):190–198.
11. Debuse D, Brace H. Outcome measures of activity for children with cerebral palsy: a systematic review. Pediatr Phys Ther. 2011;23(3):221–231.
12. Moher D, Liberati A, Tetzlaff J, Altman DG, Group TP. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097.
13. Terwee CB, Prinsen CAC, Chiarotto A, et al. COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study. Qual Life Res. 2018;27(5):1159–1170.
14. CanChild Centre for Disability Research. Outcome Measures Rating Form. https://www.canchild.ca/system/tenon/assets/attachments/000/000/372/original/measrate.pdf. Published 2004. Accessed December 19, 2018.
15. Mokkink LB, de Vet HCW, Prinsen CAC, et al. COSMIN Risk of Bias checklist for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1171–1179.
16. Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737–745.
17. Prinsen CAC, Mokkink LB, Bouter LM, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1147–1157.
18. van Tulder M, Furlan A, Bombardier C, Bouter L, Editorial Board of the Cochrane Collaboration Back Review Group. Updated method guidelines for systematic reviews in the Cochrane collaboration back review group. Spine. 2003;28(12):1290–1299. doi:10.1097/01.BRS.0000065484.95996.AF.
19. Ammann-Reiffer C, Bastiaenen CH, de Bie RA, van Hedel HJ. Measurement properties of gait-related outcomes in youth with neuromuscular diagnoses: a systematic review. Phys Ther. 2014;94(8):1067–1082.
20. Benfer KA, Weir KA, Boyd RN. Clinimetrics of measures of oropharyngeal dysphagia for preschool children with cerebral palsy and neurodevelopmental disabilities: a systematic review. Dev Med Child Neurol. 2012;54(9):784–795.
21. Gerber CN, Labruyère R, van Hedel HJA. Reliability and responsiveness of upper limb motor assessments for children with central neuromotor disorders: a systematic review. Neurorehabil Neural Repair. 2016;30(1):19–39.
22. Avery LM, Russell DJ, Rosenbaum PL. Criterion validity of the GMFM-66 item set and the GMFM-66 basal and ceiling approaches for estimating GMFM-66 scores. Dev Med Child Neurol. 2013;55(6):534–538.
23. Russell DJ, Rosenbaum PL, Cadman DT, Gowland C, Hardy S, Jarvis S. The gross motor function measure: a means to evaluate the effects of physical therapy. Dev Med Child Neurol. 1989;31(3):341–352.
24. Russell DJ, Avery LM, Walter SD, et al. Development and validation of item sets to improve efficiency of administration of the 66-item Gross Motor Function Measure in children with cerebral palsy. Dev Med Child Neurol. 2010;52(2):e48–e54.
25. Brunton LK, Bartlett DJ. Validity and reliability of two abbreviated versions of the Gross Motor Function Measure. Phys Ther. 2011;91(4):577–588.
26. Boyce WF, Gowland C, Hardy S, et al. Development of a quality-of-movement measure for children with cerebral palsy. Phys Ther. 1991;71(11):820–828; discussion 828-832.
27. Haley SM. Pediatric Evaluation of Disability Inventory (PEDI): Development, Standardization and Administration Manual. Boston, MA: New England Medical Center; 1992.
28. Msall ME, DiGaudio K, Rogers BT, et al. The Functional Independence Measure for Children (WeeFIM): conceptual basis and pilot use in children with developmental disabilities. Clin Pediatr (Phila). 1994;33(7):421–430.
29. Lundkvist Josenby A, Jarnlo GB, Gummesson C, Nordmark E. Longitudinal construct validity of the GMFM-88 total score and goal total score and the GMFM-66 score in a 5-year follow-up study. Phys Ther. 2009;89(4):342–350.
30. Russell DJ, Avery LM, Rosenbaum PL, Raina PS, Walter SD, Palisano RJ. Improved scaling of the gross motor function measure for children with cerebral palsy: evidence of reliability and validity. Phys Ther. 2000;80(9):873–885.
31. Beckung E, Carlsson G, Carlsdotter S, Uvebrant P. The natural history of gross motor development in children with cerebral palsy aged 1 to 15 years. Dev Med Child Neurol. 2007;49(10):751–756.
32. Elad D, Barak S, Eisenstein E, Bar O, Herzberg O, Brezner A. Reliability and validity of Hebrew Pediatric Evaluation of Disability Inventory (PEDI) in children with cerebral palsy: health care professionals vs. mothers. J Pediatr Rehabil Med. 2012;5(2):107–115.
33. Salavati M, Krijnen WP, Rameckers EAA, et al. Reliability of the modified Gross Motor Function Measure-88 (GMFM-88) for children with both Spastic Cerebral Palsy and Cerebral Visual Impairment: a preliminary study. Res Dev Disabil. 2015;45-46:32–48.
34. Ko J, Kim M. Reliability and responsiveness of the gross motor function measure-88 in children with cerebral palsy. Phys Ther. 2013;93(3):393–400.
35. Wang H-Y, Yang YH. Evaluating the responsiveness of 2 versions of the gross motor function measure for children with cerebral palsy. Arch Phys Med Rehabil. 2006;87(1):51–56.
36. Ko J, Kim M. Inter-rater reliability of the K-GMFM-88 and the GMPM for children with cerebral palsy. Ann Rehabil Med. 2012;36(2):233–239.
37. Park EY, Kim WH, Choi YI. Factor analysis of the WeeFIM in children with spastic cerebral palsy. Disabil Rehabil. 2013;35(17):1466–1471.
38. Terwee CB, Bot SDM, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42.
39. Terwee CB, Mokkink LB, Knol DL, Ostelo RWJG, Bouter LM, de Vet HCW. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21(4):651–657.
40. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1996;1(1):30–46.
41. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–428.
42. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005;19(1):231–240.
Keywords:

cerebral palsy; functional skills; motor skills; outcome measure; psychometric properties

Supplemental Digital Content

© 2019 Academy of Pediatric Physical Therapy of the American Physical Therapy Association