Reliability and Validity of the Gross Motor Function Classification System for Cerebral Palsy

Bodkin, Amy Winter MS, PT, PCS; Robinson, Cordelia PhD; Perales, Frida P. MA

Pediatric Physical Therapy:
Research Report

Purpose: The purposes of this study were to evaluate interrater reliability using videotapes and criterion-related and construct validity of the Gross Motor Function Classification System (GMFCS), aspects of reliability and validity not previously published.

Methods: Two experienced pediatric physical therapists rated 30 videotapes of children with cerebral palsy (CP) or Down syndrome (DS) to test interrater reliability. Criterion-related validity was evaluated by comparing GMFCS levels with tests of motor and nonmotor development. Construct validity was assessed by comparing GMFCS trends over time in children with CP and DS.

Results: Interrater reliability was 0.84. Correlation was higher between GMFCS level and tests of motor development than GMFCS level and tests of nonmotor development. The GMFCS level remained relatively stable in children with CP but tended to improve in children with DS.

Conclusions: This study extends reliability and validity of the GMFCS, supporting its use in clinical practice and research.

In Brief

This study of the reliability and validity of the GMFCS supports its use in clinical practice and research.

Author Information

Center for Gait & Movement Analysis and Physical Therapy Program (A.W.B.), JFK Partners (A.W.B., C.R.), University of Colorado Health Sciences Center, Denver, CO; Mandel School of Applied Social Sciences, Case Western Reserve University (F.P.P.), Cleveland, OH

Address correspondence to: Amy Winter Bodkin, MS, PT, PCS, Rehabilitation Medicine, University of Colorado Health Sciences Center, 4200 E. Ninth Avenue, Box A036/B476, Denver, CO 80262. Email:

Article Outline
Back to Top | Article Outline


Interest in developing measures to evaluate outcomes of intervention for children with cerebral palsy has grown over the years. Changes in measures of motor functioning have commonly been used to assess intervention outcomes. However, severity of the condition and rates of motor development vary considerably, making it difficult to compare groups of children with cerebral palsy. The Gross Motor Function Classification System (GMFCS) was developed to classify severity of functional limitation/disability in children with cerebral palsy. 1 The GMFCS is a five-level scale that rates a child’s gross motor function with an emphasis on movement initiation, sitting control, and walking. Level I represents the highest gross motor function, whereas level V represents the lowest (Appendix). The GMFCS appears promising as a risk-adjustment instrument and is currently being used in studies of children who have cerebral palsy. 2–4 Although the findings of initial investigations of its validity and reliability are positive, 1,5–9 further evaluation is warranted.

Several previous studies have demonstrated the reliability of the GMFCS. Palisano et al 1 reported moderate interrater reliability in children with cerebral palsy who were younger than two years old (κ = 0.55) and excellent reliability in children from two to 12 years old (κ = 0.75) whose motor performance was rated by two physical and occupational therapists who were familiar with the child. Interrater reliability of assigning a GMFCS level to children from ages one to 12 years using medical records was also high (G = 0.93). 5 Wood and Rosenbaum 5 demonstrated that GMFCS levels were relatively stable across time in children whose motor performance was rated during each of four age ranges: age one year to the second birthday, age two years to the fourth birthday, age four years to the sixth birthday, and age six years to the twelfth birthday. They also demonstrated that GMFCS level at one to two years of age predicted children who were household or community ambulators at age 12 years with 74% accuracy and children who were not community ambulators at age 12 years with 90% accuracy. Reliability of GMFCS ratings using videotapes has, to our knowledge, not been published.

Establishing validity of a new instrument is challenging, and multiple aspects of validity should be evaluated, including content and criterion-related and construct validity. Content validity refers to whether an instrument makes sense and whether it includes most aspects of the trait being measured. Content validity of the GMFCS was established during its development using a nominal group process and Delphi method. 1

Criterion-related validity is defined as agreement between a new instrument and widely accepted or validated tools that measure similar traits. Concurrent validity, one type of criterion-related validity, is defined as agreement between new and accepted instruments administered in the same time period, whereas predictive validity, another type of criterion-related validity, is defined as the new test’s capacity to be used to predict outcome. GMFCS level has been reported to be strongly correlated with the handicap code of the International Classification of Impairment, Disabilities, and Handicap (ICIDH) developed by the World Health Organization in 1980 (r = 0.95, p < 0.0001). 6,7 Correlation between the GMFCS and the ICIDH demonstrated concurrent validity between two scales designed to rate disability. Because the GMFCS is purported to be a reflection of gross motor function, concurrent validity can also be evaluated by comparing the GMFCS level with established tests of gross motor skills. Strong correlation between the GMFCS level and one test of motor function, the Gross Motor Function Measure, 8 has been established (r = 0.91, p < 0.0001). 9 Correlation between the GMFCS and other accepted tests of motor skills has not been documented. In addition, the GMFCS has not been compared with tests of nonmotor function to confirm that GMFCS levels are truly a reflection of gross motor function rather than global development.

Construct validity refers to how well a measure conforms to theoretical constructs concerning the entity under study. One method of evaluating construct validity is to test several samples of subjects that should have different results to determine whether the scale indeed provides different results for the different samples. The GMFCS was developed to classify gross motor functioning in children with cerebral palsy and was designed to stay relatively constant across time to assist with prognosticating gross motor function. 1,5 Several studies support the stability of GMFCS level over time in children with cerebral palsy. 1,5 However, because the GMFCS was developed specifically to classify the severity of cerebral palsy, its use has not been reported for children with diagnoses in whom GMFCS level would be expected to change over time. For example, children with Down syndrome are expected to walk independently in the home and community and attain gross motor skills such as climbing stairs, running, and jumping. Therefore, even though the GMFCS was not developed for use with Down syndrome, the GMFCS level should change toward level I as young children with Down syndrome get older rather than stay stable over time as in cerebral palsy.

Back to Top | Article Outline

We had three specific aims for this study. The first was to evaluate interrater reliability of GMFCS ratings made by experienced physical therapists using videotapes. The second was to examine criterion-related validity by comparing GMFCS levels with tests of motor and nonmotor development. The third was to evaluate construct validity by comparing the stability of GMFCS levels over time in children with cerebral palsy and children with Down syndrome. We wanted to confirm the value of the GMFCS as an easy-to-use, quick, valid, and reliable severity rating scale for cerebral palsy.

Back to Top | Article Outline



Subjects were 50 children with a diagnosis of cerebral palsy or Down syndrome who were receiving early intervention services at one of nine sites located in six states (New York, Virginia, Florida, Ohio, Alabama, and Colorado). Twenty-three children had cerebral palsy and 27 had Down syndrome. The sample was ethnically diverse, consisting of Asian American (6%), African American (14%), Hispanic (20%) and white (60%) children. The average age at admission to the study was 13.9 months for the group with cerebral palsy and 15.3 months for the group with Down syndrome. 10

Back to Top | Article Outline
Interrater Reliability

Children were videotaped at admission to the study (time 1) and approximately one and two years later (time 2 and time 3, respectively). All 50 children were videotaped twice. Forty-three children were videotaped a third time for a total of 143 videotape segments. Children were videotaped in six possible positions (supine, prone, sitting, quad/weight-bearing, kneeling, and standing) appropriate for their current level of motor functioning. The GMFCS level for each videotape segment was determined by an experienced pediatric physical therapist using GMFCS guidelines. 1 Thirty segments were randomly selected, using a random number table, to evaluate interrater reliability. We chose 30 segments because we thought that rating 20% of the tapes would give us a good estimate of reliability. The 30 segments were from 22 children, eight with cerebral palsy and 14 with Down syndrome. A second experienced pediatric physical therapist rated the 30 segments. Both physical therapists had 20 years of pediatric physical therapy experience. Neither physical therapist received specific training in using the GMFCS; however, before reliability testing, they reviewed the classification system and together rated several segments that were not used in the reliability sample. Interrater reliability of GMFCS scores was calculated using the κ statistic. 11

Back to Top | Article Outline

Children in this study did not have evaluations at precisely the same age. As a result, several children had two evaluations before their second birthday and did not have a third evaluation, whereas several others did not have the first evaluation until after their second birthday. Because different GMFCS criteria apply before and after the second birthday, we thought that we would have more consistent GMFCS levels for the validity studies if children had been evaluated using similar criteria. Therefore, children who were videotaped and tested at least once between their first and second birthdays and at least once after their second birthday were used for the remainder of the validity analyses. Forty-three children fulfilled these criteria: 19 with cerebral palsy and 24 with Down syndrome. Testing done closest to, but before, the second birthday was designated as time A (mean age = 16.6 months, SD = 3.4, range = 12–23 months). Final testing after the second birthday was designated as time B (mean age = 36.8 months, SD = 4.1, range = 24–44 months). Children were tested with Bayley Scales of Infant Development–Mental Scale, 12 Peabody Developmental Motor Scales–Gross Motor Scale, 13 and Vineland Adaptive Behavior Scales 14 each time that they were videotaped. Trained research assistants administered tests according to standardized procedures. Bayley Mental Development, Peabody Gross Motor, and Vineland Communication, Gross Motor, and Adaptive Behavior Scales age-equivalent scores were calculated for each child. Criterion-related validity of the GMFCS was evaluated by comparing GMFCS levels with these well-established, standardized tests that measure a variety of domains of development. Correlation between GMFCS level and Bayley Mental Development Scale age equivalent, Peabody Gross Motor Scale age equivalent, and Vineland Gross Motor, Communication, and Adaptive Behavior Scales age equivalent scores were determined using the Spearman rank correlation coefficient.

GMFCS levels across time were compared between children with cerebral palsy and children with Down syndrome to evaluate construct validity. Relationships between GMFCS level at time A, GMFCS level at time B, and diagnosis were examined using the Mann-Whitney U test.

Back to Top | Article Outline


Interrater Reliability

Interrater reliability between two experienced pediatric physical therapists assigning GMFCS level using videotaped segments was high (κ = 0.84, p < 0.0001). Table 1 shows the distribution of agreements and disagreements between the two therapists assigning GMFCS levels. None of the disagreements were greater than one classification level. Two of the three disagreements were between levels III and IV; the other was between levels II and III. Two of the disagreements were GMFCS levels of children with cerebral palsy; the other reflected rating of a child with Down syndrome.

Back to Top | Article Outline
Criterion-Related Validity

GMFCS level was moderately correlated with age equivalent scores of all developmental tests at time A (Table 2). At time B, GMFCS level continued to be moderately correlated with the two gross motor tests, the Peabody Gross Motor Scale and Vineland Gross Motor Scale, but was no longer correlated with the two nongross motor tests, the Bayley Mental Gross Motor Scale and Vineland Communication Scale, or the composite Vineland Adaptive Behavior Scale (Table 2).

Back to Top | Article Outline
Construct Validity

Figures 1 and 2 show the distribution of GMFCS levels at times 1 and 2 for both diagnoses. Table 3 compares the GMFCS level at times A and B. At timeB, most children with Down syndrome were at level I (21 of 24), whereas there was more variability and less change in the GMFCS level at time B in children with cerebral palsy. At time A, there was no difference in GMFCS level between the two diagnoses (Z = −1.005, p = 0.315); at time B, GMFCS levels of children with Down syndrome were higher than those attained by their peers with cerebral palsy (Z = −2.491, p = 0.013).

Back to Top | Article Outline


Our results demonstrate that the GMFCS can be used to reliably rate motor behavior using videotapes of children with cerebral palsy or Down syndrome. It is important to know that videotape rating is reliable because videotapes may be a more efficient way to rate children in some clinical and research situations. Previous studies have demonstrated GMFCS reliability using a review of medical records 5 or therapists’ knowledge of the children being rated. 1 In our study, children with Down syndrome were rated reliably; however, it is important to note that the GMFCS was not developed for other diagnoses and should especially not be used for prognostication in diagnoses other than cerebral palsy. In fact, we demonstrated that GMFCS tended to change toward level I in our sample of children with Down syndrome, whereas the level stayed more stable in children with cerebral palsy, similar to earlier research. 5 We do not advocate using the GMFCS for diagnoses other than cerebral palsy, but in this situation, we thought that it was valuable to rate these young children with Down syndrome so we could examine construct validity. We believe that it was acceptable because the children were so young and the children with Down syndrome demonstrated variability on the GMFCS. There would be no benefit in rating older children with Down syndrome due to the lack of variability in their levels and reduced possibility for change in score over time.

We compared GMFCS level with two tests of gross motor function, the Peabody Gross Motor Scale and the Vineland Gross Motor Scale, two tests of nongross motor function, the Bayley Mental Scale, and the Vineland Communication Scale, and a composite, the Vineland Adaptive Behavior Scale, which combines the Vineland Communication, Motor, Socialization, and Daily Living Skill Scales. GMFCS levels were moderately correlated with tests of motor development but not with tests of nonmotor domains of development at time B. This finding supports the criterion-related validity of the GMFCS, strengthening the assumption that the GMFCS level is a reflection of motor function. It is interesting to note that at time A, GMFCS level was correlated with all the developmental tests, including those testing developmental domains other than motor development. This correlation could be present because assessment of early development in all domains is constrained by motor ability. For example, Bayley Mental Scale items around 16 months of age (the average age of our sample at time A) require motor responses such as pointing, placing shapes or pegs in a board, stacking blocks, and holding a crayon. A similar constraint in assessment of cognitive function is found in items for early ages on the Vineland Scale.

As expected, GMFCS levels in children with Down syndrome change toward level I, whereas in children with cerebral palsy, there was more variability in level between children and more stability in level within children across time. This finding supports the construct validity of the GMFCS because it shows differences in groups of children with different diagnoses consistent with our expectation of differences. It is possible that more dramatic differences between children with different diagnoses would be found if the children were followed for a longer period of time and if more variability were observed in the children with cerebral palsy.

A limitation of this study is that the children with cerebral palsy were relatively high functioning. Eleven of the 19 children with cerebral palsy had level I gross motor function at time B (Fig. 2), indicating an overall high level of function. This finding may be explained by the inclusion and exclusion criteria of the original study from which these data were analyzed. Children were excluded who were “so severely impaired that the probability of achieving significant gross motor gains was extremely low,”10 leading to a higher functioning sample than might be expected in a typical early intervention population. In addition, these children were recruited from early intervention programs and, as a result, were very young. Future studies should investigate a broader range of function and ages in children with cerebral palsy.

Back to Top | Article Outline


In summary, our study supports the reliability and validity of the GMFCS. We demonstrated strong interrater reliability using videotape segments. Moderate correlation between GMFCS level and established tests of gross motor function supports criterion-related validity of the GMFCS. Furthermore, while we do not recommend the use of the GMFCS with populations other than those with cerebral palsy, except in limited situations, we were able to demonstrate construct validity because GMFCS levels remained more stable in children with cerebral palsy over time than in children with Down syndrome. The GMFCS appears to be a good clinical tool for risk adjustment in outcome studies of children with cerebral palsy as well as a tool to predict future function in childrenwith cerebral palsy.

Back to Top | Article Outline


The authors thank Steven Rosenberg, PhD, for his assistance with statistical analysis and manuscript review, and Ann Cooper Rich, PT, for her assistance with reliability testing.

Back to Top | Article Outline


1. Palisano R, Rosenbaum P, Walter S, et al. Development and reliability of a system to classify gross motor function in children with cerebral palsy. Dev Med Child Neurol. 1997; 39: 214–223.
2. Bower E, Michell D, Burnett M, et al. Randomized controlled trial of physiotherapy in 56 children with cerebral palsy followed for 18 months. Dev Med Child Neurol. 2001; 43: 4–15.
3. Liptak GS, O’Donnell M, Conaway M, et al. Health status of children with moderate to severe cerebral palsy. Dev Med Child Neurol. 2001; 43: 364–370.
4. Mall V, Heinen F, Kirschner J, et al. Evaluation of botulinum toxin A therapy in children with adductor spasm by gross motor function measure. J Child Neurol. 2000; 15: 214–217.
5. Wood E, Rosenbaum P. The gross motor function classification system for cerebral palsy: a study of reliability and stability over time. Dev Med Child Neurol. 2000; 42: 292–296.
6. Beckung E, Hagberg G. Correlation between ICIDH handicap code and Gross Motor Function Classification System in children with cerebral palsy. Dev Med Child Neurol. 2000; 42: 669–673.
7. World Health Organization. International Classification of Impairments, Disabilities, and Handicaps. Geneva: WHO; 1980.
8. Russell D, Rosenbaum P, Gowland C, et al. Gross Motor Function Measure: A Measure of Gross Motor Function in Cerebral Palsy. 2nd ed. Hamilton, Ontario, Canada: Institute for Applied Health Sciences, McMaster University; 1993.
9. Palisano RJ, Hanna SE, Rosenbaum PL, et al. Validation of a model of gross motor function for children with cerebral palsy. Phys Ther. 2000; 80: 974–985.
10. Mahoney G, Robinson C, Fewell RR. The effects of early motor intervention on children with Down syndrome or cerebral palsy: a field-based study. J Dev Behav Pediatr. 2001; 22: 153–162.
11. Hulley SB, Cummings ST, Browner WS, et al. Designing Clinical Research. 2nd ed. Philadelphia: Lippincott Williams & Wilkins; 2001.
12. Bayley N. Bayley Scales of Infant Development. New York: The Psychological Corporation; 1969.
13. Folio MR, Fewell RR. Peabody Developmental Motor Scales and Activity Cards. Allen, TX: DLM Teaching Resources; 1983.
14. Sparrow SS, Balla DA, Cicchetti DV. Vineland Adaptive Behavior Scales. Circle Pines, MN: American Guidance Service; 1984.
Back to Top | Article Outline
Description of GMFCS for Children with Cerebral Palsy Before the Second Birthday and from Age Two Years to the Fourth Birthday

TABLE Cited Here...


reproducibility of results; evaluation study; severity of illness index; child; activities of daily living; age factors; cerebral palsy; disabled persons/classification; motor skills/classification

© 2003 Lippincott Williams & Wilkins, Inc.