“Cerebral palsy (CP) is a group of disorders of the development of movement and posture, causing activity limitations, that are attributed to nonprogressive disturbances that occurred in the developing fetal or infant brain.”1 (p9) The current literature provides a variety of information on the development of movement in individuals with CP; however, little clinically applicable knowledge is available regarding the postural component of these disorders. Because CP represents such a large number of individuals, affecting about 2 to 2.5 per 1000 live births,2 having a thorough understanding of all characteristics of CP is important.
Little information is available on tests of postural stability suitable for children in both the sitting and standing positions across all levels of the Gross Motor Function Classification System (GMFCS)3 in clinical or community settings. The method of examining the postural stability of children with CP has largely been restricted to laboratories.4,5 When evaluating postural stability, simply altering sensory conditions or perturbing individuals in a laboratory setting to identify changes in body structure and function is not enough. At a minimum, results must be able to be generalized to these children's everyday environments while participating in everyday activities, through examining postural stability in a functional manner.
At the time this study was conducted, no standardized clinical assessment of postural stability was available to be used regularly in practice that could accommodate children across all levels of the GMFCS. As described elsewhere,6 no existing measure provides a comprehensive, reliable measure to assess the postural stability of children with CP and varying functional abilities that provides a valid assessment over time. Specific purposes of measures include discrimination, prediction, and evaluation,7 and different psychometric requirements accompany each purpose. Construct validity and interrater reliability are important for all these purposes. Test-retest reliability is especially important for discriminative and predictive purposes. Evidence of responsiveness is essential for evaluative purposes. Two recently developed measures, the Pediatric Reach Test (PRT)8 and the Early Clinical Assessment of Balance (ECAB),6 have the potential to be a reference standard clinical measurement of postural stability for children with CP because they both can be used across all GMFCS levels. Evidence of validity has been established for both measures; however, evidence of reliability has been established only for the PRT. Ideally, one of these measures provides a standard that can be used to discriminate among and predict future outcomes of children with CP, which was our primary interest at the time this study was conducted; however, we were also interested in a preliminary evaluation of the ability to use the measures to detect change.
The primary purpose of this study was to compare the psychometric properties of the PRT and the ECAB by determining the validity and reliability of both measures. Specifically, we aimed at determining the construct validity between the measures, interrater and test-retest reliabilities of the 2 measures, and their associations with the Gross Motor Function Measure, 66-item version, Basal and Ceiling approach (GMFM-66-B&C).9 A secondary purpose was to determine the responsiveness of each measure to change, and a final purpose was to determine time to complete each test. In this study, we used the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidelines,10 which recommend the term construct validity for the examination of the degree to which the scores of an instrument are related to the scores on other instruments. We hypothesized that the PRT and the ECAB would be moderately correlated (rs = 0.60-0.80) and have high interrater and test-retest reliabilities (intraclass correlation coefficient [ICC] > 0.90). Furthermore, we hypothesized that the PRT and the ECAB would have a moderate (rs = 0.60-0.80) and strong (rs ≥ 0.90) relationship, respectively, with the GMFM-66-B&C. The measure with the highest reliability and strongest correlation with the GMFM-66-B&C would be considered optimal, in the context of the ability of each measure to detect change over time, determined by the standard error of measurement (SEM) and the minimal detectable change (MDC), as well as the time to complete each measure.
This study was a measurement study comprising validity and reliability components. Approval was obtained from Western University (Research Ethics Board) and Thames Valley Children's Center (Research Advisory Committee); informal approval was obtained from each additional site, per site requirements, which accepted Western University's approval. All parents provided written consent for their children to participate.
Children between the ages of 2 and 7 years (ie, up to the eighth birthday) with a primary diagnosis of CP were eligible to participate in this study. Children with significant health comorbidities, dual diagnoses (eg, CP and autism), or who had orthopedic surgery within the 4 months prior to the first assessment were excluded.
A convenience sample of 28 children (mean age = 56 [SD = 23.7] months) was recruited by participating physical therapists (PTs) from 6 children's rehabilitation centers in Southwestern Ontario. All children participated in each aspect of the study. A summary of the children's age, sex, distribution of involvement, and GMFCS level is presented in Table 1.
Sample Size Calculation
A planned sample size of 30 participants would provide a power value of 0.95 to detect an ICC of greater than 0.80.11 Power is slightly lower with the obtained sample of 28 children.
Postural stability was measured using 2 instruments. The PRT8 was developed as a simple and time-efficient discriminative tool that measures the distance in centimeters that a child is able to reach forward and sideways from a fixed base of support without loss of balance. It incorporates aspects of balance in both the sitting and standing positions to accommodate children across the spectrum of GMFCS ability levels. Concurrent validity was supported with the observation of moderate to strong correlations between the standing section of the PRT and reference standard laboratory tests of limits of stability (ie, the threshold point at which a child needs to take a step when leaning forward, backward, or sideward to prevent a fall) when tested on children who were developing typically (r = 0.42-0.77). Construct validity was supported with a strong correlation between the total PRT score and the GMFCS level (r = −0.88) in a sample of children with CP. Test-retest and interrater reliability ICCs ranged from 0.54 to 0.88 and 0.50 to 0.93, respectively.8
The second instrument, the ECAB6 (available at http://www.canchild.ca/en/ourresearch/moveplay.asp), was developed through integration of 2 existing balance measures to accommodate children with CP with varying levels of involvement across the GMFCS. Part 1 of the ECAB tests early postural stability via reactive and anticipatory head stability and trunk stability in floor sitting, whereas part 2 tests anticipatory postural stability starting with bench sitting, then transfers, and static and dynamic standing.6 Construct and discriminate validities were supported with strong correlations between the ECAB and the GMFM-66-B&C (r = 0.95) and significant differences among the GMFCS levels, respectively.5 Evidence of reliability of the ECAB has not yet been established.
Basic Motor Abilities
Basic motor abilities were measured using the GMFM-66-B&C,9 which is an officially recognized version of the GMFM-66. (The second version of the Gross Motor Ability Estimator [GMAE] is available by searching for GMAE-2 at www.canchild.ca, in which the B&C version is one of several scoring options.) To administer the B&C approach, a modified score sheet has been developed, on which items have been ordered by difficulty level, on the basis of the Rasch analysis12 (available at http://www.canchild.ca/en/ourresearch/moveplay.asp). The GMAE is used to calculate a total score.13 Concurrent validity was supported with a strong relationship between the GMFM-66-B&C and the GMFM-66 (ICC = 0.987; 95% confidence interval [CI] = 0.972-0.994). Test-retest reliability was evident with an ICC of 0.994 (95% CI = 0.987-0.997).9 Further evidence of criterion validity was obtained in a separate study, in which basal and ceiling items were extracted from an existing database of full GMFM-66 scores.14 Correspondence between the GMFM-66 and the GMFM-66-B&C scores was high (ICC = 0.99; 95% CI = 0.988-0.993).
Prior to data collection, all PT assessors received a training booklet with a summary of the study, a figure containing the study design, and the standardized item administration guidelines for both measures. The first author completed criterion testing for the GMFM-66 and the GMFCS, attaining the standard of greater than 80% item agreement with the reference standard, established by the larger study team.15 All recruited children were assessed twice; a 2-week time interval occurred between each session to minimize recall of the instrument scores by the assessors and to minimize the children's potential motor progress. During the initial visit, the first author and a second assessor (ie, a PT from 1 of the 6 centers at which the children received rehabilitation services) each administered the PRT and the ECAB to determine interrater reliability of the 2 measures. The order of testing was variable because of (1) preference of the assessing PT, relating to wanting to see how the measure was administrated, (2) child behavioral factors such as shyness, and (3) physical attributes such as extreme spasticity and issues related to safety. This visit took a maximum of 1 hour. During the second visit, the assessors were randomly allocated to administer either the PRT or the ECAB to determine test-retest reliability. In addition, the first author administered the GMFM-66-B&C to determine construct validity of each balance measure. This visit also took a maximum of 1 hour. During each visit, the time taken to administer and complete each postural stability test was recorded.
Scores for the PRT are in centimeters. Summed ECAB subscale scores comprise a total raw score. The GMFM-66 score is the score established using the GMAE. Possible scores on both the ECAB and the GMFM-66 are 0 to 100. Construct validity between the PRT and the ECAB was determined using the Spearman rho. Interrater and test-retest reliabilities between and within the postural stability measures were examined using the ICC (2,1). Construct validity of the PRT and the ECAB with the GMFM-66-B&C was determined using the Spearman rho. Comparison of time to completion of the 2 measures of the postural stability was determined at the second data collection point with a paired t test. An alpha level of 0.05 was used for each statistical test.
Responsiveness of both measures was explored using the MDC, which needs the SEM to be calculated. The SEM provides an estimate of the amount of measurement error in an individual's observed test score. It was calculated for each measure using the test-retest reliability estimates from different raters during the second data collection point to increase generalizability, using the following formula: SEM = standard deviation times the square root of 1 minus the reliability estimate. The 95% CI around a test score is determined by adding and subtracting 1.96 times the SEM to and from the test score. The MDC is the minimal amount of change that is not likely to be due to chance variation in measurement16; specifically, the change score needs to be more than that accounted for by potential measurement error. The 95% confidence level of the MDC is calculated by multiplying the SEM by 1.96 and multiplying the resulting value by the square root of 2 to account for the additional measurement error incurred on the 2 test occasions.16
The summary scores for both of the postural stability measures and the GMFM-66-B&C are presented in Table 2. For all postural stability measures, a full range of the score distributions was observed, providing evidence that the measures did detect variation among the functionally heterogeneous group of children.
To establish construct validity between the PRT and the ECAB, we had 4 possible combinations of scores to consider. The first author was always the first rater and the participating PTs were the second raters. To maximize the generalizability of this finding, we report the correlation between the scores of the 2 measures obtained by the 6 participating PTs on the 28 participants at the first study visit. A Spearman correlation of 0.88 (P < .001) was obtained.
Table 3 contains the interrater and test-retest reliabilities coefficients and the construct validity with the GMFM-66-B&C for both the PRT and the ECAB. Because the ICC values are so high, 3 decimal places are reported. For interrater and test-retest reliabilities (different raters only), the ECAB demonstrated higher reliability than the PRT, as demonstrated by the nonoverlapping 95% CIs (Table 3). Nonoverlapping 95% CIs indicates statistically significant differences in the magnitude of reliability coefficients between the measures. For test-retest reliability (same rater), a trend to higher reliability for the ECAB was noted (ie, a very small 95% CI overlap was detected). The strength of association of each measure to the GMFM-66-B&C did not differ meaningfully.
The SEM was higher for the PRT than that for the ECAB (16.8 vs 3.6). The MDC (95% CI) showed a similar pattern with values of 46.5 and 10.0 for the PRT and the ECAB, respectively. These values are in centimeters and raw score values for the 2 measures, respectively.
A paired-samples t test was conducted to compare the time to completion for the PRT and the ECAB at the second data collection point (Table 4). The second data collection point has greater generalizability and is a more stable estimate as a result of the postlearning effect. A significant difference was found between the completion times, with the PRT taking significantly less time (6.4 minutes, on average) to complete.
The purpose of this study was to evaluate the reliability of the PRT and the ECAB measures of postural stability of young children with CP and their construct validity (with respect to associations between the 2 postural stability measures and each of them with the GMFM-66-B&C). Overall, on the basis of the results, both tests demonstrated strong psychometric properties; however, on comparison, the ECAB demonstrated stronger validity and reliability than the PRT, with lower measurement error and the potential to be the better measure to detect change over time. The PRT, however, performed better than expected, on the basis of stronger than hypothesized correlations with the GMFM-66-B&C.
This study illustrated that the PRT and the ECAB were strongly correlated. It was hypothesized that the PRT and the ECAB would be moderately correlated (r = 0.60-0.80), but the results show a higher correlation. Decision about the preferred measure cannot be made from this piece of evidence.
The ECAB demonstrated excellent consistency; the interrater and test-retest reliabilities assessments all resulted in ICCs greater than 0.986, which was consistent with the hypothesis of obtaining ICCs greater than 0.90. The PRT also had high test-retest reliability with the same rater (0.940); however, point estimates for test-retest with different raters and interrater reliability obtained values lower than what was hypothesized (0.884 and 0.874, respectively). The lack of overlapping CIs between the PRT and the ECAB demonstrates that the reliability indices are statistically significantly different, with the ECAB demonstrating greater reliability.
The PRT and the ECAB have good construct validity with respect to associations with the GMFM. Children who obtained higher scores on the postural stability measures also attained a higher score on the GMFM. As postural stability is known to modulate gross motor functional ability, it is reasonable to consider that both the ECAB and the PRT can provide a good indication of the child's gross motor status.
The time to complete each test was compared on the second test occasion to increase generalizability (ie, by controlling for—to some extent—the learning effect of both measures by the second raters). The ECAB took almost twice as long as the PRT to administer; however, it is important to note that both measures were completed efficiently.
In addition to the results of this study, a secondary purpose was to explore the use of the measures for discriminative and predictive purposes; exploration into their abilities to be used to detect change over time (ie, to serve potentially an evaluative purpose) was warranted. The effects of maturation and/or physical therapy on individuals are often measured by change in scores over time.16 Mean change scores may provide information about the measure's statistical significance; however, this index does not provide information about the meaningfulness of the change. Understanding whether the change in scores is clinically significant or due to an error of measurement is important. This aspect of interpretation can be explored using this study's data by establishing the SEM and the MDC.
The SEM was higher for the PRT than that for the ECAB (16.8 vs 3.6). To illustrate the utility of these values, we use the median values of 38.0 and 40.3 for the PRT and the ECAB, respectively, as estimates of an obtained score. For the PRT, one can be 95% confident that the actual score is between 5.1 and 70.9, when taking measurement error into account. For the ECAB, one can be 95% confident that the actual score is between 33.2 and 47.4. The ECAB generates a more precise estimate of postural stability than the PRT, thus contributing to its clinical utility.
The MDC was 46.5 and 10.0 for the PRT and the ECAB measures, respectively. This represented approximately 25% and 10% of the possible range of scores, for the PRT and the ECAB, respectively (see Table 2, T2 column for the estimates of range used for this interpretation; 0 to 200 for the PRT and 2 to 100 for the ECAB). This means that a child would need to change more than 50 cm of total reach (ie, 25% of the possible range of scores for children of this age), for a change over and above measurement error to be declared. If used as an outcome measure, PTs should practice caution when interpreting the changes in PRT scores over time. The ECAB had a more reasonable 10% of the total range of scores to be attained for a true change to be detected. This provides preliminary evidence that the ECAB is potentially better than the PRT in detecting change over time.
The ECAB is a more easily administered measure because the performance is evaluated and then scored on the basis of predetermined criteria. The PRT is more difficult to administer because the clinician has to identify starting point and end point of the reaching task, and this can vary depending on the skill level of the assessor and compliance of the child with test procedures. The ECAB was easier for the raters to learn. Conversely, the therapists had a more difficult time learning the PRT and as a result, anecdotally, indicated preference for the ECAB as compared with the PRT.
Both measures were difficult to administer to younger children, specifically, children aged 2 to 3 years. This was believed to be due to the complexity of the instructions for both measures. For example, it was difficult to illustrate to the young children that they must keep their feet at a fixed point when reaching in the PRT, even when using the suggested “footprints.” The later items in the ECAB were difficult to administer, such as instructing the child to keep his or her eyes closed for 10 seconds while maintaining standing balance or turning 360° in one direction, stopping, and then turning 360° in the other direction.
Evidence of responsiveness of both measures is limited to the SEM and the MDC. A longitudinal study investigating change over time is warranted to fully explore this psychometric property.
When comparing the psychometric properties of both the PRT and the ECAB, the ECAB is considered to be more reliable and clinically useful measure of postural stability in young children with CP for the purposes of discrimination and prediction. In addition, the calculations of the SEM and the MDC provide preliminary evidence that the ECAB provides a more precise estimate and has better potential to detect change. Finally, the therapists, at least anecdotally, preferred the ECAB because of ease of learning and administration.
1. Rosenbaum P, Paneth N, Leviton A, Goldstein M, Bax M. A report: the definition and classification of cerebral palsy April 2006. Dev Med Child Neurol. 2007;109(suppl):8–14.
2. Rosen MG, Dickinson JC. The incidence of cerebral palsy. Am J Obstet Gynecol. 1992;167:417–423.
3. Palisano RJ, Rosenbaum P, Walter S, Russell D, Wood E, Galuppi B. Development and reliability of a system to classify gross motor function in children with cerebral palsy. Dev Med Child Neurol. 1997;39(4):214–223.
4. Brogren Carlberg E, Hadders-Algra M. Postural control in sitting children with cerebral palsy. In: Hadders-Algra M, Brogren Carlberg E, eds. Postural Control: A Key Issue in Developmental Disorders. London, UK: Mac Keith Press; 2008:74–96.
5. Woollacott MH, Crenna P. Postural control in standing and walking in children with cerebral palsy. In: Hadders-Algra M, Brogren Carlberg E, eds. Postural Control: A Key Issue in Developmental Disorders. London, UK: Mac Keith Press; 2008:97–130.
6. McCoy SW, Bartlett DJ, Yocum A, et al. Development and validity of the early clinical assessment of balance for young children with cerebral palsy [published online ahead of print October 2, 2013]. Dev Neurorehabil. doi: 10.3109/17518423.2013.827755.
7. Kirschner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis. 1985;38:27–36.
8. Bartlett DJ, Birmingham T. Validity and reliability of a Pediatric Reach Test. Ped Phys Ther. 2003;15:84–92.
9. Brunton LK, Bartlett DJ. Validity and reliability of two abbreviated versions of the Gross Motor Function Measure. Phys Ther. 2011;91:577–588.
10. Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737–745.
11. Donner A, Eliasziw M. Sample size requirements for reliability studies. Stats Med. 1987;6:441–448.
12. Avery LM, Russell DJ, Raina PS, Walter SD, Rosenbaum PL. Rasch analysis of the Gross Motor Function Measure: validating the assumptions of the Rasch model to create an interval-level measure. Arch Phys Med Rehabil. 2003;84:697–705.
13. Russell DJ, Rosenbaum PL, Avery LM, Lane M. Gross Motor Function Measure (GMFM-66 & GMFM-88) User's Manual. London, UK: Mac Keith Press; 2002.
14. Avery LM, Russell DJ, Rosenbaum PL. Criterion validity of the GMFM-66 item set and the GMFM-66 basal and ceiling approaches for estimating GMFM-66 scores. Dev Med Child Neurol. 2013;55:534–538.
15. Bartlett DJ, Chiarello LA, McCoy SW, et al. The Move & PLAY study: an example of comprehensive rehabilitation outcomes research. Phys Ther. 2010;90:1660–1672.
16. Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and measures used in physical therapy. Phys Ther. 2006;86:735–743.