One of the primary reasons physical therapists use standardized tests and diagnostic screening measures is to objectively identify a delay in attainment of motor skills among designated populations.1 According to the Individuals with Disabilities Education Improvements Act of 2004 (IDEA),2 for an infant or toddler, a disability is defined as follows:
an individual under 3 years of age who needs early intervention services because the individual is experiencing developmental delays measured by appropriate diagnostic instruments and procedures in 1 or more of the areas of cognitive development, physical development, communication development, social or emotional development, and adaptive development; or has a diagnosed physical or mental condition that has a high probability of resulting in developmental delay.2(p118)
IDEA emphasizes the need to improve the development of infants and toddlers with disabilities in order to minimize their potential for developmental delay.2 Two diagnostic scales designed to screen infants and children for developmental delays, which are frequently used by physical therapists in pediatrics, are the Peabody Developmental Motor Scales, Second Edition (PDMS-2)3 and the Bayley Scales of Infant and Toddler Development, Third Edition (Bayley-III).4
Both the tests were developed for use in the assessment of young children's motor skill abilities. The PDMS-2 is a revised and improved version of the original Peabody Developmental Motor Scales (PDMS).5 The PDMS-2 is well established and widely used by physical therapists in the assessment of motor skills in children attending early intervention programs due to ease of administration, scoring criteria, and the process used to establish reference data for the test.
The Bayley-III represents a revision of the Bayley Scales of Infant Development, Second Edition (BSID-II).6 The goals of the revision4 were to (1) update reference data, item administration, and stimulus materials; (2) develop 5 distinct scales (Cognitive, Language, Motor, Social-Emotional, and Adaptive Behavior); (3) strengthen the psychometric quality of the instrument; (4) improve the clinical utility of the instrument; and (5) simplify administration procedures.
The concurrent validity of the BSID-II (the predecessor of Bayley-III) with the PDMS-2 has been assessed and questioned.7–9 Provost et al7 assessed children with developmental delays between the ages of 3 and 41 months. They found moderate to high correlations between the BSID-II Motor Scale and the PDMS-2 standard scores and high to very high correlations between the BSID-II Motor Scale and the PDMS-2 subtests for age-equivalent (AE) scores. However, when the psychomotor development index (PDI), which is a conversion of the raw scores on the Motor scales to a standard score, was considered, more than 75% of the children who scored as significantly delayed (PDI ≤ 69) on the BSID-II Motor Scale did not score poorly (Total Motor quotient [TMQ] ≤69) on the PDMS-2.7 Of the children who scored within age-appropriate limits or above on the BSID-II Motor Scale, 80% to 95% were categorized as developing typically on the PDMS-2.7 However, of the children who were developing appropriately only 41% to 59% of the scores on the PDMS 2 were comparable to those on the BSID-II.7 Furthermore, half of the children who scored as developing typically on the PDMS-2 were considered delayed on the BSID-II Motor Scale.
In a similar study, Connolly et al evaluated 12-month-old infants developing typically.8 In this study, low and nonsignificant correlations were found between quotient scores on the PDMS-2 and the BSID-II Motor Scale as well as between the standard scores of the Fine Motor Quotient (FMQ) and the Psychomotor Development Index (PDI), the Gross Motor Quotient (GMQ) and the PDI, and the TMQ and the PDI. However, Connolly et al found a high and significant correlation for the AE scores of the PDMS-2 Gross Motor subtest for locomotion and the BSID-II Motor Scale (r = 0.71, P < .05).8 In contrast, low and nonsignificant correlations were found between the AE scores for the Fine Motor subtest for grasp and the BSID-II Motor Scale (r = 0.13), PDMS-2 stationary Gross Motor subtest (r = 0.28, P = ns) and the Fine Motor subtest for visual motor integration (r = −0.29), and the PDMS-2 Gross Motor subtest for object manipulation and the BSID-II Motor Scale (r = 0.41).
One of the primary purposes of the Bayley-III is to “identify children with developmental delay and to provide information for intervention planning.”10(p1) Previous studies of the BSID-II and the PDMS-2 showed differences in the clinical identification of children with developmental delays.7–9 These earlier studies indicate the need for research to determine whether the revisions made on the Bayley-III strengthen the concurrent validity with other tools used for clinical identification of children with developmental delays. If diagnostic tests have good concurrent validity and have similar capacities to identify children who are at risk for developmental delays, the use of the more quickly administered tool would be beneficial for clinicians. A strength of the Bayley-III Motor Scale is that the test has fewer items than the PDMS-2 and thus less time is required for test administration.3,4 This may increase efficiency for the test administrator and decrease the probability of the child becoming fatigued and inattentive. In addition, the Bayley-III requires fewer testing materials than the PDMS-2, eliminating the need for the test administrator to transport cumbersome materials.3,4 However, if discrepancies exist between 2 widely accepted diagnostic tests, a dilemma for physical therapists exists because ultimately they must use their judgment and consideration of other risk factors in deciding the best care for the child, despite test results. Therapists must also consider the type of items in the tests as well as appropriateness for specific age groups or populations of children.
Our study examined concurrent validity between the Bayley-III Motor Scale and the PDMS-2. The following questions were addressed: (1) Do the AE scores on the Bayley-III Motor Scale (with Gross Motor and Fine Motor subtests) correlate with the AE scores on the PDMS-2 Gross Motor subscales and the PDMS-2 Fine Motor scales in children with developmental delays, or who between birth and 26 months of age are at risk for developmental delays? (2) Do the standard scores on the Bayley-III Motor Scale (composite quotient) correlate with the standard scores on the PDMS-2 GMQ, FMQ, and TMQ in children with developmental delays or who between birth and 26 months of age at risk for developmental delays? and (3) Do the AE scores and the standard scores for the Bayley-III Motor Scale (with Gross Motor and Fine Motor subtests) and the AE scores and the standard scores for the PDMS-2 agree in identifying clinically important criteria in children with developmental delays or those between birth and 26 months of age at risk for developmental delays?
The participants in this study were infants and children ranging in age from 29 days to 25 months 10 days. The participants were recruited from Methodist—LeBonheur Hospital's Pediatric Department, University Therapists at the University of Tennessee Health Sciences Center, and Jackson-Madison County General Hospital's Kiwanis Center for Child Development. Each participant was enrolled in an early intervention program and the parent/legally authorized representative of the participant was contacted initially by telephone by the director of the physical therapy department at the respective institution to determine interest in participating in the study. The inclusion criteria were children with a corrected age of birth to 26 months, previously identified by a pediatrician as developmentally delayed or at risk for developmental delays, and receiving services at 1 of the data collection sites. Exclusion criteria for children were use of medications, which might potentially impair motor ability; English as a second language; or current illness that would interfere with testing. In addition, the children were recruited on the basis of age (corrected age for infants born preterm) for placement into 1 of the 4 age groups: below 6 months, 6 to 12 months 15 days, 12 months 16 days to 18 months, and above 18 months. Table 1 presents descriptive data for the participants in the study. Thirty-two male children (66.7%) and 16 female children (33.3%) were included in the sample. The gender distribution of our sample is representative of 2008 Tennessee Child Count Data11 published by the state of Tennessee for children receiving early intervention services under IDEA, which comprised 62.1% male children. As noted in the table, the majority of the children were white (66.7%) but the distribution of race/ethnicity in this sample is representative of the ethnic diversity of the 2008 Tennessee Child Count Data.11
All 48 children in the sample qualified for early intervention services by exhibiting developmental delays or by having a diagnosis of being “at risk” for developmental delay. The children presented a wide variety of diagnoses with almost half (48%) of the children in the study born preterm. Diagnoses included cerebral palsy, genetic disorders, orthopedic impairments, neural tube defects, cardiac disorders, and developmental delay. Many of the children had multiple diagnoses.
No testing was done until all aspects of the study had been explained to the child's parent or legally authorized representative, and their questions had been answered. All parents or legally authorized representatives signed an informed consent before the collection of data. The study was approved by the University of Tennessee Institutional Review Board, the Methodist—LeBonheur Hospital Institutional Review Board, and the Jackson-Madison County General Hospital Institutional Review Board.
The PDMS-2 is a standardized diagnostic assessment used to measure a child's motor developmental status.12 The PDMS-2 was designed to (1) estimate a child's motor competence; (2) compare Gross and Fine Motor disparity; (3) provide qualitative and quantitative aspects of individual skills; (4) evaluate a child's progress; and (5) provide a research tool.3 The test separates Gross and Fine Motor skills into 2 separate quotients, as well as distinguishing 6 subtests of motor ability (reflexes, stationary, locomotion, object manipulation, grasping, and visual-motor integration). The PDMS-2 also provides a TMQ. The PDMS-2 Examiner's Manual includes instructions on how to convert item scores on each of the subtests, the Gross Motor, Fine Motor, and TMQ scores into standard scores, percentile ranks as well as AE scores.
The PDMS-2 has been shown to have very good to excellent internal consistency (r = 0.89–0.97), test-retest reliability (r = 0.89–0.96), and interrater reliability (r = 0.96–0.99).3 The Examiner's Manual for the PDMS-2 states that criterion-prediction validity was assessed when scores on the PDMS-2 were compared with other accepted motor development tools such as the Mullen Scales of Early Learning: AGS Edition.3,13 The resulting correlation coefficients ranged from r = 0.55 to 0.91.3 Validity was also examined for age differentiation. The correlation coefficients determined for 12-month age intervals ranged from r = 0.80 to 0.93, indicating that the subtests were associated with age, consistent with the developmental pattern of motor behaviors.3
The Bayley-III consists of 5 distinct scales, 3 scales that are administered to the infant or toddler by an evaluator (Cognitive scale, Language scale, and Motor scale) and 2 scales that are to be completed by the parent or main caregiver (the Social-Emotional scale and the Adaptive Behavior scale). The Language scale includes the receptive communication subtest and the expressive communication subtest. The Motor scale includes a Fine Motor subtest and a Gross Motor subtest. Types of scores that are available from the Bayley-III scales and subtests are scaled scores, composite scores, percentile ranks, confidence intervals, and developmental AEs. The Bayley-III composite scores are derived from the sum of subtest scaled scores.
The Bayley-III Motor Scale demonstrates a very high level of internal consistency (r = 0.92). Very high levels of internal consistency for special groups within subtests were also found (Fine Motor, r = 0.94, and Gross Motor, r = 0.98). Special group studies included children with high incidence characteristics (eg, premature birth) and/or clinical diagnoses such as pervasive developmental disorder, Down syndrome, language impairment, small for gestational age, prenatal alcohol exposure, cerebral palsy, and intrapartum asphyxia. High correlations for test-retest reliability were found for 4 age groups (r = 0.79–0.84). High interrater reliability was also reported (r = 0.76).4
Based on these data, the Bayley-III Motor Scale appears to demonstrate content, construct, and concurrent validity.4 However, when scores on the Bayley-III Motor Scale were compared with scores of motor ability from several other motor development assessments, the correlations were found to range from low to moderate, with the majority being of moderate strength. Specifically, correlations between the Bayley-III and other tests were lower: PDMS-2 (Total Motor, r = 0.57; Fine Motor, r = 0.48; Gross Motor, r = 0.51); Adaptive Behavior Assessment System, Second Edition (Total Motor, r = 0.33; Fine Motor, r = 0.14; Gross Motor, r = 0.42); and the Vineland Adaptive Behavior Scale, Interview Edition (Total Motor, r = 0.62).
Once a child was identified as meeting the criteria for the study, the caregiver was contacted by telephone to see whether he or she was interested in participating in the study. The caregiver's questions were answered and a testing appointment was scheduled. Each child was tested in the physical therapy department at the clinical site where the child was receiving early intervention services. Guidelines for testing environments provided in the test manuals for the PDMS-2 and the Bayley-III were followed to be consistent across the 3 clinical sites.3,10 The rooms were well lit, free of distractions, and large enough for the child to perform test activities including walking and throwing. The testing time ranged from 1 to 2 hours, depending upon the age and cooperation of the child. Each child wore comfortable clothing, which allowed for movement and shoes with nonskid soles or no shoes.
The PDMS-2 and the Bayley-III Motor Scale were administered to each participant by 2 examiners. One examiner administered the PDMS-2, whereas the other examiner administered the Bayley-III Motor Scale. For those children who were younger than 12 months, the reflex, stationary, and locomotion Gross Motor subtests were used to determine the GMQ and the TMQ for the PDMS-2. For those children who were 12 months of age or older, the stationary, locomotion, and object manipulation Gross Motor subtests were used to determine the GMQ and the TMQ for the PDMS-2. The order of the tests was randomized with a short rest given between the 2 tests. Individual test items were scored during the testing session, but the quotient scores, percentile scores, and AE scores were not calculated until testing was completed. Scores were obtained on all children on all test items; therefore, the scoring of refused items was not an issue.
Five physical therapist students who were in the final academic semester of a professional doctor of physical therapy (DPT) program, 2 physical therapists who were in a transitional DPT program with 25 and 30 years of clinical experience, and 1 physical therapist who is a pediatric clinical specialist collected data. The 5 DPT physical therapist students had completed all academic coursework and had previously, during supervised clinical experiences, assessed infants and toddlers, both those who were developing typically and those with special needs using the PDMS-2, and these students also participated in 2 training sessions on the use of the Bayley-III. During data collection, the 5 DPT physical therapist students were supervised by 1 of the 2 transitional DPT students or the pediatric clinical specialist physical therapist, both of whom had experience administering the PDMS-2 and the Bayley.
Before the study, a power analysis was conducted to estimate an appropriate sample size. Although there are no formal standards for power, most researchers assess the power of their tests, using 0.80 as a standard for adequacy.14 The Pearson product moment correlation coefficient (r) is widely used as an effect size when paired quantitative data are available.15 Cohen provides the following guidelines: small effect size, r = 0.10; medium, r = 0.30; large, r = 0.50,15 to achieve adequate power. We referenced a medium effect size for our study. A level of significance of α = 0.05 was used for the power analysis. In our power analysis, we found that only 9 subjects in each age band were needed to achieve 0.80 power using a 2-tailed test at α = 0.05 for the Pearson product moment correlation coefficient.14
Data were analyzed using STATISTICA (data-analysis software system); version 7.1.16 Descriptive statistics were used to portray sample demographics, including age, gender, ethnicity, and diagnosis, as well as for determination of the standard and AE scores on the Bayley-III Motor Scale and PDMS-2. Each child's AE score, GMQ, FMQ, and TMQ were determined for the PDMS-2. For the Bayley-III Motor Scale, the AE score and the PDI were determined. The standard scores for each child were calculated for both the Bayley-III Motor Scale and the PDMS-2.
The Pearson product moment correlation coefficient (r) was used to assess the concurrent validity between the AE scores and standard scores of each test, using the 4 age groups. In our study, the magnitude of each correlation analysis was interpreted using descriptive terms for the strength of correlation coefficients 17with 0.00 to 0.25 indicating little, if any, correlation, 0.26 to 0.49 indicating low correlation, 0.50 to 0.69 indicating moderate correlation, 0.70 to 0.89 indicating high correlation, and 0.90 to 1.00 indicating very high correlation. P values were also calculated. A level of significance of α = 0.05 was chosen to control for type I errors.
Correlations between the AE scores of the Bayley-III Gross Motor and Fine Motor scales and the PDMS-2 subtests for each age group are shown in Table 2. High to very high positive and significant correlations were found between all AE scores for the above 18 months age group. Moderate to high negative and significant correlations were found between the Fine Motor AE scores of the Bayley-III and the PDMS-2 grasp and visual motor subtest AE scores in children aged 6 to 12 months 15 days. However, Gross Motor AE scores of the Bayley-III and PDMS-2 reflex, stationary, and locomotion subtest AE scores were low and nonsignificant for children aged 6 to 12 months 15 days. Correlations between all AE scores for both the below 6 months and the 12 months 16 days to 18 months age groups were low and nonsignificant.
Correlations between quotient and composite scores of the Bayley-III and the PDMS-2 for each age group are presented in Table 3. For children younger than 6 months, a moderate and significant correlation was found between the Bayley-III composite score and the PDMS-2 TMQ as well as the FMQ. A moderate but nonsignificant correlation was present between the Bayley-III composite score and the PDMS-2 GMQ. For children 6 to 12 months 15 days, a high and significant correlation was seen between the Bayley-III composite and PDMS-2 TMQ and GMQ; the relationship with PDMS-2 FMQ, however, was low moderate and nonsignificant. A very high and significant correlation was found between all quotient scores for the 12 months 16 days to 18 months age group. The above 18 months age group demonstrated a high and significant relationship for all quotient scores.
The Bayley-III composite score and PDMS-2 TMQ were compared for frequency of agreement in the identification of children who scored within the average range and those who fell lower than 1.5 standard deviations below the mean. These results are listed in Table 4. Overall, there was 79.17% agreement in total scores. However, the Bayley-III scores identified 6 more children functioning more than 1.5 standard deviations below the mean than did the PDMS-2. The Bayley-III identified 1 more child as functioning more than 1.5 standard deviations below the mean than the PDMS-2 TMQ for the 6 to 12 months 15 days age group, 4 more (33.3%) for the 12 months 16 days to 18 months age group, and 1 more for the above 18 months age group. There was 100% agreement in scores within the below 6 months age group.
The identification of infants who are at risk for developmental delays or those who are already experiencing delays is an important function of early intervention programs under IDEA. According to the American Academy of Pediatrics, early intervention is considered the key to minimizing the long-term effects of developmental delay.18 Our data support the concurrent validity of the Bayley-III Composite Scores with the PDMS-2 TMQ scores. However, only moderate correlations were found between the Bayley-III Composite Scores and the PDMS-2 GMQ and FMQ for infants younger than 6 months, with a lower GMQ correlation and a slightly higher and significant correlation for the FMQ. Furthermore, for children aged 6 months to 12 months 15 days, the correlation between the Bayley-III Composite and PDMS-2 FMQ was only low moderate. These findings raise concerns about the use of the Bayley-III as a substitute for the PDMS-2 when assessing Gross Motor skills in infants younger than 6 months or Fine Motor skills between 6 months and 12 months 15 days.
In addition, the results of the study indicate concurrent validity for AE scores between the Bayley-III and PDMS-2 in those older than 18 months but does not support the concurrent validity of AE scores between these 2 tests for the 3 younger age groups. An exception was found for the 6 months to 12 months 15 days age group, with moderate to high negative correlations seen in comparison with the Bayley-III Fine Motor AE with PDMS-2 Grasp and Visual Motor AE scores. We believe that the negative correlations that were found may be due to several factors such as the time when the reference data were collected, inclusion of children with specific clinical diagnoses in the reference data, and differences in age intervals. The Bayley-III reference data were developed using a sample collected between January 2004 and October 2004, whereas the PDMS-2 reference data were based upon a sample collected during the winter 1997 and spring 1998.3,4 Changes in skill sets in children may have occurred during that period and the Bayley-III may represent more current reference data. The Bayley-III reference data also included the scores of children with specific clinical diagnoses to enhance representativeness of the data and may have contributed to the differences in AE scores.4 Lastly, the reference data for the 2 tests may represent different AE intervals, which might contribute to the negative correlations between AEs. We are unable to explain why the children older than 18 months had a much higher positive correlation, but speculate that change in motor skills may not be as rapid during this age period compared to earlier ages. In addition, the children older than 18 months were enrolled in the early intervention programs for a longer period of time.
Our findings should raise concern about the use of AE scores for children younger than 18 months in determining whether early intervention services should be provided. Every state has a Part C program for children birth through 2 years of age and their families under IDEA. Each state decides its own eligibility rules. For example, in some states, the term “infant and toddlers with disabilities” means a child who is functioning at least 25% below his or her chronological age in 2 major skill areas (cognitive, motor, communication, social, or adaptive) or a 40% delay in 1 area.19 Some states, however, will consider information from the child's doctor as well as the results of a developmental test to determine whether a child meets the eligibility criteria.19 Therefore, based upon our findings, a child younger than 18 months might be eligible to receive early intervention services if the Bayley-III was used and not if the PDMS-2 was used if only AE scores were considered. In some cases, the child might meet the criteria based upon the PDMS-2 but not the Bayley-III.
Previous research conducted by Connolly et al,8 on 12-month-old children who were developing typically, did not support concurrent validity between standard scores of the earlier BSID-II and the PDMS-2. They also found a lack of agreement between the AE scores of the BSID-II Motor Scale and the PDMS-2 subtests, except for locomotion. The results of our study suggest that revisions made to the BSID-II have yielded an improvement in correlations between standard scores and some AE scores between the Bayley-III and PDMS-2. Because of the stronger correlations that were found in our study, we suggest that quotient scores for the PDMS-2 or the Bayley-III be used if the intent of the testing using these tools is to qualify for services for early intervention. However, further research is needed to assess concurrent validity, specifically for children aged 12 months and younger, for both standard scores and AE scores, and for children aged 12 months 16 days to 18 months in the area of AE scores.
Our research supports the use of the Bayley-III Composite score as a worthwhile substitution for the PDMS-2 if only the TMQ is considered. Data analysis revealed a 79.17% agreement between these 2 test scores in identifying children having average scores and in those scoring less than 1.5 standard deviations of the mean. Because of the attributes of the Bayley-III, this test could provide clinicians with a time-efficient means of assessment since it requires less administration and scoring time.
Two of the strengths of this study were the inclusion of children identified as developmentally delayed or at risk for developmental delay, and the wide variety of diagnoses of these children. But, a question arises concerning the severity of developmental delay of the children tested, and its effect on the concurrent validity of the Bayley-III and PDMS-2.
The majority of children in our sample who were 12 months of age or younger scored in the average range for Bayley-III Composite scores and/or PDMS-2 TMQ scores. Many of the children tested who were older than 12 months scored lower than 1.5 standard deviations below the mean on one or both of the tests. Since the quotient score correlations were stronger in the groups above 12 months of age, it is possible that the substitution of the Bayley-III for the PDMS-2 would be more appropriate clinically for children with a suspected developmental delay rather than those younger infants who are simply at risk for delay. Alternatively neither of these tests may be the most appropriate for infants younger than 6 months who are at risk for developmental delay. As noted in our data, all participants younger than 6 months scored in the average range. Spittle et al20 performed a systematic review of 9 assessment tools (including the Bayley-III and the PDMS-2) used during the first year of life to discriminate, predict, or evaluate the motor development of infants born preterm. Neither of these tests was included in their recommendations of best assessment tools for this population. Future studies could focus on children with varying diagnoses to determine whether differences in motor development can be identified through the use of either the PDMS-2 or the Bayley-III.
A possible limitation of this study is the lack of geographic diversity, since all children tested were from the mid-south region of the United States. Further studies should be performed in different geographic regions. Although children included in the study represented a variety of ethnic groups, further studies may be warranted for specific ethnic populations as well.
This study supports concurrent validity between the PDMS-2 TMQ scores and the Bayley-III composite scores for children aged 29 days to 25 months 20 days who have, or are at risk for, developmental delays. With the exceptions noted, it can be concluded that the Bayley-III and PDMS-2 standard scores yield similar results. When AE scores are required, this study supports only the substitution of Bayley-III AE scores for PDMS-2 AE scores for children aged 19 months to 25 months 10 days.
However, neither of these tests may be the best assessment tool for infants younger than 12 months at risk for or with developmental delay.
The authors thank the following physical therapist students who assisted in the initial design of the study, analyses of cited studies, and data collection: Alexi Adams Breaux, Shaun Barrios, Angie Isaacs Hartman, Luisa Ramirez de Lynch, Rachel Wilson Segars, and Hannah Wood Taylor. The authors also thank the families and children who graciously agreed to participate in the study.