PN01 does not appear in the analysis because every infant passed the item. Figure 2 demonstrates how items of the AIMS and the infants in this study spread out along the same continuum. The ability level (raw scores) of infants is based on how many items were passed on the AIMS, with the lowest possible score reflecting failure on every item and the highest being a pass on every item. Every infant’s ability is converted to a measure, ie, 0–100 in this case, and infants are represented as # or · on the left side of the figure. The difficulty level of the AIMS items is displayed next to the infants along the same ruler. Ideally, items would be distributed rather equally across the range of subject ability, yielding high precision across levels of ability with no gaps and no floor or ceiling effect. A ceiling effect exists with several infants at the top of the ability scale where no more items are available to differentiate among their ability levels, in addition to the 11 infants with perfect scores who were excluded from the analysis.
Precision at Different Ability Levels
The average item difficulty levels ranged from 35 to 75 with a few gaps along the measurement continuum. The difficulty levels in the middle ranges often had a few items at the same level, whereas only one or two items at the same ability level are found toward the two ends of the measurement continuum, as demonstrated in Figure 2. After SD09 (controlled lowering through standing), only standing items are available to assess an infant’s ability levels and the difficulty levels are widely spaced.
The arrows in Figure 2 indicate gaps between item difficulties. PN01 was dropped because every infant passed it, which indicated that the item was too easy for these infants, a not unexpected finding because all the subjects were at least three months of age. Gaps exist among the eight most difficult standing items as well, including controlled lowering through controlled lowering through standing (SD09), cruising with rotation (SD10), stands alone (SD11), early stepping (SD12), standing from modified squat (SD13), standing from quadruped position (SD14), walks alone (SD15), and squat (SD16). No items at all are available for discriminating among the most competent 12-month-old infants.
In addition to gaps in the item difficulty measures, some individual items showed poor fit to the psychometric model in which only more able infants are expected to pass the more difficult items, whereas less able infants should pass only easier items. The expected value of infit and outfit mean square statistics (infit and outfit MNSQ in Table 3) is 1. A criterion value of 1.4 was used for a rating scale to judge whether an item misfit the model. 18 Four items had infit mean square values greater than 1.4, indicating noise in the data: PN08 rolling prone to supine without rotation, SU04 supine lying, SU05 hands to knees, and SD03 supported standing. Nine items had outfit mean square values greater than 1.4, including PN08 rolling prone to supine without rotation, PN14 propped side-lying, SU05 hands to knees, SU09 rolling supine to prone with rotation, ST10 sitting to prone, ST12 sitting without arm support, SD03 supported standing, SD11 stands alone, and SD12 early stepping.
A systematic hierarchy of item difficulty was found that was consistent with the order of items on the test scoring form. The AIMS items in each test position are arranged by difficulty level because the measures from the Rasch analysis consistently increase as the item number increases so that infants passing items with higher numbers within each position sequence have higher ability levels than infants with lower scores. As a result, this study provides evidence of the validity of using the AIMS both to assess overall motor ability in infants as well as to evaluate skills in different positions in space.
A ceiling effect existed in this analysis of longitudinal test results from infants ranging in age from three to 12 months corrected age. Few items are available to differentiate among infants whose ability level is at the top of the ability continuum where the items are spaced widely apart in difficulty.
A second purpose of this study was to explore the possibility that the AIMS might also have measurement properties that could explain the results of studies showing that it produced unstable longitudinal results. Similarly, Coryell et al 19 found instability in motor scores using the Bayley Motor Scale across the first year and indicated that limitations of the assessment itself might be a contributing factor for such performance instability. The Rasch analysis performed in this study revealed discontinuity of item difficulties on the AIMS. Gaps exist at several difficulty levels, which indicates that a large jump in ability level is required to pass one more item around the gap. The gaps exist between the various standing items, beginning with items expected to be passed by infants about 9 months old. This finding is commensurate with the report of Bartlett 20 that infants who scored low on the AIMS at 10 months of age would not necessarily score low on the AIMS at 15 months or on the Peabody Developmental Motor Scale at 18 months. The possible explanation might be that the precision of measurement was decreased by the gaps existing beyond nine months of age on the AIMS. Infants who do not pass one more item to jump to a higher measured ability level may either have motor ability close to that of the passed items, ability level close to the failed item, or anywhere in between. The true ability level cannot be revealed for these infants because of the lack of items on the AIMS at this level.
Coster 3 pointed out that one point of change in raw scores on the AIMS could result in a large change in PR in early infancy. A one-point difference in total raw score from 6 to 7 at one month of age leads to a change in percentile rank from the 25th to the 43rd, whereas a one point change in total raw score from 40 to 41 at eight months of age produces a change in percentile rank from the 51st to the 56th. Furthermore, Fetters and Tronick 23 found that the AIMS at seven months of age yields better sensitivity and specificity values for the prediction of the scores of the Peabody Gross Motor Scale at 15 months compared with prediction from the AIMS at four months of age. This study did not find gaps at the lower difficulty levels, but the infants were first tested on the AIMS at three months of age in the present study, and all infants passed the first prone item even though the PRs ranged from 1% to 99%. Further investigation is necessary to determine whether the AIMS items are adequate to precisely measure motor ability in infants younger than three months of age.
Some items might be added to fill the gaps beyond the difficulty level of independent standing, such as crawling upstairs, flinging a ball with extensor thrust of arm in standing, walking upstairs/downstairs/backward, crawling backward to go downstairs, or kicking a ball in standing. Alternatively, other gross motor assessments, such as the Peabody Developmental Motor Scales, 24 the Bayley Scales of Infant Development II, 25 or the Bayley Infant Neurodevelopmental Screener, 26 which are designed to measure motor development up to 72, 36, and 24 months respectively, and contain more standing items than the AIMS, can be used to document changes after a child passes the “controlled lowering through standing” item. One-time assessment on the AIMS at ages from about nine months on cannot be used as a sole resource to draw clinical impressions except for infants with delayed motor development who are not yet standing.
The AIMS items aggregate in the middle range of the difficulty levels, whereas fewer items are available toward the two ends of the measurement continuum. Only standing items are available for testing after an infant can lower him- or herself from standing in a controlled manner. This indicates that the AIMS is sufficiently precise to discriminate among infants whose ability levels are in the middle range but not at the higher ability end, ie, after achieving controlled lowering through standing (SD09). Darrah et al 21 found that using the 10th percentile cutoff point at four months and the fifth percentile at eight months yielded high sensitivity and specificity for predicting the pediatrician’s assessment at 18 months. Another study also found that the month-to-month correlations between PRs was the strongest between 7.5 and 8.5 months, and the correlations were unstable before 5.5 months. 4 These are in accordance with our findings, suggesting that the AIMS is most accurate in the mid-ability range. Another study also found only a moderate correlation (0.51) between AIMS raw scores at six and 12 months. 22 Variation in scores is necessary when calculating correlation coefficients 8 because decreased variability will decrease the correlation. The variation of AIMS scores is limited at the higher end of the scale because only a few standing items are available for assessment at 12 months, which will compromise any attempt to explore the correlation between scores derived at this age. Small score variance might also be the reason for lower correlations between three months of age and later ages because only a few items are available for use before three months.
Ten items with high infit or outfit values or both did not fit the Rasch model, ie, infants with high ability levels do not score higher and vice versa. Several factors can contribute to item misfit: (1) the item may measure a different construct than intended by the test’s authors, (2) the item may be hard to observe (need to facilitate or only appears for a short period of time), (3) some infants never achieve all three criteria needed to pass the item, (4) some infants do not experience this developmental stage or develop alternative motor patterns, or (5) testers cannot rate this item reliably (do not understand the item or do not use consistent criteria for scoring). The testers in this study had been trained to be reliable and consistent raters, thus eliminating the testers as likely sources of misfit. A review of the component analysis from the Rasch output suggests that the items belong to one construct. As a result, items that are difficult to observe or the skipping of particular motor milestones in some infants might contribute to high infit/outfit values on the AIMS. Bartlett 20 speculated that the low scores on the AIMS at 10 months could be explained by infants not crawling or using alternative motor patterns to move around. The crawling items, however, were not among the misfit items in this study.
Large infit and outfit statistics have different meanings. For items with infit misfit, erratic responses occur in infants whose ability levels are near the item difficulty level. It is hard to know whether some infants skip the misfitting items or whether results are affected by the fact that items below the performance window are automatically credited as passing, according to the AIMS scoring criteria. For items with outfit misfit, erratic responses occur in infants whose ability levels are higher or lower than the item difficulty levels, ie, infants with higher ability levels failed some easier items or infants with lower ability levels passed more difficult items. For example, some infants whose ability levels were much lower than the difficulty levels of SD11 and SD12 passed these two items. This phenomenon might be related to movement experience. These infants might have been exposed to the standing position or playing in the standing position more than other positions, leading to precocious performance on these items.
The erratic responses could not be related to specific groups of infants (eg, premature infants vs infants born full-term), therefore, the misfit items should be revised or deleted from the AIMS. An analysis without the misfit items revealed the mean infit as 0.99 and mean outfit as 0.46, and every item has infit and outfit misfit values within the expected range. A new gap appeared, however, when SU04, SU05, and SD03 were deleted. These three misfit items might also contribute to the instability in scores in early infancy because unreliable items cannot discriminate infants’ abilities properly.
This study examines the structure of the AIMS by using Rasch analysis. A ceiling effect exists in this sample, and only a few items are available for testing in the early months. Although the hierarchical nature of the items in each testing position was confirmed, the precision of measurement for older infants is decreased by the finding that only standing items with wide spacing of difficulty are available after an infant passes the controlled lowering from standing item (the ninth standing item). Although it is possible that the fact that infants were tested at three-month intervals up to 12 months affected the results, we do not believe that the AIMS is suitable for use in documenting motor developmental changes once an infant can lower him- or herself controllably from a standing position (SD09). After the age of about nine to 10 months, we suggest use of other standardized tests unless the infant being tested is not yet standing.
Subjects were recruited at the University of Illinois at Chicago Medical Center, the University of Chicago Hospitals, and Lutheran General Hospital. The authors thank Dolores Schorr, Pat Byrne-Bowens, Dawn Kuerschner, Carrie Ryan, and Kathy Tolzien for assistance in recruiting subjects; Elizabeth Branenn, Mary Carter, Judy Flegel, LouAnn Gouker, Pamela Klaska, Thubi Kolobe, Maureen Lenke, Gail Liberg, Elizabeth Osten, Jennifer Padek, Celina Wise and Laura Zawacki for testing infants; and the participation of all the infants and their families, without whom this work would not have been possible.
1. Piper MC, Darrah J. Motor Assessment of the Developing Infant. Philadelphia: WB Saunders; 1994.
2. Piper M. Theoretical foundations for physical therapy assessment in early infancy. In: Wilhelm I, ed. Physical Therapy Assessment in Early Infancy. New York: Churchill Livingstone; 1993: 1–12.
3. Coster W. Critique of the Alberta Infant Motor Scale (AIMS). Phys Occup Ther Pediatr. 1995; 15: 53–69.
4. Darrah J, Redfern L, Maguire TO, et al. Intra-individual stability of rate of gross motor development in full-term infants. Early Hum Dev. 1998; 52: 169–179.
5. Campbell S, Kolobe T, Wright BD, et al. Validity of the Test of Infant Motor Performance for prediction of 6-, 9-, and 12-month scores on the Alberta Infant Motor Scale. Dev Med Child Neurol. 2002; 44: 263–272.
6. Barbosa VM, Campbell SK, Sheftel D, et al. Longitudinal performance of infants with cerebral palsy on the Test of Infant Motor Performance and on the Alberta Infant Motor Scale. Phys Occup Ther Pediatr. 2003; 23( 3) 7–29.
7. Plewis I, Bax M. The uses and abuses of reliability measures in developmental medicine. Dev Med Child Neurol. 1982; 24: 388–390.
8. Portney LG, Watkins MP. Foundations of Clinical Research: Applications to Practice. Upper Saddle River, NJ: Prentice Hall; 2000.
9. Velozo CA, Kielhofner G, Lai J. The use of Rasch analysis to produce scale-free measurement of functional ability. Am J Occup Ther. 1999; 53: 83–90.
10. Wright BD, Masters GN. Rating Scale Analysis: Rasch Measurement. Chicago: MESA Press; 1982.
11. Campbell SK. Test-retest reliability of the Test of Infant Motor Performance. Pediatr Phys Ther. 1999; 11: 60–66.
12. Campbell SK, Kolobe THA. Concurrent validity of the Test of Infant Motor Performance with the Alberta Infant Motor Scale. Pediatr Phys Ther. 2000; 12: 1–8.
13. Campbell SK, Hedeker D. Validity of the Test of Infant Motor Performance for discriminating among infants with varying risk for poor motor outcome. J Pediatr. 2001; 139: 546–551.
14. Davidson EC, Hobel CJ. POPRAS: A Guide to Using the Perinatal, Intrapartum, Postpartum Record. Torrance, CA: South Bay Regional Perinatal Project Professional Staff Association; 1978.
15. Ross MG, Hobel CJ, Bragonier JR, et al. A simplified risk-scoring system for prematurity. Am J Perinatol. 1986; 3: 339–344.
16. Bond TG, Fox CM. Applying the Rasch Model: Fundamental Measurement in the Human Sciences. Mahwah, NJ: Lawrence Erlbaum Associates; 2001.
17. Lunz ME, Wright BD, Linacre JM. Measuring the impact of judge severity on examination scores. Appl Measure Educ. 1990; 3: 331–345.
18. Wright BD, Linacre JM. Reasonable mean-square fit values. Rasch Measure Trans. 1994; 8: 370.
19. Coryell J, Provost B, Wilhelm IJ, et al. Stability of Bayley Motor Scale scores in the first year of life. Phys Ther. 1989; 69: 834–841.
20. Bartlett D. Comparison of 15-month motor and 18-month neurological outcomes of term infants with and without motor delays at 10-months-of-age. Phys Occup Ther Pediatr. 2000; 19: 61–71.
21. Darrah J, Piper M, Watt MJ. Assessment of gross motor skills of at-risk infants: predictive validity of the Alberta Infant Motor Scale. Dev Med Child Neurol. 1998; 40: 485–491.
22. Jeng SF, Yau KIT, Chen LC, et al. Alberta Infant Motor Scale: reliability and validity when used on preterm infants in Taiwan. Phys Ther. 2000; 80: 168–178.
23. Fetters L, Tronick EZ. Discriminate power of the Alberta Infant Motor Scale and the Movement Assessment of Infants for prediction of Peabody Gross Motor Scale scores of infants exposed in utero to cocaine. Pediatr Phys Ther. 2000; 12: 16–23.
24. Folio MR, Fewell RR. Peabody Developmental Motor Scales and Activity Cards Manual. Allen, TX: DLM Teaching Resources; 1983.
25. Bayley N. Bayley Scales of Infant Development. 2nd ed. San Antonio, TX: The Psychological Corporation; 1993.
26. Aylward GP. The Bayley Infant Neurodevelopmental Screener. San Antonio, TX: The Psychological Corporation; 1995.
Keywords:© 2004 Lippincott Williams & Wilkins, Inc.
child development; infant; developmental disabilities/diagnosis; motor skills; classification; predictive value of tests; physical therapy/methods; sensitivity specificity; psychometrics/methods