Spinal muscular atrophy (SMA) is the most common fatal genetic disorder of infancy. This progressive anterior horn cell disorder affects approximately 1 of every 6000 to 10,000 children who are born alive.1 SMA is due to a homozygous mutation within the survival of motor neurons 1 gene (SMN1).2 Increasing the number of copies of a homologue gene, SMN2, ameliorates the disease severity.3,4 Infants with typical SMA, type I (Werdnig-Hoffmann disease, SMA-I) usually have 2 SMN2 copies. With the elucidation of the molecular genetic basis of SMA, a number of medications have been proposed as potential treatments for SMA.5,6
Fundamental to conducting a clinical trial in a neuromuscular disorder is having a validated motor outcome measure that accurately reflects change in strength or motor function and is sensitive to small degrees of change. Whereas older children and adults with SMA can be tested reliably with manual muscle testing7 or myometry measurements,8 measurement of maximal volitional strength cannot be obtained in infants. Gathering a variety of age-appropriate motor responses is instead hypothesized to capture a coherent overview of the infant's neck, trunk, and proximal-distal limb power. When these responses are graded on a scale and summed they can provide a measure of change over time. To be truly useful, such a motor scale should ideally be easily performed without expensive equipment or require specially trained examiners. The motor scale should be reliable in the hands of experienced testers and valid for the specific disorder under investigation and it must be sensitive to change. At present, these characteristics have not been shown for any motor scale specifically in SMA-I. The initial part of the Alberta Infant Motor Scales (AIMS)9 has only a few items that are typically demonstrated by infants with SMA-I and the reliance on multiple qualitative descriptors makes the scoring of items in this population problematic. The Motor Scale of the Bayley Scales of Infant Development10 and the Peabody Developmental Motor Scales11 similarly include few items that are typically completed by infants with SMA-I. The items on these tests, which on occasion are completed by children with SMA-I, are limited to arm and hand function examined in the supine position. These types of fine motor items have been found to be highly dependent on arm and trunk position and have been inconsistently performed by patients with SMA-I (A.M.G., R.S.F., unpublished observations).
The Test of Infant Motor Performance (TIMP) captures 13 spontaneous motor observations and 29 elicited motor behaviors. The observed items (1–13) are scored 0 (no) or 1 (yes) and the elicited items (14–42) are scored on a 0 to 6 ordinal scale. The TIMP was designed to focus on infants born from 34 weeks postmenstrual age to term and followed to 4 months chronological age, with the goal of identifying early those at risk for chronic motor deficits.12,13 The item set addresses postural and selective control of movement and is well tailored to a population with physiologically limited strength and a limited repertoire of motor skills. The infant born premature at 34 weeks postmenstrual age scores near the bottom of this range due to limited antigravity extremity movement and head control in the supine position. The 4-month-old infant who is typically developing and who is close to beginning to roll over, a skill that infants with SMA-I rarely achieve, represents the upper range of TIMP performance. The TIMP has been shown to be safe even in the fragile group of premature infants of 34 weeks postmenstrual age. It has been used to test over a thousand premature infants in the normalization and development process, whereby the test has proven to be well tolerated.14 It has been shown to be reliable in infants who are born prematurely, with Pearson product moment correlations of 0.89.15 It has not been studied, however, in infants with a primary neuromuscular disorder such as SMA-I or congenital myopathy. Validity of the TIMP has been demonstrated in infants from 32 weeks postconceptual age to 3.5 months post-term, where no ceiling or floor effect was seen during the development process, and covers the same motor skills typically seen in infants with SMA-I. The validity of the TIMP has also been established through expert review, ecologic item validation, and construct validity has been demonstrated through sensitivity to developmental change and to discrimination based on level of risk assessed as a neonate. In addition, the TIMP has been shown to possess predictive validity with the AIMS in later infancy and the Bruininks-Oseretsky Test of Motor Proficiency at school age.16
As the item set in the TIMP captures those skills that an infant with SMA-I typically can perform and allows for change in either direction, we hypothesized that it would be a useful measure for an infant with SMA-I. The AmSMART group elected to use the TIMP as the motor scale outcome measure in a Phase 1/2 riluzole study for SMA-I.17 In preparation for this trial, the clinical evaluators from each study site met for a joint training session to establish reliability of the TIMP for this population.
The purpose of this study was to evaluate the interrater and test-retest reliabilities of the TIMP. The results of this study are presented and discussed here.
MATERIALS AND METHODS
This study was approved by each participating institution's Investigational Review Board and the parent of each participating infant signed an Investigational Review Board-approved informed consent.
Training Session Preparation
The 2 coordinators prepared to be the trainers for the TIMP training session. One of the coordinators attended a 2-day workshop on the TIMP. A second coordinator was trained by the primary coordinator. Two TIMP training videos (TV1 and TV2) were prepared and independently rated by these training coordinators. The infants used in these training videos were recruited from the pediatric neuromuscular and the physical therapy clinics at the site where the 2 coordinators work. The weighted Κ values for these 2 videos were 0.74 and 0.62. Independently, a third coordinator discussed the test items with the developer of the TIMP (A.M.G., personal communication with Suzann Campbell) and had gained experience in administering the TIMP, AIMS, Bayley Scales of Infant Development, and Peabody Developmental Motor Scales to infants with SMA-I and other neuromuscular disorders of infancy before the training session.
Before the training meeting, materials were sent to the clinical evaluators at each participating site. These included the TIMP manual, “The Test of Infant Motor Performance”: self-study CD ROM program,18 and 2 videos (TV1 and TV2) for review. To determine whether an in-person training session was necessary, each evaluator was asked to review the training CD and manual and to score the performance of the 2 infants in TV1 and TV2. An additional 3 videos were prepared by the lead coordinators. Two infants with nonspecific low tone and weakness were used for training videos TV3 and TV4 and an infant with SMA-I was used for TV5. Each of these 5 training videos was used in the training session.
Sixteen evaluators from 10 centers participated in a 2-day session conducted by the central site's 2 training evaluators (18 total evaluators from 11 centers). On the first day of training, the TIMP score sheets for the 2 pretraining videos were submitted to the trainers. Published articles on the TIMP and the manual of procedures for administering and scoring the test correctly were reviewed and discussed by the clinical evaluators. The ratings for the 2 pretraining video assessments TV1 and TV2 were reviewed and the scoring of each item was discussed in detail. During the discussion, 2 items (33-lateral straightening of the head and body with arm support and 34-lateral hip abduction reaction) were identified in which the infants were placed in positions that were not considered to be comfortable. These items were not administered and scored as a zero, in keeping with the instructions for administering the TIMP.12
On the second day, a live-patient evaluation by the lead evaluator was performed using an infant having trisomy-21 and typical hypotonia and motoric delays (LP1), with the entire group observing and scoring independently. The score sheets for LP1 were collected and the subject's performance on each item was discussed. Then the evaluators independently observed and rated 2 additional videos (TV3 and TV4), after which the score sheets were collected followed by reviewing the scoring for each item. Finally, a video of a patient with SMA-I (TV5) was observed and rated. Because there were a large number of evaluators it was not possible to test the infant multiple times so videotaped assessments were used. One additional evaluator was trained subsequently by the lead evaluator using the 5 videos from the 2-day training session.
After the interrater training session, an intrarater study was performed with 11 infants with SMA-I tested by 8 of these 19 evaluators at 8 sites. Overall, 6 evaluators examined 1 subject on 2 occasions, 1 evaluator examined 2 subjects on 2 occasions, and 1 evaluator examined 3 subjects on 2 occasions. These infants were then reevaluated approximately 1 month later by the same evaluator. The SMA-I riluzole trial, of which this study was a part, included 2 baseline visits 1 month apart before starting the drug. It was at these time points that data were captured for this reliability study. Like the more indolent SMA types II and III, infants with type I often enter a plateau phase after an initial precipitous decline in strength. This is reflected in the plateau in the survival curve for SMA-I.19 It was anticipated, therefore, that there would be little change in these subjects over a 4-week interval, unless there was an intervening illness.
The TIMP was administered to these subjects with SMA-I in the same manner as in the training video and as was demonstrated in the training session. The observational items were typically obtained within 5 minutes and the elicited items within an additional 25 minutes. Rest periods were used liberally if, in the opinion of the evaluator, the subject was overly fussy or showed signs of fatigue. None of the infants had a significant illness, was on any acute medication, or had yet started the riluzole at the time of testing. Administering the full TIMP without a rest takes approximately 30 minutes. Patients were examined on a firm padded exam table and were evaluated in their diaper to be free of the constraints of clothing. Parents were allowed to observe the testing and helped in calming the infant if upset.
Interrater reliability analysis from the training session for the total group of subjects and for the subject with SMA alone was performed across the 40 items (the 2 items with scores of 0 for all subjects were excluded from the analysis) for pairs of evaluators. The average weighted Κ and 95% confidence intervals (95% CI) were calculated for (1) all subjects rated by the 16 evaluators present at the training session, (2) for the pretraining videos (TV1 and TV2) by 15 evaluators (1 evaluator did not participate), (3) the post-training videos (TV3 and TV4) by the 16 evaluators, (4) the subject with SMA-I (TV5) by 15 evaluators at the training session (1 evaluator did not rate this subject) and the evaluator trained at a later date, and (5) the subject with SMA-I (TV5) by 8 evaluators participating in the test-retest reliability portion of the study. For test-retest reliability, agreement between the total TIMP score, evaluated at the 2 time-points, was performed using intraclass correlation coefficients (ICCs).20 The Bradley-Blackwood test was used to simultaneously test for equality of means and variances between the 2 time-periods.20
All infants who participated completed the testing without adverse reaction. Subject characteristics and TIMP data from the test-retest reliability sessions are presented in Table 1.
The overall average weighted Κ from the training session for the 16 evaluators, 0.61 (95% CI 0.59–0.63), was considered substantial.21 An acceptable level of Κ (reliability) should exceed 0.60.22 No further training was considered necessary. The average weighted Κ based on the two pretraining videos (TV1 and TV2) for 15 evaluators (1 evaluator did not rate these videos) was 0.50 (95% CI 0.47–0.53, individually, 0.53 and 0.45), whereas the average weighted Κ for the 2 post-training videos (excluding the patient with SMA) was 0.61 (95% CI 0.58–0.64, individually, 0.64 and 0.58). The average weighted Κ for the patient with SMA-I for the 16 trained evaluators (excludes 1 rater at the training session who left before rating this subject and includes the evaluator trained at a later date) was 0.78 (95% CI 0.74–0.81) and for the 8 evaluators subsequently participating in the intrarater reliability was 0.71 (95% CI 0.68 – 0.75).
The ICC for the TIMP total score was 0.85 (95% CI 0.54–0.96) and the Bradley-Blackwood test was not significant (p = 0.46), which indicated that the means (34.0 vs 33.4) and standard deviations (18.51 vs 15.02) across the 2 time periods were similar. Infants had not changed significantly in performance over 1 month. There was no significant Spearman rank order correlations between age and total TIMP score at either time point in this group of subjects.
The results from this study demonstrate that the TIMP can be administered safely and effectively to infants with SMA-I who are weak. In that population it can be reliably performed and captures a range of motor function without a floor or ceiling effect in the small number of subjects studied.
For the purposes of current clinical trials for SMA-I, we reviewed the literature for several existing motor scales that have been used in infants. The AIMS has been adopted as a motor outcome measure in clinical trials for the classic infantile presentation of Pompe disease and has been demonstrated to be sensitive to change in that disorder.23 For SMA-I, however, we searched for a test measure that would capture change in infants who are very weak.
The training materials (training CD and manual) helped prepare the evaluators for the interrater training session. The additional in-person training and discussion of each item helped improve the groups overall agreement (reliability). It is recommended that in-person training be used to supplement the training materials.
No significant adverse effects occurred as the result of this testing. Although all infants were able to complete the test, the infants' behavioral state was often a limiting factor in obtaining optimal responses. We emphasize the need to perform testing with the infant in an alert and calm or mildly protesting state. Those who were lethargic or agitated did not score well. In this case, the examiner would offer a rest period and attempt the item again.
The TIMP has limitations in SMA-I. In this trial, the subjects often demonstrated limited tolerance to some of the positions required in the TIMP. Because of the limited respiratory capacity of these infants and their reliance on diaphragmatic breathing the items in the prone position made many of the infants irritable and those test items (36–39) were less effectively obtained. Additionally, some of the head control and defensive reaction items (15–17, 25, 26) were reported to provoke irritability in some of the infants who were weaker. When these 9 items were excluded from analysis, the ICC showed minimal improvement from 0.85 to 0.88 and the Bradley-Blackwood test remained nonsignificant (means ± standard deviations for time 1: 26.27 ± 13.61 vs time 2: 26.55 ± 11.89) showing a small improvement from the 40 item TIMP.
The preliminary evidence from this study indicates that the TIMP is a reliable test in this population. The interrater reliability portion of this study included infants with hyoptonia and weakness, but only one with confirmed SMA-I. This interrater reliability data, therefore, must be interpreted within this heterogeneous context. By comparison the test-retest reliability portion included only infants with confirmed SMA-I and is more homogenous. The former demonstrates broad reliability for the use with infants with hypotonia in a training session and the latter more specifically in SMA-I. Ideally, a revision that is more specifically tailored to SMA-I would eliminate these limitations, improve subject tolerance for testing and shorten the administration time. A longer study using the TIMP in infants with SMA-I would demonstrate the degree of change over time and indicate the ability of this test measure to capture small degrees of change. It would also permit stratification of subjects considered for an intervention trial based on initial motor abilities. In addition, this test could be used clinically to monitor the population of infants with SMA-I. Development of a revised version of the TIMP would require the establishment of both reliability and validity before adoption as an outcome measure for a clinical trial in SMA-I.
The TIMP is a motor scale that can be administered safely to infants with SMA-I who are weak. Reliability of test administration has been demonstrated in this study. Because some of the test items are not performed easily in this population, a further goal of ongoing research is validation of an SMA-specific assessment tool.
The authors appreciate the support of the subjects and their parents for this study. Participation by the following 6 clinical evaluators in the intrarater reliability portion of this study was invaluable and each is recognized here: Ann E. Fritch, Cincinnati, OH; Jean L. Stout, St. Paul, MN; Janine L. Wood, Salt Lake City, UT; Melanie P. Sherbino, Toronto, Canada; Catherine A. Seiner, St. Louis, MO; and Terry A. Shubert, Rochester, MN.
1. Emery AE. Population frequencies of inherited neuromuscular diseases—a world survey. Neuromuscul Disord
2. Lefebvre S, Bürglen L, Reboullet S, et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell
3. Wirth B, Brichta L, Schrank B, et al. Mildly affected patients with spinal muscular atrophy are partially protected by an increased SMN2 copy number. Hum Genet
4. Swoboda KJ, Prior TW, Scott CB, et al. Natural history of denervation in SMA: relation to age, SMN2 copy number, and function. Ann Neurol
5. Sumner CJ. Therapeutics development for spinal muscular atrophy. NeuroRx
6. Russman BS, Iannaccone ST, Samaha FJ. A phase 1 trial of riluzole in spinal muscular atrophy. Arch Neurol
7. Wang HY, Yang YH, Jong YJ. Evaluation of muscle strength in patients with spinal muscular atrophy. Kaohsiung J Med Sci
8. Merlini L, Mazzone ES, Solari A, et al. Reliability
of hand-held dynamometry in spinal muscular atrophy. Muscle Nerve
9. Piper MC, Darrah J. Motor Assessment of the Developing Infant
. Philadelphia, PA: WB Saunders; 1994.
10. Bayley N. Manual for the Bayley Scales of Infant Development
. 3rd ed. Australia: Psychological Corp; 1993.
11. Folio MR, Fewell RR. Peabody Developmental Motor Scales
. 2nd ed. Texas: Pro-ed; 2000.
12. Campbell SK. The Test of Infant Motor Performance
: test User's Manual, version 2.0: A Training Manual. Chicago, IL; 2005.
13. Campbell SK, Hedeker D. Validity of the Test of Infant Motor Performance
for discriminating among infants with varying risk for poor motor outcome. J Pediatr
14. Campbell SK, Levy P, Zawacki L, et al. Population-based age standards for interpreting results on the Test of Infant Motor Performance
. Pediatr Phys Ther
15. Campbell SK. Test-retest reliability
of the Test of Infant Motor Performance
. Pediatr Phys Ther
16. Flegel J, Kolobe THA. Predictive validity of the test of infant motor performance
as measured by the Bruininks-Oseretsky test of motor proficiency at school age. Phys Ther
17. Iannaccone S, Hynan L, AmSMART Group. Challenges of enrollment for SMA type I clinical trials [abstract]. Neuromuscul Disord
18. Liao PM CS, Girolami GL, et al. The Test of Infant Motor Performance: A Self-Study CD-ROM Program V3.0©2001
. Available at: http://www.thetimp.com
. Accessed August 20, 2007.
19. Oskoui M, Levy G, Garland CJ, et al. The changing natural history of spinal muscular atrophy Type 1. Neurology
20. Bartko J. Measures of agreement: a single procedure. Stat Med
21. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics
22. Hartman DP. Considerations in the choice in interobserver reliability
estimates. J Appl Behav Anal
23. Klinge L, Straub V, Neudorf U, et al. Enzyme replacement therapy in classical infantile Pompe disease: results of a ten-month follow-up study. Neuropediatrics