INTRODUCTION AND PURPOSE
“Function” refers to a person's ability to perform basic daily living skills such as eating, toileting, grooming, walking, and interacting with others. The measurement of function has application in clinical and educational settings1 and has been well established in the adult population2,3; however, much less attention has been devoted to pediatric populations, particularly among very young children (younger than 3 years).4 There is a need for quick, easily administered, valid, and reliable functional assessments, comprising minimal data sets, which can be used for measuring the effects of interventions across the spectrum of pediatric services.5
After a very thorough review of the literature, it was found that the majority of instruments available and in use for monitoring pediatric outcomes of rehabilitation were instruments that had been created to measure and monitor progress in growth and development in infants and young children. These instruments include the Battelle Developmental Inventory6; the Carolina Curriculum for Infants and Toddlers with Special Needs7; the Assessment, Evaluation, and Programming System for Infants and Toddlers8; the Test of Infant Motor Performance9; and the Ages and Stages Questionnaire: A Parent Completed, Child Monitoring System.10 These instruments are used to assess developmental skills, and the information elicited is used for diagnostic purposes, treatment planning, and to aid in determination of eligibility for early intervention and preschool services.
Differences Between Development and Functional Assessment
Although developmental status is the most often measured outcome in young children, it may not be appropriate for use in pediatric rehabilitation11 because many of the instruments were designed to identify the impairments in development that may affect function but not specifically to measure and monitor function directly or the outcome of interventions designed to improve functional deficits. Functional assessment typically measures what children do routinely and consistently, as opposed to the child's capabilities (behaviors that the child may be able to perform under optimal or specific conditions). When evaluating growth and development, the child is asked to perform new tasks representative of skills acquired by typically developing children at various ages. The timing and pattern of acquisition of these new skills are interpreted as an indication of the child's rate of maturation. However, change over time is difficult to interpret when the individual tasks do not remain the same, and the standards for comparison for growth and development assessments are derived from populations of typically developing children of the same age; these instruments may not be appropriate for comparing and tracking children with physical, cognitive, or developmental impairments over time.5
There are few instruments available to measure a child's functioning in terms of basic daily living skills, and some of the functional assessments that have been used in pediatric populations are less applicable to very young children.5 Some instruments commonly used to measure skill acquisition include the Peabody Developmental Motor Scales,12 the Ounce Scale,13 and the Pediatric Evaluation of Disability Inventory (PEDI).14 These instruments measure the acquisition of skills, such as feeding, dressing, and grooming, but are very lengthy (PEDI includes 271 items and Peabody includes 249 items) and may require longitudinal assessment with different questions asked at different points in time (PEDI), which can make it difficult to consistently measure change and outcomes of intervention in a standardized way.
The WeeFIM instrument (WeeFIM), adapted from the FIM instrument (FIM) in 19872,3 is used to assess function in children1 and has been shown to be a reliable and valid measure.15–17 The WeeFIM instrument was designed to measure degrees to which a young child achieves independence in performing basic daily living tasks. The WeeFIM instrument was patterned after the adult FIM instrument to provide continuity of measurement of functional health status from childhood into adulthood. It was found that the WeeFIM instrument has a measurement gap between birth and 2 years of age; children younger than 2 years tend to elicit a floor effect, whereby the items may be too difficult for their developmental age.17 Therefore, the WeeFIM 0-3 instrument has been developed to measure the attributes of very young children, in other words, the precursors to performing basic daily living tasks. For instance, a 4-year-old child with or without functional impairments may be completely independent in feeding, requiring no assistance from a caregiver; however, it is unlikely that a 1- or 2-year-old child, even without any functional impairments, is completely independent in feeding (may require food to be cut, etc). There are precursors to mastering the functional skill of feeding, such as grasping an object, sipping from a cup, and even the most basic, swallowing food or drink (as opposed to intravenous or gastrointestinal tube feeding). The WeeFIM 0-3 instrument is intended to measure some of the precursors to basic daily living function. It is not intended to encompass full completion of a functional task but only to sample aspects of functional skills, such as feeding, locomotion, and cognition. To our knowledge, an instrument that measures early function in the young pediatric population is not available.
OBJECTIVES AND SPECIFIC AIMS
This study describes the psychometric properties of a new functional assessment tool, the WeeFIM 0-3 instrument, which is intended to measure early function in young children with physical, cognitive, or developmental impairments from birth to 3 years of age. The tool was designed as a companion to an existing pediatric functional assessment tool, the WeeFIM instrument.18
The specific aims of the study were as follows: (1) to determine whether significant differences exist in WeeFIM 0-3 ratings (motor, cognitive, and behavioral domains) in children with impairments when compared with those without impairments, controlling for age and gender; (2) to determine the internal consistency and interitem correlations, validity (concurrent, predictive, and construct), and the hierarchical properties of the WeeFIM 0-3 instrument and the domains (motor, cognitive, and behavioral) within the instrument.
A multidisciplinary workgroup of researchers, occupational therapists, physical therapists, speech/language pathologists, and a physiatrist worked to develop a new module tailored to young children with impairments that could augment the WeeFIM instrument. All of the members of the workgroup have extensive experience in the provision of pediatric habilitative and rehabilitative services. The workgroup began by reviewing items in existing growth and development instruments, such as the Ages and Stages, the Battelle Developmental Inventory, and the PEDI. This was performed as part of a needs assessment; to identify what gaps existed in the most often used assessment instruments. New items were created to address the gaps identified; the workgroup initially created a checklist of 90 function-oriented tasks. The instrument was pilot tested using 2 pediatric rehabilitation sites located in the United States, which administered the tool to 50 children (approximately 25 per site) aged 6 to 36 months.
Data were analyzed using Rasch analysis (description of Rasch analyses provided in more detail later), and an item hierarchy (order of difficulty) was established, and redundant items in terms of task difficulty were identified. As a result of the analyses, adjustments to the tool were made, which included the removal of redundant items (as indicated by the Wright person-item map), resulting in a final version of 36 items. Following these modifications, these data satisfactorily met the Rasch model measurement requirements, and the instrument in its revised form was used in this study.
The 36 items (listed in Table 1) that comprise the WeeFIM 0-3 instrument measure early function in 3 domains: motor (16 items), cognitive (13 items), and behavioral perceptions (7 items). The motor and cognitive domains measure physical and cognitive functioning. The behavior domain is intended to measure difficulties or tensions between the caregiver and child rather than specific behaviors of the child; for instance, 1 question asks about the degree of difficulty in bathing the child, thus it could serve as an “early warning” that parental/caregiver tolerance is being stressed.
Design and Methodology
A cross-sectional study design was used. The study population was defined as children aged newborn to 36 months, with and without impairments. Children with impairments were recruited through a Listserv communication sent to inpatient and outpatient facilities that subscribe to a Data System for Medical Rehabilitation (Uniform Data System for Medical Rehabilitation, UDSMR). Sixteen facilities from the United States, 1 in Canada, and 1 in Chile volunteered to participate in administering the instrument to willing patients. The facilities were instructed to collect data from children younger than 36 months, who were receiving services from their programs. Children were chosen randomly by a facility staff representative and inclusion was based on child's age and parent(s)/caregiver(s) willingness to participate.
On examining the psychometric properties of the instrument, it was imperative to include children without impairments to determine the extent to which the instrument adequately discriminated between populations. Childcare centers located in the northeastern United States were recruited at random by mailed invitation to collect data from children without impairments. Sixteen childcare centers were recruited and participated. The center administrator selected all children younger than 3 years who were in attendance on the day that UDSMR staff delivered the WeeFIM 0-3 forms. The administrator placed the form into each eligible child's backpack/diaper bag. If a center had a low response rate (<10% return), the distribution using the same protocol was attempted at another time to increase the rate of return. Completed forms were returned to UDSMR by fax or postal mail.
The instrument is intended to be administered by proxy report (parent/caregiver or healthcare professional). Rater type (mother, father, caregiver/other, healthcare provider, or combination) is collected as part of the questionnaire. In the healthcare facility, the clinician would provide the child's caregiver with the questionnaire and may assist the caregiver in completing the form if needed. This method of assessment is gaining more recognition as a valid and reliable method to measure functional status and quality of life.4 There is no training required to administer the WeeFIM 0-3 instrument; it is easy to administer and takes approximately 10 minutes to complete.
Each item is rated on a 3-level ordinal scale (1, rarely/never; 2, sometimes; or 3, usually) to indicate the frequency of occurrence. There are short but detailed instructions at the beginning of the questionnaire describing the rating levels and instructing that if there is difficulty deciding between 2 levels, to choose the lower rating; for example, if deciding to select “sometimes” or “usually,” the rater is instructed to select “sometimes.”
The study was approved by the Institutional Review Board at the University at Buffalo, State University of New York. Consent was obtained, participation was voluntary, and no incentives were provided.
Statistical analyses were computed using SPSS version 14.0 and Winsteps version 3.61.0. Winsteps was used for Rasch analysis only. Descriptive and multivariate statistics were performed on the data to determine whether significant differences exist in the WeeFIM 0-3 ratings because of impairment, gender, or age. A 1-way, combined-factors analysis of variance (ANOVA) was used, age was controlled, and mean differences between children with and without impairments were tested. Internal consistency of the WeeFIM 0-3 instrument and domains were assessed using Cronbach's alpha.
Concurrent validity was assessed using the classification table output from binary logistic regression to investigate whether the WeeFIM 0-3 instrument could correctly identify the presence of impairment among children, namely if the child was from the control/normative group (from now onward referred as “impairment status”). Predictive validity was assessed using a logistic regression model, to determine whether the WeeFIM 0-3 instrument could predict impairment status and the proportion of variance accounted for by the instrument. Construct validity was assessed by performing a confirmatory factor analyses with Varimax rotation and Kaiser normalization.
Rasch analysis was used to establish the construct validity and hierarchical properties of the 3 domains within the WeeFIM 0-3 instrument. The Wright item-person map, useful for determining whether the items are appropriate for the respondents, was visually inspected and the item hierarchy was established. Rasch analyses output reports 2 fit statistics as chi square ratios, infit and outfit. These statistics determine how well the dataset meets requirements of the Rasch model.19 Item infit or outfit mean square (MNSQ) values of approximately 1 are ideal by Rasch model specifications and indicate local independence.20 Items with fit statistics substantially greater than 1 may belong to a different underlying construct and may indicate that they should be omitted from the instrument. Some researchers use 1.5 or even 2.0 as the threshold for item misfit.21 In this study, a more conservative approach was used and 1.3 was the threshold maintained in the analysis, which has been used in other studies.19,22
Raw data (not Rasch transformed) were used for all analyses. Descriptive and multivariate statistics and logistic regressions included all participants, and all other analyses (Cronbach's alpha, confirmatory factor analyses, and Rasch analyses) included only children with impairments; this is because the WeeFIM 0-3 instrument is not intended for use in a general population (such as to screen for impairments).
The rehabilitation facilities obtained data from an average of 10 participants, with a range of 1 to 29. The response rate could not be ascertained because the number of patients treated at the facility was privileged information. Among childcare centers, the average response rate was 40% with a range of 7% to 55%. Overall (children with and without impairments), 65% of respondents were mothers, 11% were fathers, 10% were caregivers/other, 6% were healthcare providers, and 8% were a combination of the above (multiple response variable).
There were 527 children in the study, of which, 173 children had impairments and 354 did not; the distribution of the type of impairment (primary impairment for which the child was being treated) is displayed in Table 2.
Of all children, 51% were boys; there were no significant differences in gender distribution by impairment status. There was a significant difference in racial distribution by impairment. Eighty-seven percentage of the children without impairments were white, whereas only 42% of children with impairments were white, χ2 = 181.7 (df = 6), p < 0.01. Children with impairments were significantly younger than those without impairments, 13 months (10.7 SD) compared with 20 months (10.0 SD), F = 56.3 (df = 1), p < 0.01, η2 = 0.10. Age was categorized for subsequent analyses into 3 groups: 0 to 12 months, 13 to 24 months, and 25 to 36 months. The groupings are broad but were necessary to retain statistical power.
A 1-way ANOVA was computed. There were no significant differences in mean ratings between boys and girls on the WeeFIM 0-3 instrument or the domains within. This lack of gender effect remained even after stratifying the data by impairment and age groups.
Age in months was significantly correlated (p < 0.01) with the total WeeFIM 0-3 rating and all 3 domains for both children with impairments (WeeFIM 0-3 total = 0.70, motor = 0.78, cognitive = 0.72, and behavior = 0.33) and children without (WeeFIM 0-3 total = 0.79, motor = 0.76, cognitive = 0.79, and behavior = 0.16).
Differences in WeeFIM 0-3 ratings of domains among children with and without impairments, controlling for age were tested using a 1-way ANOVA. Children without impairments had significantly higher (p < 0.05) mean motor, cognitive, and behavior ratings than children with impairments among each age strata. The distribution of mean ratings, standard deviations, and significance values are displayed in Table 3.
Cronbach's alpha for the total WeeFIM 0-3 instrument was 0.95. Cronbach's alpha for the motor domain was 0.97, for the cognitive domain it was 0.96 and 0.76 for the behavior domain. WeeFIM 0-3 internal consistency was assessed separately by rater and maintained high internal consistency (Cronbach's alpha was >0.86 for all).
Overall, total items of the WeeFIM 0-3 instrument correctly predicted impairment status of 89.4% of children. Among domains, the motor items correctly identified 87.5% of the children, followed by 85.2% for the cognitive items and 83.8% for the behavior items.
A logistic regression was used to assess the predictive validity of the WeeFIM 0-3 instrument and domains on the full dataset. Four models were analyzed. In the first model, the dependent variable was the impairment status, and the independent variable was the WeeFIM 0-3 total. In the remaining models, the dependent variable was the impairment status, and the independent variables were the total motor score, the total cognitive score, and the total behavior score. The total WeeFIM 0-3 score was significant in predicting impairment status: B = −0.79 (SE = 0.01), Wald = 159.8 (df = 1), p < 0.00, Cox and Snell R 2 = 0.40, and Nagelkerke R 2 = 0.56. The total motor score was significant in predicting impairment status: B = −0.13 (SE = 0.01), Wald = 162.4 (df = 1), p < 0.00, Cox and Snell R 2 = 0.36, and Nagelkerke R 2 = 0.50. The total cognitive score was significant in predicting impairment status: B = −0.20 (SE = 0.02), Wald = 146.5 (df = 1), p < 0.00, Cox and Snell R 2 = 0.37, and Nagelkerke R 2 = 0.52. The total behavior score was significant in predicting impairment status: B = −0.65 (SE = 0.06), Wald = 108.5 (df = 1), p < 0.00, Cox and Snell R 2 = 0.32, and Nagelkerke R 2 = 0.44.
Construct validity was assessed by a confirmatory factor analyses with Varimax rotation and Kaiser normalization on children with impairments only. Findings were significant, indicating 3 components which accounted for 68% of the variance in the model; the eigenvalue of rotated sums of square loadings = 2.6, Kaiser-Meyer-Olkin = 0.95, Bartlett's test of sphericity χ2 = 6567.3 (df = 630), p < 0.01. Two behavior items, bathing and dressing up did not load in the model.
Rasch analyses were used to establish the construct validity and hierarchical properties of the 3 domains within the WeeFIM 0-3 instrument for children with impairments. Cases scoring at the extremes (all low or all high item responses) were removed because Rasch modeling cannot provide direct estimates for extremes; thus, the motor domain analyses included 147 cases, the cognitive domain analyses included 141 cases, and behavior domain analyses included 142 cases. All 3 response categories were well represented by a distinct peak in the probability curve for each of the 3 domains.
The Wright item-person maps were visually inspected. Figure 1 displays the 3 separate maps, 1 for each domain.
Motor items were found to be distributed in a unidimensional way, and all respondents and all items but one (nutritional intake) displayed acceptable fit; nutritional intake was greater than 2 SD below the mean, indicating the item may be too “easy” for the sample of respondents (almost all had endorsed). The majority of the motor items were within 1 SD of the mean with some items 1 or more SD from the mean, which indicates an appropriate hierarchy of difficulty among items. Additionally, there were few item redundancies (eg, locomotion and crawls). Item redundancy may indicate that more than 1 item is measuring the same level of ability or skill. Cognitive items and behavior items were distributed unidimensionally, and all respondents and all items displayed acceptable fit. For the cognitive items, most were within 1 SD above or below the mean, with little item redundancy (only joint attention). For the behavioral items, nearly all were clustered around the mean, with only 1 item, meal time, more than 1 SD above the mean, and 1 item, cuddle, more than 1 SD below the mean and no item redundancy.
Of the 16 motor items, 4 (tool, 2 hands, lift head and nutritional intake) had misfit, with infit or outfit values more than 1.3. For the well-fitting motor items, the infit MNSQ statistics ranged from 0.61 to 1.16 and outfit MNSQ ranged from 0.19 to 1.22 (Table 4).
Of the 13 cognitive items, 3 items (5 words, familiar sounds, and interest) had misfit. For the well-fitting cognitive items, the infit MNSQ statistics ranged from 0.58 to 1.17 and outfit MNSQ ranged from 0.22 to 1.13 (Table 5).
Only 1 of the 7 behavior items, cuddle, had misfit, with an outfit MNSQ of 1.34; the other items infit MNSQ statistics ranged from 0.80 to 1.12 and outfit MNSQ ranged from 0.84 to 1.21 (Table 6).
The WeeFIM 0-3 instrument displayed high internal consistency, construct validity, and discriminatory capabilities. Overall, children with impairments were significantly younger than those without. This could be due to the setting from which children without impairments were recruited; daycares were used to recruit participants, and it is likely that very young children were not yet enrolled in a daycare program. However, when controlling for age, children without impairments scored significantly higher on the WeeFIM 0-3 ratings than children with impairments. Additionally, younger children both with and without impairments scored lower on the WeeFIM 0-3 ratings than older children, which confirms functional items are developmentally driven and do progress with age consistently, although the differences between children with and without impairments remain and the WeeFIM 0-3 instrument can be used to detect the differences. Thus, the functional items (motor and cognitive domains) seem to be appropriate for the identified age cohort and display an appropriate hierarchy in terms of item arrangement and item difficulty in Rasch analyses. The behavior domain showed less variation between age group and between children with and without impairments. The behavior domain is more of a measure of the parent/caregiver's difficulty with the child's routine behaviors and not a measure of the child's skills or abilities. We believe that the behavior domain may be less sensitive for detecting differences between children with impairments and children without and less dependent on child's age. Because the instrument, however, is intended for use in children with impairments (opposed to a population without impairments), we believe that it is a clinically relevant domain, and if difficulties among parent/caregiver are detected, referrals for needed services can be made, and caregivers may develop coping methods to decrease tensions with the child's behaviors; and this change may be reflected and captured overtime.
Some of the items, such as nutritional intake, tool, 2 hands, lift head, bathe and dress, 5 words, familiar sounds, and interest did not display acceptable fit in the Rasch analyses. Misfitting items may indicate issues with the individual items, such as poorly worded items, or differential item functioning, which is when different populations may answer latent trait items differently, or data redundancy. The researchers should examine any misfitting items to determine what issues exist among the items; and if the item is poorly worded, or redundant, it may be determined that the item should be removed from the instrument. However, statistical decisions to remove certain items from an analysis are no substitute for consideration of content coverage of the construct being measured23 or those items that are relevant clinically. Elimination of items based on fit criteria alone without considering content coverage may lead to the failure of the instrument to capture important aspects of the construct, causing construct underrepresentation.24 We believe that the aforementioned misfitting items within the WeeFIM 0-3 tool are due to possible differential item functioning based on type of impairment, such that some of the tasks are easier among children with some impairments relative to others or based on severity of impairment. Additional studies with a larger, more diverse sample are needed so that analysis can stratify by impairment type and possibly severity of impairment before adjusting the instrument to exclude misfitting items.
This study was limited in sample size (especially among children younger than 6 months of age), racial distribution, diversity and severity of impairment types, and lack of sequential measurements. In addition, the study examined data collected at a single point in time; additional research should consider collecting data with multiple assessments per child over time and with different raters so the instrument's test-retest and interrater and intrarater reliability can be ascertained and to determine whether the instrument can assess change in children over time.
This instrument measures performance of the same set of skills at different chronological ages, thus allowing for comparison of the individual child's performance at one age with the same child's performance at a different period in time, and for comparison with different children of the same age. This eliminates the need for external age- adjusted standards for comparison derived from typically developing children of the same age whose level of skill acquisition may be unattainable for the child with significant disability. A focus on self-comparison at the progressive ages encourages clinicians and families to set realistic goals that recognize the child's ability to achieve the greatest level of functional independence possible within existing functional limitations.
The WeeFIM 0-3 instrument is a quick, easily administered, valid measure of early function. The instrument, as opposed to currently existing tools, is designed to merge with a similar tool for older children (WeeFIM instrument) and adults (FIM instrument) for longitudinal tracking.
The instrument displayed high internal consistency, construct, and predictive validity. The WeeFIM 0-3 instrument maintained a hierarchy of item difficulty as indicated using Rasch analysis and can be used to accurately discriminate between children with and without impairments.
The WeeFIM 0-3 instrument is a promising tool to measure and track outcomes of therapeutic interventions. The WeeFIM 0-3 instrument offers a method for therapists to guide decisions to change techniques or to continue a treatment when working with patients. In addition, the WeeFIM 0-3 instrument offers the opportunity for concrete discussions with caregivers concerning the course, need, and progress in treatment. In addition, the WeeFIM 0-3 instrument's characteristics include ease of administration, a limited amount of training required to administer and score, and efficiency of scoring and interpretation of results, all of which are of great importance regarding clinical utility,25 although further studies are needed to determine the instrument sensitivity and ability to measure change over time with a larger sample.
The authors thank the families, childcare centers, and subscribers to the UDSMR for their participation. We extend our gratitude to Deborah Denniger-Bryant and Shirley Carlson, who contributed much to the development of the instrument and performed the initial instrument testing. In addition, they also thank Susan Braun for her assistance with the development of the instrument and the conceptual framework of the study.