INTRODUCTION
Parkinson’s disease (PD) characteristically produces a variety of motor deficits (eg, bradykinesia, resting tremor, and rigidity) including deterioration in balance and postural control. These impairments often result in considerable functional limitation and disability, thereby predisposing the individual to falling. Falling is not only common in PD, as suggested by a three-month fall rate of 46%,1 it can also potentially lead to serious injury. Gray and Hildebrand2 found that 40% of falls in a cohort of 118 participants with PD resulted in injury. When propulsion was associated with the fall, 56% resulted in injury. Additionally, ten years after diagnosis, those with PD are 20 times more likely to experience a hip fracture compared to age- and sex-matched control groups.3 Injuries from falling may ultimately lead to hospitalization, wheelchair confinement and/or an incapacitating fear of renewed falling,4 all of which may initiate a decrease in the quality of life in patients with PD and, subsequently, an increase in associated costs to society.
Fall prevention is an important aspect of healthcare for those with PD. Early identification of those at risk of falls affords the individual with PD the opportunity to participate in a fall-prevention program or specialized rehabilitation treatment such as gait and step perturbation training, which has been demonstrated to be effective in reducing falls in PD.5 It should be noted that not all those with PD are at risk of falling.6 Some may have had little functional decline or relatively little involvement of gait and balance systems. Therefore, the individualized nature of the disease process underscores the importance of accurately discriminating those who are likely to fall.
At present, it is unclear whether commonly used clinical assessment tools are able to accurately identify persons who are at risk of falling in PD. In a study of 59 participants with PD, Bloem et al7 found disease severity and history of previous falls to be the best predictors of falls compared to other commonly used clinical tests including: Unified Parkinson’s Disease Rating Scale (UPDRS), retropulsion test, Tinetti mobility index, and both the Romberg and sharpened Romberg tests. Furthermore, Bloem et al7 found that none of the above-mentioned tests were adequate fall predictors. Conversely, Dibble and Lange8 found that many current clinical tests are able to identify PD fallers ; however, the sensitivity levels of the clinical tests were low. Jacobs et al9 found that a few individual items of the UPDRS are capable of relating to fall history. However, they only assessed a few items of the UPDRS inventory and did not compare it to other standardized clinical assessment tools.
The intent of the present study was to analyze a wider variety of commonly used clinical assessment tools than previously reported to discriminate fallers from nonfallers in a sample of individuals with PD. Therefore, the primary purpose of this study was to identify which commonly used clinical tool was the best discriminator of falls in individuals with PD. In order to aid clinical decision making, the following aspects of these clinical tools were explored: cutoff scores, sensitivity and specificity, odds ratios, likelihood ratios, change in probability, and areas under the receiver operating characteristic (ROC) curve.
METHODS
Study Participants
With institutional review board approval, 49 participants (mean age, 70.9 years; SD = 8.9; 20 females, 29 males) diagnosed by a neurologist with idiopathic PD were included in the study. Participants were excluded if they had a history of dementia as reported by the family, comorbidities that would pose a fall risk (eg, cerebral vascular accident, orthopedic injuries), or were unable to walk unassisted for 10 minutes, either with manual contact or with an assistive device. Twenty-five of the participants were classified as fallers (mean age, 71.8 years; SD = 7.4; 11 females, 14 males; mean duration since diagnosis, 75.6 months; SD = 66.9), operationally defined as those who had had at least one fall in the past year. A fall was operationally defined as unexpectedly ending up on the ground or floor during a routine daily task.10 The remaining 24 participants were classified as nonfallers (mean age, 70.1 years; SD = 6.9; nine females, 15 males; mean duration since diagnosis, 45.4 months; SD = 36.9).
Procedure
Each participant was assessed during their medication on-time using three different categories of testing tools: PD-specific scales, balance -specific scales, and functional gait scales. All participants scored 26 or higher on the Mini-Mental State Examination (MMSE), suggesting no mental impairment.11
PD-Specific Scales
PD-specific scales such as the Hoehn and Yahr (HY) and UPDRS are useful in determining the widespread effects of PD. The UPDRS is used to evaluate many different aspects including, but not limited to, functional status, cognitive impairments, ambulation, tremor, movement initiation, and pain levels.12 The HY is a widely used reliable and valid measurement tool that evaluates the stage of PD.13,14 The modified HY has been shown to correlate with motor function in people with PD,14 while the UPDRS has shown “both scientific and clinical credibility” when determining the effects of PD.15
Balance -Specific Scales
Balance was evaluated using three nondisease-specific balance scales: the Berg Balance Scale (BBS), the Sensory Organization Test (SOT), and the Activities-Specific Balance Confidence Scale (ABC). The BBS is a widely used performance-based assessment tool for static and dynamic balance 16 and has high levels of interrater and intrarater reliability.16 The BBS consists of 14 tasks used to determine the participant’s ability to balance in different functional situations. The BBS is not specific to PD. However, Qutubuddin and colleagues17 suggest that the BBS is a valid tool for screening and assessing patients with PD. The BBS is scored on a scale of zero to 56, with higher scores correlating with increased levels of balance performance.
The SOT was completed using dynamic posturography (NeuroCom Smart Balance Master® system, Clackamas, OR). The SOT score represents the participant’s ability to perform six different static balance conditions on a force plate by selectively taxing the three main components of the sensory balance system (vision, vestibular information, and proprioception). The SOT score represents a composite of the six balance conditions and is expressed in terms of percentage ranging from zero (abnormally large sway) to 100 (small sway).10 The SOT protocol has been shown to have value in differentiating elderly fallers from nonfallers10 and has been found to be a reliable tool in studies using healthy populations.18
The ABC is a self-reported measure of confidence with various daily functional tasks that require balance .19 Powell and Myers19 found the ABC to be internally consistent with good test-retest reliability and validity.
Functional Gait
Various dynamic gait tasks were assessed using three scales, which are not specific to PD: Self-Selected Gait Velocity (SSGV), Dynamic Gait Index (DGI), and a standardized obstacle course. The SSGV requires the participant to walk 10 meters at a comfortable speed.20 Participants are timed during this process, and their speed is calculated. In patients with stroke, SSGV has been found to be able to differentiate household and community ambulators.21
The DGI is a performance-based assessment of eight dynamic balance tasks. The participant is graded on a scale of zero to 24, with higher scores indicating higher function.22 The DGI measures the participant’s ability to modify his or her gait in response to changing demands that progressively challenge the participant. The following interrater and intrarater reliability scores were found in a reliability study using patients with multiple sclerosis: interrater reliability ranged from 0.910 to 0.976 and intrarater reliability ranged from 0.760 to 0.986.23
Last, the obstacle course was designed by the research team to provide a functional, performance-based task that would objectively challenge balance . The obstacle course was divided into three 10-ft intervals each with a different task: (1) step over three evenly spaced six-inch high obstacles, (2) ambulate over a 72 × 6.25-in balance beam, and (3) weave in and out of five cones placed 18 in apart. Each participant was timed on a round trip excursion through the obstacle course while in an overground harness system. This overground harness system extended the length of the obstacle course, but did not provide any deweighting. It was used as a safety device in the event of a fall. Participants were penalized five seconds for each minor fall, defined as a minor loss of balance (eg, knocking over a cone, stepping off the balance beam) that the participant was able to recover without using the tethering support of the overground harness system. Participants were penalized 10 seconds for each major fall, defined as a loss of balance that required the tethering support of the overground harness system to prevent a fall to the ground.
Statistical Analyses
To compare the fallers and the nonfallers, t tests were used. A χ2 analysis was used to compare the number of participants based on HY staging. Discriminant function analysis (DFA) was used to examine the relative strength of each variable in discriminating faller and nonfaller group membership. DFA unique effects reflect the relative strength of association between each of the variables over and above that of the other variables on the discriminant function. The DFA combined unique and shared effects reflect the relative strength of each of the variables independently of the other variables on the discriminant function.
In order to determine the overall predictive value of these variables to falling, the data were analyzed using sensitivity and specificity values for each level of each variable. The overall predictive accuracy of these measures was assessed using the area under the ROC curve. These curves were calculated by using the method of Dorfman and Alf,24 in which each curve is plotted on a graph with the true-positive rate (sensitivity) against the false-positive rate (1 – specificity) for each cutoff score. The area under the ROC curve, which can range from 0.50, having no prognostic ability, to 1.00, having perfect prognostic ability, was computed for the dichotomous status (faller or nonfaller) for each of the variables. The area under the ROC curve represents the percentage that one would correctly classify a participant as a faller given a randomly chosen pair of participants. Thus, an area under the ROC of 0.800 would mean that one would correctly classify 80% of the time. ROC curves were also used to calculate cutoff scores. Odds ratios and likelihood ratios and change in probability were calculated for each possible cutoff score. The proportions of fallers vs nonfallers above and below the cutoff scores were analyzed using χ2 analyses.
RESULTS
The t tests and χ2 analyses revealed for the most part a significant difference between the fallers and the nonfallers (Table 1 ). Using DFA, a statistically significant group difference (faller vs nonfaller) was found in the combined discriminators, λ = 0.380, χ2 8 = 38.677, P < 0.0005, with a moderately strong association between groups and function (squared canonical correlation = 0.619). SOT and SSGV were not included in the final function as they did not contribute significantly to the final model. Based on an analysis of the unique effects, the UPDRS activities of daily living subscale (UPDRS-ADL), BBS, the UPDRS motor subscale, and DGI (in descending order of contribution) were the best discriminators of group membership. While the HY and ABC appear to be as good as the other variables based on t tests and χ2 analysis, they were not found to be as strong as the UPDRS-ADL, BBS, UPDRS motor subscale, and DGI when analyzed together on DFA. Based on an analysis of the unique effects combined with the shared effects, the UPDRS-ADL, the HY, the BBS, the ABC, and the DGI (in descending order of contribution) were the best discriminators of group membership. Using the discriminant functions, participants were correctly classified as a faller or a nonfaller 91.3% (80.4% cross-validated).
TABLE 1: Mean and SD of Fallers and Nonfallers for All Clinical Measures
The best cutoff scores, area under the ROC curves, sensitivity, specificity, likelihood ratios, posttest probability, and odds ratios were calculated for each clinical tool using 2 × 2 contingency tables (Table 2 ). There was a statistically significant difference in the proportion of fallers vs nonfallers for those above and below the cutoff scores on most of the clinical measures (Table 3 ).
TABLE 2: Cutoff Scores, Area under the curve (AUC), sensitivity, specificity, positive and negative likelihood ratios (+LR and –LR), post-test probability, and odds ratio for each clinical measure examined in this study
TABLE 3: Statistically Significant Differences in the Proportions of Fallers Who Are Above and Below the Cutoff Scores
DISCUSSION
Results from this study suggest that the BBS, UPDRS-Overall, and UPDRS-ADL were the top three clinical measures for fall discrimination. The following clinical tests had moderate discriminative values: DGI, obstacle course, and ABC. Because the administration of the entire battery of these measurement tools would be impractical and time-consuming from a clinical perspective, it seems logical that identification of one or a small subgroup of these tests would be the most effective for clinical decision making and would be a more time-efficient strategy. Therefore, based on our data, the aforementioned clinical tools may all be useful clinically in discriminating fallers from nonfallers.
A few clinical measures examined in this study (ie, SOT, SSGV, UPDRS other subscales) did not have good overall discriminative value and are subsequently not recommended as good discriminators of fallers . While the SOT and SSGV are valued clinically, our data suggest that their value in discriminating fallers in our study was fair at best.
Dibble and Lange8 found similar results, with the BBS being the best overall predictor of falls in patients with PD compared to the Functional Reach Test (FRT), and the Timed Up and Go test (TUG). In order to maximize the opportunity to treat fallers while avoiding the treatment of nonfallers, Dibble and Lange suggest that it is necessary to evaluate the sensitivity and specificity of cutoff points for these scales. Furthermore, they suggest that current cutoff scores have high specificity and low sensitivity. Clinically, this may lead to false negatives. Thus, those who are at risk of falling may be incorrectly labeled as nonfallers due to low levels of sensitivity in the currently used cutoff scores. False negatives may result in the occurrence of a preventable fall, whereas false positives result in unnecessary treatment. Dibble and Lange suggest that there are fewer adverse effects of treating someone who may not be at risk of falling vs missing the opportunity to treat someone who is at risk of falling.
Our results suggested that a cutoff score of 44 on the BBS should be used when evaluating a patient with PD. This number differs from the suggested cutoff score of Dibble and Lange8 of 54 (out of a possible 56), which emphasized sensitivity. Using this cutoff score of 54 to analyze the data from this study, we found a sensitivity of 1.00 and a specificity of 0.208. Using the proposed cutoff score of 54 in our data, the difference in probability from 0.51 (pretest) to 0.568 (posttest) would represent only a modest improvement for clinical decision making. While a cutoff score that is highly sensitive decreases the chance of incorrectly classifying a faller as a nonfaller, a cutoff score that is only two points less than a perfect score may place an excessive number of nonfallers into a faller category. This could result in a situation in which some patients may be referred for unnecessary treatment. From a cost-containment perspective, this could logically lead to increased healthcare costs. In contrast, our decision to choose a cutoff score that resulted in the greatest shift in probability, from pre to post, was done to maximize clinical decision making. Before performing any clinical tests, there was a 51% chance (pretest probability) of accurate classification. However, using the proposed BBS cutoff score of 44 from this study resulted in accurate classification (posttest probability) 94.4% of the time. Also, an odds ratio of 48.9 would suggest that a person with a score below the cutoff value is 48.9 times more likely to be correctly classified as a faller than someone scoring above 44. Bogle Thorbahn and Newton25 suggested a cutoff score of 45 in elderly patients. When using their proposed cutoff score in our data, we found a sensitivity of 0.680, a specificity of 0.875, and a posttest probability of 80.2%. While their proposed cutoff score differs from our score by only one point, our suggested cutoff score had a higher posttest probability. Decreasing the cutoff score by only one point increased the posttest probability by approximately 14%.
Because the UPDRS-Overall and the UPDRS-ADL also offer substantial shifts in probability, 89.9% and 85.9%, respectively, these tools may also prove useful in clinical decision making. These results contrast with work by Bloem et al,7 who did not find any diagnostic utility for the UPDRS-Overall score. However, Bloem et al did not investigate the diagnostic utility of the subscales in their logistic regression analysis.
The UPDRS-Overall and, hence, the UPDRS-ADL are easy to administer and require no specialized equipment. However, it is not entirely surprising that the ADL subscale is a good discriminator of fallers as one of the questions in this subscale specifically ranks the frequency of falls. The ADL subscale also has questions about the frequency of freezing when walking and the amount of assistance required when walking. From a construct validity perspective, these seem to logically relate to balance . However, from a clinical perspective, the other questions in the domain seem quite tangential to falling (eg, speech, salivation, swallowing, handwriting, cutting food, hygiene) and could, therefore, result in misleading inferences about the nature of a patient’s fall status.
Another important clinical consideration when choosing a scale is the difference between a performance-based test, such as the BBS, and a combination of both performance-based and subjective self-reported tests such as the UPDRS-Overall or the UPDRS-ADL. In addition to minimizing responder bias and response shift, an objective testing tool minimizes the error associated with a subjective, self-reported measure. In support of this assertion, Shulman et al26 found individuals with PD tend to overestimate their function in self-reported ratings. Additionally, Kamata et al27 reported that persons with PD overestimated their own stability limits. Taken together, these studies suggest that use of self-reported tests may be problematic in that subjects with PD may have a tendency toward overestimation of their own ability. In another study of older adults, Kempen et al28 found only a moderate correlation between performance-based and self-reported tests. In the domain of motor function, they found that discrepancies in the two scales were attributed to personality and affective functioning in the self-reported scales. Therefore, performance-based tests, such as the BBS, which theoretically eliminate responder bias, may provide a more accurate assessment of a patient with PD. However, the BBS is not without its own limitations. While the BBS is considered an objective measurement tool, most of the items on the BBS require some subjectivity in the assignment of rank. Additionally, the BBS does not assess other elements that may likely relate to fall risk in PD (eg, cognitive impairment and elements of gait [freezing, attentional distractions, directional and speed changes]).
One limitation of this study was that the faller and nonfaller classification of each participant was based on the participant’s report of falling in the past year. Despite no mental impairment, as measured by the MMSE, we have no way of knowing whether the participants had memory impairment. Second, from a design perspective, this study may have been strengthened by using a prospective design in which clinical measures of balance are conducted and the participants record their falls (eg, fall diary) over the next year. Another limitation that may pose a threat to internal validity is observer and subject bias. The participants, knowing that they were being tested, and the observer, knowing the participants’ fall status, may have altered their performance or expectancy during the testing based on these potential biases.
CONCLUSIONS
Results from this study suggest that commonly used standardized clinical measures, such as the UPDRS-Overall and the motor and ADL subscales of the UPDRS, BBS, HY, and DGI are most effective at discriminating PD fallers from nonfallers. Moreover, it appears that the three best scales are the BBS, UPDRS-Overall, and UPDRS-ADL. The UPDRS cognition subscale, the ABC, and obstacle course negotiation offered a moderate contribution to the discrimination of PD fallers from nonfallers. SSGV and computerized dynamic posturography using SOT did not have much value in discriminating PD fallers from nonfallers in this study.
ACKNOWLEDGMENTS
All funding for this study was provided by a grant from the American Parkinson’s Disease Association. We also acknowledge the assistance of the following research assistants: Tarah Badger, MSPT, Scott Snow, MSPT, Michael Russell, DPT, and, Dustin Miller, MSPT.
REFERENCES
1.Pickering RM, Grimbergen YA, Rigney U, et al. A meta-analysis of six prospective studies of falling in
Parkinson’s disease .
Mov Disord. 2007;22:1892–1900.
2.Gray P, Hildebrand K.
Fall risk factors in
Parkinson’s disease .
J Neurosci Nurs. 2000;32:222–228.
3.Johnell O, Melton LJ 3rd, Atkinson EJ, O’Fallon WM, Kurland LT. Fracture risk in patients with parkinsonism: a population-based study in Olmsted County, Minnesota.
Age Ageing. 1992;21:32–38.
4.Grimbergen YA, Munneke M, Bloem BR. Falls in
Parkinson’s disease .
Curr Opin Neurol. 2004;17:405–415.
5.Protas EJ, Mitchell K, Williams A, Qureshy H, Caroline K, Lai EC. Gait and step training to reduce falls in
Parkinson’s disease .
NeuroRehabilitation. 2005;20:183–190.
6.Jankovic J, Kapadia AS. Functional decline in Parkinson disease.
Arch Neurol. 2001;58:1611–1615.
7.Bloem BR, Grimbergen YA, Cramer M, Willemsen M, Zwinderman AH. Prospective assessment of falls in
Parkinson’s disease .
J Neurol. 2001;248:950–958.
8.Dibble LE, Lange M. Predicting falls in individuals with Parkinson disease: a reconsideration of clinical
balance measures.
J Neurol Phys Ther. 2006;30:60–67.
9.Jacobs JV, Horak FB, Tran VK, Nutt JG. Multiple
balance tests improve the assessment of postural stability in subjects with
Parkinson’s disease .
J Neurol Neurosurg Psychiatry. 2006;77:322–326.
10.Wallmann HW. Comparison of elderly nonfallers and
fallers on performance measures of functional reach, sensory organization, and limits of stability.
J Gerontol A Biol Sci Med Sci. 2001;56:M580–M583.
11.Folstein MF, Folstein SE, McHugh PR. “Mini-mental state.” A practical method for grading the cognitive state of patients for the clinician.
J Psychiatr Res. 1975;12:189–198.
12.Fahn S, Elton RL, eds.
Unified Parkinson’s Disease Rating Scale . Florham Park, NJ: Macmillan Healthcare Information; 1987.
13.Jankovic J, McDermott M, Carter J, et al. Variable expression of
Parkinson’s disease : a base-line analysis of the DATATOP cohort. The Parkinson Study Group.
Neurology. 1990;40:1529–1534.
14.Goetz CG, Poewe W, Rascol O, et al. Movement Disorder Society Task Force report on the Hoehn and Yahr staging scale: status and recommendations.
Mov Disord. 2004;19:1020–1028.
15.Movement Disorder Society Task Force on Rating Scales for
Parkinson’s Disease . The Unified
Parkinson’s Disease Rating Scale (UPDRS): status and recommendations.
Mov Disord. 2003;18:738–750.
16.Berg K, Wood-Dauphinee S, Williams JI, Gayton D. Measuring
balance in the elderly: preliminary development of an instrument.
Physiother Can. 1989;41:304–311.
17.Qutubuddin AA, Pegg PO, Cifu DX, Brown R, McNamee S, Carne W. Validating the Berg
Balance Scale for patients with
Parkinson’s disease : a key to rehabilitation evaluation.
Arch Phys Med Rehabil. 2005;86:789–792.
18.Clark S, Rose DJ, Fujimoto K. Generalizability of the limits of stability test in the evaluation of dynamic
balance among older adults.
Arch Phys Med Rehabil. 1997;78:1078–1084.
19.Powell LE, Myers AM. The Activities-Specific
Balance Confidence (ABC) Scale.
J Gerontol A Biol Sci Med Sci. 1995;50:M28–M34.
20.Sullivan KJ, Knowlton BJ, Dobkin BH. Step training with body weight support: effect of treadmill speed and practice paradigms on poststroke locomotor recovery.
Arch Phys Med Rehabil. 2002;83:683–691.
21.Perry J, Garrett M, Gronley JK, Mulroy SJ. Classification of walking handicap in the stroke population.
Stroke. 1995;26:982–989.
22.Shumway-Cook A, Woollacott M.
Motor Control: Translating Research into Clinical Practice . Philadelphia: Lippincott Williams & Wilkins; 2007.
23.McConvey J, Bennett SE. Reliability of the Dynamic Gait Index in individuals with multiple sclerosis.
Arch Phys Med Rehabil. 2005;86:130–133.
24.Dorfman DD, Alf E. Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals-rating-method data.
J Math Psychol. 1969;6:487–496.
25.Bogle Thorbahn LD, Newton RA. Use of the Berg
Balance Test to predict falls in elderly persons.
Phys Ther. 1996;76:576–583.
26.Shulman LM, Pretzer-Aboff I, Anderson KE, et al. Subjective report versus objective measurement of activities of daily living in
Parkinson’s disease .
Mov Disord. 2006;21:794–799.
27.Kamata N, Matsuo Y, Yoneda T, Shinohara H, Inoue S, Abe K. Overestimation of stability limits leads to a high frequency of falls in patients with
Parkinson’s disease .
Clin Rehabil. 2007;21:357–361.
28.Kempen GI, van Heuvelen MJ, van den Brink RH, et al. Factors affecting contrasting results between self-reported and performance-based levels of physical limitation.
Age Ageing. 1996;25:458–464.