Accidental falls are a major public health concern, with a substantial impact on quality of life, health, and healthcare costs. Falls are ranked by the World Health Organization (WHO) as the second leading cause of unintentional injuries and deaths worldwide, after motor vehicle traffic crash. One of the major fall risk factors is age. Bone frailty, chronic, and/or degenerative conditions associated with physical, sensory, and cognitive changes in advancing age increase the risk of falling and being injured and contribute to long-term pain, loss of confidence and independence, and increased mortality.[1,4] Fall consequences in high-income countries account for 1% to 2% of health costs and carry a huge socio-economic burden. Accordingly, prevention programs and effective policies are needed to reduce fall risks. Primary prevention should focus on early fall-risk assessment, when the risk is low, and prevention interventions.
Many international guidelines (e.g., American and British Geriatric Societies or English National Institute for Health and Care Excellence) suggest assessing fall risk through a combination of questionnaires investigating personal history of falls and functional tests assessing gait and balance.[1,6] However, integration of fall screening and prevention for the elderly is rarely done in both primary and secondary care settings, perhaps because perceived as onerous by health professionals. Actually, the administration time reported for the most frequently used functional test is quite long ranging from 20 to 60 minutes. Recent advances in technology can support the identification of more effective, time-efficient, appealing prevention strategies. Automated tools that measure gait and balance through standardized protocols can offer several potential advantages, including relief of health professionals from screening procedures, lower costs, enhanced patient compliance and satisfaction, and better prediction and prevention of falls. However these tools have rarely been tested.
Against this background we decided to test an automated device, answering to a technical question: “how well does an automated system in which a scale to assess risk of falls is integrated differentiate between people at low and high risk of falling?”. This study aimed to evaluate the performance of an automated device, as test index for fall risk assessment in the elderly, measured by device failure and diagnostic test accuracy (DTA). Device safety was also investigated.
2.1 Study design and setting
We performed a prospective study, adopting a futility and diagnostic test accuracy design. The study was approved by the Ethic Committee of the San Raffaele Hospital, Milan (May 11, 2015). All participants gave their informed, written consent prior to participation. The participants were recruited at the IRCCS Orthopedic Institute Galeazzi, Milan, Italy, between November 2015 and December 2016. All procedures were performed in accordance with the Declaration of Helsinki. Study reporting followed the STARD (Standards for Reporting of Diagnostic Accuracy) statement, supplementary material S1, https://links.lww.com/MD/D254. The trial is registered in ClinicalTrials.gov, identifier NCT02655796.
Healthy male and female volunteer participants aged over 65 years were consecutively recruited throughout announcements placed in the community and in the hospital. Inclusion criteria were: ability to walk unassisted and without walking aids, and no severe cognitive impairment (e.g., dementia). Exclusion criteria were medical conditions limiting mobility function (e.g., diabetes, obesity, serious ocular disorders, such as glaucoma), vestibular disorders (e.g., labyrinthitis and proprioceptive disorders such as ataxia) that can compromise subject safety during risk assessment; wearing a pacemaker; history of orthopedic surgery (e.g., knee or hip prosthesis) during the previous 6 months; taking medicines that alter coordination and equilibrium (e.g., anti-epileptics, sedative-hypnotics). Potentially eligible participants were identified at the IRCCS Orthopedic Institute Galeazzi in consultation with orthopedists and other specialists. They were interviewed before enrollment to explore potential reasons for ineligibility and asked to provide informed, written consent. Recruitment continued until the required sample size was reached.
2.3 OAK device
We tested the OAK device (Khymeia, Noventa Padovana, Italy), a new virtual-reality based system that can be used to perform fall risk assessment using any functional scale that investigates risk fall assessment and that has the potential to be automated and implemented on the OAK software platform. Best scales to be integrated mainly act on gait and balance, and are sufficiently simple to be performed by a virtual–reality interface through motion sensors. For these reasons, we implemented the brief balance evaluation systems test (Brief-BESTest). It consists of a subset of 8 items derived from the original BESTest[12,13] and addresses 6 postural constructs: mechanical constraints, stability limits/verticality, anticipatory postural adjustments, postural responses, sensory orientation, and gait stability. Each Brief-BESTest item strongly represents the context of balance impairment as assessed by the original BESTest. It yields a point-score from 0 to 24 and includes such tasks as the TUG and the one-leg stance, which are commonly used by physical therapists to assess strength, gait, and balance. The psychometric proprieties and level of accuracy of the Brief-BESTest are similar to those of the Mini-BESTest, a longer version of the BriefBESTest. In contrast to the Mini-BESTest, the BriefBEST test assesses 6 balance dimensions (all the originally outlined by the original BESTest), gaining popularity among clinicians for predicting falls, thanks to its time-efficiency and limited needed equipment. Appendix 1, https://links.lww.com/MD/D254 presents the complete Brief-BESTest.
The OAK device comprises several integrated technologies that interact with one another (Fig. 1).
The main structure is equipped with two stabilometric platforms that record the center of pressure of each limb, 4 antennas that generate a low intensity magnetic field where the subject moves, 3 bars that detect the subject's body weight during exercise, and a virtual reality monitor that presents the exercises. The subject is set up with a belt, 2 gloves, and 2 wrappers on both legs. Each glove and wrappers contains a passive magnetic sensor with 6 degrees of freedom. The sensors are attached to the hands, lower back, and back of the knees and record the subject's motions in real time within the magnetic field. The sensors are connected to a portable device (HUB) attached to the lower back; the sensors transmit the motion data to the main processor via Bluetooth connection. The OAK device is also connected to a portable computer programmed to integrate the data and calculate the scores for each item and an overall score based on the Brief-BESTest score assignment. A multidisciplinary team of physicians, physiotherapists, bioengineers, and programmers developed and adapted the exercises from the Brief-BEStest assessment scale to the OAK technology. Even though the system is automated, a minimum level of supervisions was necessary for safety reasons for 2 items considered critical for those assumed to be at high risk of falling.
2.4 Reference standard assessment
The reference standard used as comparator was the STEADI (Stopping Elderly Accidents, Deaths & Injuries) algorithm, developed by the US Centers for Disease Control and Prevention (CDC). The CDC algorithm was selected to screen subjects at low, moderate, and high risk of falling (Appendix 2, https://links.lww.com/MD/D254). This algorithm can be assumed as an established system that screens multiple domains across self-report and objective measures. The algorithm comprises 3 questions: have you fallen in the past year? Do you feel unsteady when standing or walking? and Do you worry about falling? A “yes” answer to any of these key screening questions classifies the subject at increased risk of falling. Further assessments are recommended to investigate the presence of gait, strength, or balance problems. We chose a single functional recommended test, the Time Up and Go Test (TUG) among those suggested in the algorithm flow, as we felt most clinicians would be familiar with it. A TUG test score of ≥12 seconds was used to identify individuals at moderate/high risk of falling. The number of falls and injuries in the past year discriminates between moderate and high risk. Falling twice or more, or once with injury, classifies the person as being at high risk of falling.
2.5 DTA comparison
We compared the measurement of OAK diagnostic accuracy via BriefBESTEST against the CDC-STEADI fall risk algorithm. Our interest was on the technical side—the diagnostic maturity of the machine—not on differences in DTA of scales.
Primary outcomes were: performance failure, defined as the proportion of participants for which the OAK device failed to provide a fall risk assessment (i.e., any software or hardware defect that did not permit or interrupted risk assessment), and DTA, evaluated as the proportion of participants identified by the OAK device as being at moderate/high risk of falling among those at moderate/high risk according to the CDC algorithm for fall risk assessment. The secondary outcome was safety in terms of serious adverse events and adverse events during the assessment, as measured by a health professional observing the procedure. Measures of interest were collected during testing with no follow up.
At the beginning of the assessment session, the subject's general characteristics and results of evaluation with the CDC algorithm were collected. The subjects were fitted with the magnetic sensors and instructed to follow the directions for completing a series of 8 exercises. If the subject did not complete an exercise correctly, he/she was instructed to repeat it. If completed successfully, the next exercise in the series was automatically displayed on the monitor. If an exercise was not completed within 30 seconds, the application automatically stopped the exercise, graded the subject's performance incapable and moved on to the next exercise. At the end of the session, the device evaluated performance on a Brief-BESTest point-score ranging from 0 to 24.
Trained physicians or physiotherapists performed the CDC evaluations. A physiotherapist and bioengineer observed the interaction between the subjects and the device. For the reactive postural response (items 5 and 6) a closely supervision was undertaken.
At the end of the assessment, subjects received a list of recommended exercises and standardized training on how to improve strength, gait, and mobility and reduce their risk of falling. This was not part of the intervention under study but reflects good clinical practice.
2.8 Sample size calculation
The sample size was calculated based on the 2 primary outcomes: OAK performance and accuracy of the system for assessing risk of falling. Using a futility design, we tested the hypothesis that the OAK system would fail to correctly complete the assessment with an incidence of at least 5% and not above 15% with a type I error of 10% and a type II error of 15% (power 85%). Given these estimates, a sample size of 47 subjects was calculated. Furthermore, with a type I error of 5% and a type II error of 20% (power 80%), a sample of 60 subjects at moderate/high risk of falling according to the CDC algorithm would have been sufficient to assess the accuracy of the OAK system for sensitivity, which was expected to be equal to 95% and not lower than 85%. Considering a drop-out percentage of 20% for any reason (including device malfunction), we calculated a sample size of 75 subjects at moderate/high risk of falling. To be more conservative, we planned to recruit 80 subjects at moderate/high risk and 20 subjects at low risk of falling (for a total of 100 subjects) screened with the CDC algorithm. Sample size was calculated using Stata software (StataCorp. 2013. Stata Statistical Software: Release 13. StataCorp LP, College Station, TX).
2.9 Data collection
Data were collected on a case report form. General characteristics (age, sex, body weight, height, retrospective fall occurrence) and single and total item scores of CDC algorithm evaluation and Brief-BESTest were extracted. The data were entered into a database and then analyzed.
2.10 Statistical analysis
Demographic characteristics were summarized as absolute and relative frequencies for categorical variables, and mean with standard deviation (SD) or median with interquartile range (IQR) for continuous variables when appropriate. Performance outcome was analyzed following an exact binomial distribution, providing the corresponding estimate with the upper limit of the relative 90% confidence interval (CI). For the accuracy outcomes, nonparametric receiver operating characteristic (ROC) analysis was performed, providing the corresponding sensitivity and specificity estimates and a ROC area with the relative 95% CI. Youden index was used to select an optimal cut-off point.
A cohort of 183 healthy adults aged over 65 years volunteered to participate over a recruitment period of 13 months was screened in order to achieve the planned sample size. Overall, 131 (71.6%) women and 52 (28.4%) men (mean age, 74 ± 6 years) were assessed. The CDC algorithm screened 41 (22.4%) subjects as being at high risk of falling, 39 (21.3%) at medium risk, and 103 (56.3%) at low risk of falling (Fig. 2). Almost half had fallen at least once in the past year (n = 91, 49.7%; mean, 1.8 ± 1.6 falls); 48 (52.7%) of these sustained at least 1 injury. The final number recruited exceeded the trial recruitment target: most were classified at low risk of falling. Reaching the target number of participants at high risk would have required a larger sample. Table 1 presents the sample characteristics and results of CDC assessment.
3.2 Primary outcomes
3.2.1 OAK performance
The OAK device failed to assess the risk of falling in 9 instances: 6 failures were due to software issues and 3 were caused by connection problems between the sensors and the hardware. The incidence of device failure was 4.9% (90% CI upper limit 7.7%, <15%), well below the threshold for futility—early termination of the study—that was preset to not above 15% incidence and not lower than 5%. Considering the administration of virtual tasks, the mean time needed to complete the whole test was 9.6 minutes (standard deviation 4.3 minutes).
3.2.2 OAK accuracy
As compared with evaluation with the CDC algorithm, nonparametric ROC analysis of assessment with the OAK device provided a corresponding area under the curve (AUC) of 82% (95% CI 76–88%) (Fig. 3). Based on Youden index, the Brief-BESTest via OAK, shown that the relative optimal cut-off point was a 16 point-score out of 24 (i.e., a point-score between 17 and 24 classifies a subject as low risk who would otherwise be classified as being at medium/high risk), corresponding to a sensitivity of 84% and a specificity of 67%. The hypothesis for OAK system performance in terms of sensitivity cannot be rejected (sensitivity <85%). In other words, the system failed to reach the threshold for sensitivity set a priori. Analyzing the data according to the sex: in women, the specificity and the sensitivity are similar to those obtain in the overall analysis whereas, in men, the specificity is 66% and the sensitivity is 94%. However, we did not find any statistically significant differences due to sex.
3.3 Secondary outcome
No adverse or serious adverse events related to use of the OAK device were reported during the study.
This study is one of the first attempts to evaluate the diagnostic accuracy of an automated screening tool for assessing the risk of falls in a large cohort of elderly individuals with different baseline risks. The tool incorporates an innovative technology that uses accelerometers, balance and movement sensors for the assessment of fall risk in a single examination. It can be used under a health care professional's supervision or independently with recommended exercises on how to improve strength, gait, and mobility to reduce the risk of falling. User-device interaction and the programmed exercise series was reportedly easy to follow. The automated tool was well accepted: none of the subjects refused being screened or interrupted the exercise. The incidence of device failures was <5%; there were no substantial problems with software or hardware defects. Nevertheless, considerable room remains for improving device design, user-friendly technology, and cost minimization on this prototype.
The OAK device showed discriminative power of AUC values above 80%, which can be judged as good accuracy similar to other medical diagnostic technologies with a broad impact on health (e.g., mammography for breast cancer screening). The power of AUC is similar to results found for the manual Brief-BESTest application in other populations, type 2 diabetes and Parkinson diseases.[15,16] The device showed a sensitivity of 84% and a specificity of 67%: these results did not reject our null hypothesis, meaning that the accuracy is not yet fully adequate to detect the risk of falling in clinical contexts. However, measures of screening accuracy should be interpreted with the awareness that these results were obtained by comparing an automated device to an algorithm used by experienced physiotherapists in clinical practice.
As a device for screening elderly persons for the risk of falling, the OAK can be rated similar to other clinical balance tests investigating fall-risk assessment.[15–17] A recent review of risk assessment tools commonly adopted in community-dwelling elderly people found that the TUG test has the highest sensitivity (90%) for detecting people at fall risk. However, this result is obtained using a cut-off point score of 20 seconds, which lowers the specificity to 22%. When using less extreme cut-off point scores (e.g., 11 seconds), the TUG test has a sensitivity of 83% and a specificity of 72%. These values are comparable with our data. Other tests such as the Berg Balance Scale have a sensitivity of 25% with a threshold of 45 out of 56 points, which is inadequate for the identification of the majority of people at risk of falling. Some advantages in using OAK via brief-BESTest are the comprehensive functional assessment including TUG and balance exercises, and the shorter administration time required when compared with the Berg Balance Scale or BESTest.
However, this study has also several limitations. Competing reasons might justify the mediocre discriminative power of the tool. Falls have a multifactorial nature. OAK assessment is strictly functional and when it is associated with a single scale (e.g., Brief-BESTest) is likely to not be the most accurate predictor of falls in terms of AUC.
We recruited a large number of healthy elderly people. It is likely that the sample comes from a homogeneous population: differences in the risk of falling are small, making discrimination difficult. Since neurological, musculoskeletal, and cognitive symptoms are predictive of later incidence of falls,[19,20] taking these factors as inclusion criteria would have amplified the differences between the groups, possibly resulting in better differentiation. The choice of the comparative reference standard might also have influenced the diagnostic performance values. The lack of international consensus on the most accurate fall risk assessment tool made our choice challenging. The CDC algorithm might not be the best tool to assess the risk of falling in our population. Though widely used in the United States, it might not be applicable to other populations or not fully adaptable to automated screening devices. Moreover, the CDC algorithm considers domains such as fear or retrospective number of falls that rely on emotions or past events, dimensions outside the range of functional physical activities (i.e., one-leg stance or equilibrium with eyes closed) that were implemented in the device. However, we expect that next generation of devices might easily apply constructs outside the physical dimension (i.e., cognitive abilities) and thus improve device accuracy. Also, the adoption of existing tests, such as the Brief-BESTest, might have been a limitation: the advantages of technological and virtual-reality features can be fully exploited by developing ad-hoc tests (e.g., reaching for a moving target while standing or counting during one-leg stance).
Further steps are needed to improve the capacity of automated devices to assess the risk of falls. The device is not completely automated. The set-up requires an external health professional with the advantage to record additional clinical information made by accelerometers, balance, and movement sensors (e.g., pressure center, deviations) not controlled in manual assessments. Two items are physically challenging and demand some supervision by the therapist, especially in people supposed to be at high risk of falling. The OAK device could be simplified with the use of wireless technology based on sensors that detect direction and speed of human movements. This would help to reduce overall device dimensions. We wish that self-administered assessment on an automated screening device could be envisaged in an in- or out-patient setting with or without the supervision of a health care professional. Indeed, the use of virtual reality to reduce fall risk has already been explored in treadmill training[21,22] and balance-based exercises,[23,24] where it was found to add value and information to conventional methods. Even if sometimes they are not completely automated for safety reasons (e.g., safety harness[21,22]), innovative devices, games, and virtual reality can open the way to the development and application of new methods for self-assessment of the risk of falls or self-training to improve balance and movement. Familiarity with innovative technologies and attention to health status will increase the demand for these devices.
Our study reflects an early research phase. These preliminary results invite health professionals to carefully use innovative devices even if they have the advantage to meet some of the numerous challenges (e.g., time constraints, competing demands, and inadequate reimbursement) to incorporating fall prevention into practice. Automated assessments of falls should be further scrutinized before is used in clinical practice as a screening test. Ability in discriminating patients at different risk of falling is still limited.
The authors wish to thank all the participants, the medical staff, physicians, and physical therapists for their assistance with recruitment and assessment. They thank the external English service for language revision.
Conceptualization: Silvia Gianola, Irene Tramacere, Lorenzo Moja.
Data curation: Greta Castellini.
Formal analysis: Greta Castellini, Irene Tramacere.
Funding acquisition: Lorenzo Moja.
Investigation: Greta Castellini, Silvia Gianola, Elena Stucovitz.
Methodology: Greta Castellini, Silvia Gianola, Elena Stucovitz.
Project administration: Greta Castellini.
Resources: Elena Stucovitz.
Software: Irene Tramacere.
Supervision: Giuseppe Banfi, Lorenzo Moja.
Visualization: Irene Tramacere.
Writing – original draft: Greta Castellini, Silvia Gianola, Elena Stucovitz, Irene Tramacere.
Writing – review & editing: Greta Castellini, Silvia Gianola, Elena Stucovitz, Irene Tramacere, Giuseppe Banfi, Lorenzo Moja.
Silvia Gianola orcid: 0000-0003-3770-0011.
Greta Castellini orcid: 0000-0002-3345-8187.
Irene Tramacere orcid: 0000-0001-5550-3412.
Giuseppe Banfi orcid: 0000-0001-9578-5338.
Elena Stucovitz orcid: 0000-0003-3102-9888.
Lorenzo Moja orcid: 0000-0001-6680-6507.
. National Institute for Health and Care Excellence. Falls: the assessment and prevention of falls in older people; 2013. NICE guideline [CG161].
. WHO: Global Report on Falls Prevention in Older Age World Health Organization, Geneva; 2014.
. Houry D, Florence C, Baldwin G, et al. The CDC Injury Center's response to the growing public health problem of falls among older adults. Am J Lifestyle Med 2016;10:
. O’Loughlin JL, Robitaille Y, Boivin JF, et al. Incidence of and risk factors for falls and injurious falls among the community-dwelling elderly. Am J Epidemiol 1993;137:342–54.
. Heinrich S, Rapp K Fau, Rissmann U, et al. Cost of falls in old age: a systematic review. 20100430 DCOM- 20110329. (1433–2965).
. Hermens H, Freriks B, Merletti R. SENIAM 8: European recommendations for surface electromyography, ISBN: 90-75452-15-2. The Netherlands: Roessingh Research and Development bv 1999; 1999.
. Sarmiento K, Lee R. STEADI: CDC's approach to make older adult fall prevention part of every primary care practice. J Safety Res 2017;63:105–9.
. O’Hoski S, Winship B, Herridge L, et al. Increasing the clinical utility of the BESTest, mini-BESTest, and brief-BESTest: normative values in Canadian adults who are healthy and aged 50 years or older. Phys Ther 2014;94:334–42.
. Ni Scanaill C, Garattini C, Greene BR, et al. Technology innovation enabling falls risk assessment in a community setting. Ageing Int 2011;36:217–31.
. Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med 2003;138:W1–2.
. Padgett PK, Jacobs JV, Kasser SL. Is the BESTest at its best? A suggested brief version based on interrater reliability, validity, internal consistency, and theoretical construct. Phys Ther 2012;92:1197–207.
. Padgett PK, Jacobs Jv Fau, Kasser SL, et al. Is the BESTest at its best? A suggested brief version based on interrater reliability, validity, internal consistency, and theoretical construct. 20120903 DCOM- 20120920 (1538-6724).
. Duncan RP, Leddy Al Fau, Cavanaugh JT, et al. Comparative utility of the BESTest, mini-BESTest, and brief-BESTest for predicting falls in individuals with Parkinson disease: a cohort study 20130402 DCOM- 20130605 (1538-6724).
. Swets JA. Measuring the accuracy of diagnostic systems. Science 1988;240:1285–93.
. Marques A, Silva A, Oliveira A, et al. Validity and relative ability of 4 balance tests to identify fall status of older adults with type 2 diabetes. J Geriatr Phys Ther 2017;40:227–32.
. Duncan RP, Leddy AL, Cavanaugh JT, et al. Comparative utility of the BESTest, mini-BESTest, and brief-BESTest for predicting falls in individuals with Parkinson disease: a cohort study. Phys Ther 2013;93:542–50.
. Lee J, Geller AI, Strasser DC. Analytical review: focus on fall screening assessments. PM R 2013;5:609–21.
. Muir SW, Berg K, Chesworth B, et al. Balance impairment as a risk factor for falls in community-dwelling older adults who are high functioning: a prospective study. Phys Ther 2010;90:338–47.
. Kose N, Cuvalci S, Ekici G, et al. The risk factors of fall and their correlation with balance, depression, cognitive impairment and mobility skills in elderly nursing home residents. Saudi Med J 2005;26:978–81.
. Vieira ER, Palmer RC, Chaves PH. Prevention of falls in older people living in the community. BMJ 2016;353:i1419.
. Mirelman A, Rochester L, Maidan I, et al. Addition of a non-immersive virtual reality component to treadmill training to reduce fall risk in older adults (V-TIME): a randomised controlled trial. Lancet 2016;388:1170–82.
. Mirelman A, Rochester L, Reelick M, et al. V-TIME: a treadmill training program augmented by virtual reality to decrease fall risk in older adults: study design of a randomized controlled trial. BMC Neurol 2013;13:15.
. Yesilyaprak SS, Yildirim MS, Tomruk M, et al. Comparison of the effects of virtual reality-based balance exercises and conventional exercises on balance and fall risk in older adults living in nursing homes in Turkey. Physiother Theory Pract 2016;32:191–201.
. Cho GH, Hwangbo G, Shin HS. The effects of virtual reality-based balance training on balance of the elderly. J Phys Ther Sci 2014;26:615–7.