Falls are common adverse events for hospitalized patients and have been targeted as an important safety goal in the United States (National Priorities Partnership, 2008) and internationally (e.g., Healey & Scobie, 2007). A recent national study of falls found fall rates of 2.8–4.0 falls per 1,000 patient days in medical, medical–surgical, and surgical units and injury fall rates ranging between 0.7 and 1.1 (Bouldin et al., 2012). Patients experiencing an injurious fall incident have substantially longer lengths of stay and higher operational costs than similar patients without a fall incident (Wong et al., 2011). Although the importance is unchallenged, the prevention of falls has proven to be a difficult task (Coussement et al., 2008). Previous research found discrepancies in the definition of falls (Hauer, Lamb, Jorstad, Todd, & Becker, 2006), comparability of falls captured by different reporting methods (Hill et al., 2010; Sari, Sheldon, Cracknell, & Turnbull, 2007), and underreporting of falls from incident reports (Shorr et al., 2008). In clinical practice, routine data collection and reporting of fall events by care providers enables learning from experience (Leape, 2002) and is a core strategy in fall prevention in hospitals. Inconsistencies in fall classification and reporting might be one contributing factor to the difficulty of identifying effective fall prevention measures (Haines, Massey, Varghese, Fleming, & Gray, 2009; Hill et al., 2010; Shorr et al., 2008).
The National Quality Forum (NQF) has established a process to endorse consensus standards for hospital care performance measurement, including falls (NQF, 2004). The goal of hospital care performance measurement is to increase knowledge about quality and variation in performance, create incentives to improve performance, and facilitate benchmarking and sharing of best practices (NQF, 2003). Given the difficulties of fall classification and reporting identified by previous research, it remains to be shown how the classification of fall events by staff might result in measurement error or bias.
The purposes of this study were to investigate potential ambiguities of the NQF definition of falls and to explore the impact of fall classifications at the individual, unit, or hospital level. Investigated were (a) how experts classify a set of fall scenarios according to the NQF fall definition (National Database of Nursing Quality Indicators [NDNQI], 2010); (b) how hospital staff classifies the same fall scenarios; and (c) if fall classifications differ among units, individuals, and hospitals.
Online Video Survey
The study consisted of two phases and two samples: (a) an online survey of experts and (b) a multi-site online survey of hospital staff clustered within units and hospitals. The online survey for both samples contained videos of 20 different fall scenarios. These fall videos were based on previous work (Haines et al., 2009) and were adopted and supplemented by the research team for the U.S. study context. Scenarios were rerecorded by professional actors and filmed in a simulation laboratory with the intention of providing a realistic depiction of clinical situations. To control for a potential bias in the videos resulting from respondent reaction to different actors, two scenarios were each filmed twice using different actors.
The goal for the expert sample was to determine the correct classification of scenarios, and therefore, the NQF fall definition was accessible to the experts during the online survey. Scenarios could be classified as fall, nonfall, or unclear. Expert judgments were used as the standard against which hospital staff responses were assessed. To reflect current practice where the definition would not be available to hospital staff, the survey did not include the NQF definition. The survey also contained an item to assess whether respondents had experienced each fall scenario during the last 4 weeks. A written description of the filmed scenarios is shown in Table 1.
The current NQF-endorsed fall definition was used to classify the fall scenarios (NDNQI, 2010). A fall was defined as “an unplanned descent to the floor (or extension of the floor, e.g., trash can or other equipment) with or without injury to the patient, and occurs on an eligible reporting nursing unit. All types of falls are to be included whether they result from physiological reasons (fainting) or environmental reasons (slippery floor). Include assisted falls—when a staff member attempts to minimize the impact of the fall.”
Participants and Setting
The study was approved by the human subjects committee of the University of Kansas Medical Center. In the first phase of the study, 24 experts consisting of staff of the NDNQI and fall researchers were asked to classify the video fall scenarios. The NDNQI is a program of the American Nurses Association and is administered under contract to the University of Kansas School of Nursing. The NDNQI is the measure developer and steward for the NQF patient fall indicator. Currently, more than 1,800 hospitals with more than 10,000 units voluntarily collect and report fall data to the NDNQI.
For the second phase of the study, 910 hospitals representing 2,984 units were invited to participate. Invitations were sent via E-mail to NDNQI site coordinators who manage data collection efforts at each participating hospital. Inclusion criteria for hospital units were (a) participation in NDNQI; (b) reported fall data the previous four quarters (fourth quarter of 2008 to third quarter of 2009); and (c) classified as an adult critical care, step down, medical, surgical, medical–surgical, or rehabilitation unit. Two hundred and forty-seven hospitals (response rate = 27.1%) encompassing 615 units (response rate = 20.6%) agreed to participate in the study. To calculate response rates, site coordinators were asked to provide the number of staff members allowed to file incident reports on enrolled units. Eligible staff could include registered and unlicensed nursing staff, physical therapists, and physicians. In total, 21,043 hospital staff members were invited to participate in the online video survey. The total number of respondents was 8,655, resulting in a response rate of 41.1%. The median unit-level response rate was 39.3%, and the mean response rate was 48.3%. Because of a technical error, not all responses could be matched with a unit, which reduced the analytical sample to 6,342 staff in 362 units in 170 hospitals. No differences on age (p = .625) or gender (p = .259) were identified between the respondents associated with a unit and those who could not be associated with a unit.
For the analytical design, a measurement model similar to the model of hospital differences by van Dishoeck, Lingsma, Mackenbach, and Steyerberg (2011) was used. This model is used to separate observed hospital differences of a given performance indicator into random variation, patient characteristics, registration bias, quality of care, and residual confounding. Through the standardization of the fall scenarios (each respondent classified the same set of scenarios), patient characteristics, registration bias, and quality of care were held constant. Random variation was addressed by measuring the residual variance on the hospital, unit, and individual level with a random effects model to measure and represent sources of potential bias (e.g., through different reporting cultures or local reporting policies).
Study aims 1 and 2 were assessed using the percentage of experts or hospital staff classifying each fall scenario as either a fall or a nonfall. Unambiguous scenarios were those where the rating significantly deviated from 50% based on the binomial distribution with a uniform prior distribution. Using the posterior distribution that experts ratings were larger than 0.5 (chance) or less than 0.5, a specific cutoff point of 70.2% differentiated an ambiguous from an unambiguous fall scenario (two-sided Bayesian, p < .05). Intraclass correlations (ICCs) were provided to describe the between-unit variation for each scenario; the lower the ICC, the lower the variation of fall classifications between hospital units. Low variation between units indicates similar abilities of unit personnel to identify a scenario as a fall or a nonfall. Because of the binary outcome (classification as fall or nonfall) and to provide credibility intervals, ICCs were calculated from generalized mixed models and Markov Chain Monte Carlo (MCMC) simulations with a noninformative prior distribution were used. From these MCMC draws, equal-tailed 95%-credibility intervals were calculated (Nakagawa & Schielzeth, 2010). Also calculated were the percentage of respondents who had experienced a specific fall scenario during the last 4 weeks to give an account of how common the videotaped scenarios are.
The experts did not classify all fall scenarios unambiguously as either a fall or a nonfall, so staff classifications of hospital falls were assessed from three points of view: (a) as sensitivity by the proportion of staff that identified unique scenarios as falls in accordance with the classification by the expert panel (Scenarios 1, 2, 4, 5, 7, 12, 13, and 19; Scenario 15 was not included, as it was a repetition of another scenario); (b) as specificity by the proportion of staff who identified nonfall scenarios in accordance with the expert panel (Scenarios 6, 9, 10, and 14); and (c) by a crude classification rate, which takes all scenarios into account. Sensitivity and specificity are used to assess the variation between units and hospitals based on the agreement of fall classifications with reference to the current NQF fall definition and the expert classification (definition compliance rate). The crude classification rate disregards the fall definition and assesses agreement among all scenarios.
For aim 3, generalized mixed models with MCMC simulations were used to calculate ICCs with 95% credibility intervals. To assess the influence of different factors, ICCs were calculated from unconditional mixed models with a random effect for units (Models 1, 6, and 11), hospitals (Models 2, 7, and 12), and individuals (Models 3, 8, and 13). Because we are interested primarily in between-unit and between-hospital variation, unit-level (Models 4, 9, and 14) and hospital-level (Models 5, 10, and 15) ICCs, adjusted for unit type (as fixed effect) and individuals (as random effect), were provided.
Because using ICCs does not identify unusual units, differences in fall classifications were visualized using a funnel plot (Spiegelhalter, 2005). Unusual units refer to those units that deviate from the sample distribution. In this study, unusual units are units where the classification of fall scenarios significantly differed from the sample. The plot provides point estimates of the predicted fall rate for each unit based on the random effects of Model 14, which include fixed effects for unit type and random effects for each individual and hospital unit. Because no established standards exist to assess differences among clusters (units) formally, the funnel plots also display confidence limits with and without a Bonferroni correction for p = .05 and p = .01. The plot depicts how many of the units are outside the limits of the sampling distribution.
All analyses and the funnel plot were computed using R software (R Development Core Team, 2011) including the packages MCMCglmm (Hadfield, 2010) and rptR (Nakagawa & Schielzeth, 2010).
Fifty-eight percent of the experts were registered nurses (RNs), 21% were advanced practice nurses, and 20% were falls researchers with various backgrounds such as medicine, biostatistics, or physical therapy. Twenty-five percent of the experts had bachelor’s degrees, 29% had master’s degrees, and 46% had doctoral degrees. Fourteen out of 20 scenarios (10 falls and four nonfalls, including one repetition) were determined to be unambiguously fall or nonfall scenarios according to the NQF definition (Table 1); six scenarios (including one repetition) did not achieve unambiguous ratings by the experts (Scenarios 3, 8, 11, and 15–17). No differences, or only small differences, were found for the repeated scenarios with different actors (100% vs. 100% for Scenarios 6 and 14; 54% vs. 58% for Scenarios 8 and 16).
Hospital Staff Classifications
Hospital staff were RNs (78%), nursing assistants (8%), license practical nurses (4%), patient care technicians (3%), physical therapists (2%), and others (5%). Education levels of participants were bachelor’s degree (42%), associate degree (32%), diploma (16%), masters (6%), and others (4%). Twelve scenarios could be classified clearly as either fall or nonfall scenarios. Only small differences were found for the same scenarios with different actors (98% vs. 97% for Scenarios 6 and 14; 62% vs. 64% for Scenarios 8 and 16). Five of the unclear scenarios (3, 8, and 15–17) overlapped with the scenarios assessed as unclear by the experts. However, hospital staff did not classify Scenarios 18 and 20 clearly, and Scenario 11 was classified as a nonfall although it was unclear from the expert’s point of view. Taking the cutoff point of 70.2% into account, experts and hospital staff agreed on eight fall, three nonfall, and five ambiguous scenarios. There was disagreement between experts and hospital staff on the other four scenarios. Considering the expert classifications including all unambiguous scenarios, the sensitivity was 0.90 (SD = 0.16, n = 6,342) and the specificity was 0.88 (SD = 0.19). The individual mean probability for classifying a scenario as a fall was 0.60 (SD = 0.18).
Scenarios can be divided into two groups of relative frequency: 14 were less-frequent scenarios with <4% of respondents having experienced the scenario in the last 4 weeks, and six were more-frequent scenarios with >6% having experienced the scenario in the last 4 weeks. Three scenarios (16, 18, and 20) had ICCs with credibility limits of .05 or above, indicating weak differences between units for the remaining 17 scenarios. All three scenarios were not clearly classified as falls by hospital staff.
Variation Among Individuals, Units, and Hospitals
An overview of professional background, gender, and age of the study participants is shown in Table 2. Most participants were RNs. The proportion of RNs was highest in critical care units (92%) and lowest in rehabilitation units (60%). Nine out of 10 nurses were women, and the average age was 40 years.
The ICC of the sensitivity ranged from 0.042 to 0.254 for the models with one random effect (Table 3), with the largest variation between individuals, followed by between-unit and between-hospital variations. We also considered professional background of the respondents having a role (RNs vs. licensed practical nurses vs. others as fixed effect) and potentially influence the ICC. Although the association was significant (MCMC, p < .03) and the model fit improved in Model 3 for sensitivity, the ICC decreased from 0.254 to 0.248 (95% CI [0.227, 0.270]), indicating that only a relatively small proportion of the ICC is explained by the professional background. For the crude classification rates, ICCs were markedly lower than for sensitivity and specificity and ranged between 0.018 and 0.055. Again, the between-individual differences were the largest source of variation, followed by between-unit and between-hospital variation. As expected, unit and hospital ICCs for sensitivity, specificity, and the crude classification rate adjusted for unit type (fixed effect), and individuals (random effect) were slightly smaller than ICCs from unconditional models.
Point estimates for the crude classification rate of all units are shown in Figure 1. Depending on how the confidence limits are defined, 39 (95% control limit, no adjustment), 20 (99% control limit, no adjustment), or 6 (99% control limit, Bonferroni adjustment), units would be classified as unusual units. On the basis of the funnel plot, 10.8% of units were outside the 95% control limit; however, taking multiple testing into account and using the Bonferroni correction, this percentage shrank to a negligible 1.7%.
Expert and hospital staff classifications were generally in agreement, with the exception of four scenarios. Whereas experts were asked to classify according to the NQF definition, hospital staff were not. The extent to which participating hospitals might have trained staff with regard to the NQF definition is unknown, but the high level of agreement between experts and hospital staff implies that the definition meets the common sense or typical hospital standard definition (face validity) of what clinicians believe should be classified as a fall. On the basis of the NQF definition, 14 of the 20 fall scenarios were classified consistently by experts. However, there was less agreement among experts on six scenarios, and three of these were reported by hospital staff as occurring frequently, ranking 4th, 6th, and 10th in terms of frequency. Thus, some relatively frequent fall scenarios are not delineated clearly by the NQF fall definition.
These results are consistent with previous research identifying a lack of standardization in fall definitions (Hauer et al., 2006) and in the application of such definitions in practice settings (Haines et al., 2009). The results indicate that consistent classification is also difficult for experts who should be able to classify fall scenarios. The NQF consensus definition might be the root of these difficulties; however, compared with other commonly used definitions (e.g., Kellogg International Work Group, 1987; Lamb, Jørstad-Stein, Hauer, & Becker, 2005), the NQF definition is more comprehensive and more specific regarding fall scenarios than, for example, the World Health Organization definition used in the study by Haines et al. (2009). Nevertheless, the results suggest the need for further clarification of the NQF definition of falls. Considering the ambiguous fall scenarios, two areas in need of clarification emerge: scenarios where patients land on a chair or bed with or without assistance (Scenarios 3, 8, 11, and 16) and scenarios where the patient is asked retrospectively if a fall has occurred (Scenarios 2, 15, and 17). One way of clarifying these situations would be to include beds and chairs as extensions of the floor; this classification might help to classify some scenarios (e.g., Scenario 8). However, this would lead to a range of situations to be classified as falls that are not usually considered as a fall (Scenarios 3 and 10). The main difference between these scenarios might be that some show an uncontrolled landing (Scenario 8), whereas others give the impression of a controlled catch, which could be used in a revision of the definition. A future revision of the definition should also clarify the classification of patient-reported incidents as falls, unless there are plausible reasons to believe that a fall did not occur. For any changes to the definition, it seems to be crucial to conduct cognitive testing of the changes to make sure that the definition is understood by staff and helps to classify unclear situations.
Variability between individuals was considerably higher (ICCs ranging between 0.055 and 0.251) than for units or hospitals, indicating that there are opportunities to increase agreement through training of hospital staff. Given the weak influence of professional background on the sensitivity, it seems to be appropriate to develop a training program that addresses all professional groups.
The ICCs and a funnel plot were used to assess the variability between units, hospitals, and individuals to explore the impact of ambiguities in the definitions and differing reporting cultures on the unit or hospital level. Although it is not possible to divide effects of ambiguities clearly in the definition from differing reporting cultures, it seems reasonable to associate the latter to ICCs at the unit and hospital level. Adjusted ICCs for units and hospitals for sensitivity, specificity, and the crude classification rate indicate little variability among units and hospitals and, therefore, little bias or evidence of systematic differences through reporting cultures. There are no established standards as to when between variability is considered clinically relevant on the organizational level. However, funnel plots presented here are in line with current thoughts on how unusual performance of organizations in healthcare can be identified (Spiegelhalter, 2005) and offer information on variability between units and hospitals beyond the ICC. In summary, results indicate only small bias from ambiguities in the definitions and differing reporting cultures. Measures to further reduce this bias should be targeted at the individual level, which remains the largest source of variation.
One limitation of this study is that, although a large sample was employed, a response rate of 41% includes the potential for nonresponse bias. Comparing the hospital size measured by the number of beds, no differences between the sample and all NDNQI hospitals could be found (χ2 = 8.8, df = 5, p = .12). However, NDNQI hospitals are more often not-for-profit, larger, and with a higher case-mix index than the U.S. population of hospitals (Lake, Shang, Klaus, & Dunton, 2010) and, therefore, are not representative for the U.S. population of hospitals.
One of the strengths of the study can be considered as a limitation: the use of video vignettes and associated questions of reliability and validity. Hospital staff classified the repeated scenarios filmed with different actors similarly, which gives some indication of the reliability of the videos. Although this approach allows delivering a standardized representation of fall situations, questions of validity remain less clear on how situations would be judged in a real-world setting or if the situations are realistic in the clinical context. Although a measure of the relative frequency of certain fall situations was included, the distribution of different fall situations, how common each of the scenarios are, or if common situations were omitted are not known. Any future testing on fall classifications by hospital staff should therefore explore how representative scenarios are for fall situations in clinical practice. This also includes the question of whether contextual factors might influence fall classification and subsequent reporting, such as time pressure or trends of not reporting falls without a recognizable impact on the patient.
Future research may be needed to test a revised fall definition, to develop fall identification training programs, and to test the impact of the training on fall classification. Earlier work on how to develop and test a training program on the pressure ulcer indicator could be used as a template for a fall classification training program (Bergquist-Beringer et al., 2009; Bergquist-Beringer, Gajewski, Dunton, & Klaus, 2011; Gajewski, Hart, Bergquist-Beringer, & Dunton, 2007).
The results of this study indicate that, for some fall scenarios, the NQF definition needs improvement to classify all fall scenarios correctly and consistently. Using ICCs, the primary source of variability was found to be between individuals, which might be reduced by training of hospital staff. From the quality improvement, the benchmarking, and the public reporting perspectives, the results show little impact of individual variations in classification on aggregated fall rates at the unit or hospital level.
Bergquist-Beringer S., Davidson J., Agosto C., Linde N. K., Abel M., Spurling K., Christopher A. (2009). Evaluation of the National Database of Nursing Quality Indicators (NDNQI) training program on pressure ulcers. Journal of Continuing Education in Nursing, 40, 252–258.
Bergquist-Beringer S., Gajewski B., Dunton N., Klaus S. (2011). The reliability of the National Database of Nursing Quality Indicators pressure ulcer indicator: A triangulation approach. Journal of Nursing Care Quality, 26, 292–301.
Bouldin E. D., Andresen E. M., Dunton N. E., Simon M., Waters T. M., Liu M., Shorr R. I. (2012). Falls among adult patients hospitalized in the United States: Prevalence and trends. Journal of Patient Safety. DOI: 10.1097/PTS.0b013e3182699b64
Coussement J., De Paepe L., Schwendimann R., Denhaerynck K., Dejaeger E., Milisen K. (2008). Interventions for preventing falls in acute- and chronic-care hospitals: A systematic review and meta-analysis. Journal of the American Geriatrics Society, 56, 29–36.
Gajewski B. J., Hart S., Bergquist-Beringer S., Dunton N. (2007). Inter-rater reliability of pressure ulcer staging: Ordinal probit Bayesian hierarchical model that allows for uncertain rater response. Statistics in Medicine, 26, 4602–4618. doi:10.1002/sim.2877.
Hadfield J. D. (2010). MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R Package. Journal of Statistical Software, 33 (2), 1–22.
Haines T. P., Massey B., Varghese P., Fleming J., Gray L. (2009). Inconsistency in classification and reporting of in-hospital falls. Journal of the American Geriatric Society, 57, 517–523.
Hauer K., Lamb S. E., Jorstad E. C., Todd C., Becker C. (2006). Systematic review of definitions and methods of measuring falls in randomised controlled fall prevention trials. Age and Ageing, 35, 5–10.
Healey F., Scobie S. (2007). Slips, trips and falls in hospital. London, UK: NHS Patient Safety Agency.
Hill A. M., Hoffmann T., Hill K., Oliver D., Beer C., McPhail S., Haines T. P. (2010). Measuring falls events in acute hospitals—A comparison of three reporting methods to identify missing data in the hospital reporting system. Journal of the American Geriatrics Society, 58, 1347–1352.
Kellogg International Work Group. (1987). The prevention of falls in later life. A report of the Kellogg International Work Group on the prevention of falls by the elderly. Danish Medical Bulletin, 34 (Suppl 4), 1–24.
Lake E. T., Shang J., Klaus S., Dunton N. E. (2010). Patient falls: Association with hospital Magnet status and nursing unit staffing. Research in Nursing & Health, 33, 413–425.
Lamb S. E., Jørstad-Stein E. C., Hauer K., Becker C. (2005). Development of a common outcome data set for fall injury prevention trials: The Prevention of Falls Network Europe consensus. Journal of the American Geriatrics Society, 53, 1618–1622.
Leape L. L. (2002). Reporting of adverse events. New England Journal of Medicine, 347, 1633–1638.
Nakagawa S., Schielzeth H. (2010). Repeatability for Gaussian and non-Gaussian data: A practical guide for biologists. Biological Reviews of the Cambridge Philosophical Society, 85, 935–956.
National Database of Nursing Quality Indicators. (2010). Guidelines for data collection on the American Nurses Association’s National Quality Forum endorsed measures. Kansas City, KS: University of Kansas Medical Center.
National Priorities Partnership. (2008). National priorities and goals: Aligning our efforts to transform America’s healthcare. Washington, DC: National Quality Forum.
National Quality Forum. (2003). A comprehensive framework for hospital care performance evaluation: A consensus report. Washington, DC: Author.
National Quality Forum. (2004). National voluntary consensus standards for nursing-sensitive care: An initial performance measure set. Washington, DC: Author.
R Development Core Team. (2011). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Sari A. B., Sheldon T. A., Cracknell A., Turnbull A. (2007). Sensitivity of routine system for reporting patient safety incidents in an NHS hospital: Retrospective patient case note review. British Medical Journal, 334 (7584), 79.
Shorr R. I., Mion L. C., Chandler A. M., Rosenblatt L. C., Lynch D., Kessler L. A. (2008). Improving the capture of fall events in hospitals: Combining a service for evaluating inpatient falls with an incident report system. Journal of the American Geriatric Society, 56, 701–704.
Spiegelhalter D. J. (2005). Funnel plots for comparing institutional performance. Statistics in Medicine, 24, 1185–1202.
van Dishoeck A. M., Lingsma H. F., Mackenbach J. P., Steyerberg E. W. (2011). Random variation and rankability of hospitals using outcome indicators. BMJ Quality & Safety, 20, 869–874.
Wong C. A., Recktenwald A. J., Jones M. L., Waterman B. M., Bollini M. L., Dunagan W. C. (2011). The cost of serious fall-related injuries at three Midwestern hospitals. Joint Commision Journal on Quality and Patient Safety, 37, 81–87.