Background. Length of stay (LOS) is an important measure of hospital activity and health care utilization, but its empirical distribution is often positively skewed.
Objective. This study reviews the mean and median regression approaches for analyzing LOS, which have implications for service planning, resource allocation, and bed utilization.
Methods. The two approaches are applied to analyze hospital discharge data on cesarean delivery. Both models adjust for patient and health-related characteristics, and for the dependency of LOS outcomes nested within hospitals. The estimation methods are also compared in a simulation study.
Results. For the empirical application, the mean regression results are somewhat sensitive to the magnitude of trimming chosen. The identified factors from median regression, namely number of diagnoses, number of procedures, and payment classification, are robust to high-LOS outliers. The simulation experiment shows that median regression can outperform mean regression even when the response variable is moderately positively skewed.
Conclusion. Median regression appears to be a suitable alternative to analyze the clustered and positively skewed LOS, without transforming and trimming the data arbitrarily.
During the past decade, the reduction of in-patient stays and related resource use has become a major priority for public hospitals. Length of stay (LOS) is an important measure of health care utilization and determinant of hospitalization costs. Health care providers and hospital administrators are interested in early and accurate LOS predictions for both economic and organizational reasons. In addition to these aspects of quality control, there is also patient interest in anticipated dates of discharge. 1 Unfortunately, LOS guidelines are not rigorously evidence-based, but often drawn from benchmarking comparisons between institutions or group consensus rather than from epidemiologic data. 2
An understanding of factors influencing in-patient LOS can help clinicians optimize care, rationalize their medical practice, assist administrators for budget planning and resource allocation, and potentially improve patient satisfaction and quality of care. 3,4 Moreover, hospital performances, in terms of LOS, are not directly comparable without adjustment for patient casemix. 5 But the skewness of the LOS variable poses a problem for statistical modeling and analysis. 6 It is well known that the empirical distribution of LOS is positively skewed and varies considerably across Diagnosis Related Groups (DRG). However, trimming methods to discriminate outliers from normal-stays are determined arbitrarily and without theoretical support. 7 In pediatric LOS, for example, outliers are sometimes defined as the 2% of discharges with the longest LOS, although the choice was somewhat arbitrary but has been applied by others in the analysis of administrative data. 2 The objective of trimming coupled with transformation is to minimize the effects of extreme outliers and to attain the normality assumption on the LOS distribution. Selecting an appropriate threshold for exclusion can be further complicated because of multiple diagnoses, some with limited sample sizes (number of discharges), so that a uniform cutoff is difficult to apply in practice.
For highly positively skewed LOS data, which are common for complex diagnoses or conditions, the sample mean often exceeds the median dramatically. Federman et al 8 thus suggested predicting the median of psychiatric LOS. Although bivariate techniques such as rank-based methods have been widely used, relatively little attention has been paid to the robust multivariate techniques that do not rely on the normality assumption. This study reviews the mean and median regression approaches and compares their performances by a simulation experiment. An empirical data set on maternity LOS for cesarean delivery is used to illustrate their practical applications.