Suppose that we had incorrectly treated the 560 observations as a sample from a single log-normal distribution. Then, the 95% confidence interval for the log-normal mean would have been calculated from the 2.5% and 97.5% percentiles of the exponentials of the simulated pivotal statistics from equation (4) in Appendix 1. Those limits are 1.95 to 2.33 minutes. Matching results in the Motivation—Means of Log-Normally Distributed Data section, the interval does not even include the sample mean of the 560 observations, 2.46 minutes. We include a small derivation in Appendix 3 showing why this behavior is observed.
Although the pivotal analysis is sensitive to violations of a single log-normal distribution, many distribution-free statistical methods are suitable for calculating the confidence interval for the mean of a positively valued right skewed distribution when the sample size is large.15 Banik and Kimbria recently performed Monte Carlo simulations to compare the performance of 18 such methods. For example, Chen's method16 estimates the 100(1 − α)% confidence interval by modifying the Student t distribution interval to include the coefficient of skewness. The simple equation (13) is given in Appendix 3. Applied to our data, Chen's 95% confidence interval is 1.96 to 2.94 minutes. Although these methods for single groups do not help with the ANOVA of the preceding section, they give us the confidence interval that we want for the pooled data.
Anesthesia providers rely on receiving communications within minutes of their transmission, ideally <0.5 minutes for emergencies. The analysis of latencies between when messages are sent and when they are received will be an essential component of practical and regulatory assessment of clinical and managerial decision-support systems.1–3 Quantifying latency precisely is important because of its sometimes large but consistently complicated impact on decision making.5,6 Latency data including human response time have moderate sample sizes, large coefficients of variation (>1.70), and highly heterogeneous coefficients of variation among groups.
We showed that, for such data, ANOVA can obtain highly inaccurate results and correspondingly poor managerial decisions, whether performed in the time scale or in the log scale followed by taking the exponential of the result. In contrast, inference of mean latencies based on log-normal distributions can be performed using pivotal methods. Fixed-effects 2-way ANOVA can be extended to the comparison of means of log-normal distributions. Pivotal inference does not require that the coefficients of variation of the studied log-normal distributions are equal, and can be used to assess the interaction and the proportional main effects of 2 factors. We also showed that latency data including a human behavioral component can be bimodal in the log scale (i.e., a mixture of distributions). In such situations, ANOVA can be performed on the homogeneous segment of the data, followed by a single group analysis applied to all or portions of the data using the more robust method.
One alternative approach that would not be suitable is use of rank-based methods. First, these methods produce P-values, not confidence intervals, and the confidence intervals are needed to assess the practical importance of ratios of means. Second, the methods compare mean ranks between (among) groups, not means. For example, consider 2 groups following log-normal distributions with equal means, equal sample sizes of 50 subjects, but unequal coefficients of variation, 1.87 in one group and 0.81 in the other. The Wilcoxon test rejects the null hypothesis of equality at the α = 0.05 criterion for 68% of simulations.17 The value of 68% is far different from the appropriate 5.0% of simulations because although the means are equal, the medians are unequal, and the medians are close to the mean ranks that the Wilcoxon test compares.
Another alternative approach would be suitable if the sample sizes were much larger.18 First, sequential batches of latencies measured at equal time intervals (e.g., first 288, second 288, … ) are created. Second, the ratio of the means is calculated pairwise for each batch. Third, the analyses are performed using the ratios of the means, with the sample size being the number of batches. This method of batch means is how most statistical problems in operating room management are analyzed.19–21 The method is suitable for analyzing latencies when there are no humans involved (e.g., a computer sending messages through the 2 systems under comparison every 5 minutes and recording when messages were received on each device). For example, after 20 workdays, there could be 20 batches each with the pairwise comparison of 288 latencies, where 288 = 24 hours × 12 tests per hour. The method of batch means was not suitable for analyzing the data in Table 1 because the sample size of the smallest group was several-fold too small for the method. The same applied to the data8 in the Motivation—Means of Log-Normally Distributed Data section. For the 2-group comparison in the Motivation—Means of Log-Normally Distributed Data section, a Taylor series expansion method could be used with several batches, some small. However, the method would not be suitable for the ANOVA.22
Franklin Dexter is the Statistical Editor and section Editor for Economics, Education, and Policy for Anesthesia & Analgesia. This manuscript was handled by Steve Shafer, Editor-in-Chief, and Dr. Dexter was not involved in any way with the editorial process or decision.
† Maximum likelihood estimation was performed using the R function VGLM with option mix2normal1. Accessed January 8, 2011: http://rgm2.lab.nig.ac.jp/RGM2/R_man-2.9.0/library/VGAM/man/mix2normal1.html
1. Epstein RH, Dexter F, Ehrenfeld JM, Sandberg WS. Implications of event entry latency on anesthesia information management system decision support systems. Anesth Analg 2009;108:941–7
2. Dexter F, Epstein RH, Lee JD, Ledolter J. Automatic updating of times remaining in surgical cases using Bayesian analysis of historical case duration data and instant messaging updates from anesthesia providers. Anesth Analg 2009;108:929–40
3. Epstein RH, Ekbatani A, Kaplan J, Shechter R, Grunwald Z. Development of a staff recall system for mass casualty incidents using cell phone text messaging. Anesth Analg 2010;110:871–8
4. Poon EG, Kuperman GJ, Fiskio J, Bates DW. Real-time notification of laboratory data requested by users through alphanumeric pagers. JAMIA 2002;9:217–22
5. Dexter F, Macario A, Traub RD. Enterprise-wide patient scheduling information systems to coordinate surgical clinic and operating room scheduling can impair operating room efficiency. Anesth Analg 2000;91:617–26
6. Epstein RH, Dexter F, Piotrowski E. Automated correction of room location errors in anesthesia information management systems. Anesth Analg 2008;107:965–71
7. Dexter F, Bayman EO, Epstein RH. Statistical modeling of average and variability of time to extubation for meta-analysis comparing desflurane to sevoflurane. Anesth Analg 2010;110:570–80
8. Jacques P St., France DJ, Pilla M, Lai E, Higgins MS. Evaluation of a hands-free wireless communication device in the perioperative environment. Telemed J e-Health 2006;12:42–9
9. Limpert E, Stahel WA, Abbt M. Log-normal distributions across the sciences: keys and clues. BioScience 2001;51:341–52
10. Weerahandi S. Generalized confidence intervals. J Am Stat Assoc 1993;88:899–905
11. Krishnamoorthy K, Mathew T. Inferences on the means of lognormal distributions using generalized p-values and generalized confidence intervals. J Stat Plan Infer 2003;115:103–21
12. Krishnamoorthya K, Lua F, Mathew T. A parametric bootstrap approach for ANOVA with unequal variances: fixed and random models. Comput Stat Data Anal 2007;51:5731–42
13. Spangler WE, Strum DP, Vargas LG, May JH. Estimating procedure times for surgeries by determining location parameters for the lognormal model. Health Care Manag Sci 2004;7:97–104
14. Li X. A generalized p-value approach for comparing the means of several lognormal distributions. Statist Probab Letters 2009;79:1404–8
15. Banik S, Kibria BMG. Comparison of some parametric and nonparametric type one sample confidence intervals for estimating the mean of a positively skewed distribution. Commun Stat Simulat 2010;39:361–89
16. Chen L. Testing the mean of skewed distribution. JASA 1995;90:767–72
17. Zhou XH, Gao S, Hui SL. Methods for comparing the means of two independent log-normal samples. Biometrics 1997;53:1129–35
18. Law AM, Kelton WD. Simulation Modeling and Analysis. 2nd ed. New York McGraw-Hill, 1991: 551–3
19. Masursky D, Dexter F, O'Leary CE, Applegeet C, Nussmeier NA. Long-term forecasting of anesthesia workload in operating rooms from changes in a hospital's local population can be inaccurate. Anesth Analg 2008;106:1223–31
20. Dexter F, Epstein RH, Marcon E, Ledolter J. Estimating the incidence of prolonged turnover times and delays by time of day. Anesthesiology 2005;102:1242–8
21. Wachtel RE, Dexter F. Influence of the operating room schedule on tardiness from scheduled start times. Anesth Analg 2009;108:1889–901
22. Ledolter J, Dexter F. Analysis of interventions influencing or reducing patient waiting while stratifying by surgical procedure. Anesth Analg 2011;112:950–7
23. Student The probable error of a mean. Biometrika 1908;6:1–25
From Motivation—Means of Log-Normally Distributed Data Section
The following steps calculate a 100(1−α) percent confidence interval for the ratio of 2 means11:
For j=1 to 2
For k = 1 to m (e.g., m = 100,000)
Generate Zjk: a normally distributed random number with mean 0 and variance 1
Generate Ujk: the square root of a χ2 distributed random number with nj−1 df.
where y[Combining Macron]j and sj are the sample means and sample standard deviations of the log-transformed data
Calculate the 100(α/2) and 100(1−α/2) percentiles of the simulated quantities
, k=1, … m.
Equation (4) for Tmeanjk follows from the same 2 statistical properties as Student's t test23:
follows a normal distribution (Zj), and (nj−1)sj2/σj2 follows a χ2 distribution (Uj2). Data and R code are available as a Web supplement (see Supplemental Digital Content 1, http://links.lww.com/AA/A292).
We performed Monte Carlo simulations of equations (4) and (5) for pairs of log-normal distributions with common μ = 1.0 and σ2 = 0.1. Whereas for n1 = n2 = 40, coverage of α = 0.050 confidence intervals was 0.050 (SE 0.003), for n1 = n2 = 20 coverage was 0.046, and for n1 = n2 = 10 coverage was 0.040. Simulated coverage was no different for 2 normal distributions with the same parameters. Krishnamoorthya and colleagues showed similar deterioration of coverage for smaller sample sizes and normal distributions while using different parameter values and numbers of groups.11,12 To understand results, we consider the extreme n1 = n2 = 2 for which our simulations of 2 normal distributions demonstrated very poor coverage (0.004), in comparison with exact coverage for a single normal distribution (0.050). The generalized probability value of the pivotal test statistic is
where t1 and t2 have t distributions with 1 degree of freedom and
is the pooled variance. Under the assumption of independent normal random variables with equal means and variances,
has a t distribution with
, where W1=a1t1 is Cauchy with scale
and W2 = a2t2 is Cauchy with scale
. Hence W=W1+W2 is Cauchy with scale
ranging from a minimum of 1 (when 1 sample variance dominates the other) to a maximum of ✓2 (when the sample variances are the same). Consequently,
, where F is the cdf of the Cauchy distribution with scale
, where G is the cdf of the t distribution with 2 df. For scale λ = ✓2 and w = 0.025, F−1(0.025) = −17.9692 and GF−1(0.025) = G(−17.9692) =0.0015. For scale λ = 1 (which implies a t distribution with 1 degree of freedom) and w=0.025, F−1(0.025) = −12.7062, and GF−1(0.025) = G(−12.7062) = 0.0031. For a 2-sided test with significance level 0.05, the rejection probability is doubled, implying bounds 0.003≤P[GPV≤0.05]≤0.006, matching the above simulated value of 0.004. Thus, failure to achieve the 0.05 significance level results from reliance on separate variance estimates from small sample sizes in comparison with assuming that the variances are equal. This is the so-called Behrens–Fisher problem (e.g., Student t distribution with unequal versus equal variances).
From Analysis of Variance Section
Our analysis of the fixed effects 2-way ANOVA with 2 levels for the treatment factor (referred to as factor 1) and g levels for the group factor (referred to as factor 2) starts with an overall test of equality of the 2g means, E(Xij) = exp(μij + 0.5σij2) = exp(ηij), i = 1, 2 and j=1, 2, …, g. The null hypothesis is written as H0:η11=η21=…=η1g=η2g. For our data, the 2nd factor (previous message) has g = 2 groups.
Li defines the vector of pivotal statistics
, with elements14
where y[Combining Macron]ij and sij are the sample means and sample standard deviations of the log-transformed data for the (ij)th treatment/group combination, and the Zij and Uij2 are independent standard normal and χ2 (with (nij−1) df) random variables. For simplicity of notation, the superscript k corresponding to k = 1, 2, …, m simulations is not displayed. The term
follows a Student t distribution and thus has expectation of zero. Furthermore,
. Therefore, from equation (6), the elements of the mean (vector) are14
Because the variance of the t distribution equals
, the diagonal covariance (matrix) V=V(Tη[Combining Circumflex Accent]) has elements14
The null hypothesis H0:η11=η21=…=η1g=η2g is expressed as H0:HAη=0, where 0 = (0, 0…, 0)′ is a 2g−1 × 1 vector of zeros, η=(η11, η21, …, η1g, η2g)′, and HA is the 2g−1 × g constraint matrix
Li14 showed that the null hypothesis can be tested by computing the test statistic
For each of the m simulations, the vector of pivotal statistics Tη[Combining Circumflex Accent] and the value of the test statistic TSA are calculated. The generalized probability value P[TSA>0] is estimated from the proportion of the simulated TSA>0. The null hypothesis H0:η11=η21=…=η1g=η2g is rejected when that estimated P[TSA>0]<0.05. Accepting the null hypothesis implies that all means are the same and that neither factor has an effect on the response.
We adapt Li's general testing approach in equation (10) to test interaction and main effects. For each level j of the group factor (factor 2), the log-ratio of the 2 treatment means,
, expresses the proportional effect of treatment. Interaction is absent if these g log-ratios are the same. We write the null hypothesis of no interaction H0:η21−η11=η22−η12=…=η2g−η1g as H0:HINTη=0, where 0=(0, 0,…, 0)′ is a g − 1 × 1 vector of zeros, η=(η11, η21, …, η1g, η2g)′, and HINT is the g − 1 × 2g constraint matrix
The test statistic for interaction is obtained from equation (10), replacing the constraint matrix HA with HINT. Rejection of H0 implies that the proportional treatment effects depend on the group (2nd) factor. When there is appreciable interaction, it can be misleading to talk about main effects of treatment (factor 1) and group (factor 2). If the 2nd factor represents groups that cannot be controlled, such as the number of previous messages, confidence intervals of the proportional treatment effects would be reported separately for each group.
There is no main effect of treatment (factor 1) if η21+η22+…+η2g=η11+η12+…+η1g or (η21−η11)+(η22−η12)+…+(η2g−η1g)=0 (i.e., the sum or average of the g proportional treatment effects is zero). The null hypothesis is expressed as H0:HF1η=0, with the 2g × 1 constraint matrix HF1=(−1, 1, −1, 1, …, −1, 1).
There is no main effect of group (factor 2) if the g sums η1j+η2j, j=1, 2, …, g, are the same. The null hypothesis is expressed as H0:HF2η=0 with HF2 the g − 1 × 2g matrix
From Mixture Distributions Section
The 95% confidence interval for the mean calculated using equation (4) and all 560 observations did not straddle the sample mean. To understand why, let W follow the Bernoulli distribution with parameter p. The pivotal approach assuming a single log-normal distribution X=exp(Y) effectively assumes a normally distributed Y=WY1+(1−W)Y2 with mean
The assumed mean Mean(X)=exp(μM+0.5σM2)=1.99 minutes, which is within the estimated 95% confidence interval of 1.95 to 2.33 minutes. However, the 1.99 minutes is smaller than the sample mean of 2.46 minutes, which is not within the confidence interval. Instead of relying on the pivotal method, which is inaccurate in this context, we apply Chen's method to estimate the 100(1 − α)% confidence interval for the mean. Chen's method modifies the Student t distribution interval to include the coefficient of skewness16:
where x[Combining Macron], sx, and [Combining Circumflex Accent]γ are the sample mean, SD, and skewness of the observations of the single group in the time scale. Banik and Kimbria15 showed that for a log-normal distribution with n=100 and σM=1.0 (for our data, n = 560 and [Combining Circumflex Accent]σM=0.92), Chen's method covers the true mean for 93.8% of simulations, close to the desired 95.0% rate.15