Skip Navigation LinksHome > October 2002 - Volume 97 - Issue 4 > Sample Size Calculations in Clinical Research

Sample Size Calculations in Clinical Research

Bacchetti, Peter Ph.D.*; Leung, Jacqueline M. M.D., M.P.H.

Free Access
Article Outline
Collapse Box

Author Information

To the Editor:—
We write to make the case that the practice of providing a priori sample size calculations, recently endorsed in an Anesthesiology editorial, 1 is in fact undesirable. Presentation of confidence intervals serves the same purpose, but is superior because it more accurately reflects the actual data, is simpler to present, addresses uncertainty more directly, and encourages more careful interpretation of results. The clinical trial report 2 lauded in the editorial in fact serves to illustrate the drawbacks of sample size calculation as a data analysis tool. The a priori calculation presented is based on assumptions about length of stay (normally distributed with a SD of 4.5 days) that did not hold in the actual data, an analysis (comparison of mean length of stay between two groups by t test) that was not presented, and a sample size that was not attained. It therefore does not help the reader interpret the results, which is the proper goal when reporting on a study that has been completed. The post hoc power calculation presented retains most of these deficiencies, and therefore does not help the reader to assess the strength of evidence against a 1.0-day mean advantage for one treatment versus another. In contrast, a confidence interval for the difference in means would directly address this issue. Although the presence of outliers would require a bootstrapping method 3 to obtain a valid confidence interval for a difference in means, this bit of extra effort is certainly worthwhile for the central issue of a study, and in any case, much better than relying on convoluted reasoning with invalid power approximations.
Perhaps the worst aspect of reporting sample size or power calculations is that it encourages interpretation of studies’ results based only on P values, in particular the widespread fallacy of interpreting P > 0.05 as proving the null hypothesis. The other article 4 cited by the editorial provides a glaring example of this type of reasoning, concluding that reporting of sample size calculations did not change over time in any journal but did increase overall (see their fig. 2). Returning to the clinical trial report, consider the statement that death rates “were similar” in the four subgroups. While this is an accurate characterization of what was actually observed, unsophisticated readers are liable to interpret this (contrary to the authors’ intentions) to mean that the study found strong evidence against any substantial difference in death rates. In fact, the exact 5 95% confidence interval around the odds ratio for death comparing intravenous versus epidural postoperative analgesia goes from 0.36 to 5.4, which is wide enough to make clear to most readers that this study by itself provides only very weak evidence against a clinically important difference in death rates.
We urge reviewers, editors, and quality studies to give authors full credit for providing confidence intervals instead of sample size calculations in reports of completed studies. Indeed, for the reasons illustrated here, it would be best to discourage the practice of using sample size and power calculations as substitutes for more direct assessment of uncertainty using confidence intervals.
Peter Bacchetti, Ph.D.*
Jacqueline M. Leung, M.D., M.P.H.
Back to Top | Article Outline


1. Todd MM: Clinical research manuscripts in Anesthesiology. A nesthesiology 2001; 95: 1051–3

2. Norris EJ, Beattie C, Perler BA, Martinez EA, Meinert CL, Anderson GF, Grass JA, Sakima NT, Gorman R, Achuff SC, Martin BK, Minken SL, Williams GM, Traystman RJ: Double-masked randomized trial comparing alternate combinations of intraoperative anesthesia and postoperative analgesia in abdominal aortic surgery. A nesthesiology 2001; 95: 1054–67

3. Efron B, Tibshirani RJ: An Introduction to the Bootstrap. London, Chapman and Hall, 1993

4. Pua HL, Lerman J, Crawford MW, Wright JG: An evaluation of the quality of clinical trials in anesthesia. A nesthesiology 2001; 95: 1068–73

5. Mehta CR, Patel NR: Exact logistic regression: theory and examples. Stat Med 1995; 14: 2143–60

© 2002 American Society of Anesthesiologists, Inc.

Publication of an advertisement in Anesthesiology Online does not constitute endorsement by the American Society of Anesthesiologists, Inc. or Lippincott Williams & Wilkins, Inc. of the product or service being advertised.

Article Tools