The frequency of type 2 error (TTE), or lacking statistical power to demonstrate a difference between two treatment arms, has been well-documented.1–3 Analysis of randomized controlled trials (RCTs) published in Annals of Surgery, Surgery, and Archives of Surgery between 1988 and 1998 with nonstatistically significant or “negative” results determined that more than three-quarters of them are at risk for TTE.4 Subsequent analysis of RCTs published in the same journals between 1999 and 2002 corroborated this finding with a TTE rate exceeding 80%.5 More recent work in 2015 investigating the orthopedic surgical community suggests that >90% of studies reporting a negative primary outcome were at risk for TTE.6 Similar concerns have been raised for observational studies.7
Given the continued persistence of TTE in the surgical literature, it is imperative to recognize its impact on surgical practice. So, what is TTE, why is it important, and what should the surgical community do about it?
TTE arises from the challenge of proving the absence of something. The problem lies in the fact that when we do not find something, we do not know whether it truly does not exist or whether we just have not searched thoroughly enough. In other words, an absence of evidence is not evidence of absence.
TTE is a common occurrence in daily life. To better understand TTE, imagine that you are at home relaxing after a long day in the operating room. Your partner enters and asks you, “Have you seen my keys in the house?” You acquiesce, but limit your search to only the couch. If you then tell your partner that the keys are not in the house, you would be predisposed to TTE. The keys may still be in the house; you just did not search thoroughly enough.
IMPLICATIONS OF TTE IN SURGICAL SCIENCE
The Drug Price Competition and Patent Term Restoration Act of 1984 established the precedent of noninferiority in medicine by upholding an equivalency standard for the introduction of generic medications.8 The surgical community has since adopted this standard for reviewing new treatment modalities without understanding the nuances of equivalency standards and the potential for TTE. The challenge of demonstrating equivalence arises because many people believe that it is tantamount to demonstrating the absence of difference.9 The belief that the absence of difference is synonymous with evidence of equivalence is erroneous. With this noninferiority standard in place, TTE in the surgical literature may inappropriately dissuade future investigations from completing studies that might be better considered as preliminary work, given the limitation in power. Consequently, this misinterpretation risks the adoption of new interventions that may in fact be inferior to those currently in use.
SOLUTIONS TO MITIGATE TTE: DISCLOSURE OF STATISTICAL POWER
Traditional teaching for addressing TTE is to increase sample size, to reach an unspoken and arbitrary assumption of 80% power. This standard is similar to the arbitrary P value of 0.05 to define statistical significance. However, outside of population-level database analyses, it is nearly impossible to achieve sample sizes comparable to those in medical trials to reach 80% power.9 We believe that studies that do not reach 80% power can still be informative as long as they properly disclose the limitation of statistical power.
For the individual surgeon-scientist, the solution is to properly report power in their articles. As of yet, explicit power calculations are rare.1,4,5 We believe that power calculations will inform readers to better understand the limitations, implications, and applications of such research studies. Full disclosure of the limits of our studies will not eliminate TTE, but will effectively mitigate its impact and help prevent erroneous conclusions from being drawn by the readers. For example, instead of merely reporting a finding of “no difference,” future surgeon-scientists should report a negative result with a disclosure of power, “no difference with a power of X”.
Let us revisit our previous TTE analogy. You decide to tell your partner that you limited your search for keys only to the couch (ie, searching only a small part of the house). Your partner therefore accurately concludes that more search efforts are worthwhile within the house. Your disclosure of the limitations of your search prevented your partner from prematurely buying a new set of keys. In other words, reporting the power of a study is analogous to disclosing the extent of a search effort.
A systems-level solution to improve the quality of surgical research is to address the concept of TTE in publishing guidelines of institutional and journal review boards, as there is an ethical consideration of not disclosing power and miscommunicating the limitations of a study. The Consolidated Standards of Reporting Trials (CONSORT) and Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) were developed to standardize the reporting and review of clinical studies to improve their quality. However, although these guidelines have addressed many methodological shortcomings, they currently require only a traditional sample size calculation based on the standard assumption of 80% power.10–12 But, as 80% power is difficult to achieve in surgical studies, we argue that the CONSORT and STROBE guidelines should be modified to include the disclosure of power—even if <80%—with the given sample size and effect size observed in that study.
Academic surgeons must be the gatekeepers of surgical innovation. Although we may want new surgical interventions to be made available to our patients as quickly as possible, we need to recognize that the limitations implied by TTE may lead us to prematurely implement these interventions and inappropriately adopt new strategies. While we value efficiency, we must place equal importance on reasonable certainty. At the very least, we need to begin to convey the uncertainty associated with our studies so that patients and providers can be empowered to make appropriate decisions.
The authors greatly appreciate the assistance of Helen Mayer in contributing to the review of the literature and preliminary drafts.
1. Ko CY, Sack J, Chang JT, et al. Reporting randomized, controlled trials—where quality of reporting may be improved. Dis Colon Rectum
2. Chung KC, Kalliainen LK, Spilson SV, et al. The prevalence of negative studies with inadequate statistical power: an analysis of the plastic surgery literature. Plastd Reconstr Surg
3. Solomon MJ, Laxamana A, Devore L, et al. Randomized controlled trials in surgery. Surgery
4. Dimick JB, Diener-West M, Lipsett PA. Negative results of randomized clinical trials published in the surgical literature - Equivalency or error? Arch Surg
5. Maggard MA, O’Connell JB, Liu JH, et al. Sample size calculations in surgery: are they done correctly? Surgery
6. Sabharwal S, Patel NK, Holloway I, et al. Sample size calculations in orthopaedics randomised controlled trials: revisiting research practices. Acta Orthop Belg
7. Hutton B, Joseph L, Fergusson D, et al. Risks of harms using antifibrinolytics in cardiac surgery: systematic review and network meta-analysis of randomised and observational studies. BMJ
8. 98th Congress. Drug Price Competition and Patent Term Restoration Act of 1984. 1984.
9. Chang DC, Yu PT, Easterlin MC, et al. Demystifying sample-size calculation for clinical trials and comparative effectiveness research: the impact of low-event frequency in surgical clinical research. Surg Endosc
10. Schulz KF, Altman DG, Moher D, et al. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomized trials. Ann Intern Med
11. Piaggio G, Elbourne DR, Pocock SJ, et al. Reporting of noninferiority and equivalence randomized trials extension of the CONSORT 2010 statement. JAMA
12. von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. Int J Surg