Secondary Logo

Journal Logo

A Proposal to Mitigate the Consequences of Type 2 Error in Surgical Science

Bababekov, Yanik J. MD, MPH; Stapleton, Sahael M. MD; Mueller, Jessica L. BA; Fong, Zhi Ven MD, MPH; Chang, David C. PhD, MPH, MBA

doi: 10.1097/SLA.0000000000002547
SURGICAL PERSPECTIVES
Free

Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA.

Reprints: Yanik J. Bababekov, MD, MPH, Department of Surgery, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114-3117. E-mail: ybababekov@partners.org.

The authors declare no conflicts of interest.

The frequency of type 2 error (TTE), or lacking statistical power to demonstrate a difference between two treatment arms, has been well-documented.1–3 Analysis of randomized controlled trials (RCTs) published in Annals of Surgery, Surgery, and Archives of Surgery between 1988 and 1998 with nonstatistically significant or “negative” results determined that more than three-quarters of them are at risk for TTE.4 Subsequent analysis of RCTs published in the same journals between 1999 and 2002 corroborated this finding with a TTE rate exceeding 80%.5 More recent work in 2015 investigating the orthopedic surgical community suggests that >90% of studies reporting a negative primary outcome were at risk for TTE.6 Similar concerns have been raised for observational studies.7

Given the continued persistence of TTE in the surgical literature, it is imperative to recognize its impact on surgical practice. So, what is TTE, why is it important, and what should the surgical community do about it?

Back to Top | Article Outline

TTE

TTE arises from the challenge of proving the absence of something. The problem lies in the fact that when we do not find something, we do not know whether it truly does not exist or whether we just have not searched thoroughly enough. In other words, an absence of evidence is not evidence of absence.

TTE is a common occurrence in daily life. To better understand TTE, imagine that you are at home relaxing after a long day in the operating room. Your partner enters and asks you, “Have you seen my keys in the house?” You acquiesce, but limit your search to only the couch. If you then tell your partner that the keys are not in the house, you would be predisposed to TTE. The keys may still be in the house; you just did not search thoroughly enough.

Back to Top | Article Outline

IMPLICATIONS OF TTE IN SURGICAL SCIENCE

The Drug Price Competition and Patent Term Restoration Act of 1984 established the precedent of noninferiority in medicine by upholding an equivalency standard for the introduction of generic medications.8 The surgical community has since adopted this standard for reviewing new treatment modalities without understanding the nuances of equivalency standards and the potential for TTE. The challenge of demonstrating equivalence arises because many people believe that it is tantamount to demonstrating the absence of difference.9 The belief that the absence of difference is synonymous with evidence of equivalence is erroneous. With this noninferiority standard in place, TTE in the surgical literature may inappropriately dissuade future investigations from completing studies that might be better considered as preliminary work, given the limitation in power. Consequently, this misinterpretation risks the adoption of new interventions that may in fact be inferior to those currently in use.

Back to Top | Article Outline

SOLUTIONS TO MITIGATE TTE: DISCLOSURE OF STATISTICAL POWER

Traditional teaching for addressing TTE is to increase sample size, to reach an unspoken and arbitrary assumption of 80% power. This standard is similar to the arbitrary P value of 0.05 to define statistical significance. However, outside of population-level database analyses, it is nearly impossible to achieve sample sizes comparable to those in medical trials to reach 80% power.9 We believe that studies that do not reach 80% power can still be informative as long as they properly disclose the limitation of statistical power.

For the individual surgeon-scientist, the solution is to properly report power in their articles. As of yet, explicit power calculations are rare.1,4,5 We believe that power calculations will inform readers to better understand the limitations, implications, and applications of such research studies. Full disclosure of the limits of our studies will not eliminate TTE, but will effectively mitigate its impact and help prevent erroneous conclusions from being drawn by the readers. For example, instead of merely reporting a finding of “no difference,” future surgeon-scientists should report a negative result with a disclosure of power, “no difference with a power of X”.

Let us revisit our previous TTE analogy. You decide to tell your partner that you limited your search for keys only to the couch (ie, searching only a small part of the house). Your partner therefore accurately concludes that more search efforts are worthwhile within the house. Your disclosure of the limitations of your search prevented your partner from prematurely buying a new set of keys. In other words, reporting the power of a study is analogous to disclosing the extent of a search effort.

A systems-level solution to improve the quality of surgical research is to address the concept of TTE in publishing guidelines of institutional and journal review boards, as there is an ethical consideration of not disclosing power and miscommunicating the limitations of a study. The Consolidated Standards of Reporting Trials (CONSORT) and Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) were developed to standardize the reporting and review of clinical studies to improve their quality. However, although these guidelines have addressed many methodological shortcomings, they currently require only a traditional sample size calculation based on the standard assumption of 80% power.10–12 But, as 80% power is difficult to achieve in surgical studies, we argue that the CONSORT and STROBE guidelines should be modified to include the disclosure of power—even if <80%—with the given sample size and effect size observed in that study.

Back to Top | Article Outline

CONCLUSION

Academic surgeons must be the gatekeepers of surgical innovation. Although we may want new surgical interventions to be made available to our patients as quickly as possible, we need to recognize that the limitations implied by TTE may lead us to prematurely implement these interventions and inappropriately adopt new strategies. While we value efficiency, we must place equal importance on reasonable certainty. At the very least, we need to begin to convey the uncertainty associated with our studies so that patients and providers can be empowered to make appropriate decisions.

Back to Top | Article Outline

Acknowledgments

The authors greatly appreciate the assistance of Helen Mayer in contributing to the review of the literature and preliminary drafts.

Back to Top | Article Outline

REFERENCES

1. Ko CY, Sack J, Chang JT, et al. Reporting randomized, controlled trials—where quality of reporting may be improved. Dis Colon Rectum 2002; 45:443–447.
2. Chung KC, Kalliainen LK, Spilson SV, et al. The prevalence of negative studies with inadequate statistical power: an analysis of the plastic surgery literature. Plastd Reconstr Surg 2002; 109:1–6.
3. Solomon MJ, Laxamana A, Devore L, et al. Randomized controlled trials in surgery. Surgery 1994; 115:707–712.
4. Dimick JB, Diener-West M, Lipsett PA. Negative results of randomized clinical trials published in the surgical literature - Equivalency or error? Arch Surg 2001; 136:796–800.
5. Maggard MA, O’Connell JB, Liu JH, et al. Sample size calculations in surgery: are they done correctly? Surgery 2003; 134:275–279.
6. Sabharwal S, Patel NK, Holloway I, et al. Sample size calculations in orthopaedics randomised controlled trials: revisiting research practices. Acta Orthop Belg 2015; 81:115–122.
7. Hutton B, Joseph L, Fergusson D, et al. Risks of harms using antifibrinolytics in cardiac surgery: systematic review and network meta-analysis of randomised and observational studies. BMJ 2012; 345:e5798.
8. 98th Congress. Drug Price Competition and Patent Term Restoration Act of 1984. 1984.
9. Chang DC, Yu PT, Easterlin MC, et al. Demystifying sample-size calculation for clinical trials and comparative effectiveness research: the impact of low-event frequency in surgical clinical research. Surg Endosc 2013; 27:359–363.
10. Schulz KF, Altman DG, Moher D, et al. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomized trials. Ann Intern Med 2010; 152:726–732.
11. Piaggio G, Elbourne DR, Pocock SJ, et al. Reporting of noninferiority and equivalence randomized trials extension of the CONSORT 2010 statement. JAMA 2012; 308:2594–2604.
12. von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. Int J Surg 2014; 12:1495–1499.
Keywords:

power; research methods; sample size; surgical science; type 2 error

Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.