Share this article on:

Systematic Reviews of Anesthesiologic Interventions Reported as Statistically Significant: Problems with Power, Precision, and Type 1 Error Protection

Imberger, Georgina MBBS, PhD, FANZCA*†; Gluud, Christian MD, Dr Med Sci*; Boylan, John MB, FRCP(C); Wetterslev, Jørn MD, PhD*

doi: 10.1213/ANE.0000000000000892
Economics, Education, and Policy: Research Report

BACKGROUND: The GRADE Working Group assessment of the quality of evidence is being used increasingly to inform clinical decisions and guidelines. The assessment involves explicit consideration of all sources of uncertainty. One of these sources is imprecision or random error. Many published meta-analyses are underpowered and likely to be updated in the future. When data are sparse and there are repeated updates, the risk of random error is increased. Trial Sequential Analysis (TSA) is one of several methodologies that estimates this increased risk (and decreased precision) in meta-analyses. With nominally statistically significant meta-analyses of anesthesiologic interventions, we used TSA to estimate power and imprecision in the context of sparse data and repeated updates.

METHODS: We conducted a search to identify all systematic reviews with meta-analyses that investigated an intervention that may be implemented by an anesthesiologist during the perioperative period. We randomly selected 50 meta-analyses that reported a statistically significant dichotomous outcome in their abstract. We applied TSA to these meta-analyses by using 2 main TSA approaches: relative risk reduction 20% and relative risk reduction consistent with the conventional 95% confidence limit closest to null. We calculated the power achieved by each included meta-analysis, by using each TSA approach, and we calculated the proportion that maintained statistical significance when allowing for sparse data and repeated updates.

RESULTS: From 11,870 titles, we found 682 systematic reviews that investigated anesthesiologic interventions. In the 50 sampled meta-analyses, the median number of trials included was 8 (interquartile range [IQR], 5–14), the median number of participants was 964 (IQR, 523–1736), and the median number of participants with the outcome was 202 (IQR, 96–443). By using both of our main TSA approaches, only 12% (95% CI, 5%–25%) of the meta-analyses had power ≥80%, and only 32% (95% CI, 20%–47%) of the meta-analyses preserved the risk of type 1 error <5%.

CONCLUSIONS: Most nominally statistically significant meta-analyses of anesthesiologic interventions are underpowered, and many do not maintain their risk of type 1 error <5% if TSA monitoring boundaries are applied. Consideration of the effect of sparse data and repeated updates is needed when assessing the imprecision of meta-analyses of anesthesiologic interventions.

From the *Copenhagen Trial Unit, Centre for Clinical Intervention Research, Rigshospitalet, Copenhagen, Denmark; Department of Anesthesia & Perioperative Medicine, Monash University, Melbourne, Australia; and Department of Anaesthesia, St. Vincent’s Hospital, Dublin, Ireland.

Accepted for publication May 5, 2015.

Funding: None.

Conflict of Interest: See Disclosures at the end of the article.

Reprints will not be available from the authors.

Address correspondence to Georgina Imberger, MBBS, PhD, FANZCA, Copenhagen Trial Unit, Centre for Clinical Intervention Research, Rigshospitalet, Copenhagen University Hospital, Blegdamsvej 9, DK-2100 Copenhagen Ø, Denmark. Address e-mail to gimberger@gmail.com.

The GRADE Working Group’s approach for assessing the quality of evidence is being used increasingly to inform clinical decisions and guidelines.1 The assessment involves explicit consideration of all sources of uncertainty2 by the use of formal systematic review, often with meta-analysis. One of the sources of uncertainty is imprecision or random error.3 GRADE recommends grading down the quality of evidence because of imprecision when there is a small total number of participants, or a small total number of participants with the outcome, and they suggest a consideration of the “optimal information size” to inform this assessment.3

Many published meta-analyses are underpowered. A recent survey in which the author used conventional power analysis and testing for a relative risk reduction (RRR) of 30% showed that 78% of Cochrane meta-analyses with a dichotomous outcome have power <80%, and 50% have power <27%.4 Compared with single clinical trials, meta-analyses usually include more variation in populations and interventions. Many argue that this increased heterogeneity means that more participants are needed in a meta-analysis to achieve a given amount of power.5–12 The problem of sufficient power in meta-analyses may be even larger than that suggested when power analyses intended for a single trial are used.

When meta-analyses are underpowered, there is an increased risk of random error and associated imprecision. This increased risk has been demonstrated by theoretical argument,8,13,14 in simulation studies,11,15–18 and in empirical studies.19 If meta-analyses are small, they are likely to be updated in the future. Indeed, the Cochrane Collaboration recommends that all systematic reviews be updated every 2 years.20 The resulting sequential multiplicity causes a further increase in the risk of random error,11 analogous to the increased risk of error present if interim analyses are done in a single trial using a constant threshold for statistical significance. In a single trial, it has long been accepted that adjustments of thresholds for statistical significance are required for the increased random error caused by sparse data and repetitive testing.21,22 In a single trial, early stopping is accepted as problematic, and monitoring boundaries, incorporating the sample size calculation, are commonly used to control the risk of random error at desired levels and to allow us to make inferential conclusions.

Issues related to sequential multiplicity have received much less attention in the context of meta-analysis.23,24 However, as a methodologic concern, discussion about these issues is increasing.5,6,9–11,13,17,25–28 Several techniques have been proposed, including frequentist methods such as Trial Sequential Analysis (TSA),5,6,9,10 Whitehead’s triangular test,28 and an application of the law of the iterated logarithm,17 as well as Bayesian methods such as a semi-Bayes procedure.25 All these techniques provide more conservative thresholds for statistical significance when meta-analyses are small, aiming to preserve the overall risk of type 1 error, and when the meta-analysis reaches a satisfactory size, at the desired level (usually ≤5%).

As a part of assessment of a body of evidence, and the range of possible effects of an intervention, assessment of power and imprecision is important. In this study, we used TSA to assess the power and the imprecision, allowing for sparse data and repeated updates, of a population of nominally statistically significant meta-analyses of anesthesiologic interventions. Our goal was to estimate how many of these meta-analyses were underpowered, to estimate how many of them maintained precision consistent with statistical significance when using TSA, and to explore the important issue of assessing imprecision in meta-analyses as a part of the overall assessment of the quality of a body of evidence.

Back to Top | Article Outline

METHODS

Search

We searched Medline, EMBASE, and The Cochrane Database of Systematic Reviews until April 1, 2011, with the goal of identifying all published systematic reviews with meta-analyses of randomized clinical trials investigating anesthesiologic interventions. The search strategy we used is listed in Appendix 1.

Back to Top | Article Outline

Sampling of Included Meta-Analyses

We reviewed all the abstracts generated by the aforementioned search and selected the systematic reviews for which there was a description of a meta-analysis that investigated an anesthesiologic intervention. We defined an anesthesiologic intervention as one that may be implemented by an anesthesiologist during the perioperative period. We ordered the selected abstracts alphabetically according to the surname of the first author. To create a random sample representative of the whole population of anesthesiologic systematic reviews, we generated a list of random numbers. In the order dictated by the random numbers, we reviewed each systematic review. We included the systematic review if the abstract quoted a dichotomous meta-analytic outcome with a clear description consistent with statistical significance. We excluded reviews if the full publication contained insufficient data to repeat the meta-analysis or if they were older than 10 years of age at the time of the search. We continued this process until we had a sample of 50 systematic reviews for inclusion. When >1 statistically significant meta-analysis with a dichotomous outcome was reported in the same abstract, we selected the meta-analysis that was presented first.

Back to Top | Article Outline

Data Extraction

Table 1

Table 1

Table 1 describes the data extracted from each included meta-analysis, describes the meta-analysis itself, and lists the trials included.

Back to Top | Article Outline

Trial Sequential Analysis

TSA is a methodology that combines conventional meta-analysis techniques with thresholds for declaring significance in the context of sparse data and repeated updates. The required information size (RIS) is used to estimate whether the size of a meta-analysis is adequate.10,14 RIS is comparable with the optimal information size referred to by GRADE in the context of assessing imprecision3; both estimates use conventional power analysis, incorporating an estimate of the proportion in the control group with the outcome, a specified intervention effect, and chosen risks of type 1 and type 2 errors. In addition, RIS includes an estimation of heterogeneity in the calculation,14,29 reflecting an increased risk of type 2 error when there is a more between-trial variation.

With an estimate of RIS, a threshold, or boundary, can be constructed for statistical significance using an α-spending function, known from interim analyses of single trials.30 The threshold for statistical significance varies, such that it is more conservative when the data are sparse and becomes progressively more lenient as the accrued information gets closer to the RIS.30

The TSA monitoring boundaries are based on the Lan-DeMets α-spending function, which was intended initially for flexible repeated significance testing in a single trial.30–32 The monitoring thresholds are constructed as a function of the strength of the evidence, and the underlying statistical methodology depends on an assumption that data will continue to accumulate until either a monitoring boundary is crossed or the RIS is passed.30–35 The goal of the adjusted thresholds is to maintain the overall risk of type 1 error ≤5%, independent of how many times a hypothesis is repeatedly tested. The α-spending function uses accumulated information as an independent variable (accumulated number of participants in a meta-analysis) and the maximal allowed cumulative type 1 error as the dependent variable.30,31 The maximal allowed cumulative type 1 error describes the amount of error that should be maximum for a given accumulated number of participants (to ensure that the overall risk of type 1 error stays <5%).14

As each new trial is added to the cumulative meta- analysis, a new cumulative z-score is calculated, representing the updated summary test statistic of all the included trials. The cumulative z-score is used to estimate the random error in the data. A greater z-score (a lower P value) is consistent with a lower probability that the data came from a population where the null hypothesis is true. The TSA monitoring thresholds are constructed using z-values for the dependent variable, such that the z-value for a given number of participants corresponds to the maximal allowed cumulative type 1 error for that number of participants. When a z-score calculated from a meta-analysis is higher than a z-score that TSA has calculated as the maximum allowed (at that point), then the TSA monitoring threshold is crossed and statistical significance is declared.

Back to Top | Article Outline

TSA Approaches Used

We used TSA software to conduct the TSA analyses (http://www.ctu.dk/tsa/).14 We used the same meta-analytical technique—effect measure, model, and zero event handling—as the authors of the systematic review from which the meta-analysis originated. We constructed TSA boundaries (2 sided), using the α-spending function, for each included meta-analysis. We created 2 main TSA approaches for each included meta-analysis, by using 2 anticipations of intervention effect, as described under RRR below:

  • Type 1 error: 5%
  • Power: 80%
  • Proportion with the outcome in the control group: Based on the unweighted mean of the proportion of participants with the outcome in the control groups of the included trials
  • RRR:
    1. An a priori 20% RRR
    2. An a priori RRR consistent with the conventional 95% confidence interval (CI) closest to null
  • Heterogeneity adjustment: We used diversity (D2),29 from all the included trials, for heterogeneity adjustment

We performed sensitivity analyses using 2 extra TSA approaches based on the RRRs of 10% and 30%.

Back to Top | Article Outline

Estimation of Power

By using the 2 main TSA approaches described earlier, we used the TSA software to determine the power of each included meta-analysis (http://www.ctu.dk/tsa/).14 We rounded our power estimates to the closest 5% in each decade (e.g., 82% was rounded to 80% and 54% was rounded to 55%). The sample size calculation done using the TSA software is a conventional sample size calculation with an adjustment factor for heterogeneity, estimated using diversity (D2).14,29

Back to Top | Article Outline

Assessment of Imprecision: Will the Type 1 Error Remain <5%?

We considered meta-analyses that crossed the TSA monitoring boundaries for statistical significance of benefit to have preserved the risk of type 1 error <5%. We considered meta-analyses that did not cross the TSA monitoring boundaries to have a risk of type 1 error of >5%. The 2 main TSA approaches used anticipations of intervention effect size that were generically reasonable (RRR, 20%) and specific for the individual meta-analysis (RRR consistent with the conventional 95% CI closest to null). Our primary result was the power and precision achieved when both TSA approaches were used. We also presented the results for the TSA approaches individually and for the sensitivity analyses using other anticipations for intervention effect size (RRR, 10% and RRR, 30%).

Back to Top | Article Outline

RESULTS

Search and Selection of Meta-Analyses

Figure 1

Figure 1

Our searches produced 11,870 titles. Of these abstracts, 682 described systematic reviews with a meta-analysis investigating interventions that might be implemented by an anesthesiologist. Sixty-five percent (95% CI, 57%–73%) of the abstracts clearly described a statistically significant conclusion of benefit of the intervention with a dichotomous outcome. Of the systematic reviews with a statistically significant dichotomous outcome in their abstract, 36% (95% CI, 26%–48%) did not provide sufficient data in the publication to repeat the meta-analysis. Figure 1 summarizes the process that resulted in the random selection of 50 meta-analyses for inclusion.

Back to Top | Article Outline

Characteristics of Included Meta-Analyses

Table 2

Table 2

Table 2 describes the characteristics of the 50 included meta-analyses.36–85 The median number of trials included in the meta-analyses was 8 (interquartile range [IQR], 5–14). The median number of participants was 964 (IQR, 523–1736). The median number of participants with the outcome was 202 (IQR, 96–443).

Back to Top | Article Outline

Estimation of Power

Table 3

Table 3

Table 4

Table 4

Table 3 shows the power estimates for all the included meta-analyses, using the 2 main TSA approaches. Table 4 summarizes the proportions of meta-analyses that achieved power ≥80% and ≤50%. Only 6 of 50 (12%; 95% CI, 5%–25%) of the meta-analyses had power ≥80% for both TSA approaches. Twenty-five of fifty (50%; 95% CI, 37%–63%) of the meta-analysis had power ≤50% for both TSA approaches. Twenty-one of fifty (44%; 95% CI, 30%–59%) had power ≤10% for at least 1 of the 2 main TSA approaches.

Back to Top | Article Outline

Assessment of Imprecision: Will the Type 1 Error Remain <5%?

Table 5

Table 5

Table 3 summarizes the results of the TSAs and whether the type 1 error was preserved <5%. Table 4 summarizes the proportion of meta-analyses that did preserve the type 1 error <5%. Only 16 of 50 (32%; 95% CI, 20%–47%) of the meta-analyses preserved the risk of type 1 error <5% (i.e., crossed the TSA thresholds for significance) using both the main TSA approaches (RRR, 20% and RRR consistent with the conventional 95% confidence limit closest to null). For the sensitivity analyses, using the most conservative estimate of 10% RRR, only 6 of 50 (12%; 95% CI, 5%–25%) of the meta-analyses preserved the risk of type 1 error <5%. Even with a lenient estimate of 30% RRR, only 31 of 50 (62%; 95% CI, 47%–75%) preserved the risk of type 1 error <5%. Table 5 summarizes the results of the TSAs in relation to the number of trials, the number of participants, and the number of participants with the outcome.

Back to Top | Article Outline

DISCUSSION

Using a random sample of systematic reviews of anesthesiologic interventions, this study examined the first statistically significant dichotomous outcome quoted in each abstract. Given that they were presented in the abstracts, these results are likely to be commonly read and considered important. Responses to the descriptions of statistical significance no doubt vary with different readers having different abilities to critique and interpret the imprecision of meta-analysis results. However, as a generalization, results described as statistically significant in an abstract may be interpreted by readers as having an acceptable amount of certainty about the range of possible effects. Our aim was to investigate how many of these meta-analyses were in fact sufficiently powered and how many, considering the effect of sparse data and repeated updates, may be less precise than described by a conventional meta-analysis technique.

Our study demonstrates that lack of power and imprecision are common problems in meta-analyses of anesthesiologic interventions. Using 2 reasonable TSA approaches, most of the meta-analyses were underpowered. Only 12% (95% CI, 5%–25%) of the meta-analyses had power ≥80% for both main TSA approaches, and 50% (95% CI, 37%–63%) had power ≤50%. Regarding precision, when adjusting for the increased risk of random error caused by sparse data and repeated updates, only 32% (95% CI, 20%–47%) of apparently statistically significant conclusions in anesthesiologic meta-analyses preserve a risk of type 1 error <5%.

Our findings deal with the issue of imprecision as a source of uncertainty in a body of evidence. A complete assessment of evidence involves a consideration of other equally important factors as well, as described by GRADE, including the risk of bias in included trials,86 publication bias,87 heterogeneity (or consistency),88 and indirectness.89 Further, the methodology of the systematic review itself may be a source of systematic error, and the techniques used to collect, select, and extract and analyze the evidence also need to be considered.90 All sources of error contribute to uncertainty in the result of a meta-analysis. Translation of evidence into a recommendation, or a clinical decision, involves yet more consideration, including weighing up the importance of the outcome in the meta-analysis relative to other important outcomes.

Our search for eligible systematic reviews represents a strength to this study; it was thorough and sensitive, and we feel confident that we included all of the systematic reviews published in the defined time frame that investigate interventions that may be implemented by anesthesiologists during the perioperative period. We chose our sample from this population using a random sequence of numbers, so our inferential estimates should accurately represent the full population of anesthesia meta-analyses for the time period of our search. As a limitation to our selection process, to maintain consistency, we only included dichotomous outcomes. Our results may not be applicable to continuous outcomes.

The TSA modeling was the source of both strengths and limitations. It is impossible to choose parameters for a TSA approach and to be certain that they are correct. When testing hypotheses using frequentist techniques, one must choose values for the parameters. These choices define the clinical question being asked. The same parameters are also the values that we are trying to determine. For example, the effect size is a parameter that must be predicted to test a hypothesis and the effect size is also the information we seek. Therefore, TSA methodology well illustrates that the parameters are variables in the process of gathering information and that conclusions change as they do. We aimed to use an approach to the TSA modeling that incorporated this innate uncertainty in testing hypotheses, choosing parameters to create the most reasonable TSA approaches for a given clinical question and scenario.

For any individual meta-analysis, parameters should be chosen to best fit the clinical question being asked. In this study, our purpose was more generalized as we examined a large population of meta-analyses and wanted to get a general overview of the lay of the land. For our aim, we chose 2 estimates for effect size. The first was a RRR of 20%, which represented a generic estimate that is often reasonable. The second was the RRR consistent with the conventional 95% confidence limit closest to null, representing an estimate relevant to that individual meta-analysis. Our approaches allowed some coverage of what would be reasonable in an individual meta-analysis, while still allowing a generalized comparison.

We did extend our exploration, including a sensitivity analysis using 2 further estimates for effect size: RRR of 10% and RRR of 30%. Using the most conservative estimate of 10% RRR, only 12% (95% CI, 5%–25%) of the meta-analyses preserved the risk of type 1 error <5%. Even with a lenient estimate of 30% RRR, only 62% (95% CI, 47%–75%) preserved the risk of type 1 error <5%. For a broad range of estimates for effect size, TSA showed that nominally statistically significant meta-analyses of anesthesiologic interventions often have a risk of type 1 error >5%.

Challenges involved with controlling errors in a meta-analysis are similar to those for controlling errors in a single trial.91 TSA uses methodology that has had widespread use and acceptance in the context of interim analyses in a single trial, incorporating a measure of heterogeneity developed to inform the RIS.7,14 Other techniques also have well-argued theoretical groundings, all aiming to create a more stringent threshold for declaring significance when data are sparse and updates are repeated.25,28,36 Using these techniques for meta-analysis, exploration with simulation and empirical study has started,5,6,9–11,15,17–19,92 but issues related to usability and reproducibility need further attention.

Bayesian approaches to cumulative meta-analysis provide an alternative to frequentist methods, offering potential intuitive advantages. Results are communicated as probabilities of a defined hypothesis, given the accumulated data. However, Bayesian approaches have their own set of limitations and complexities, including the challenge of defining a specific alternative hypothesis, and the variation in the previous probabilities of intervention effect and heterogeneity. Some questions will always remain contentious, independent of the method of modeling, including the distinct actual threshold at which an intervention should be used. That is, independent of the method we use, we always have to decide how much certainty is enough.93

Heterogeneity in PICOT (population, intervention, control, outcome, and time), the “apples and oranges” argument, is a well-known objection to meta-analysis. Beyond the concern about the applicability of a meta-analysis result when heterogeneity is substantial, heterogeneity of PICOT and the related parameters is very influential when assessing the risk of random error and precision, making it yet more difficult to reach discrete conclusions. Despite the complexity that it brings, heterogeneity should not be a reason to preclude summarizing bodies of evidence. Rather, we should accept that for each meta-analysis, there are multiple clinical questions that could be asked, with multiple different answers and, importantly, multiple different levels of certainty. The conclusion of each meta-analysis needs to be considered in the context of that meta-analysis, the question it is asking, the assumptions it makes, and the sources of uncertainty.

The oldest meta-analysis found in our search was published in 1990,94 reflecting that meta-analysis is a young methodology, which will no doubt continue to evolve. Assessment of error and improving communication of uncertainty are important challenges, and published systematic reviews are a useful tool to inform discussion and consider improvement. Refutation of the null hypotheses is a cornerstone of the theory of hypothesis testing and inductional thinking. Quoting Sir Karl Popper, “the scientist must consciously and cautiously try to uncover error in order to refute his theories.”95 The important implication of this statement is that research must be designed to allow for null hypotheses with “sufficient room” to discard chosen alternative hypotheses of a realistic size.

Our study demonstrates that most nominally statistically significant meta-analyses in our specialty are underpowered. Moreover, by using a technique that is an international standard for interim analyses in a single trial, many of these meta-analyses do not maintain statistical significance when imprecision is considered in the context of sparse data and repeated updates. The issue of imprecision needs to be considered when interpreting the conclusions of meta-analyses of anesthesiologic interventions.

Back to Top | Article Outline

Appendix 1

The following is the search strategy used in this study.

  1. systematic review.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  2. meta-analysis.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  3. metaanalysis.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  4. 1 or 2 or 3
  5. perioperative.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  6. peri-operative.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  7. intraoperative.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  8. intra-operative.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  9. postoperative.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  10. post-operative.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  11. PACU.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  12. recovery.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  13. post?an?esth*.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  14. ambulatory care.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  15. an?esth*.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  16. post-an?esth*.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  17. postan?esth*.mp. [mp=title, abstract, subject headings, heading word, drug trade name, original title, device manufacturer, drug manufacturer]
  18. 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17
  19. 4 and 18
Back to Top | Article Outline

DISCLOSURES

Author: Georgina Imberger, MBBS, PhD, FANZCA.

Contribution: This author helped design the study, conduct the study, analyze the data, and write the manuscript.

Attestation: Georgina Imberger has seen the original study data, reviewed the analysis of the data, approved the final manuscript, and is the author responsible for archiving the study files.

Conflicts of Interest: The author has no conflicts of interest to declare.

Author: Christian Gluud, MD, Dr Med Sci.

Contribution: This author helped design the study and write the manuscript.

Attestation: Christian Gluud has seen the original study data and approved the final manuscript.

Conflicts of Interest: Christian Gluud developed the TSA methodology.

Author: John Boylan, MB, FRCP(C),

Contribution: This author helped conduct the study, analyze the data, and write the manuscript.

Attestation: John Boylan has seen the original study data, reviewed the analysis of the data, and approved the final manuscript.

Conflicts of Interest: The author has no conflicts of interest to declare.

Author: Jørn Wetterslev, MD, PhD.

Contribution: This author helped design the study, analyze the data, and write the manuscript.

Attestation: Jørn Wetterslev has seen the original study data, reviewed the analysis of the data, and approved the final manuscript.

Conflicts of Interest: Jørn Wetterslev developed the TSA methodology.

This manuscript was handled by: Franklin Dexter, PhD, MD.

Back to Top | Article Outline

REFERENCES

1. Guyatt GH, Oxman AD, Schünemann HJ, Tugwell P, Knottnerus A. GRADE guidelines: a new series of articles in the Journal of Clinical Epidemiology. J Clin Epidemiol. 2011;64:380–2
2. Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R, Brozek J, Vist GE, Falck-Ytter Y, Meerpohl J, Norris S, Guyatt GH. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol. 2011;64:401–6
3. Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, Devereaux PJ, Montori VM, Freyschuss B, Vist G, Jaeschke R, Williams JW Jr, Murad MH, Sinclair D, Falck-Ytter Y, Meerpohl J, Whittington C, Thorlund K, Andrews J, Schünemann HJ. GRADE guidelines 6. Rating the quality of evidence—imprecision. J Clin Epidemiol. 2011;64:1283–93
4. Turner RM, Bird SM, Higgins JP. The impact of study size on meta-analyses: examination of underpowered studies in Cochrane reviews. PLoS One. 2013;8:e59202
5. Brok J, Thorlund K, Gluud C, Wetterslev J. Trial sequential analysis reveals insufficient information size and potentially false positive results in many meta-analyses. J Clin Epidemiol. 2008;61:763–9
6. Brok J, Thorlund K, Wetterslev J, Gluud C. Apparently conclusive meta-analyses may be inconclusive—trial sequential analysis adjustment of random error risk due to repetitive testing of accumulating data in apparently conclusive neonatal meta-analyses. Int J Epidemiol. 2009;38:287–98
7. Pogue J, Yusuf S. Overcoming the limitations of current meta-analysis of randomised controlled trials. Lancet. 1998;351:47–52
8. Pogue JM, Yusuf S. Cumulating evidence from randomized trials: utilizing sequential monitoring boundaries for cumulative meta-analysis. Control Clin Trials. 1997;18:580–93
9. Thorlund K, Devereaux PJ, Wetterslev J, Guyatt G, Ioannidis JP, Thabane L, Gluud LL, Als-Nielsen B, Gluud C. Can trial sequential monitoring boundaries reduce spurious inferences from meta-analyses? Int J Epidemiol. 2009;38:276–86
10. Wetterslev J, Thorlund K, Brok J, Gluud C. Trial sequential analysis may establish when firm evidence is reached in cumulative meta-analysis. J Clin Epidemiol. 2008;61:64–75
11. Borm GF, Donders AR. Updating meta-analyses leads to larger type I errors than publication bias. J Clin Epidemiol. 2009;62:825–30.e10
12. Ioannidis J, Lau J. Evolution of treatment effects over time: empirical insight from recursive cumulative metaanalyses. Proc Natl Acad Sci USA. 2001;98:831–6
13. Bender R, Bunce C, Clarke M, Gates S, Lange S, Pace NL, Thorlund K. Attention should be given to multiplicity issues in systematic reviews. J Clin Epidemiol. 2008;61:857–65
14. Thorlund K, Engstrøm J, Wetterslev J, Brok J, Imberger G, Gluud C User Manual for Trial Sequential Analysis (TSA). 2011 Copenhagen Copenhagen Trial Unit
15. Berkey CS, Mosteller F, Lau J, Antman EM. Uncertainty of the time of first significance in random effects cumulative meta-analysis. Control Clin Trials. 1996;17:357–71
16. Thorlund K, Imberger G, Walsh M, Chu R, Gluud C, Wetterslev J, Guyatt G, Devereaux PJ, Thabane L. The number of patients and events required to limit the risk of overestimation of intervention effects in meta-analysis—a simulation study. PLoS One. 2011;6:e25491
17. Hu M, Cappelleri JC, Lan KK. Applying the law of iterated logarithm to control type I error in cumulative meta-analysis of binary outcomes. Clin Trials. 2007;4:329–40
18. Whitehead A. A prospectively planned cumulative meta-analysis applied to a series of concurrent clinical trials. Stat Med. 1997;16:2901–13
19. Pereira TV, Ioannidis JP. Statistically significant meta-analyses of clinical trials have modest credibility and inflated effects. J Clin Epidemiol. 2011;64:1060–9
20. Higgins JPT, Green S Cochrane Handbook for Systematic Reviews of Interventions.. 2009 Chichester John Wiley & Sons
21. McPherson K. Statistics: the problem of examining accumulating data more than once. N Engl J Med. 1974;290:501–2
22. Bassler D, Montori VM, Briel M, Glasziou P, Guyatt G. Early stopping of randomized clinical trials for overt efficacy is problematic. J Clin Epidemiol. 2008;61:241–6
23. Biester K, Lange S The Multiplicity Problem in Systematic Reviews [Abstract], XIII Cochrane Colloquium; 2005 Oct 22–26. 2005 Melbourne, Australia:153
24. Imberger G, Vejlby AD, Hansen SB, Møller AM, Wetterslev J. Statistical multiplicity in systematic reviews of anaesthesia interventions: a quantification and comparison between Cochrane and non-Cochrane reviews. PLoS One. 2011;6:e28422
25. Higgins JPT, Whitehead A, Simmonds M. Sequential methods for random-effects meta-analysis. Stat Med. 2011;30:903–21
26. Imberger G, Wetterslev J, Gluud C. Trial sequential analysis has the potential to improve the reliability of conclusions in meta-analysis. Contemp Clin Trials. 2013;36:254–5
27. Miladinovic B, Mhaskar R, Hozo I, Kumar A, Mahony H, Djulbegovic B. Optimal information size in trial sequential analysis of time-to-event outcomes reveals potentially inconclusive results because of the risk of random error. J Clin Epidemiol. 2013;66:654–9
28. van der Tweel I, Bollen C. Sequential meta-analysis: an efficient decision-making tool. Clin Trials. 2010;7:136–46
29. Wetterslev J, Thorlund K, Brok J, Gluud C. Estimating required information size by quantifying diversity in random-effects model meta-analyses. BMC Med Res Methodol. 2009;9:86
30. DeMets DL, Lan KK. Interim analysis: the alpha spending function approach. Stat Med. 1994;13:1341–52
31. Lan KK, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–63
32. Jennison C, Turnbull BW Group Sequential Methods with Applications to Clinical Trials. 2010 Boca Raton, FL Taylor & Francis
33. Armitage P. Sequential analysis in therapeutic trials. Annu Rev Med. 1969;20:425–30
34. Pocock SJ. Group sequential methods in the design and analysis of clinical trials. Biometrika. 1977;64:191–9
35. Pocock SJ. Interim analyses for randomized clinical trials: the group sequential approach. Biometrics. 1982;38:153–62
36. Afolabi BB, Lesi FE, Merah NA. Regional versus general anaesthesia for caesarean section. Cochrane Database Syst Rev. 2006:Cd004350
37. Alghamdi AA, Albanna MJ, Guru V, Brister SJ. Does the use of erythropoietin reduce the risk of exposure to allogeneic blood transfusion in cardiac surgery? A systematic review and meta-analysis. J Card Surg. 2006;21:320–6
38. Alhassan MB, Kyari F, Ejere HO. Peribulbar versus retrobulbar anaesthesia for cataract surgery. Cochrane Database Syst Rev. 2008:CD004083
39. Apfel CC, Zhang K, George E, Shi S, Jalota L, Hornuss C, Fero KE, Heidrich F, Pergolizzi JV, Cakmakkaya OS, Kranke P. Transdermal scopolamine for the prevention of postoperative nausea and vomiting: a systematic review and meta-analysis. Clin Ther. 2010;32:1987–2002
40. Bagshaw SM, Galbraith PD, Mitchell LB, Sauve R, Exner DV, Ghali WA. Prophylactic amiodarone for prevention of atrial fibrillation after cardiac surgery: a meta-analysis. Ann Thorac Surg. 2006;82:1927–37
41. Beattie WS, Wijeysundera DN, Karkouti K, McCluskey S, Tait G. Does tight heart rate control improve beta-blocker efficacy? An updated analysis of the noncardiac surgical randomized trials. Anesth Analg. 2008;106:1039–48
42. Bulley S, Derry S, Moore RA, McQuay HJ. Single dose oral rofecoxib for acute postoperative pain in adults. Cochrane Database Syst Rev. 2009:CD004604
43. Carless P, Moxey A, O’Connell D, Henry D. Autologous transfusion techniques: a systematic review of their efficacy. Transfus Med. 2004;14:123–44
44. Carless PA, Rubens FD, Anthony DM, O’Connell D, Henry DA. Platelet-rich-plasmapheresis for minimising peri-operative allogeneic blood transfusion. Cochrane Database Syst Rev. 2011:CD004172
45. Cid J, Lozano M. Tranexamic acid reduces allogeneic red cell transfusions in patients undergoing total knee arthroplasty: results of a meta-analysis of randomized controlled trials. Transfusion. 2005;45:1302–7
46. Cinà CS, Abouzahr L, Arena GO, Laganà A, Devereaux PJ, Farrokhyar F. Cerebrospinal fluid drainage to prevent paraplegia during thoracic and thoracoabdominal aortic aneurysm surgery: a systematic review and meta-analysis. J Vasc Surg. 2004;40:36–44
47. Cohen AT, Hirst C, Sherrill B, Holmes P, Fidan D. Meta-analysis of trials comparing ximelagatran with low molecular weight heparin for prevention of venous thromboembolism after major orthopaedic surgery. Br J Surg. 2005;92:1335–44
48. Craven PD, Badawi N, Henderson-Smart DJ, O’Brien M. Regional (spinal, epidural, caudal) versus general anaesthesia in preterm infants undergoing inguinal herniorrhaphy in early infancy. Cochrane Database Syst Rev. 2003:CD003669
49. Cunningham M, Bunn F, Handscomb K. Prophylactic antibiotics to prevent surgical site infection after breast cancer surgery. Cochrane Database Syst Rev. 2006:Cd005360
50. Davies RG, Myles PS, Graham JM. A comparison of the analgesic efficacy and side-effects of paravertebral vs epidural blockade for thoracotomy—a systematic review and meta-analysis of randomized trials. Br J Anaesth. 2006;96:418–26
51. Derry S, Barden J, McQuay HJ, Moore RA. Single dose oral celecoxib for acute postoperative pain in adults. Cochrane Database Syst Rev. 2008:Cd004233
52. Derry C, Derry S, Moore RA, McQuay HJ. Single dose oral naproxen and naproxen sodium for acute postoperative pain in adults. Cochrane Database Syst Rev. 2009:CD004234
53. Giglio MT, Marucci M, Testini M, Brienza N. Goal-directed haemodynamic therapy and gastrointestinal complications in major surgery: a meta-analysis of randomized controlled trials. Br J Anaesth. 2009;103:637–46
54. Gurusamy KS, Li J, Sharma D, Davidson BR. Cardiopulmonary interventions to decrease blood loss and blood transfusion requirements for liver resection. Cochrane Database Syst Rev. 2009:Cd007338
55. Handoll HH, Koscielniak-Nielsen ZJ. Single, double or multiple injection techniques for axillary brachial plexus block for hand, wrist or forearm surgery. Cochrane Database Syst Rev. 2006:CD003842
56. Handoll HH, Farrar MJ, McBirnie J, Tytherleigh-Strong G, Milne AA, Gillespie WJ. Heparin, low molecular weight heparin and physical methods for preventing deep vein thrombosis and pulmonary embolism following surgery for hip fractures. Cochrane Database Syst Rev. 2002:CD000305
57. Henry DA, Carless PA, Moxey AJ, O’Connell D, Stokes BJ, Fergusson DA, Ker K. Anti-fibrinolytic use for minimising perioperative allogeneic blood transfusion. Cochrane Database Syst Rev. 2011:CD001886
58. Hurley RW, Cohen SP, Williams KA, Rowlingson AJ, Wu CL. The analgesic effects of perioperative gabapentin on postoperative pain: a meta-analysis. Reg Anesth Pain Med. 2006;31:237–47
59. Kuratani N, Oi Y. Greater incidence of emergence agitation in children after sevoflurane anesthesia as compared with halothane: a meta-analysis of randomized controlled trials. Anesthesiology. 2008;109:225–32
60. Landoni G, Mizzi A, Biondi-Zoccai G, Bruno G, Bignami E, Corno L, Zambon M, Gerli C, Zangrillo A. Reducing mortality in cardiac surgery with levosimendan: a meta-analysis of randomized controlled trials. J Cardiothorac Vasc Anesth. 2010;24:51–7
61. Lavoie A, Guay J. Anesthetic dose neuraxial blockade increases the success rate of external fetal version: a meta-analysis. Can J Anaesth. 2010;57:408–14
62. Lee A, Fan LT. Stimulation of the wrist acupuncture point P6 for preventing postoperative nausea and vomiting. Cochrane Database Syst Rev. 2009:CD003281
63. Mauermann WJ, Shilling AM, Zuo Z. A comparison of neuraxial block versus general anesthesia for elective total hip replacement: a meta-analysis. Anesth Analg. 2006;103:1018–25
64. McIlroy DR, Myles PS, Phillips LE, Smith JA. Antifibrinolytics in cardiac surgical patients receiving aspirin: a systematic review and meta-analysis. Br J Anaesth. 2009;102:168–78
65. Mhyre JM, Greenfield ML, Tsen LC, Polley LS. A systematic review of randomized controlled trials that evaluate strategies to avoid epidural vein cannulation during obstetric epidural catheter placement. Anesth Analg. 2009;108:1232–42
66. Natanson C, Kern SJ, Lurie P, Banks SM, Wolfe SM. Cell-free hemoglobin-based blood substitutes and risk of myocardial infarction and death: a meta-analysis. JAMA. 2008;299:2304–12
67. Paranjothy S, Griffiths JD, Broughton HK, Gyte GM, Brown HC, Thomas J. Interventions at caesarean section for reducing the risk of aspiration pneumonitis. Cochrane Database Syst Rev. 2010:CD004943
68. Poeze M, Greve JW, Ramsay G. Meta-analysis of hemodynamic optimization: relationship to methodological quality. Crit Care. 2005;9:R771–9
69. Popping DM, Elia N, Marret E, Remy C, Tramer MR. Protective effects of epidural analgesia on pulmonary complications after abdominal and thoracic surgery: a meta-analysis. Arch Surg. 2008;143:990–9
70. Rajagopalan S, Mascha E, Na J, Sessler DI. The effects of mild perioperative hypothermia on blood loss and transfusion requirement. Anesthesiology. 2008;108:71–7
71. Rajeev S, Wong DT. Effect of beta-blockers on perioperative myocardial ischemia in patients undergoing noncardiac surgery. Curr Drug Targets. 2009;10:833–41
72. Richman JM, Joe EM, Cohen SR, Rowlingson AJ, Michaels RK, Jeffries MA, Wu CL. Bevel direction and postdural puncture headache: a meta-analysis. Neurologist. 2006;12:224–8
73. Roy YM, Derry S, Moore RA. Single dose oral lumiracoxib for postoperative pain in adults. Cochrane Database Syst Rev. 2010:CD006865
74. Sanchez-Manuel FJ, Lozano-Garcia J, Seco-Gil JL. Antibiotic prophylaxis for hernia repair. Cochrane Database Syst Rev. 2007:Cd003769
75. Shiga T, Wajima Z, Inoue T, Ogawa R. Magnesium prophylaxis for arrhythmias after cardiac surgery: a meta-analysis of randomized controlled trials. Am J Med. 2004;117:325–33
76. Sun JC, Whitlock R, Cheng J, Eikelboom JW, Thabane L, Crowther MA, Teoh KH. The effect of pre-operative aspirin on bleeding, transfusion, myocardial infarction, and mortality in coronary artery bypass surgery: a systematic review of randomized and observational studies. Eur Heart J. 2008;29:1057–71
77. Tanaka Y, Nakayama T, Nishimori M, Sato Y, Furuya H. Lidocaine for preventing postoperative sore throat. Cochrane Database Syst Rev. 2009:CD004081
78. Thomsen T, Tønnesen H, Møller AM. Effect of preoperative smoking cessation interventions on postoperative complications and smoking cessation. Br J Surg. 2009;96:451–61
79. Toms L, McQuay HJ, Derry S, Moore RA. Single dose oral paracetamol (acetaminophen) for postoperative pain in adults. Cochrane Database Syst Rev. 2008:CD004602
80. Wang G, Bainbridge D, Martin J, Cheng D. The efficacy of an intraoperative cell saver during cardiac surgery: a meta- analysis of randomized trials. Anesth Analg. 2009;109:320–30
81. Whitlock RP, Chan S, Devereaux PJ, Sun J, Rubens FD, Thorlund K, Teoh KH. Clinical benefit of steroid use in patients undergoing cardiopulmonary bypass: a meta-analysis of randomized trials. Eur Heart J. 2008;29:2592–600
82. Wijeysundera DN, Bender JS, Beattie WS. Alpha-2 adrenergic agonists for the prevention of cardiac complications among patients undergoing surgery. Cochrane Database Syst Rev. 2009:CD004126
83. Zaric D, Pace NL. Transient neurologic symptoms (TNS) following spinal anaesthesia with lidocaine versus other local anaesthetics. Cochrane Database Syst Rev. 2009:CD003006
84. Zufferey P, Merquiol F, Laporte S, Decousus H, Mismetti P, Auboyer C, Samama CM, Molliex S. Do antifibrinolytics reduce allogeneic blood transfusion in orthopedic surgery? Anesthesiology. 2006;105:1034–46
85. Zürcher M, Tramèr MR, Walder B. Colonization and bloodstream infection with single- versus multi-lumen central venous catheters: a quantitative systematic review. Anesth Analg. 2004;99:177–82
86. Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, Montori V, Akl EA, Djulbegovic B, Falck-Ytter Y, Norris SL, Williams JW Jr, Atkins D, Meerpohl J, Schünemann HJ. GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias). J Clin Epidemiol. 2011;64:407–15
87. Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, Alonso-Coello P, Djulbegovic B, Atkins D, Falck-Ytter Y, Williams JW Jr, Meerpohl J, Norris SL, Akl EA, Schünemann HJ. GRADE guidelines: 5. Rating the quality of evidence— publication bias. J Clin Epidemiol. 2011;64:1277–82
88. Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, Alonso-Coello P, Glasziou P, Jaeschke R, Akl EA, Norris S, Vist G, Dahm P, Shukla VK, Higgins J, Falck-Ytter Y, Schünemann HJGRADE Working Group. . GRADE guidelines: 7. Rating the quality of evidence—inconsistency. J Clin Epidemiol. 2011;64:1294–302
89. Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, Alonso-Coello P, Falck-Ytter Y, Jaeschke R, Vist G, Akl EA, Post PN, Norris S, Meerpohl J, Shukla VK, Nasser M, Schünemann HJGRADE Working Group. . GRADE guidelines: 8. Rating the quality of evidence—indirectness. J Clin Epidemiol. 2011;64:1303–10
90. Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, Porter AC, Tugwell P, Moher D, Bouter LM. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:10
91. Jakobsen JC, Gluud C, Winkel P, Lange T, Wetterslev J. The thresholds for statistical and clinical significance—a five-step procedure for evaluation of intervention effects in randomised clinical trials. BMC Med Res Methodol. 2014;14:34
92. Jakobsen JC, Wetterslev J, Winkel P, Lange T, Gluud C. Thresholds for statistical and clinical significance in systematic reviews with meta-analytic methods. BMC Med Res Methodol. 2014;14:120
93. Johnson VE. Revised standards for statistical evidence. Proc Natl Acad Sci USA. 2013;110:19313–7
94. Pace NL. Prevention of succinylcholine myalgias: a meta- analysis. Anesth Analg. 1990;70:477–83
95. Popper KR Conjectures and Refutations. The Growth of Scientific Knowledge. 1962 New York Basic Books:412 pp; Science 1963; 140: 643
© 2015 International Anesthesia Research Society