Secondary Logo

Journal Logo

Invited commentaries

When may systematic reviews and meta-analyses be considered reliable?

Afshari, Arash; Wetterslev, Jørn

Author Information
European Journal of Anaesthesiology: February 2015 - Volume 32 - Issue 2 - p 85-87
doi: 10.1097/EJA.0000000000000186
  • Free

This Invited Commentary accompanies the following original article:

Stevanovic A, Rossaint R, Fritz HG, et al. Airway reactions and emergence times in general laryngeal mask airway anaesthesia. A meta-analysis. Eur J Anaesthesiol 2015; 32:106–116.

In a systematic review in this issue of the European Journal of Anaesthesiology, Stevanovic et al.1 compare the event rate of upper airway adverse events and recovery after desflurane in comparison with other volatile anaesthetics and propofol in patients undergoing general surgery with a laryngeal mask airway (LMA). The authors found desflurane comparable with other agents with regard to upper airway reactions, but desflurane appeared to offer a significantly faster emergence from general anaesthesia. Furthermore, the authors advocate the need for large multicentre trials because large inter-study variations were detected. In light of such uncertainty, does inclusion of less than 600 patients in most of the primary and secondary outcome statistics across 13 trials in this systematic review justify drawing firm conclusions or even partially answer the clinical questions of interest? To answer the above questions, one should look beyond the numbers and reflect on some fundamental issues. Systematic reviews and meta-analyses serve the important purpose of identifying potential benefits or harms of an intervention and ideally provide the answers in a rapid and reliable manner. They are often the cornerstone of clinical practice guidelines and may serve as a justification for funding of further trials.2

Authors of systematic reviews are obliged to consider the limitations of the included trials in relation to their design, conduct, analysis and presentation of data. Bias in ‘quality’ assessment of trials included in meta-analyses often leads to result discrepancies between different systematic reviews examining identical questions and between meta-analyses and subsequent large trials.3 This is often due to inclusion of biased results from randomised controlled trials (RCTs). Quality assessment of RCTs can be carried out in many ways such as the use of scales and checklists that assess the trials and provide a score.4 But because most of these scales have methodological shortcomings and often lack transparency, the Cochrane collaboration discourages the use of such scales/scores and recommends the use of the Cochrane risk of bias assessment tool to allow readers to decide on whether they disagree with the authors on the judgements made.5,6 The latter strategy requires a risk of bias judgement by incorporating trial conduct details such as random sequence generation, allocation concealment, blinding of personnel, blinding of outcome assessment, incomplete outcome data, selective reporting and other bias.7,8

Over the years, the quality of published systematic reviews has remained at best questionable despite the introduction of QUOROM (QUality Of Reporting Of Meta-analyses) and the PRISMA (Preferred Reporting Items of Systematic reviews and Meta-Analyses) Statements.9,10 The leading medical journals are partly responsible for the lack of adherence to these quality requirements because they often fail to inform authors and peer-reviewers of their responsibilities and fail to demand adherence to a research-quality-of-reporting statement in the manuscript.11 This remains the case despite ample evidence to support the fact that introduction and adherence to QUOROM and PRISMA statements have improved the quality of systematic reviews in medical journals.12,13

As a consequence, the validity and contribution of published systematic reviews to clinicians is rightly being put to question by many.14 Interpretation of evidence in case of scarce available data is challenging. Inclusion of a small number of trials or few patients in meta-analysis leads to an increased risk of random error that ultimately may lead to false conclusions.15–17 This uncertainty increases further as more statistical tests are carried out either due to a large number of primary and secondary outcomes within a systematic review or due to emergence of additional data. The latter is often referred to as ‘multiplicity due to multiple and repeated significance testing’.18,19 As systematic reviews and meta-analyses should be subject to updates and thus significance testing, the risk of type I error increases. The actual risk of type 1 error may be as high as 10 to 30%.20,21 Meta-analyses often provide evidence of substantial heterogeneity across included trials (e.g. methods, populations, interventions, duration of intervention) and issues such as between-trial variation may further reduce the precision of their findings.22,23

Computation of statistical power (sample size calculation) is essential in order to carry out a randomised clinical trial based on an expected event rate in the control group with adjustment by an anticipated relative risk reduction in the intervention group and with maximum risks of type I and II errors. Despite a general misperception that systematic reviews and meta-analyses provide high statistical power, the contrary is often the case.24 Furthermore, inclusion of studies without adequate power (small sample sizes) in meta-analyses may act paradoxically and not only contribute very little information but also reduce the overall precision and power of meta-analyses, and as a consequence perhaps even delay decisions on the benefits or harms of an intervention.24,25 Thus, extrapolation of similar thinking from RCTs to systematic reviews translates into application of measures to calculate appropriate and adequate power in order to be able to accept or refute positive findings or nonsignificant results. Inadequate power (small number of patients) and small event rates in meta-analysis tend to overestimate intervention effect estimates not only because of bias but equally importantly due to risk of random error and lack of precision.22 Indeed, evidence suggests that the majority of large treatment effects emerge from small studies, and as additional trials are conducted, these effect sizes are often substantially reduced.26

The required information size in meta-analysis (number of participants) in order to provide a reliable and conclusive estimate can be calculated by application of sequential methods such as trial sequential analysis (TSA).17,18,23 Sequential methods may help to provide information on when firm evidence is reached in a cumulative meta-analysis. In addition, they provide information on a lack of presence of clinical effect in a meta-analysis as early as possible. As a consequence, large numbers of apparently statistically conclusive meta-analyses may become inconclusive (false positives or false negative) due to lack of power.18,21,28 This problem is not to be underestimated because the estimated proportion of false-positive meta-analyses has been reported to be as high as 16 to 37%, which is an alarmingly large figure and has a severe impact on our clinical decision-making and for treatment of our patients.29 When addressing the level of statistical significance, the use of P values and unadjusted 95% confidence intervals (CIs) is generally the norm and the assumption seems to be that a meta-analysis in itself provides all the necessary information and the required number of randomised patients. However, this may be a naive and dangerous presumption because P values and CI are closely related and their interpretation may require estimates of how much information is needed. As in the case with interim-analyses in RCTs, statistics in meta-analyses also need adjustment. Sequential methods such as TSA increase the validity and reliability of CIs when sequentially adjusted.30

The authors of the systematic review in this issue of EJA should be commended on summarising the evidence of the effect of desflurane compared with other anaesthetics on adverse airway events.1 However, we miss an evaluation of the information required for the meta-analysis to be conclusive. Given that all included trials were of low risk of bias, a required information size of 2220 randomised patients would have been necessary to detect or reject a 40% relative risk reduction, equalling an NNT of 23, with the usual choice of type 1 and 2 errors and a diversity of 41%. At present, the meta-analysis of the primary outcome of adverse airway events is merely an interim analysis in this process and the TSA-adjusted 95% CI is 0.32 to 3.92, allowing for both considerable harm and benefit. Further, as only two trials described allocation sequence generation and allocation concealment, all the trials were certainly not free of bias risk and it would have been interesting to estimate the possible bias effect comparing trials with the lowest risk of bias to trials with a high risk of bias. These considerations suggest that the authors are indeed correct in the quest for a new large trial with a low risk of bias, but the authors could have substantiated their demands of how much information was actually gathered in relation to the needed information and how much the risk of bias may have influenced the meta-analysis results.

We believe that it is time to move beyond merely reflecting on unadjusted P values and CI in meta-analyses as if all the necessary information has been reached, and promote the application of sequential methods in systematic reviews adopting an ‘interim analysis’ perspective on meta-analysis. We strongly urge journals, editors and peer-reviewers to exert more attention to quality indicators and demand adherence to the PRISMA guideline and risk of bias assessment in all systematic reviews as a mandatory requirement prior to publication of systematic reviews.

Uncited reference


Acknowledgements relating to this article

Assistance with the Commentary: none.

Financial support or sponsorship: none.

Conflicts of interest: none.

Comment from the Editor: this Invited Commentary was accepted by the editors but was not sent for external peer review.


1. Stevanovic A, Rossaint R, Fritz HG, et al. Airway reactions and emergence times in general laryngeal mask airway anaesthesia. A meta-analysis. Eur J Anaesthesiol 2015; 32:106–116.
2. Swingler GH, Volmink J, Ioannidis JP. Number of published systematic reviews and global burden of disease: database analysis. BMJ 2003; 327:1083–1084.
3. LeLorier J, Grégoire G, Benhaddad A, et al. Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med 1997; 337:536–542.
4. Crowe M, Sheppard L. A review of critical appraisal tools show they lack rigor: alternative tool structure is proposed. J Clin Epidemiol 2011; 64:79–89.
5. Lundh A, Gøtzsche PC. Recommendations by Cochrane Review Groups for assessment of the risk of bias in studies. BMC Med Res Methodol 2008; 8:22.
6. Higgins JPT, Green S, et al. Cochrane handbook for systematic reviews of interventions. Version 5.0.0. Wiley; 2009.
7. Higgins JP, Altman DG, Gøtzsche PC, et al. Cochrane Bias Methods Group; Cochrane Statistical Methods Group. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ 2011; 343:d5928.
8. Hróbjartsson A, Boutron I, Turner L, et al. Assessing risk of bias in clinical trials included in Cochrane Reviews: the why is easy, the how is a challenge. Cochrane Database Syst Rev 2013; 4:ED000058.
9. Moher D, Cook DJ, Eastwood S, et al. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 1999; 354:1896–1900.
10. Moher D, Liberati A, Tetzlaff J, et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ 2009; 339:b2535.
11. Tao KM, Li XQ, Zhou QH, et al. From QUOROM to PRISMA: a survey of high-impact medical journals’ instructions to authors and a review of systematic reviews in anesthesia literature. PLoS One 2011; 6:e27611.
12. Tunis AS, McInnes MD, Hanna R, et al. Association of study quality with completeness of reporting: have completeness of reporting and quality of systematic reviews and meta-analyses in major radiology journals changed since publication of the PRISMA statement? Radiology 2013; 269:413–426.
13. Panic N, Leoncini E, De Belvis G, et al. Evaluation of the endorsement of the preferred reporting items for systematic reviews and meta-analysis (PRISMA) statement on the quality of published systematic review and meta-analyses. PLoS One 2013; 8:e83138.
14. Gagnier JJ, Kellam PJ. Reporting and methodological quality of systematic reviews in the orthopaedic literature. J Bone Joint Surg Am 2013; 95:e771–e777.
15. Flather MD, Farkouh ME, Pogue JM, et al. Strengths and limitations of meta-analysis: larger studies may be more reliable. Control Clin Trials 1997; 18:568–579.
16. Pogue JM, Yusuf S. Cumulating evidence from randomized trials: utilizing sequential monitoring boundaries for cumulative meta-analysis. Control Clin Trials 1997; 18:580–593.
17. Thorlund K, Devereaux PJ, Wetterslev J, et al. Can trial sequential monitoring boundaries reduce spurious inferences from meta-analyses? Int J Epidemiol 2009; 38:276–286.
18. Higgins JP, Whitehead A, Simmonds M. Sequential methods for random-effects meta-analysis. Stat Med 2011; 30:903–921.
19. Imberger G, Vejlby AD, Hansen SB, et al. Statistical multiplicity in systematic reviews of anaesthesia interventions: a quantification and comparison between Cochrane and non-Cochrane reviews. PLoS One 2011; 6:e28422.
20. Borm GF, Donders AR. Updating meta-analyses leads to larger type I errors than publication bias. J Clin Epidemiol 2009; 62:825–830.
21. Brok J, Thorlund K, Gluud C, et al. Trial sequential analysis reveals insufficient information size and potentially false positive results in many meta-analyses. J Clin Epidemiol 2008; 61:763–769.
22. Thorlund K, Imberger G, Walsh M, et al. The number of patients and events required to limit the risk of overestimation of intervention effects in meta-analysis – a simulation study. PLoS One 2011; 6:e25491.
23. Wetterslev J, Thorlund K, Brok J, et al. Estimating required information size by quantifying diversity in random-effects model meta-analyses. BMC Med Res Methodol 2009; 9:86.
24. Hedges LV, Pigott TD. The power of statistical tests in meta-analysis. Psychol Methods 2001; 6:203–217.
25. Turner RM, Bird SM, Higgins JP. The impact of study size on meta-analyses: examination of underpowered studies in Cochrane reviews. PLoS One 2013; 8:e59202.
26. Pereira TV, Horwitz RI, Ioannidis JP. Empirical evaluation of very large treatment effects of medical interventions. JAMA 2012; 308:1676–1684.
27. Wetterslev J, Thorlund K, Brok J, et al. Trial sequential analysis may establish when firm evidence is reached in cumulative meta-analysis. J Clin Epidemiol 2008; 61:64–75.
28. Brok J, Thorlund K, Wetterslev J, et al. Apparently conclusive meta-analyses may be inconclusive – trial sequential analysis adjustment of random error risk due to repetitive testing of accumulating data in apparently conclusive neonatal meta-analyses. Int J Epidemiol 2009; 38:287–298.
29. Pereira TV, Ioannidis JP. Statistically significant meta-analyses of clinical trials have modest credibility and inflated effects. J Clin Epidemiol 2011; 64:1060–1069.
30. Thorlund K, Engstrøm J, Wetterslev J, et al. User manual for trial sequential analysis (TSA). Copenhagen, Denmark: Copenhagen Trial Unit, Centre for Clinical Intervention Research; 2011. pp. 1–115.
© 2015 European Society of Anaesthesiology