Systematic reviews are essential for summarizing and clarifying existing data and help avoid unnecessary duplication of prior work. For example, 64 randomized controlled trials were conducted between 1987 and 2002 to study the effect of the drug aprotinin on perioperative blood transfusions. A systematic review and cumulative meta-analysis of these studies by Fergusson et al.10 revealed that clear benefit of the drug had been unequivocally established by the twelfth published study but that subsequent trials failed to cite and consider the data from prior studies. Thus, 52 unnecessary randomized controlled trials were funded, conducted, and published, resulting both in wasted resources and unethical treatment of patients who did not receive the drug due to unnecessary randomization. Citing this particular example, the editors of The Lancet changed their publication policy to require that all new randomized controlled trials submitted for publication cite or include a proper systematic review of the existing literature relevant to their research.2 It is important, therefore, that all readers of and contributors to the medical literature become familiar with the principles and methods of systematic review.
HOW TO START A SYSTEMATIC REVIEW
Writing a systematic review typically follows these five fundamental steps: (1) formulating a question; (2) conducting a literature search; (3) refining the search by applying predetermined inclusion and exclusion criteria; (4) extracting the appropriate data and assessing their quality and validity; and (5) synthesizing, interpreting, and reporting the data. An example of a study following these five steps is described in Table 3.
FORMULATING A QUESTION:
Systematic reviews are carried out to answer a focused question related to the biology, diagnosis, treatment, or outcome of a specific disease or condition. Formulating the right question is the crucial first step but is not always straightforward. It is often helpful to begin with a generalized, free-form question. The authors should ask themselves four structured questions: (1) What is the population of interest? (2) What are the interventions being considered? (3) What are the outcomes of interest? (4) What study designs are appropriate to answer the question?11
How we focus and refine the structured question will directly determine the inclusion and exclusion criteria of the systematic review, the number of relevant studies available, and whether the results will apply to the general patient population. If we focus the question too narrowly, the number of included studies and patients may be small and the precision of the review will be low. If we leave the question broad, we may capture many more studies and patients, but may fail to detect important relationships between subgroups of patients and the outcomes of interest. For instance, if preoperative antibiotics reduce the infection rate in obese women but not in nonobese women, the benefit to obese patients may not be detected if all patients are included in the final analysis. Although the relationship between certain risk factors and outcome, such as obesity and infection, can be investigated using posthoc subgroup analysis, this approach is not considered rigorous because it can lead to spurious findings. A systematic review should be based on principles of hypothesis testing, and the hypotheses must be conceived a priori.
Once the study question is formalized, the authors must compose a comprehensive list of inclusion and exclusion criteria to which all retrieved studies will be review. The criteria will depend on the nature of the patient population, the intervention, the outcome, and the type of study to be included. Consensus agreement between two or more investigators should be obtained to formulate a comprehensive and workable list. Too restrictive a list will yield few studies, small numbers of patients, and reduced precision, whereas broad criteria may result in an unmanageable number of studies, poor study quality, and invalid conclusions. To avoid selection bias, inclusion and exclusion criteria should be agreed upon and formalized before data extraction and analysis.
Systematic reviews do not have to be only of randomized controlled trials. Many clinical questions do not lend themselves to randomization and blinding, particularly those pertaining to disease prevalence and risk factors. Ethical and practical considerations dictate that we use observational studies, either prospective cohort, case-control, cross-sectional, or case series, to study such associations.12 Observational studies should also be reviewed systematically and their results combined by meta-analysis to answer questions regarding disease prevalence, risk factors, and adverse effects of treatment.13 Systematic reviews of observational studies now represent half of all published systematic reviews.14
SEARCHING THE LITERATURE
A comprehensive and reproducible literature search is the foundation of a systematic review (Table 4). If clinical trials are designed to answer questions about a population of patients by testing hypotheses on a smaller sample of study patients, then one can think about systematic reviews as clinical trials, but with published studies taking the place of individual patients in a study sample. In a clinical trial, to make valid inferences about the larger patient population, the investigator must be sure that the study sample is truly representative of the larger patient population. In the same way, the conclusions of a systematic review are only valid if the trials included are representative of all trials available on the subject. This can usually be achieved only if all available trials are included in the review, and any omissions may bias the systematic review. Because of large numbers of publications, inaccessibility of journals, inaccurate indexing, and inadequate searching, almost all literature searches and retrievals are subject to bias and omissions; some may be avoidable, and some may not.
Most literature searches now begin on an electronic database of citations, such as MEDLINE, EMBASE, CINHAL, or the Cochrane Central Register of Controlled Trials.15–17 No single database is likely to contain all published studies on a given subject. Scherer et al.18 found that a MEDLINE search captured only about half of all trials available in the literature, and that only 76 percent of clinical trials listed in MEDLINE could actually be identified in journals, a finding which they attributed to inaccurate indexing. Since that study, MEDLINE has undergone a major indexing upgrade and is now felt to be much more accurate and comprehensive than it had been in the past.19 Nonetheless, it is appropriate to search multiple citation databases and combine several search strategies and keywords. Information on comprehensive search strategies can be found in several references.11,20 For completeness, it is recommended that reviewers consult additional citation sources, including the bibliographies of textbook chapters and traditional review articles, conference proceedings, and recognized experts in the field. The bibliographies of articles identified in the initial search should also be checked manually.
The initial list of citations should be purposefully broad and inclusive. Once the list is compiled, the titles can be rapidly searched and compared with the inclusion and exclusion criteria. The abstracts of studies that pass the title search should be retrieved and read. Based on data presented in the abstract, additional nonrelevant articles can be excluded. The complete text of the remaining articles should be retrieved and read, and those not meeting the inclusion criteria should be eliminated. It is important to keep track of the number of articles retrieved at each stage of the process and the number rejected under each inclusion and exclusion criterion. This information is important for the reproducibility of the article extraction process and will be presented as a “citation attrition diagram” or “trial flow diagram” (Fig. 1). Citation retrieval and elimination should be performed independently by at least two investigators.21 Where the reviewers disagree over the inclusion of a study, a decision should be reached by consensus agreement. This will maximize the reliability and reproducibility of the systematic review and minimize omissions and biases.
Despite multiple precautions, bias is inevitably introduced during the process of literature search and selection (Table 5). Because most reviews depend on published articles, the most important form of bias is publication bias, which refers to the selective publication of articles that show positive treatment effects and statistical significance.22 The odds that a clinical trial, observational study, or laboratory-based experiment will be published are three to eight times greater if the results are statistically significant than if they are not, despite the fact that studies that do not reach statistical significance are not of poorer quality.23,24 It is then easy to suppose that a systematic review based on published data alone is more likely to show a positive treatment effect. There are two possible solutions to this problem: (1) a comprehensive search for unpublished studies through a manual search of conference proceedings, correspondence with experts, and a search of clinical trial registries; and (2) voluntary, prospective registration of all pending and in-progress clinical trials in a central database (such as the Cochrane Central registry), where they will remain forever listed independent of publication status and the direction of outcome. Preregistration of clinical trials has many other benefits to the scientific community and is gaining popularity. According to a 2004 meeting of the International Committee of Medical Journal Editors, many mainstream journals, including the Journal of the American Medical Association, require trial registration before the enrollment of any patients.25 Reviewers must keep in mind that the use of data from unpublished, non–peer-reviewed studies is highly controversial.
English-language bias occurs when reviewers exclude papers published in languages other than English, usually to simplify the search and article retrieval process and to eliminate the costs and problems associated with translation. This is a common practice for up to 75 percent of meta-analyses published in English-language journals.26 Depending on subject matter, this strategy may exclude 20 to 50 percent of appropriate clinical trials, for which scientific quality ratings are generally on par with those in the English-language literature,27 and there is evidence that trials with favorable results are more likely to be submitted to international, English-language journals, a practice that compounds the bias in English-only systematic reviews.20 The overall effect of language bias on the validity of the systematic review process is difficult to estimate.28
Hand-searching the bibliographies of published articles, reviews, and textbooks is subject to citation bias. Citation bias is the increased frequency with which studies with significant or positive results are referenced in other publications, compared with studies with inconclusive or negative findings.26
Reviews based on observational studies may be affected by bias in patient selection or treatment allocation. This was found in our systematic review and meta-analysis of distal radius fracture fixation, comparing the outcomes of external fixation and internal fixation.29 Because the review was based on observational studies with nonrandom allocation, it is possible that patients with more severe fractures were preferentially treated with external fixation, because the treating physicians believed that internal fixation would be too difficult for more severe fracture patterns. It is clearly not possible to eliminate all sources of bias. Authors should be aware of the potential sources of bias, minimize them when possible by adhering to scientific principles, and investigate and report their presence for the readers’ consideration.
Once the list of included studies has been compiled, all relevant data must be extracted. This process is labor intensive and demanding, but must be meticulous and accurate. The list of data to be extracted should be agreed upon by a priori consensus during the design stage of the study. After testing the a priori stated hypotheses, it is permissible for the investigators to explore the data for other possible associations, while keeping in mind that positive associations may present themselves randomly and cannot be stated as true associations or causation. If these positive associations have face validity (i.e., make sense), the findings from the exploratory phase of the study should be used to spur future investigations to test additional hypotheses.
Collected data include study characteristics, sample demographics, and outcome data. It is necessary to design a review-specific data extraction form, so that the same data are extracted from each study and missing data are clearly apparent. This will facilitate data entry. To ensure that data extraction is accurate and reproducible, it should be performed by at least two independent readers. If this is not possible, or when the number of retrieved studies is large, a random sample of articles should be selected and subject to repeated data extraction and verification by an independent investigator. Ideally, data extraction should be performed with the investigators blinded to the article’s authors and source.11,19
The validity of a systematic review ultimately depends on the scientific method of the retrieved studies and the reporting of data. Randomized controlled trials are considered to be more rigorous than observational studies, and a review based on well-designed randomized controlled trials will likely be more valid and accurate than a review based on observational studies or case reports.7,30 Likewise, studies that present their data uniformly and use consistently high standards for reporting are better suited for systematic review than studies with incomplete methodology, missing data, and inconsistent outcome measurement. The Quality of Reporting of Meta-Analyses guidelines require that a systematic review assess and report the quality of the studies and data on which it is based. The most common way to assess and report study quality has been using a composite, numerical scoring instrument. More than 35 different quality assessment instruments have been published in the literature, and most are designed for randomized clinical trials. The Jadad score31 and the T. C. Chalmers score19 are two examples of quality assessment instruments. Different assessment scores can be found for different study designs, including randomized controlled trials and prospective cohort studies, and some are designed for specific interventions; to date, however, no assessment scores have been published for retrospective case series. No single score is universally accepted, and utility, validity, and accuracy of quality assessment remain problematic. For these reasons, the common practice of using the numerical quality score as a weighting factor when combining articles by meta-analysis should be avoided.32,33 When an appropriate instrument cannot be found, relevant study characteristics can be summarized qualitatively, including study design, methods of patient recruitment, loss to follow-up, and the presence of bias in patient selection and treatment allocation.
Once the data have been extracted and their quality and validity assessed, the outcomes of individual studies within a systematic review may be pooled and presented as a summary outcome or effect. This is often done to combine the results of inclusive or underpowered studies to arrive at a more precise estimate of the true effect of intervention.
Meta-analysis is a statistical technique for combining the results of independent, but similar, studies to obtain an overall estimate of a treatment effect.34 Meta-analysis combines the quantitative outcomes of an intervention using measures of variability within and between studies. A quantitative synthesis that does not account for variability in the data is not a true meta-analysis. Thus, while all meta-analyses are based on systematic review of the literature, not all systematic reviews necessarily include a meta-analysis. Data that are very conflicting and widely variable should not, under most circumstances, be combined numerically.35 Wide variations in outcomes of similar studies are usually the result of significant underlying differences in study methods, patient samples, data collection, or data reporting. Such differences may not be apparent in the published report, but they will obscure the true effect of the intervention under study. In cases of high study heterogeneity, a narrative synthesis can be substituted, as is frequently the case for systematic reviews based on case series and small-sample trials. The value of the systematic review lies in the methodological search for the underlying causes of heterogeneity, which allows the authors to make evidence-based recommendations for future investigations. The details of meta-analysis and heterogeneity testing are beyond the scope of this article but can be found in several references.19,34,36–38 If a meta-analysis is to be included in a systematic review, an experienced statistician or an epidemiologist should be consulted during all phases of the study.
PRESENTING AND REPORTING
The format and content of a systematic review should follow the recommended guidelines appropriate for the type of study. The Quality of Reporting of Meta-Analyses consensus should be followed for reviews of randomized controlled trials,39 and the Meta-Analysis of Observational Studies in Epidemiology consensus should be followed for other, nonexperimental study designs.12 Details can be found in the appropriate consensus statements.12,39 These guidelines ensure that the information necessary for reproduction and verification of the study is presented in a clear, organized, and accessible fashion. Having completed a very thorough analysis of all existing literature on the subject, the authors are in a unique position to make recommendations and suggestions for future investigation that are based on the best available evidence.
SUMMARY AND RECOMMENDATIONS
Systematic review is a research and analysis tool used to summarize and resolve conflicts in the existing literature. Systematic reviews are rapidly replacing traditional, nonscientific, narrative reviews and are preferred and even required by some mainstream medical journals. Clinicians need to understand the limitations of systematic reviews and how to independently evaluate and even perform a review. Systematic reviews and meta-analyses are forms of original research designed to answer specific questions and require careful planning and execution. If performed correctly, they can be powerful tools.
Neither of the authors has any financial interest in any information presented in this article.
1. Moher, D., Cook, D. J., Eastwood, S., Olkin, I., Rennie, D., and Stroup, D. F. Improving the quality of reports of meta-analyses of randomised controlled trials: The QUOROM statement–-Quality of reporting of meta-analyses. Lancet
354: 1896, 1999.
2. Young, C., and Horton, R. Putting clinical trials into context. Lancet
366: 107, 2005.
3. Cook, D. J., Mulrow, C. D., and Haynes, R. B. Systematic reviews: Synthesis of best evidence for clinical decisions. Ann. Intern. Med.
126: 376, 1997.
4. Mulrow, C. D. The medical review article: State of the science. Ann. Intern. Med.
106: 485, 1987.
5. Bhandari, M., Guyatt, G. H., Montori, V., Devereaux, P. J., and Swiontkowski, M. F. User’s guide to the orthopedic literature: How to use a systematic literature review. J. Bone Joint Surg. (Am.)
84: 1672, 2002.
6. Egger, M., Smith, G. D., and O’Rourke, K. Rationale, potentials, and promise of systematic reviews. In M. Egger, G. D. Smith, and D. G. Altman (Eds.), Systematic Reviews in Health Care: Meta-Analysis in Context
, 2nd Ed. London: BMJ Publishing Group, 2001.
7. Harbour, R., and Miller, J. A new system for grading recommendations in evidence-based guidelines. B.M.J.
323: 334, 2001.
8. Mann, J. J., Apter, A., Bertolote, J., et al. Suicide prevention strategies: A systematic review. J.A.M.A.
294: 2064, 2005.
9. Patsopoulos, N. A., Analatos, A. A., and Ioannidis, J. P. Relative citation impact of various study designs in the health sciences. J.A.M.A.
293: 2362, 2005.
10. Fergusson, D., Glass, K. C., Hutton, B., and Shapiro, S. Randomized controlled trials of aprotinin in cardiac surgery: Could clinical equipoise have stopped the bleeding? Clin. Trials
2: 218, 2005.
11. Khan, K. S., Kunz, R., Kleijnen, J., and Antes, G. Systematic Reviews to Support Evidence-Based Medicine.
London: The Royal Society of Medicine Press, 2003.
12. Stroup, D. F., Berlin, J. A., Morton, S. C., et al. Meta-analysis of observational studies in epidemiology: A proposal for reporting. Meta-Analysis of Observational Studies in Epidemiology (MOOSE) Group. J.A.M.A.
283: 2008, 2000.
13. Egger, M., Schneider, M., and Davey, S. G. Spurious precision? Meta-analysis of observational studies. B.M.J.
316: 140, 1998.
14. Dickersin, K. Systematic reviews in epidemiology: Why are we so far behind? Int. J. Epidemiol.
31: 6, 2002.
18. Scherer, R. W., Dickersin, K., and Langenberg, P. Full publication of results initially presented in abstracts: A meta-analysis J.A.M.A.
272: 158, 1994.
19. Systematic Reviews in Health Care: Meta-Analysis in Context
, 2nd Ed. London: BMJ Publishing Group, 2001.
20. Dickersin, K., Scherer, R., and Lefebvre, C. Identifying relevant studies for systematic reviews. B.M.J.
309: 1286, 1994.
21. Akobeng, A. K. Understanding randomised controlled trials Arch. Dis. Child.
90: 840, 2005.
22. Montori, V., and Guyatt, G. Summarizing the evidence: Publication bias. In R. Hayward (Ed.), User’s Guides Interactive.
Chicago: JAMA Publishing Group, 2002.
23. Dickersin, K., and Min, Y. I. Publication bias: The problem that won’t go away. Ann. N.Y. Acad. Sci.
703: 135, 1993.
24. Easterbrook, P. J., Berlin, J. A., Gopalan, R., and Matthews, D. R. Publication bias in clinical research. Lancet
337: 867, 1991.
25. Deangelis, C. D., Drazen, J. M., Frizelle, F. A., et al. Is this clinical trial fully registered? A statement from the International Committee of Medical Journal Editors. J.A.M.A.
293: 2927, 2005.
26. Egger, M., and Smith, G. D. Bias in location and selection of studies. B.M.J.
316: 61, 1998.
27. Moher, D., Fortin, P., Jadad, A. R., et al. Completeness of reporting of trials published in languages other than English: Implications for conduct and reporting of systematic reviews. Lancet
347: 363, 1996.
28. Moher, D., Pham, B., Klassen, T. P., et al. What contributions do languages other than English make on the results of meta-analyses? J. Clin. Epidemiol.
53: 964, 2000.
29. Margaliot, Z., Haase, S. C., Kotsis, S. V., Kim, H. M., and Chung, K. C. A meta-analysis of outcomes of external fixation versus plate osteosynthesis for unstable distal radius fractures. J. Hand Surg. (Am.)
30: 1185, 2005.
30. Atkins, D., Eccles, M., Flottorp, S., et al. Systems for grading the quality of evidence and the strength of recommendations I: Critical appraisal of existing approaches. The GRADE Working Group. B.M.C. Health Serv. Res.
4: 38, 2004.
31. Jadad, A. R., Moher, D., and Klassen, T. P. Guides for reading and interpreting systematic reviews: II. How did the authors find the studies and assess their quality? Arch. Pediatr. Adolesc. Med.
152: 812, 1998.
32. Berlin, J. A., and Rennie, D. Measuring the quality of trials: The quality of quality scales. J.A.M.A.
282: 1083, 1999.
33. Juni, P., Witschi, A., Bloch, R., and Egger, M. The hazards of scoring the quality of clinical trials for meta-analysis: The hazards of scoring the quality of clinical trials for meta-analysis. J.A.M.A.
282: 1054, 1999.
34. Normand, S. L. Meta-analysis: Formulating, evaluating, combining, and reporting. Stat. Med.
18: 321, 1999.
35. Berlin, J. A. Commentary: Summary statistics of poor quality studies must be treated cautiously. B.M.J.
314: 337, 1997.
36. Berman, N. G., and Parker, R. A. Meta-analysis: Neither quick nor easy. B.M.C. Med. Res. Methodol.
2: 10, 2002.
37. Deeks, J. J., Altman, D. G., and Bradburn, M. J. Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In M. Egger, G. D. Smith, and D. G. Altman (Eds.), Systematic Reviews in Health Care: Meta-Analysis in Context
, 2nd Ed. London: BMJ Publishing Group, 2001.
38. Sutton, A. J., Abrams, K. R., and Jones, D. R. An illustrated guide to the methods of meta-analysis. J. Eval. Clin. Pract.
7: 135, 2001.
©2007American Society of Plastic Surgeons
39. Moher, D., Cook, D. J., Eastwood, S., Olkin, I., Rennie, D., and Stroup, D. F. Improving the quality of reports of meta-analyses of randomised controlled trials: The QUOROM statement. QUOROM Group. Br. J. Surg.
87: 1448, 2000.