The Place of Systematic Reviews in the Hierarchy of Evidence
Broadly defined, evidence is any empirical observation about the association between events. 17 One observation (such as a case report) therefore is evidence but the inferences from that observation are limited to the circumstances and characteristics of the observer and that which was observed. If different observers independently make the same observation (such as a multiple case reports or case series), then inferences about these observations are stronger. As investigators do more complex observations (such as prospective cohort studies) and experiments (such as randomized trials) inferences become even stronger. However, biased estimates of the true nature and magnitude of the association of interest still may be present in carefully designed observations and experiments. Arguably, only the best-designed experiments may achieve enough safeguards against the intrusion of bias and may provide a reasonable estimate of the true association between two events (such as the association between treatment and cure). Herein, the authors suggest a hierarchy of evidence in which evidence is ordered according to the strength of inference: unsystematic observations are at the bottom and methodologically sound randomized clinical trials are at the top of the hierarchy.
A methodologically sound experiment, however, may not estimate with enough precision the true nature of the association between two events if the experiment involves a small number of observations. 18 Also, it may not be applicable to a wide variety of patients and clinical situations. Therefore, it is rare that a small single trial would completely inform clinical decision-making. However, when investigators do multiple small trials they often end up being unable to rule out clinically important differences between two interventions or reaching contradictory results. 20 Therefore, to obtain unbiased and precise estimates of truth, investigators need large-scale randomized evidence. 35 One approach is to do large simple clinical trials that reach widely applicable and precise estimates because of their inclusion of a broad array of participants and their large size, respectively. Similar benefits can be obtained by different methods through summarizing and synthesizing existing evidence.
Traditionally, content experts write summaries of evidence in book chapters or in narrative reviews in medical journals. Although popular with editors and readers, and potentially useful in providing a broad overview of a topic, these reviews have important shortcomings. 21,27 For instance, narrative reviews are (1) less likely to answer a specific clinical question; (2) less likely to contain research that was systematically sought from the literature; (3) more likely to contain research reports selected by the authors; (4) more likely to qualitatively summarize research without reporting the relative weights afforded to different studies; and (5) more likely to make recommendations weighted strongly by opinion, which may or may not reflect the evidence. The process of identifying, selecting, and combining the evidence typically is not set out in a protocol before the review begins, and is not described explicitly in the publication. Readers cannot ascertain the strength of the review methods (as distinct from the methods of the primary studies summarized in the review). Perhaps more importantly, some of these narrative reviews provide a summary of the evidence that is not consistent with the available evidence. A study by Antman and collaborators 3 showed this was the case. When these investigators compared the state of knowledge about the efficacy and safety of thrombolytics after myocardial infarction with concurrent expert recommendations expressed in narrative reviews, experts were not recommending thrombolytics more than a decade after the evidence had confirmed that thrombolytics saved lives and chance was an unlikely explanation for that benefit (p < 0.001). In the same study, the authors showed how experts writing narrative reviews continued to recommend lidocaine prophylaxis after myocardial infarction even though there never had been evidence of benefit. Another finding of this study was the variation between expert reviews: although some recommended the interventions, others contraindicated them, and still others considered them experimental. Therefore, traditional unsystematic narrative reviews and opinions based on unsystematic observations usually are inconsistent with the evidence, lag behind the evidence, and disagree with each other. Therefore, traditional, unsystematic reviews are not an appropriate approach to summarizing the evidence. In addition, their infrequent use of quantitative summaries does little to increase the precision of the estimates of treatment effect from small studies.
The solution would be to do systematic summaries of the all the available evidence with the intent of obtaining an unbiased and precise measure of the true magnitude and direction of the association between events that would be widely applicable. Such systematic summaries are called systematic reviews and the quantitative (statistical) pooling of estimates from individual studies is called meta-analysis. Although the terms may be used interchangeably they should be distinguished. It always is appropriate to summarize the available evidence in a systematic review, but it may not always be desirable or appropriate to quantitatively pool the study results to reach one estimate. Therefore, not all systematic reviews include meta-analyses. The decision to do a meta-analysis to statistically pool individual study results in a systematic review is a fundamental methodologic issue and will be discussed in detail later.
Cochrane, a British physician and epidemiologist, in 1979 stated that “a great criticism of our profession [is] that we have not organized a critical summary, by specialty or subspecialty, adapted periodically, of all relevant randomized controlled trials.”9 With the rediscovery of meta-analysis in psychology in the mid1970s and its attractive application in medicine in the 1980s, the popularity of systematic reviews and meta-analyses increased in the 1980s and 1990s. The Cochrane Collaboration, answering Cochrane’s clarion call in 1992, became a point of contact for researchers interested in producing and disseminating updated systematic reviews of treatment effectiveness. During the past decade, there was an explosion of methodologic research with evidence-based guidelines on how to do 12 and report 22 systematic reviews and meta-analyses, and how to determine whether investigators did a review in accord with sound methodologic principles. 33,34 The Cochrane Collaboration completed 1000 reviews in 2001. Many more systematic reviews produced by other investigators also are available. This is true in surgery in general and in orthopaedics in particular. In a review of orthopaedic literature published in the past 15 years, Bhandari et al 7 showed that there were one or two meta-analyses published per year in the 1980s and eight to 10 meta-analyses published per year in the 1990s.
In general, systematic reviews are observations of previous research and similar to other retrospective observations, they are open to systematic error (also referred to as bias), and to the influence of other variables on the apparent association between the events of interest (also referred to as confounding). In the critical appraisal of meta-analyses in orthopaedics by Bhandari et al, 7 the authors showed that the review methods were of limited quality and that they failed to improve with time. Furthermore, it seemed that studies of lower methodologic quality dealt mostly with surgical questions, were published in surgical journals, and had positive results. Unfortunately, a recent review shows that the methodologic quality of meta-analyses and systematic reviews addressing clinical questions in other disciplines and produced by the Cochrane Collaboration and other authors is poor. 39
In the next section, the authors will review the appropriate methods to complete a methodologically sound systematic review and will discuss frequent pitfalls. Readers interested in doing a systematic review may wish to download and peruse the detailed Reviewers’ Handbook edited by the Cochrane Collaboration and published on their Website. 41 Participation in the Collaboration and association with other investigators with an interest in systematic reviews could be additional steps to increasing skills in doing and interpreting systematic reviews.
Doing a Systematic Review
Table 1 describes doing a systematic review. The first step is to define the review question. This question drives and focuses the process by clearly establishing the patients, intervention, and outcomes of interest. This usually determines the types of study designs that best answer the question. The question also determines the composition of the review team. The review team commonly includes a content area expert, a methodologist, a statistician, and a librarian with expertise in systematic literature searches. The inclusion in the review team of an expert in research methodology (such as a clinical epidemiologist) may be associated with a systematic review of higher methodologic quality, and was important in explaining differences between the quality of published meta-analyses in orthopaedics. 7
Formulating the question typically is an iterative process of narrowing and expanding the focus to make the review feasible. Focusing the question limits the scope of the review to studies that try to estimate a similar true treatment efficacy. As an example, consider a review to determine the effect of all forms of palliative therapy on bone metastases in which authors would like to provide one estimate of the efficacy of all the treatment modalities. Intuitively, this does not seem to be a sensible proposal. First, there are numerous palliative treatments such as radiation, bisphosphonates, and oral morphine that exert their effects through different mechanisms of action. Second, different cancer types respond differently to these treatments (consider the sensitivity to radiation of different cancer types). Third, each treatment has different rates and types of side effects. The patients, their different cancers, and the different palliative treatments have different biologies such that pooling them to find one underlying true effect for all of them on all these patients seems insensible. In contrast, consider the question of pooling all studies of thromboembolism prophylaxis with anticoagulants in patients having knee replacement surgery. Although warfarin and various heparin agents have different mechanisms of action, the similar nature of their effect (to decrease the risk of thrombosis and increase the risk of bleeding by interfering with the coagulation cascade), and the relatively homogenous patient group appear similar enough to generate a pooled estimate of these studies to sensibly estimate the underlying truth about the efficacy of thromboembolism prevention with anticoagulants in patients having knee surgery.
Studies that provide answers to the same narrowly focused question still may yield different estimates of treatment effect. These between-study differences (also referred to as heterogeneity of study results) may be explained by differences in patient characteristics, in experimental and control interventions, in outcome measures and outcome ascertainment, in overall study design and methodologic quality, and by chance. Exploration of these differences often yields new insights and hypotheses. The examination of heterogeneity is a crucial methodologic issue in systematic reviews and meta-analyses and the authors will discuss it further later.
The sine qua non of systematic reviews is the systematic search and identification of studies. Bhandari et al 7 found that 83% of orthopaedic meta-analyses explicitly stated the methods used to search for evidence and that 73% of the meta-analyses had reasonably comprehensive strategies. By systematic, the authors mean the use of a search protocol that lists all potential data sources and multiple (and frequently overlapping) strategies to consult them. The main concern is that important studies could be missed if reviewers only consult a narrow choice of data sources. Electronic databases, such as MEDLINE (a database produced by the United States National Library of Medicine and freely accessible), provide the bulk of data for most systematic reviews. MEDLINE indexes a large number of publications, mostly from the United States (EMBASE, another popular electronic database, has some important overlap with MEDLINE but has greater European and multidisciplinary coverage). For instance, the choice of searching just in MEDLINE also entails searching mostly for studies published in English. Limited searches (such as searching only for studies published in peer-reviewed journals, in English, or very recently) may lead to a biased sample of the available evidence (rather than to all the available evidence) and bias the evidence summary. One form of such bias may result from publication bias. 26
Publication bias refers to the selective publication of research findings based on the magnitude, direction, or statistical significance of the study results. 26 Studies that show small effects or fail to reject the null hypothesis (also known imprecisely as negative studies) tend not to get submitted by their authors, to remain unpublished or to see their publication delayed, and to be published in a language other than English and in journals not indexed in MEDLINE. As a consequence, reviewers may not identify such a study. Publication bias is more likely in small studies; large studies tend to get published regardless of their results. Therefore, systematic reviews summarizing small studies may overestimate the treatment effect by including small studies with positive results and omitting small studies with negative results. Sometimes this overestimation is called postpublication bias. Postpublication bias can be minimized by searching for studies using multiple data sources including multiple electronic sources (MEDLINE, EMBASE, Cochrane CENTRAL, the Federal Drug Administration, and other Internet websites), contacting experts in the field, searching thesis and dissertation databases, reviewing clinical trial registries (such as the National Institutes of Health CRISP database), and handsearching journals in the specialty.
Methodologists have proposed several statistical and graphic strategies to determine whether a review is affected by publication bias. 26 None of these strategies is widely accepted, but the most widely used is the funnel plot. The funnel plot involves a graph of the studies arranged by sample size (Y axis) and effect size (X axis). Such arrangement creates a funnel-shaped distribution of studies with its base on the X axis. The pooled estimate of the effect (dominated by the studies with the largest sample) defines the center of the funnel (its axis) and its position on the X axis. The studies occupy symmetric locations centered on the funnel’s axis. Assuming the studies are fairly similar, a funnel shape arises because smaller studies will tend by chance to vary more widely from the pooled estimate than the larger studies. Smaller studies will take more peripheral positions (overestimating and underestimating the effect) forming the base of the funnel. A paucity of small negative studies (determined graphically or through statistical testing) would suggest publication bias.
Unearthing of unpublished reports is not free of controversy or pitfalls. Several members of the scientific community (such as journal editors in a survey published in 1993 10) think that the quality of unpublished work tends to be poorer, despite some evidence of no difference in quality between published and unpublished (or delayed) papers. 14 Another pitfall results from an interested party’s generous release of a selected (and therefore biased) sample of unpublished studies to meta-analysts from an undisclosed universe of data on file. The approaches to statistically minimize the impact of publication bias are evolving. 36 Although extensive searching and contact of experts in the field, drug manufacturers, regulatory agencies, and other parties represent the best approach to limit postpublication bias, trial registries and prepublication of trial protocols show the most promise as tools to decrease publication bias.
The specific search strategy will vary according to the data source and the topic. Electronic search strategies are complex and structured and include descriptors of the topic, the methods of study (such as the study design), and limits (such as terms to exclude closely related but irrelevant topics or a time frame for the year of publication). Examples of search strategies can be found in the Cochrane Collaboration’s Reviewers’ Handbook. 42 Robinson and Dickersin 37 recently published a new and more sensitive strategy to identify randomized trials to be included in systematic reviews.
The next step in doing a systematic review is to select studies for inclusion in the review. The question usually defines the type of study to include and therefore the inclusion and exclusion criteria. In defining these, as in all previous steps, subject matter experts and research methodologists usually have important input. Inclusion criteria tend to define the study designs of interest and the type of patients, interventions, control interventions, and outcomes that matter to answer the review question. These criteria tend to be broad and inclusive so that the results will be generalizable although still reflecting the focused nature of the question. The exclusion criteria tend to be few and limit the sample of studies to increase the homogeneity and maintain the focus of the review. Reviewers should decide and define the inclusion and exclusion criteria in the review protocol. Bhandari et al 7 showed that 78% of orthopaedic meta-analyses explicitly reported their criteria for inclusion of studies in the review, but only 43% of them were judged as able to avoid bias in the selection of studies for review.
Reviewers face several pitfalls in defining the inclusion and exclusion criteria. Inclusion criteria may be too broad. Reviews with such criteria may end up including heterogeneous studies assessing different truths. More worrisome is when too many exclusion criteria are proposed. These tend to introduce bias by restricting the set of studies to a group of studies that have characteristics linked to effect. For instance, if the researchers stipulate that studies in languages other than English should be excluded, they may be excluding negative studies and producing a review with overly sanguine conclusions. Also, the opportunity to explore certain study characteristics as causes of heterogeneity later in the review could be lost by the use of these characteristics as exclusion criteria. For instance, if the reviewers investigating the prevention of deep vein thrombosis after knee replacement surgery exclude studies in which patients did not start anticoagulation until the day of hospital discharge, they will impair their ability to determine whether the timing of anticoagulation impacts its efficacy. Particular attention should be given to avoid the stipulation of arbitrary exclusions based on the data. This may occur even if the exclusion criteria are formulated a priori because the content experts of the review team and others may be familiar with the available evidence.
It is important to limit investigator bias throughout all aspects of the review. To reiterate the threats to validity, even with a protocol outlining the review methods, reviewing research is a retrospective exercise, and therefore is subject to random and systematic error. The Cochrane Collaboration routinely submits review protocols for peer review and publishes protocols in the Cochrane Library for open and widespread criticism and commentary as another step to avoid bias in the review process. The application of inclusion and exclusion criteria to abstracts and papers in full text should be piloted to ensure that the reviewers apply them uniformly. Usually, two or more reviewers working independently apply these criteria to later quantify their disagreements (sometimes using statistical methods that measure agreement beyond chance like the kappa statistic). Substantial agreement between reviewers (κ > 0.7) indicates that criteria were clear, objective, and consistently applied. Disagreements are resolved frequently by discussion or through arbitration by a third reviewer. Showing reliability of article selection does not eliminate any kind of error, but does decrease the probability of serious error.
Similar safeguards should be in place when abstracting data from the studies. The data to be abstracted should be pertinent to the review question, be specified in the review protocol, and be abstracted systematically. A duplicate independent process should be used, similar to that described for article selection. Data abstraction is best done after a structured form has been developed and piloted by reviewers, to enhance the efficient and accurate abstraction of key information. After data abstractors quantify and resolve disagreements, data are entered into a database (sometimes in duplicate or with multiple verification strategies depending on the data volume and capacity of the review team).
Perhaps the most important element to abstract is the methodologic quality of the primary studies. Two of the methodologic features of randomized trials for which there is empiric evidence of bias include allocation concealment, and blinding. 23,38 Allocation concealment, such that all involved in the study cannot determine the arm to which the next patient enrolled will be allocated, is reported infrequently. Blinding and which parties in the study remained unaware of allocation after randomization and throughout the study also are described poorly. 13,25 In a recent summary, poor methodologic quality of the primary studies accounted for an overestimation of treatment effects by 15% because of inadequate blinding and by 30% because of inadequate concealment of the allocation sequence (in contrast with the impact of the reporting bias the authors discussed in the previous paragraphs which appears not to exceed 10%). 15 Despite its importance, reviewers may encounter challenges in abstracting the methodologic quality elements from each study because many elements are reported poorly in the studies. Bhandari et al 4 reported that 20% of trials provide enough information to ascertain the appropriateness of allocation concealment and that 11% reported on blinding of the outcome assessors (often the only practical blinding procedure in surgical trials). As in previous steps of the review requiring judgment, the provision of duplicate and independent assessments study methodology with quantification and resolution of disagreements by consensus or arbitration minimize the introduction of bias.
Reviewers can report composite quality scores with higher scores representing studies with more safeguards against bias. Many composite scoring tools are available, 19 but recent trends favor a more transparent analysis and reporting of individual quality components (appropriate generation of random allocation sequence, allocation concealment, blinding, completeness of followup, intention-to-treat analysis). In addition to reporting the quality of the studies included in the review, some reviewers may use methodologic aspects to weight studies with more safeguards against bias more heavily during statistical pooling. Others will use these study characteristics to assemble study subgroups (all studies with appropriate allocation concealment) before pooling.
In the foregoing steps, the authors have summarized those necessary to do a systematic review (with or without meta-analysis) to provide a valid answer to the review question. These steps lead to minimizing heterogeneity (pooling apples and oranges) and minimizing bias in the identification, inclusion, and abstraction of studies (including publication and investigator bias). In the next section the authors will define approaches to summarize data, including whether quantitative summaries should be done, what statistical approaches to use, how to do analyses to test possible explanations for differences in treatment effects between studies and subgroup and sensitivity analyses.
When and How to Do Meta-Analysis and Heterogeneity
The objective of the systematic review is to answer a focused question with all the available evidence. This objective entails a tension between defining the question in narrow terms and providing a widely applicable answer. Reviewers often resolve this tension by choosing to answer a broad question with a summary of dissimilar studies. This solution leaves the reviewers with a greater number of studies to summarize which, in turn, increases the precision of the summary. The reviewer also can capitalize on the dissimilitude between studies to find and test possible determinants of a treatment effect among the dissimilar study characteristics. Furthermore, the reviewer may end up with an answer that will be applicable to a wide array of patients and clinical settings. This approach also prevents data-driven focus on a particular subgroup with the danger of obtaining a biased answer to the question. 31 However, the reviewer may be left with studies so different to each other that one summary would not sensibly describe all the included studies.
When is it sensible to pool? This should be determined before doing the review to prevent this decision from being driven by the data. Indeed, this decision often is part of the process of defining the scope of the review question, as discussed above. However, the decision to do a meta-analysis often is revisited after careful analysis of the studies being pooled. If estimates of treatment effect from individual studies are similar then there should be little hesitation to pool (the investigator would be determining the best estimate of the same underlying truth that all the studies being pooled are trying to measure). Reviewers also could encounter studies that differ in their estimates of effect but have widely overlapping confidence intervals suggesting that chance remains a good explanation for between-study differences and pooling would be reasonable. These heuristics appear unsatisfactory and reviewers may seek comfort in formal statistical testing for differences between studies. The null hypothesis for these statistical tests states that there is no difference in results between studies. Unfortunately, these tests tend to have limited power to detect statistically significant heterogeneity when the review includes few studies with few patients in each study. Therefore, tests that do not reject the null hypothesis do not necessarily indicate absence of large heterogeneity. If important heterogeneity exists, the reviewers may decide not to pool and a qualitative summary of the evidence would follow. Alternatively, the investigators may decide it is sensible to pool and an exploration of study characteristics proposed a priori to explain heterogeneity would follow. In the analysis of orthopaedic meta-analysis by Bhandari et al, 7 less than ½ of the meta-analyses included statements stating the rationale for pooling the results, and 70% reported the methods used to pool results.
Explanations for heterogeneity come from two sources of variability. The first is clinical heterogeneity and results from differences in the patients, interventions, and outcomes assessed in each study. Assessment of clinical heterogeneity requires critical appraisal and clinical sensibility, but not a statistical test. The second type of heterogeneity is methodologic heterogeneity, which refers to differences in treatment effects between studies that result from differences in methodologic safeguards against bias. Exploring heterogeneity may shed light over important determinants of treatment effect and increase the scientific and clinical value of the meta-analysis. 44
To explain heterogeneity, reviewers could test whether grouping the studies according to certain characteristics reduces heterogeneity within each subgroup (this is referred to as subgroup analysis or as stratified meta-analysis). If so, then this result indicates that the characteristic that defined the subgroup is related to treatment effect and that the pooled estimate within the subgroup may represent studies in this subgroup more sensibly. An alternative approach is to do a multivariable analysis akin to those done in primary studies to determine the impact of predictor variables over the dependent variable using regression models. Because investigators use characteristics of the studies (and not of individual study participants for instance) as predictor variables and the pooled estimate as the dependent variable, the technique often is called metaregression. Meta-regression uses conventional procedures to incorporate independent variables in the model (including each study’s methodologic characteristics (was the study randomized?) and clinical variables (such as the length of followup and interactions between variables) and tests the ability of the model and its explanatory independent variables to predict the variability of the outcome of interest (the dependent variable). Therefore, variables that provide more predictive information about the dependent variable are likely to be responsible for between-study differences if this variable is not uniform across these studies. Not all reviews include sufficient primary studies for reviewers to be able to do subgroup analyses and meta-regression. It follows that meta-analyses of individual patient data from multiple studies offer the best opportunity to explore heterogeneity. Also, the predictors should be stated a priori to prevent the random nature of the data from steering the analysis toward random and arbitrary selection of predictors and assembly of subgroups.
Once the reviewers decide to pool the results, they need to decide on the appropriate statistical approach. Although a complete discussion of all models available is beyond the scope of this review, the authors will focus on two commonly used models: the fixed-effects model and the random-effects model. 24 The fixed-effects model assumes that if all studies done were to be infinitely large they would all estimate the same underlying true effect (or fixed effect). This means that statistical procedures using a fixed-effects assumption incorporate the variability within each trial but do not incorporate the variability between trials (because there would be none if all studies were sufficiently large). This model weights studies by their sample size (usually by the inverse of a measure of within-study variability) such that large studies will have stronger influence on the pooled estimate. Also because only one source of variability is included, the confidence interval (a measure of precision) of the pooled estimate will be narrow. In contrast, the randomeffects model assumes that each study represents one of several studies distributed around the true underlying effect. Therefore, this model incorporates within-study and between-study variation. If there were important between-study differences (that is, is there heterogeneity), the confidence interval of a random-effects point estimate would be wider than that from a fixed-effects model. Also, smaller studies would have greater representation in random-effects meta-analysis. Therefore, the random-effects meta-analyses are more susceptible to publication bias (when small negative studies are not included in the review because they remain unpublished) and could overestimate the uncertainty (if a small study causes large between-study variability). If the studies are heterogeneous, in general, random-effects should be used given that this model takes into account heterogeneity although fixed-effects models ignore it. However, this choice does not exonerate the reviewer from investigating what led to heterogeneity of study results.
In the review by Bhandari et al, 7 the most common method of pooling in orthopaedic meta-analyses was simple addition across studies, a method that yields invalid answers by failing to account for the study weight (frequently given by measures of sample size and within-study variability) and heterogeneity.
Reviewers may choose to do and present fixed-effects and random-effects models to determine whether the choice of analyses affects the results of the review. This is an example of sensitivity analysis. This type of analysis refers to the reviewers’ exploration of the impact that certain decisions taken at the protocol development, conduct, or reporting stages of the systematic review could have in the pooled estimate of effect. Other examples of sensitivity analysis include consideration of studies in the meta-analysis that were excluded by arbitration, and exclusion of a very large study that could be dominating the pooled estimate (to explore heterogeneity).
Publication of the Systematic Review and Use of Systematic Reviews in Clinical Decision-Making
Because bias could be introduced in the editorial and publication stage of the systematic review, methodologists have reviewed the evidence of bias at this stage and proposed a set of standards for publication of systematic reviews of randomized trials (also called the QUOROM statement 22). The QUOROM statement indicates the necessary elements that should be included in the report to facilitate the evaluation of the methodologic quality of the review, and enough data elements to facilitate the reproducibility of its results by other investigators. For example, to indicate how the large number of original citations was culled down to the final number of studies included, a QUORUM Study Flow Diagram is recommended. Explicit reporting of all steps in the review methods is required, including evidence-based conclusions in the discussion section, and contextualization of the review findings in light of the totality of other available evidence.
The users of systematic reviews and meta-analyses also need tools to locate and critically appraise systematic reviews and meta-analyses. The Cochrane Collaboration plans, conducts, and disseminates systematic reviews of effectiveness. These reviews can be found in the Cochrane Library. 41 Peer-reviewed journals publish systematic reviews and meta-analyses done by other groups. The abstracts for Cochrane and other systematic reviews can be searched in PubMed 28 and in the Database of Abstracts of Reviews of Effectiveness or DARE. 30 Shojania and Bero 40 developed a search strategy that identifies, with great sensitivity, systematic reviews and meta-analyses indexed in PubMed. This search strategy can be found in the Clinical Queries tool of PubMed. 29 Secondary journals (journals that identify the most relevant published literature published in other journals) may represent another source of systematic reviews. The ACP Journal Club, 1 for instance, frequently highlights systematic reviews and meta-analyses. The periodical Clinical Evidence 8 includes summaries of evidence that often highlight these studies. Certain agencies, such as the Agency for Healthcare Research and Quality, 2 frequently commission systematic reviews which then are published in the agency Website. These reviews tend to study health technologies and can be found by searching the Internet (using a search engine such as Google 16 and typing the phrase health technology assessment or meta-analysis and the topic of interest).
Once found, systematic reviews need to be appraised critically to determine the degree with which the methods used limit bias, the results and their precision, and the applicability of the results to the clinical situation that triggered the inquiry. This strategy of appraisal is included in the Users’ Guides Series that were summarized in a book by the members of the Evidence-based medicine working group and the applied appraisal guide to systematic reviews is summarized in Table 2. 32 The Journal of Bone and Joint Surgery 43 periodically publishes similar guides using orthopaedics illustrations. 5,6
Systematic reviews of the available evidence play a fundamental role in establishing what is known and what is not known. They inform clinicians wanting to practice evidence-based medicine. Summaries of evidence also facilitate clinical practice by allowing it to be based on the available evidence and leaving the clinician and the patient with time to weight other elements into the decision such as the patient values and preferences for healthcare and the clinical circumstances. They are valuable teaching tools. Furthermore, most granting agencies demand a systematic review of the literature to critically appraise what has been done before, to build on prior strengths, and create new and necessary knowledge to address residual uncertainty. Therefore, systematic reviews and meta-analyses help to identify areas in need of additional research. 11
The systematic summary of the evidence is an expensive and labor-intensive research effort that cannot be done realistically by clinicians during clinical practice. Clinicians, teachers, and researchers in the discipline of orthopaedics should consider reading and doing not only clinical trials of orthopaedic interventions, but also systematic reviews of previous research. The authors hope this discussion will further that enthusiasm.
1. ACP Journal Club: http://www.acpjc.org
. Accessed November 28, 2002.
2. Agency for Healthcare Research and Quality: Evidence-based Practice. http://www.ahrq.gov/clinic/epcix.htm
. Accessed November 26, 2002.
3. Antman E, Lau J, Kupelnick B, Mosteller F, Chalmers T: A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts: Treatments for myocardial infarction. JAMA 268:240–248, 1992.
4. Bhandari M, Guyatt G, Lochner H, Sprague S, Tornetta P: Application of the Consolidated Standards of Reporting Trials (CONSORT) in the fracture care literature. J Bone Joint Surg 84A:485–489, 2002.
5. Bhandari M, Guyatt GH, Swiontkowski MF: User’s guide to the orthopaedic literature: How to use an article about a surgical therapy. J Bone Joint Surg 83A:916–926, 2001.
6. Bhandari M, Guyatt GH, Swiontkowski MF: User’s guide to the orthopaedic literature: How to use an article about prognosis. J Bone Joint Surg 83A:1555–1564, 2001.
7. Bhandari M, Morrow F, Kulkarni AV, Tornetta III P: Meta-analyses in orthopaedic surgery: A systematic review of their methodologies. J Bone Joint Surg 83A:15–24, 2001.
8. Clinical Evidence: http://www.clinicalevidence.com
. Accessed November 28, 2002.
9. Cochrane A: 1931–1971: A Critical Review, With Particular Reference to the Medical Profession. Medicines for the Year 2000. London, Office of Health Economics l–11, 1979.
10. Cook DJ, Guyatt GH, Ryan G, et al: Should unpublished data be included in meta-analyses? Current convictions and controversies. JAMA 269:2749–2753, 1993.
11. Cook DJ, Mulrow CD, Haynes RB: Systematic reviews: Synthesis of best evidence for clinical decisions. Ann Intern Med 126:376–380, 1997.
12. Cook DJ, Sackett DL, Spitzer WO: Methodologic guidelines for systematic reviews of randomized control trials in health care from the Potsdam Consultation on Meta-Analysis. J Clin Epidemiol 48:167–171, 1995.
13. Devereaux P, Manns B, Ghali W, et al: Physician interpretations and textbook definitions of blinding terminology in randomized controlled trials. JAMA 285:2000–2003, 2001.
14. Egger M, Dickersin K, Davey Smith G: Problems and Limitations in Conducting Systematic Reviews. In Egger M, Davey Smith G, Altman D (eds). Systematic Reviews in Health Care Meta-Analyses in Context. London, BMJ Publishing Group 43–68, 2001.
15. Egger M, Ebrahim S, Davey Smith G: Where now for meta-analysis? Int J Epidemiol 31:1–5, 2002.
16. Google: http://www.google.com
. Accessed November 28, 2002.
17. Guyatt G, Haynes R, Jaeshke R, et al: The Philosophy of Evidence-Based Medicine. In Guyatt G, Rennie D (eds). Users’ Guides to the Medical Literature: A Manual For Evidence-Based Clinical Practice. Chicago, American Medical Association Press 3–12, 2002.
18. Ioannidis J, Lau J: Evolution of treatment effects over time: Empirical insight from recursive meta-analyses. Proc Nat Acad Sci USA 98:831–836, 2001.
19. Juni P, Altman D, Egger M: Assessing the Quality of Randomized Controlled Trials. In Egger M, Davey Smith G, Altman D (eds). Systematic Reviews in Health Care: Meta-Analysis in Context. London, BMJ Publishing Group 87–108, 2001.
20. Lochner HV, Bhandari M, Tornetta III P: Type-II error rates (beta errors) of randomized trials in orthopaedic trauma. J Bone Joint Surg 83A:1650–1655, 2001.
21. McAlister FA, Clark HD, van Walraven C, et al: The medical review article revisited: Has the science improved? Ann Intern Med 131:947–951, 1999.
22. Moher D, Cook DJ, Eastwood S, et al: Improving the quality of reports of meta-analyses of randomised controlled trials: The QUOROM statement: Quality of Reporting of Meta-analyses. Lancet 354:1896–1900, 1999.
23. Moher D, Pham B, Jones A, et al: Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 352:609–613, 1998.
24. Montori V, Guyatt G, Oxman A, Cook D: FixedEffects and Random-Effects Models. In Guyatt G, Rennie D (eds). Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. Chicago, American Medical Association Press 539–545, 2001.
25. Montori VM, Bhandari M, Devereaux PJ, et al: In the dark: The reporting of blinding status in randomized controlled trials. J Clin Epidemiol 55:787–790, 2002.
26. Montori VM, Smieja M, Guyatt GH: Publication bias: A brief review for clinicians. Mayo Clin Proc 75:1284–1288, 2000.
27. Mulrow CD: The medical review article: State of the science. Ann Intern Med 106:485–488, 1987.
28. National Library of Medicine: PubMed. http://www.pubmed.gov
. Accessed November 28, 2002.
29. National Library of Medicine: PubMed Clinical Queries. http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.html
. Accessed November 28, 2002.
30. NHS Centre for Reviews and Dissemination: Database of Abstracts of Reviews of Effectiveness (DARE). http://nhscrd.york.ac.uk/darehp.htm
. Accessed November 28, 2002.
31. Oxman A, Guyatt G: When to Believe a Subgroup Analysis. In Guyatt G, Rennie D (eds). Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. Chicago, American Medical Association Press 553–565, 2002.
32. Oxman A, Guyatt G, Cook D, Montori V: Summarizing the Evidence. In Guyatt G, Rennie D (eds). Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. Chicago, American Medical Association Press 155–173, 2002.
33. Oxman AD, Guyatt GH: Validation of an index of the quality of review articles. J Clin Epidemiol 44:1271–1278, 1991.
34. Oxman AD, Guyatt GH, Singer J, et al: Agreement among reviewers of review articles. J Clin Epidemiol 44:91–98, 1991.
35. Peto R, Collins R, Gray R: Large-scale randomized evidence: Large, simple trials and overviews of trials. J Clin Epidemiol 48:23–40, 1995.
36. Pham B, Platt R, McAuley L, Klassen T, Moher D: Is there a “best” way to detect and minimize publication bias? An empirical evaluation. Eval Health Prof 24:109–125, 2001.
37. Robinson K, Dickersin K: Development of a highly sensitive search strategy for the retrieval of reports of controlled trials using PubMed. Int J Epidemiol 31:150–153, 2002.
38. Schulz K, Chalmers I, Hayes R, Altman D: Empirical evidence of bias: Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273:408–412, 1995.
39. Shea B, Moher D, Graham I, Pham B, Tugwell P: A comparison of the quality of Cochrane reviews and systematic reviews published in paper-based journals. Eval Health Prof 25:116–129, 2002.
40. Shojania K, Bero L: Taking advantage of the explosion of systematic reviews: An efficient MEDLINE search strategy. Eff Clin Pract 4:157–162, 2001.
41. The Cochrane Collaboration: http://www.cochrane.org
. Accessed November 28, 2002.
42. The Cochrane Collaboration: Reviewers’ Handbook. http://www.cochrane.dk/cochrane/handbook/handbook.htm
. Accessed November 28, 2002.
43. The Journal of Bone and Joint Surgery: http://www.ejbjs.org
. Accessed November 28, 2002.
44. Thompson S: Why and How Sources of Heterogeneity Should be Investigated. In Egger M, Davey Smith G, Altman D (eds). Systematic Reviews in Health Care Meta Analysis in Context. London, BMJ Publishing Group 157–175, 2001.
Mohit Bhandari, MD, MSc; and Paul Tornetta, III, MD—Guest Editors