As the name suggests, evidence-based medicine is about finding evidence and using that evidence to make clinical decisions. A cornerstone of evidence-based medicine is the hierarchical system of classifying evidence. This hierarchy is known as the levels of evidence. Physicians are encouraged to find the highest level of evidence to answer clinical questions. Several articles published in plastic surgery journals concerning evidence-based medicine topics have touched on this subject.1–6 Specifically, previous articles have discussed the lack of higher level evidence in Plastic and Reconstructive Surgery and the need to improve the evidence published in the Journal. Before that can be accomplished, it is important to understand the history behind the levels and how they should be interpreted. This article focuses on the origin of levels of evidence, their relevance to the evidence-based medicine movement, and the implications for the field of plastic surgery and the everyday practice of plastic surgery.
HISTORY OF LEVELS OF EVIDENCE
The levels of evidence were originally described in a report by the Canadian Task Force on the Periodic Health Examination in 1979.7 The report's purpose was to develop recommendations on the periodic health examination and base those recommendations on evidence in the medical literature. The authors developed a system of rating evidence (Table 1) when determining the effectiveness of a particular intervention. The evidence was taken into account when grading recommendations. For example, a grade A recommendation was given if there was good evidence to support a recommendation that a condition be included in the periodic health examination. The levels of evidence were further described and expanded by Sackett8 in an article on levels of evidence for antithrombotic agents in 1989 (Table 2). Both systems place randomized controlled trials at the highest level and case series or expert opinions at the lowest level. The hierarchies rank studies according to the probability of bias. Randomized controlled trials are given the highest level because they are designed to be unbiased and have less risk of systematic errors. For example, by randomly allocating subjects to two or more treatment groups, these types of studies also randomize confounding factors that may bias results. A case series or expert opinion is often biased by the author's experience or opinions, and there is no control of confounding factors.
MODIFICATION OF LEVELS
Since the introduction of levels of evidence, several other organizations and journals have adopted variations of the classification system. Diverse specialties are often asking different questions, and it was recognized that the type and level of evidence needed to be modified accordingly. Research questions are divided into the following categories: treatment, prognosis, diagnosis, and economic/decision analysis. For example, Table 3 shows the levels of evidence developed by the American Society of Plastic Surgeons for prognosis9 and Table 4 shows the levels developed by the Centre for Evidence-Based Medicine for treatment.10 The two tables highlight the types of studies that are appropriate for the question (prognosis versus treatment) and how quality of data is taken into account when assigning a level. For example, randomized controlled trials are not appropriate when looking at the prognosis of a disease. The question in this instance is, “What will happen if we do nothing at all?” Because a prognosis question does not involve comparing treatments, the highest evidence would come from a cohort study or a systematic review of cohort studies. The levels of evidence also take into account the quality of the data. For example, in the chart from the Centre for Evidence-Based Medicine, a poorly designed randomized controlled trial has the same level of evidence as a cohort study.
A grading system that provides strength of recommendations based on evidence has also changed over time. Table 5 shows the Grade Practice Recommendations developed by the American Society of Plastic Surgeons. The grading system provides an important component in evidence-based medicine and assists in clinical decision making. For example, a strong recommendation is given when there is level I evidence and consistent evidence from level II, III, and IV studies available. The grading system does not degrade lower level evidence when deciding recommendations if the results are consistent.
INTERPRETATION OF LEVELS
Many journals assign a level to the articles they publish, and authors often assign a level when submitting an abstract to conference proceedings. This allows the reader to know the level of evidence of the research, but the designated level of evidence does always guarantee the quality of the research. It is important that readers not assume that level I evidence is always the best choice or appropriate for the research question. This concept will be very important for all of us to understand as we evolve into the field of evidence-based medicine in plastic surgery. By design, our designated surgical specialty will always have important articles that may have a lower level of evidence because of the level of innovation and technique articles that are needed to move our surgical specialty forward.
Although randomized controlled trials are often assigned the highest level of evidence, not all randomized controlled trials are conducted properly, and the results should be scrutinized carefully. Sackett8 stressed the importance of estimating types of errors and the power of studies when interpreting results from randomized controlled trials. For example, a poorly conducted randomized controlled trial may report a negative result because of low power when in fact a real difference exists between treatment groups. Scales such as the Jadad scale have been developed to judge the quality of randomized controlled trials.11 Although physicians may not have the time or inclination to use a scale to assess quality, there are some basic items that should be taken into account. Items used for assessing randomized controlled trials include randomization, blinding, a description of the randomization and blinding process, a description of the number of subjects who withdrew or dropped out of the study, the confidence intervals around study estimates, and a description of the power analysis. For example, Bhandari et al.12 published an article assessing the quality of surgical randomized controlled trials. The authors evaluated the quality of randomized controlled trials reported in the Journal of Bone and Joint Surgery from 1988 to 2000. Articles with a score of greater than 75 percent were deemed high quality, and 60 percent of the articles had a score less than 75 percent. The authors identified 72 randomized controlled trials during this time period, and the mean score was 68 percent. The main reason for the low-quality score was lack of appropriate randomization, blinding, and a description of patient exclusion criteria. Another article found the same quality score of articles in the Journal of Bone and Joint Surgery with a level 1 rating compared with level 2.13 Therefore, one should not assume that level 1 studies are of higher quality than level 2 studies.
A resource for surgeons to use when appraising levels of evidence are the users' guides published in the Canadian Journal of Surgery14,15 and the Journal of Bone and Joint Surgery.16 Similar articles that are not specific to surgery have been published in the Journal of the American Medical Association.17,18
PLASTIC SURGERY AND EVIDENCE-BASED MEDICINE
The field of plastic surgery has been slow to adopt evidence-based medicine. This was demonstrated in an article examining the level of evidence of articles published in Plastic and Reconstructive Surgery.19 The authors assigned levels of evidence to articles published in Plastic and Reconstructive Surgery over a 20-year period. The majority of studies (93 percent in 1983) were level IV or V, which denotes case series and case reports. Although the results were disappointing, there was some improvement over time. By 2003, there were more level I studies (1.5 percent) and fewer level IV and V studies (87 percent). A recent analysis looked at the number of level I studies in five different plastic surgery journals from 1978 to 2009. The authors defined level I studies as randomized controlled trials and meta-analyses and restricted their search to these studies. The number of level I studies increased from one in 1978 to 32 by 2009.20 From these results, we see that the field of plastic surgery is improving the level of evidence but still has a long way to go, especially in improving the quality of studies published. For example, approximately one-third of the studies involved double blinding, but the majority did not randomize subjects, describe the randomization process, or perform a power analysis. Power analysis is another area of concern in plastic surgery. A review of the plastic surgery literature found that the majority of published studies have inadequate power to detect moderate to large differences between treatment groups.21 Regardless of the level of evidence for a study, if the study is underpowered, the interpretation of results is questionable.
Although the goal is to improve the overall level of evidence in plastic surgery, this does not mean that all lower level evidence should be discarded. Case series and case reports are important for hypothesis generation and can lead to more controlled studies. In addition, in the face of overwhelming evidence to support a treatment, such as the use of antibiotics for wound infections, there is no need for a randomized controlled trial.
CLINICAL EXAMPLES USING LEVELS OF EVIDENCE
To understand how the levels of evidence work and aid the reader in interpreting levels, we provide some examples from the plastic surgery literature. The examples also show the peril of medical decisions based on results from case reports.
An association was hypothesized between lymphoma and silicone breast implants based on case reports.22–27 The level of evidence for case reports, depending on the scale used, is IV or V. These case reports were used to generate the hypothesis that a possible association existed. Because of these results, several large retrospective cohort studies from the United States, Canada, Denmark, Sweden, and Finland were conducted.28–32 The level of evidence for a retrospective cohort study is II. All of these studies had many years of follow-up for a large number of patients. Some of the studies found an elevated risk and others found no risk for lymphoma. None of the studies reached statistical significance. Therefore, higher level evidence from cohort studies does not provide evidence of any risk of lymphoma. Finally, a systematic review was performed that combined the evidence from the retrospective cohorts.27 The results found an overall standardized incidence ratio of 0.89 (95 percent confidence interval, 0.67 to 1.18). Because the confidence interval includes 1, the results indicate there is no increased incidence. The level of evidence for the systematic review is I. Based on the best available evidence, there is no association between lymphoma and silicone implants. This example shows how studies with a low level of evidence were used to generate a hypothesis, which then led to higher level evidence that disproved the hypothesis. This example also demonstrates that randomized controlled trials are not feasible for rare events such as cancer and emphasizes the importance of observational studies for a specific study question. A case-control study is a better option and provides higher level evidence for testing the prognosis of the long-term effect of silicone breast implants.
Another example is the injection of epinephrine in fingers. Based on case reports before 1950, physicians were advised that epinephrine injection can result in finger ischemia.33 We see in this example that level IV or V evidence was accepted as fact and incorporated into medical textbooks and teaching. However, not all physicians accepted this evidence and were performing injections of epinephrine into the fingers, with no adverse effects on the hand. Obviously, it was time for higher level evidence to resolve this issue. An in-depth review of the literature from 1880 to 2000 by Denkler33 identified 48 cases of digital infarction, of which 21 had been injected with epinephrine. Further analysis found that the addition of procaine to the epinephrine injection was the cause of the ischemia.34 The procaine used in these injections included toxic acidic batches that were recalled in 1948. In addition, several cohort studies found no complications from the use of epinephrine in the fingers and hand.35–37 The results from these cohort studies increased the level of evidence. Based on the best available evidence from these studies, the hypothesis that epinephrine injection will harm fingers was rejected. This example highlights the biases inherent in case reports. It also shows the risk when spurious evidence is handed down and integrated into medical teaching.
OBTAINING THE BEST EVIDENCE
We have established the need for randomized controlled trials to improve evidence in plastic surgery but have also acknowledged the difficulties, particularly with randomization and blinding. Although randomized controlled trials may not be appropriate for many surgical questions, well-designed and well-conducted cohort or case-control studies could boost the level of evidence. Many of the current studies tend to be descriptive and lack a control group. The way forward seems clear. Plastic surgery researchers need to consider using a cohort or case-control design whenever a randomized controlled trial is not possible. If designed properly, the level of evidence for observational studies can approach or surpass those from a randomized controlled trial. In some instances, observational studies and randomized controlled trials have yielded similar results.38 If enough cohort or case-control studies become available, the prospect of systematic reviews of these studies will increase, which will increase overall evidence levels in plastic surgery.
The levels of evidence are an important component of evidence-based medicine. Understanding the levels and why they are assigned to publications and abstracts helps the reader to prioritize information. This is not to say that all level IV evidence should be ignored and all level I evidence accepted as fact. The levels of evidence provide a guide, and the reader needs to be cautious when interpreting these results.
This work was supported in part by a Midcareer Investigator Award in Patient-Oriented Research (K24 AR053120) from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (to K.C.C.).
1. McCarthy CM, Collins ED, Pusic AL. Where do we find the best evidence? Plast Reconstr Surg
. 2008;122:1942–1947; discussion 1948–1951.
2. Chung KC, Swanson JA, Schmitz D, Sullivan D, Rohrich RJ. Introducing evidence-based medicine to plastic and reconstructive surgery. Plast Reconstr Surg
3. Chung KC, Ram AN. Evidence-based medicine: The fourth revolution in American medicine? Plast Reconstr Surg
4. Rohrich RJ. So you want to be better: The role of evidence-based medicine in plastic surgery. Plast Reconstr Surg
5. Burns PB, Chung KC. Developing good clinical questions and finding the best evidence to answer those questions. Plast Reconstr Surg
6. Sprague S, McKay P, Thoma A. Study design and hierarchy of evidence for surgical decision making. Clin Plast Surg
7. The periodic health examination. Canadian Task Force on the Periodic Health Examination. Can Med Assoc J
8. Sackett DL. Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest
10. Centre for Evidence Based Medicine (Web site). Available at: http://www.cebm.net
. Accessed December 17, 2010.
11. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Control Clin Trials
12. Bhandari M, Richards RR, Sprague S, Schemitsch EH. The quality of reporting of randomized trials in the Journal of Bone and Joint Surgery
from 1988 through 2000. J Bone Joint Surg Am
13. Poolman RW, Struijs PA, Krips R, Sierevelt IN, Lutz KH, Bhandari M. Does a “Level I Evidence” rating imply high quality of reporting in orthopaedic randomised controlled trials? BMC Med Res Methodol
14. Urschel JD, Goldsmith CH, Tandan VR, Miller JD. Users' guide to evidence-based surgery: How to use an article evaluating surgical interventions. Evidence-Based Surgery Working Group. Can J Surg
15. Thoma A, Farrokhyar F, Bhandari M, Tandan V; Evidence-Based Surgery Working Group. Users' guide to the surgical literature: How to assess a randomized controlled trial in surgery. Can J Surg
16. Bhandari M, Guyatt GH, Swiontkowski MF. User's guide to the orthopaedic literature: How to use an article about prognosis. J Bone Joint Surg Am
17. Guyatt GH, Sackett DL, Cook DJ. Users' guides to the medical literature: II. How to use an article about therapy or prevention. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA
18. Guyatt GH, Haynes RB, Jaeschke RZ, et al. Users' Guides to the Medical Literature: XXV. Evidence-based medicine: Principles for applying the Users' Guides to patient care. Evidence-Based Medicine Working Group. JAMA
19. Loiselle F, Mahabir RC, Harrop AR. Levels of evidence in plastic surgery research over 20 years. Plast Reconstr Surg
20. McCarthy JE, Chatterjee A, McKelvey TG, Jantzen EM, Kerrigan CL. A detailed analysis of level I evidence (randomized controlled trials and meta-analyses) in five plastic surgery journals to date: 1978 to 2009. Plast Reconstr Surg
21. Chung KC, Kalliainen LK, Spilson SV, Walters MR, Kim HM. The prevalence of negative studies with inadequate statistical power: An analysis of the plastic surgery literature. Plast Reconstr Surg
. 2002;109:1–6; discussion 7–8.
22. Newman MK, Zemmel NJ, Bandak AZ, Kaplan BJ. Primary breast lymphoma in a patient with silicone breast implants: A case report and review of the literature. J Plast Reconstr Aesthet Surg
23. Gaudet G, Friedberg JW, Weng A, Pinkus GS, Freedman AS. Breast lymphoma associated with breast implants: Two case-reports and a review of the literature. Leuk Lymphoma
24. Sahoo S, Rosen PP, Feddersen RM, Viswanatha DS, Clark DA, Chadburn A. Anaplastic large cell lymphoma arising in a silicone breast implant capsule: A case report and review of the literature. Arch Pathol Lab Med
25. Keech JA Jr, Creech BJ. Anaplastic T-cell lymphoma in proximity to a saline-filled breast implant. Plast Reconstr Surg
26. Duvic M, Moore D, Menter A, Vonderheid EC. Cutaneous T-cell lymphoma in association with silicone breast implants. J Am Acad Dermatol
27. Lipworth L, Tarone RE, McLaughlin JK. Breast implants and lymphoma risk: A review of the epidemiologic evidence through 2008. Plast Reconstr Surg
28. Lipworth L, Tarone RE, Friis S, et al. Cancer among Scandinavian women with cosmetic breast implants: A pooled long-term follow-up study. Int J Cancer
29. Deapen DM, Hirsch EM, Brody GS. Cancer risk among Los Angeles women with cosmetic breast implants. Plast Reconstr Surg
30. Brisson J, Holowaty EJ, Villeneuve PJ, et al. Cancer incidence in a cohort of Ontario and Quebec women having bilateral breast augmentation. Int J Cancer
31. Pukkala E, Boice JD Jr, Hovi SL, et al. Incidence of breast and other cancers among Finnish women with cosmetic breast implants, 1970-1999. J Long Term Eff Med Implants
32. Brinton LA, Lubin JH, Burich MC, Colton T, Brown SL, Hoover RN. Cancer risk at sites other than the breast following augmentation mammoplasty. Ann Epidemiol
33. Denkler K. A comprehensive review of epinephrine in the finger: To do or not to do. Plast Reconstr Surg
34. Thomson CJ, Lalonde DH, Denkler KA, Feicht AJ. A critical look at the evidence for and against elective epinephrine use in the finger. Plast Reconstr Surg
35. Lalonde D, Bell M, Benoit P, Sparkes G, Denkler K, Chang P. A multicenter prospective study of 3,110 consecutive cases of elective epinephrine use in the fingers and hand: The Dalhousie Project clinical phase. J Hand Surg Am
36. Chowdhry S, Seidenstricker L, Cooney DS, Hazani R, Wilhelmi BJ. Do not use epinephrine in digital blocks: Myth or truth? Part II: A retrospective review of 1111 cases. Plast Reconstr Surg
37. Wilhelmi BJ, Blackwell SJ, Miller JH, et al. Do not use epinephrine in digital blocks: Myth or truth? Plast Reconstr Surg
38. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med