Secondary Logo

Journal Logo

Development, Reporting, and Evaluation of Clinical Practice Guidelines

Benzon, Honorio T. MD*; Joshi, Girish P. MBBS, MD, FFARCSI; Gan, Tong J. MD, MBA, MHS, FRCA; Vetter, Thomas R. MD, MPH§

doi: 10.1213/ANE.0000000000004441
General Articles

Clinical practice parameters have been published with greater frequency by professional societies and groups of experts. These publications run the gamut of practice standards, practice guidelines, consensus statements or practice advisories, position statements, and practice alerts. The definitions of these terms have been clarified in an accompanying article. In this article, we present the criteria for high-quality clinical practice parameters and outline a process for developing them, specifically the Delphi method, which is increasingly being used to build consensus among content experts and stakeholders. Several tools for grading the level of evidence and strength of recommendation are offered and compared. The speciousness of categorizing guidelines as evidence-based or consensus-based will be explained. We examine the recommended checklist for reporting and appraise the tools for evaluating a practice guideline. This article is geared toward developers and reviewers of clinical practice guidelines and consensus statements.

From the *Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Chicago, Illinois

Department of Anesthesiology, University of Texas Southwestern Medical Center, Dallas, Texas

Department of Anesthesiology, Stony Brook University, Stony Brook, New York

§Department of Surgery and Perioperative Care, University of Texas, Austin, Texas.

Published ahead of print 9 October 2019.

Accepted for publication August 12, 2019.

Funding: None.

The authors declare no conflicts of interest.

Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website.

Reprints will not be available from the authors.

Address correspondence to Honorio T. Benzon, MD, Department of Anesthesiology, Northwestern University Feinberg School of Medicine, Feinberg Pavilion, Suite 5-704, 251 E Huron St, Chicago, IL 60611. Address e-mail to

See Editorial, p 1462

Back to Top | Article Outline


ACC = American College of Cardiology; AGREE = Appraisal of Guidelines for REsearch and Evaluation; AHA = American Heart Association; ANHMRC = Australian National Health and Medical Research Council; ASA = American Society of Anesthesiologists; COGS = Conference on Guideline Standardization; EBM = Evidence-Based Medicine; GRADE = Grades of Recommendation, Assessment, Development, and Evaluation; GIN = Guidelines International Network; iCAHE = International Centre for Allied Health Evidence; IOM = Institute of Medicine; MeSH = Medical Subject Headings; NICE = National Institute for Health and Care Excellence; NIH = National Institute of Health; OCEBM = Oxford Centre for Evidence-Based Medicine; PICO = population, intervention, comparator, or outcome; RAND = Research ANd Development; RCT = randomized clinical trial; RIGHT = Reporting Items for Practice Guidelines in HealThcare; SIGN = Scottish Intercollegiate Guideline Network; USPSTF = US Preventive Services Task Force; USTFCPS = US Task Force on Community Preventive Services; WHO = World Health Organization

Clinical practice parameters bridge the gap between the constant flow of research publications and actual clinical practice.1 Practice guidelines and consensus statements have been published with ever-increasing frequency such that the practicing clinician has a difficult time adjusting to new and sometimes controversial information. While some of these clinical practice parameters are generated by organizations or societies with well-defined protocols and adequate support staff, others do not have such expertise or infrastructure. Some practice parameters also originate from groups of individuals who are experts on the topic examined. In both instances, the developers of such clinical practice parameters may not have adequate knowledge of the rigorous process in developing quality guidelines.

Clinical practice parameters include practice standards, practice guidelines, consensus statements or practice advisories, position statements, and practice alerts. These terms have different meanings, requirements, and implications. Proposed standardized definitions for these terms are presented in an accompanying article.2

Adherence to a well-defined and standard process of developing and reporting a guideline is just as important as using the correct nomenclature. In this article, we discuss the process of development of practice guidelines and consensus statements, offer tools for grading the level of evidence and strength of recommendations, and list the items to be included in the final reported product. We also assess the tools for evaluating practice guidelines and consensus statements. The rigorous approach that we will outline may be too comprehensive for a position statement and not relevant for a practice alert. It is our primary objective to assist developers and reviewers of clinical practice guidelines and consensus statements.

Back to Top | Article Outline


Practice guidelines originated as policies or official statements from organizations on the proper management of specific clinical conditions or the indications for a procedure or other treatment.1 The history and development of clinical practice guidelines have been chronicled by Woolf.3

Practice guidelines were initially developed via informal consensus, by which the decisions were often poorly defined and the rationale for the recommendations arbitrary, resulting in guidelines of poor quality. Formal development methods began in the 1970s with the National Institute of Health Consensus Development Program, wherein an expert panel made recommendations after a two 1/2-day conference.3 In the 1980s, the American Medical Association Diagnostic and Therapeutic Technology Assessment Program conducted appropriateness assessments by simply collecting expert opinions.3 Importantly, the Research ANd Development (RAND) Corporation introduced a more formal approach to consensus development regarding the appropriateness of therapeutic interventions by way of a 2-step Delphi method.3

While scientific evidence has been historically considered, there was initially no clear connection between the quality of the evidence and the strength of the recommendation. Clear linkage of the grade of the evidence to the recommendations was made in the Canadian Task Force on Periodic Health Examination in 1979,4 the Clinical Efficacy Assessment Project of the American College of Physicians in 1980,5 and the US Preventive Services Task Force (USPSTF) in 1984.6

The formal processes for guideline development include the Consensus Group Conference, the Nominal Group Technique, and the Delphi method.7,8 The Consensus Group Conference provides a face-to-face meeting, does not allow for follow-up feedback, and is costly.3,7 The Nominal Group Technique involves a structured meeting at which the invited participants are asked to list their ideas on a topic.9,10 Each participant presents the most important idea on his/her list. All the posited ideas are listed, then followed by a structured discussion and ranking by the group.10

The Delphi method—or modified Delphi method when it involves several rounds of discussions—is a frequently used and accepted process.8,11 It involves the following:

  • selection of relevant experts;
  • appointment of a facilitator or leader who has content expertise;
  • gathering of data through a rigorous review of the literature through assessment of relevant published studies obtained through search engines using Medical Subject Headings (MeSH) terms;
  • presentation of statements and recommendations to the whole group; and
  • appropriate grading of the evidence and strength of the recommendation.

The statements and recommendations are modified after comments and revisions from all the members. The process is repeated 2 to 4 times until there is convergence of opinion or when a point of diminished or no return is reached.

A reasonable amount of time is allowed between each step in the Delphi method. Three to 4 months is reasonable between the appointment of topic leaders and presentation of their initial recommendations to the whole group; the time allowed also depends on the expanse and complexity of the topic. Depending if a face-to-face meeting was held where voting was conducted, a 1-month interval is usually adequate for successive votes until final decisions are made. The final recommendations are voted on by all the members of the group; any dissent is recorded, and dissenting rationale provided. Majority of the votes are probably adequate for a recommendation to be approved, the number of votes maybe included to show the robustness of the recommendation(s). With some practice guidelines, the set of recommendations is approved by the Board of Directors of each participating organization. In some organizations, the guideline is voted on by the society’s House of Delegates, but without any provision or opportunity to amend (wordsmith) any of the recommendations.

Ways to improve the Delphi method include the involvement of patients, use of open questions posed to participants, no guiding or prompting of participants, and minimization of attrition.8

Advantages of the Delphi method are the ability of experts from all over the world to participate, allowance of private decisions by each participant, and flexibility because the number of rounds can be adjusted.10 These salutary characteristics are the reasons for the widespread use of the Delphi in the development of practice guidelines.

Limitations of the Delphi method include the arbitrary selection and definition of expertise of the participants, difficulty in coordinating a large group, occurrence of fatigue after 2 or 3 rounds, overestimation of the consensus with attrition of participants, and the absence of personal (face-to-face) contact if no live meeting is held.10

With both the Delphi method and Nominal Group Technique, the topic is precisely defined; the ideas or responses are collated and summarized and any irrelevant material is removed; and, decision-making is facilitated by the topic/section leaders and organizers of the group. In both approaches, members should be recruited with the expectation that they know the subject matter, will contribute, and remain involved until the process is completed.

Back to Top | Article Outline


The Institute of Medicine (IOM) proposed 8 standards for developing trustworthy clinical practice guidelines include the following12,13:

  • transparency of the process for creating the guideline,
  • management of conflict of interest,
  • systematic review—guideline development intersection,
  • establishing evidence foundations for and rating strength of guideline recommendations,
  • articulation of recommendations, and
  • external review and updating.

Challenges facing organizations in adhering to the standards recommended by the IOM include the requisite resources and expense needed to develop guidelines that meet the IOMs criteria.14

Tools to aid the development of practice guidelines include the Guidelines International Network (GIN),15 Guidelines 2.0 Checklist,16 National Institute for Health and Care Excellence (NICE),17 and the Scottish Intercollegiate Guideline Network (SIGN).18

Back to Top | Article Outline

Guidelines International Network

Table 1.

Table 1.

The GIN recommendations include 11 criteria for high-quality guidelines (Table 1).15 They recommend that a panel of 10 to 20 members be selected based on their scientific and clinical knowledge and that the chair be neutral, knowledgeable, and without preconceived notion. The composition of the panel should include content experts, methodologists, and ideally, health care consumers and economists. Any financial or nonfinancial conflict of interest of potential members should be disclosed, discussed, resolved, and noted. A formal process of development should be agreed on a priori. The literature should be systematically reviewed and the evidence graded, and the recommendations should be unambiguous and actionable.15 An expiration date and a process for updating the guideline are recommended.

Back to Top | Article Outline

Guidelines 2.0 Checklist

The Guidelines 2.0 checklist is more comprehensive than the GIN.16 It includes 18 topics, with each item consisting of 6 to 13 subitems. The additional subject matters include organization, budget, planning, and training; identification and ranking of priorities; recognition of target audience; question generation; wording of recommendations; print or online reporting and peer review; dissemination and implementation; and evaluation and use.

Of note, economic analyses (ie, cost-effectiveness of the recommendations) are not included in the GIN or the Guideline 2.0; however, it is an added feature in the NICE guidelines.19 Economic analyses are appropriate and should be discussed in relation to its relevance and unique application to a country, state, or province.

Back to Top | Article Outline


The methodology used by the American Society of Anesthesiologists (ASA) in developing clinical practice parameters is discussed by Apfelbaum et al20 and Apfelbaum and Connis.21 The process starts with nominating a Writing Committee. A member of the ASA Committee on Standards and Practice Parameters and a nonclinical PhD methodologist are included in the Writing Committee. This is followed by extensive review of the literature, formal reliability assessment to determine any bias in the evidence being considered, and grading the levels of evidence. Summary and conclusions are made, but the product does not comment on the strength of recommendations.

The ASA employs their own statisticians and creates de novo meta-analysis if there are adequate number of randomized controlled trials published. In their levels of evidence, their ability to perform a meta-analysis separates level 1 from a level 2 in their category A designation (Supplemental Digital Content, Table 1, and whether their practice parameter is labeled as a practice guideline (ability to perform a meta-analysis) instead of a practice advisory.21

Opinion surveys are conducted to address the feasibility of implementation of their guideline. The final document is submitted to the ASA Board of Directors and then to the ASA House of Delegates for a vote. If the clinical practice parameter is not approved, it is then revised and resubmitted the following year.

Back to Top | Article Outline


The Evidence-Based Medicine (EBM) movement started in the 1990s as a new approach to medical practice and teaching of the practice of medicine.22 With EBM, the paradigm shifted from intuition, unsystematic clinical experience for clinical decision-making to critical appraisal of the literature.

EBM is the “conscientious, explicit, and judicious use of current evidence in making decisions about the care of individual patients.”23 It involves integrating clinical expertise with the best available clinical evidence from systematic research.24

EBM relies mainly on randomized clinical trials (RCTs) and meta-analysis, and this overdependence has been faulted.25 Criticisms against these types of publications include limitations of studies with high risk of bias; deductions from results with wide confidence intervals; inconsistency of outcomes from different investigations; difficulty of applying the conclusions from specific population studies to other age groups26; duplicate publication of data by investigators; publication preferences of journal editors; nonsubmission of industry-supported studies with negative results; reluctance of editors to publish “negative trials”; industry sponsorship of RCTs, meta-analysis, and journal supplements27,28; favorable conclusions in meta-analysis sponsored by industry29; and the existence of predatory journals that publish “evidence” that is below the usual standards.24

While focused on educating clinicians, clinical practice guidelines were initially neglected by the architects of EBM.30 The IOMs call for standardization of clinical practice through development and application of clinical practice guidelines12,13 provided an impetus for the EBM Group to develop a new approach to rating the quality of evidence and grading the strength of recommendations. Their rubric, termed the Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) system, was reported in 2004.31

Subsequent to their GRADE levels of evidence,31 the EBM group discussed the differences between the Delphi method and the nominal group technique and noted the accommodation of a larger number of participants in the Delphi process.32 More recent publications added consultation with stakeholders, implementation and evaluation of the impact of the guideline, and updating the guideline.33,34 The group’s discussion of the elements involved in developing guidelines is not as detailed as in the GIN or Guideline 2.0.

Back to Top | Article Outline


Typically, the evidence presented is assigned a level or grade, which is followed by a recommendation or set of recommendations. The strength of the recommendation(s) is also graded.35 Several groups have published tools to assess the robustness of the evidence and the soundness of the recommendations. These include the GRADE (Supplemental Digital Content, Table 2,,35,36 USPSTF (Supplemental Digital Content, Table 3,,37 American College of Cardiology (ACC) and American Heart Association (AHA) (Supplemental Digital Content, Table 4,,38 Oxford Centre for Evidence-Based Medicine (OCEBM) (Supplemental Digital Content, Table 5a, b,,39 and the ASA scoring systems.21

The overall approach to identifying and rating of the quality of evidence by the various groups is similar. The GRADE, ACC/AHA, and OCEBM recognize the impact of RCTs on the levels of evidence, compared to case reports and expert opinions.

The approach to grading the strength of a recommendation varies considerably. In the GRADE approach, a particular level of quality does not imply a particular strength of recommendation such that a low-quality evidence can lead to a strong recommendation. A weak recommendation requires evaluation of the evidence and shared decision-making, such that the informed choice reflects the patient’s values and preferences.36 To avoid unintended coinflation with “weak evidence,” alternative terms were proposed for weak recommendation: conditional, discretionary, or qualified.

In contrast, the USPSTF does not make a recommendation if evidence of benefits and harms is lacking.40 The USPSTF also provides a rating of “I statement” when there is insufficient evidence to recommend a service or screening. The OCEBM modified their 2009 scoring tool in 2011 such that the group did not provide criteria for the strength of recommendations.39

The GRADE approach is the most commonly used tool and adopted by several national organizations. The GIN, World Health Organization (WHO), NIH, and NICE use the GRADE rating system in their guidelines.41,42 However, it has several limitations. Interrater agreement of the GRADE is low, it lacks internal consistency, and there is very little proof of its effectiveness.43 Per developers of the GRADE system, their grading tool is appropriate for addressing clinical questions and in the evaluation of preventive and therapeutic interventions rather than inquiries on issues on public health and public health systems.30

Modifying the grading tools has been discouraged.16 As the developers of the GRADE approach have noted, the elements of their process are interlinked; modifications may perplex some users; and the adjustments may compromise the goal of a system with which users, policy makers, and patients are familiar.34 This suggestion has not deterred developers form making minor variations of the different scoring tools.31

The different tools for grading of evidence have been compared44—including the GRADE, OCEBM, USPSTF, US Task Force on Community Preventive Services (USTFCPS),45 Australian National Health and Medical Research Council (ANHMRC), and the SIGN. The investigators noted a poor agreement in the sensibility of the 6 systems. The OCEBM was the only tool that was suitable for the 4 types of questions asked: diagnosis, prognosis, effectiveness, and harms. None of the systems were noted to be usable for all of the target groups: professionals, patients, and policy makers, although the USPSTF and GRADE systems were noted to be suitable for professionals.

Back to Top | Article Outline


We do not recommend a specific grading tool. The tools used by the ACC/AHA appear to be simple and may be adopted by those not experienced in developing guidelines or may be used in guidelines with less supporting evidence (eg, consensus statements and position statements).

The 2009 OCEBM grading tool is also simple, and as noted previously, it is the most suitable tool when considering questions related to diagnosis and therapy. However, the 2011 OCEBM update does not provide grading of the strength of recommendation(s) (see Supplemental Digital Content, Table 5a, b, For this reason, some authors use the simpler 2011 OCEBM criteria for their grading of evidence but continue to use the 2009 criteria for their strength of recommendations.

It is ideal if a specialty or subspecialties apply the same scoring rubric—for example, the ACC/AHA and the European Society of Cardiology use the same definitions for their levels of recommendations.46 The inability of anesthesiology subspecialty groups to generate their own meta-analysis from published studies hinders the application of the present ASA scoring system to their practice guidelines.

Back to Top | Article Outline


A consensus-based guideline does not mandate that its conclusions be based on the published evidence. Hence, the use of consensus to develop recommendations has been questioned.47–49 The availability of publications with higher or stronger levels of evidence (ie, RCTs) prompted developers to label their practice guidelines as evidence-based and not consensus-based.26 However, this practice of categorizing guidelines based on the type of evidence has been opposed.26 This is partly because the EBM group refined their definition of “evidence-based” to “best available evidence.” This includes the use of analytical observational studies, case reports, and clinician experience in making recommendations consistent with “the circumstances and their values.”26 Other reasons for accommodating consensus-based recommendations include the requirements that deliberate interpretations are essential regardless of the type of evidence and that optimal decisions should balance benefits and harms, and not just the quality of evidence.26

Back to Top | Article Outline


Two initiatives attempted to improve the reporting of practice guidelines by providing a list of information to be included. The Conference on Guideline Standardization (COGS) checklist includes 18 items for standard reporting of guidelines (Supplemental Digital Content, Table 6, A more recent tool is the Reporting Items for Practice Guidelines in HealThcare (RIGHT) checklist,51 which consists of 22 items. The additional items from the RIGHT list include identification of the nature of the guideline (eg, practice standard, guideline, alert, or consensus statement); summary of the recommendations; statement of the key health care questions that were the basis for the recommendations presented in the population, intervention, comparator, or outcome (PICO) or other format; and whether the guideline underwent an independent review subjected to quality assurance process issues. Of note, the original COGS checklist has not been updated, so it is prudent to model reporting a clinical practice guideline to the RIGHT document.

Back to Top | Article Outline


Appraisal of Guidelines for REsearch and Evaluation

Originally released in 200352 and revised in 2010,53 the Appraisal of Guidelines for REsearch and Evaluation (AGREE) instrument is a tool to evaluate the quality of published guidelines (Supplemental Digital Content, Table 7,

With the AGREE instrument, the appraiser selects 1 of 7 answers (1 = strongly disagree; 7 = strongly agree) for each of the 23 items.53,54 A score of 1 indicates a poorly reported guideline or an absence of information. A score of 7 indicates an exceptional guideline and that all of the criteria and considerations conveyed in the user’s manual were met. A score between 2 and 6 indicates that the reporting of the item does not fully meet the AGREE II criteria.53 A score is calculated for each of the 6 AGREE II domains (see Supplemental Digital Content, Table 6, The domain score is calculated by adding up all the scores of the individual items in a domain and by scaling the total as a percentage of the maximum possible score for that domain.55 Domain scores that are >70% represent a guideline of high quality.55

Back to Top | Article Outline

International Centre for Allied Health Evidence Instrument

Another instrument to evaluate practice guidelines was developed by the International Centre for Allied Health Evidence (iCAHE).56 It is simple, short, and binary scored. The instrument contains 14 items to be answered “yes” or “no.” The iCAHE added 3 domains: (1) currency (dates when literature was included, readability, and ease of navigation of guidelines), (2) availability (full text and complete reference list), and (3) summary of recommendations.

However, the iCAHE does not address the issues of clarity, applicability (resources implications of applying recommendations and monitoring/auditing criteria), and editorial independence (see Supplemental Digital Content, Table 6, Although it compares favorably with the AGREE II instrument, the iCAHE was considered underpowered by the developers.56

It takes approximately 20 minutes to 1 hour to evaluate a guideline with the AGREE II and 5 minutes with the iCAHE.52 A comparison of the assessments of these 2 instruments of the quality of guidelines on traumatic brain injury observed that the top-rated guidelines scored 85%–97% on the AGREE II compared to 98%–100% on the iCAHE, while the lower-rated guidelines had scores of 56% and 71%–74% on the AGREE II and iCAHE, respectively.56

As the iCAHE is underpowered and the AGREE II checklist is still evolving, with an AGREE A3 initiative,56 either one of these tools can presently be used as a guide to assess the quality of any guideline.

Evaluation of guidelines is not widely applied and continues to evolve. Developers of clinical guidelines should be enlightened on these instruments and preferably evaluate their guidelines before they present it to their respective societies for approval and before submission to a journal. Reviewers of journals should be aware of these evaluation tools.

Back to Top | Article Outline


Practice guidelines and consensus statements, which have clear objectives, are rigorously developed, are correctly reported (Table 2), assuage most uncertainties about the guideline, and answer potential questions about its implementation. Rigorous evaluation of a guideline assures the reader that it has undergone a precise development and reporting process. As guidelines need to constantly evolve, an understanding of the development, reporting, and evaluation of clinical practice parameters are essential.

Table 2.

Table 2.

It should be noted that practice guidelines provide guidance to the clinician but do not replace the opinion of a well-educated, judicious, conscientious, and experienced clinician.43 Clinicians are not required to unequivocally follow a guideline; the recommendations may be adopted, rejected, or modified according to clinical needs and constraints.21

Back to Top | Article Outline


Name: Honorio T. Benzon, MD.

Contribution: This author helped contribute to conceptualizing, designing, and writing of the manuscript. He has been involved with development of recommendations for the American Society of Anesthesiologists Committee on Pain Medicine and American Society of Regional Anesthesia and Pain Medicine.

Name: Girish P. Joshi, MBBS, MD, FFARCSI.

Contribution: This author helped contribute to conceptualizing, designing, and writing of the manuscript. He has been involved with development of recommendations for the Society for Ambulatory Anesthesia (SAMBA), Society of Anesthesia and Sleep Medicine (SASM), and Procedure Specific Pain Management (PROSPECT) Working Group and is a member of the American Society of Anesthesiologists Committee on Standards and Practice Parameters.

Name: Tong J. Gan, MD, MBA, MHS, FRCA.

Contribution: This author helped contribute to the conceptualizing, designing, and revising of the manuscript. Has been involved with development of recommendations for the Society for Ambulatory Anesthesia (SAMBA) and American Society for Enhanced Recovery (ASER) and Perioperative Quality Initiative (POQI) and is a member of the American Society of Anesthesiologists Committee on Standards and Practice Parameters.

Name: Thomas R. Vetter, MD, MPH.

Contribution: This author helped contribute to the conceptualizing, designing, and revising of the manuscript.

This manuscript was handled by: Jean-Francois Pittet, MD.

Back to Top | Article Outline


1. Woolf SH, Grol R, Hutchinson A, Eccles M, Grimshaw J. Clinical guidelines: potential benefits, limitations, and harms of clinical guidelines. BMJ. 1999;318:527–530.
2. Joshi GP, Benzon HT, Gan TJ, Vetter TR. Consistent definitions of clinical practice guidelines, consensus statements, position statements, and practice alerts. Anesth Analg. 2019;129:1767–1770.
3. Woolf SH. Practice guidelines, a new reality in medicine. II. methods of developing guidelines. Arch Intern Med. 1992;152:946–952.
4. Canadian Task Force on the Periodic Health Examination. The periodic health examination. Can Med Assoc J. 1979;121:1193–1254.
5. White LJ, Ball JR. The clinical efficacy assessment project of the American College of Physicians. Int J Technol Assess Health Care. 1985;1:69–74.
6. US Preventive Services Task Force. Guide to Clinical Preventive Services: An Assessment of the Effectiveness of 169 Interventions. 1989.Baltimore, MD: Williams & Wilkins
7. Fretheim A, Schünemann HJ, Oxman AD. Improving the use of research evidence in guideline development: 5. Group processes. Health Res Policy Syst. 2006;4:17.
8. Sinha IP, Smyth RL, Williamson PR. Using the Delphi technique to determine which outcomes to measure in clinical trials: recommendations for the future based on a systematic review of existing studies. PLoS Med. 2011;8:e1000393.
9. Van de Ven A, Delbecq A. The nominal group as a research instrument for exploratory health studies. Am J Public Health 1972;62:337–342.
10. Fink A, Kosecoff J, Chassin M, Brook RH. Consensus methods: characteristics and guidelines for use. Am J Public Health. 1984;74:979–983.
11. Verhagen AP, de Vet HCW, de Bie RA, Kessels AGH, Boers LM, Knipschild PG. The Delphi list: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. J Clin Epidemiol. 1998;51:1235–1241.
12. Field MJ, Lohr KN; Institute of Medicine (US) Committee to Advise the Public Health Service on Clinical Practice Guidelines. Clinical Practice Guidelines: Directions for a new Agency. 1990.Washington, DC: National Academy Press
13. Graham R, Mancher M, Wolman DM, Greenfield S, Steinberg E. IOM (Institute of Medicine). Clinical Practice Guidelines We Can Trust. 2011.Washington, DC: National Academies Press
14. Kuehn BM. IOM sets out “gold standard” practices for creating guidelines, systematic reviews. JAMA. 2011;305:1846–1848.
15. Qaseem A, Forland F, Macbeth F, Ollenschläger G, Phillips S, van der Wees P; Board of Trustees of the Guidelines International Network. Guidelines International Network: toward international standards for clinical practice guidelines. Ann Intern Med. 2012;156:525–531.
16. Schünemann HJ, Wiercioch W, Etxeandia I, et al. Guidelines 2.0: systematic development of a comprehensive checklist for a successful guideline enterprise. CMAJ. 2014;186:E123–E142.
17. The National Institute for Health and Clinical Excellence. The Guidelines Manual (January 2009). London, UK. NICE: Available at: manual. Accessed September 10, 2019.
18. Harbour R, Miller J. A new system for grading recommendations in evidence based guidelines. BMJ. 2001;323:334–336.
19. Drummond M. Clinical guidelines: a NICE way to introduce cost-effectiveness considerations? Value Health. 2016;19:525–530.
20. Apfelbaum JL, Connis RT, Nickinovich DG; 2012 Emery A. Rovenstine memorial lecture: the genesis, development, and future of the American Society of Anesthesiologists evidence-based practice parameters. Anesthesiology. 2013;118:767–768.
21. Apfelbaum JL, Connis RT. The American Society of Anesthesiologists practice parameter methodology. Anesthesiology. 2019;130:367–384.
22. Evidence-Based Medicine Working Group. Evidence-based medicine. A new approach to teaching the practice of medicine. JAMA. 1992;268: 2420–2425.
23. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ. 1996;312:71–72.
24. Knottnerus JA, Tugwell P. Evidence-based medicine: achievements and prospects. J Clin Epidemiol. 2017;84:1–2.
25. Wilmhurst P. Evidence based medicine: can we trust the evidence? Int J Cardiol. 2013;636637.
26. Djubelgovic B, Guyatt G. Evidence vs consensus in clinical practice guidelines. JAMA. 2019 July 19 [Epub ahead of print].
27. Ioannidis JP. Evidence-based medicine has been hijacked: a report to David Sackett. J Clin Epidemiol. 2016;73:82–86.
28. Ebrahim S, Bance S, Athale A, Malachowski C, Ioannidis JP. Meta-analyses with industry involvement are massively published and report no caveats for antidepressants. J Clin Epidemiol. 2016;70:155–163.
29. Jørgensen AW, Hilden J, Gøtzsche PC. Cochrane reviews compared with industry supported meta-analyses and other meta-analyses of the same drugs: systematic review. BMJ. 2006;333:782.
30. Djulbegovic B, Guyatt GH. Progress in evidence-based medicine: a quarter century on. Lancet. 2017;390:415–423.
31. Atkins D, Best D, Briss PA, et al.; GRADE Working Group. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:1490.
32. Jaeschke R, Guyatt GH, Dellinger P, et al.; GRADE Working Group. Use of GRADE grid to reach decisions on clinical practice guidelines when consensus is elusive. BMJ. 2008;337:a744.
33. Guyatt GH, Oxman AD, Schünemann HJ, Tugwell P, Knottnerus A. GRADE guidelines: a new series of articles in the journal of clinical epidemiology. J Clin Epidemiol. 2011;64:380–382.
34. Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64:383–394.
35. Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R, Brozek J, et al.; GRADE guidelines. 3. Rating the quality of evidence. J Clin Epidemiol. 2011;64:401–446.
36. Andrews J, Guyatt G, Oxman AD, Alderson P, Dahm P, Falck-Ytter Y, et al.; GRADE guidelines14. Going from evidence to recommendations: the significance and presentation of recommendations. J Clin Epidemiol. 2013;66:719–25.
37. Grossman DC, Curry SJ, Owens DK, et al.; US Preventive Services Task Force. Screening for ovarian cancer: US preventive services task force recommendation statement. JAMA. 2018;319:588–594.
38. Smith SC Jr, Feldman TE, Hirshfeld JW Jr, et al.; American College of Cardiology/American Heart Association Task Force on Practice Guidelines; ACC/AHA/SCAI Writing Committee to Update the 2001 Guidelines for Percutaneous Coronary Intervention. ACC/AHA/SCAI 2005 guideline update for percutaneous coronary intervention-summary article: a report of the American College of Cardiology/American Heart Association task force on practice guidelines (ACC/AHA/SCAI writing committee to update the 2001 guidelines for percutaneous coronary intervention). J Am Coll Cardiol. 2006;47:216–235.
39. Howick H, Chalmers I, Glasziou P, Greenhalgh T, Heneghan C, Liberati A, et al.; Explanation of the 2011 Oxford Centre for Evidence-Based Medicine (OCEBM) Levels of Evidence (Background Document). Oxford Centre for Evidence-Based Medicine. Available at: Accessed June 20, 2019.
40. Campos-Outcalt D. USPSTF recommendations: a 2017 roundup. J Fam Pract. 2017;66:310–314.
41. Neumann I, Santesso N, Akl EA. A guide for health professionals to interpret and use recommendations in guidelines developed with the GRADE approach. J Clin Epidemiol. 2016;72:45–55.
42. Thornton J, Alderson P, Tan T. Introducing GRADE across the NICE clinical guideline program. J Clin Epidemiol. 2013;66:124–31.
43. Kavanagh BP. The GRADE system for rating clinical guidelines. PLoS Med. 2009;6:e1000094.
44. Atkins D, Eccles M, Flottorp S, et al.; GRADE Working Group. Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches the GRADE Working Group. BMC Health Serv Res. 2004;4:38.
45. Briss PA, Zaza S, Pappaioanou M, et al. Developing an evidence-based guide to community preventive services–methods. The task force on community preventive services. Am J Prev Med. 2000;18:35–43.
46. Silber S. A new and rapid scoring system to assess the scientific evidence from clinical trials. J Interven Cardiol. 2006;19:485–92.
47. Abernethy AP, Raman G, Balk EM, et al. Systematic review: reliability of compendia methods for off-label oncology indications. Ann Intern Med. 2009;150:336–43.
48. Wagner J, Marquart J, Ruby J, et al. Frequency and level of evidence used in recommendations by the national comprehensive cancer network guidelines beyond approvals of the US food and drug administration: retrospective observational study. BMJ. 2018;360:k668.
49. Green AK, Wood WA, Basch EM. Time to reassess the cancer compendia for off-label drug coverage in oncology. JAMA. 2016;316:1541–1542.
50. Shiffman RN, Shiffman RN, Shekelle P. Standardized reporting of clinical practice guidelines: a proposal from the Conference on Guideline Standardization. Ann Intern Med. 2003;139:493–498.
51. Chen Y, Yang K, Marušic A, et al.; RIGHT (Reporting Items for Practice Guidelines in Healthcare) Working Group. A reporting tool for practice guidelines in health care: the RIGHT statement. Ann Intern Med. 2017;166:128–132.
52. AGREE Collaboration. Development and validation of an international appraisal instrument for assessing the quality of clinical practice guidelines: the AGREE project. Qual Saf Health Care. 2003;12:18–23.
53. Brouwers MC, Kho ME, Browman GP, et al.; AGREE Next Steps Consortium. AGREE II: advancing guideline development, reporting and evaluation in health care. CMAJ. 2010;182:E839–E842.
54. Brouwers MC, Kerkvliet K, Spithoff K; AGREE Next Steps Consortium. The AGREE reporting checklist: a tool to improve reporting of clinical practice guidelines. BMJ. 2016;352:i1152.
55. AGREE Next Steps Consortium (2017). The AGREE II Instrument [Electronic version]. Available at: Accessed July 15, 2019.
56. Grimmer K, Dizon JM, Milanese S, et al. Efficient clinical evaluation of guideline quality: development and testing of a new tool. BMC Med Res Methodol. 2014;14:63.

Supplemental Digital Content

Back to Top | Article Outline
Copyright © 2019 International Anesthesia Research Society