The translation of novel scientific findings into clinical practice has become a major challenge in medicine. To guide physicians in the implementation of best practice, professional societies and consensus groups now routinely develop practice guidelines and recommendations. These guidelines are meant to synthesize the best available data and make practical recommendations for clinicians.1,2
The past two decades have witnessed a dramatic increase in the number of available guidelines.2–4 For adult pharyngitis alone there are 10 different guidelines from various groups.5 Although guidelines provide useful information for clinicians, they have limitations.4 First and foremost, guidelines are only as good as the evidence that underlies them. Evaluations of a number of guidelines have found that many recommendations are based on low-quality evidence and expert opinion.4,6–10 This is particularly problematic as expert opinion is subject to bias, either implicit or subconscious.2,4,11,12 Other limitations have also been noted. Some authors have suggested that guidelines are too narrowly focused and often lack the flexibility to address patient-specific issues.4
Despite the limitations of guidelines, adherence to these recommendations is often used as a benchmark for quality.13–18 Given the importance that guidelines have assumed in practice, it is critical to examine the quality of evidence that forms the basis for the recommendations. The objective of our study was to examine the quality of evidence that underlies the recommendations made by the American College of Obstetricians and Gynecologists (the College) and to define areas where high-quality evidence is lacking.
MATERIALS AND METHODS
The College produces a number of disease-specific guidelines and recommendations. Our analysis focused specifically on practice bulletins. In general, practice bulletins address a specific topic, provide a literature review of the subject, and provide a summary of recommendations. A list of current practice bulletins was obtained in January 2011 from the American Congress of Obstetricians and Gynecologists web site.
Practice bulletins generally provide a list of references on which the topic review is based. Based on the summation of available data, each practice bulletin provides a “summary of recommendations.” Each recommendation is associated with a level of evidence that is based on the type and consistency of the studies associated with the recommendation. The levels of evidence used by the College are (Table 1)19 level A: recommendations are based on good and consistent scientific evidence; level B: recommendations are based on limited or inconsistent scientific evidence; and level C: recommendations are based primarily on consensus and expert opinion.
We collected each recommendation and its associated level of evidence from the current practice bulletins. Committee opinions and other education bulletins were not analyzed. The topics were stratified by the study team into those primarily concerned with obstetrics and those focused on gynecology. The gynecologic topics were also stratified by subspecialty into the following: general gynecology, family planning, oncology, urogynecology, and reproductive endocrinology. We developed a classification system to describe the type of recommendations each guideline made. Each recommendation was classified as one of the following: diagnosis, counseling, evaluation, treatment, or mode of delivery (obstetrics only). Table 1 describes our system for guideline classification. Each recommendation was classified by two independent reviewers. Discrepancies were resolved by consensus. Discrepancies that required resolution were noted for 28 (3.9%) of the individual recommendations.
Descriptive statistics were used to report study findings. The level of evidence for obstetric and gynecologic recommendations were compared using χ2 tests. P<.05 was considered statistically significant. All tests were two-sided.
A total of 84 practice bulletins that offered 717 individual recommendations were identified (Table 2). Forty-eight (57.1%) of the guidelines were obstetric and 36 (42.9%) were gynecologic. Among the individual recommendations, 365 (50.9%) were obstetric and 352 (49.1%) were gynecologic topics. When all of the recommendations were considered, 215 (30.0%) provided level A evidence, 270 (37.7%) level B, and 232 (32.3%) level C. Figure 1 provides a breakdown of the level of evidence for the obstetrics and gynecology guidelines. Among obstetrics recommendations, 93 (25.5%) were level A, 145 (39.7%) level B, and 117 (34.8%) level C. For the gynecology recommendations, 122 (34.7%) were level A, 125 (35.5%) level B, and 105 (29.8%) level C. The gynecology recommendations were more likely to be of level A evidence than the obstetrics recommendations (P=.049).
The recommendations were then stratified by our classification schema. For gynecology, 67.9% of recommendations addressed treatment, 17.6% evaluation, 8.8% diagnosis, and 5.7% counseling. Of the obstetrics recommendations, 46.0% concerned treatment, 23.6% evaluation, 15.6% diagnosis, 8.2% counseling, and 6.6% mode of delivery. Among the obstetrics recommendations, level A evidence was noted for 24.6% of the diagnostic recommendations, 46.7% of the counseling recommendations, 20.9% of the guidelines for evaluation, 27.4% of the treatment recommendations, and 4.2% of the guidelines concerning mode of delivery. For gynecology, level A recommendations were found for 29.0% of the diagnostic guidelines, 35.0% of the counseling recommendations, 24.2% of evaluation guidelines, and 38.1% of those recommendations that addressed treatment (Fig. 2).
When the gynecology guidelines were stratified by subspecialty, the majority of recommendations (n=225, 63.9%) addressed general gynecology topics. Sixteen percent (16.2%) of the guidelines concerned family planning, 8.5% oncology, 7.7% urogynecology, and 3.7% reproductive endocrinology (Fig. 3). For each of the subspecialties, treatment recommendations predominated. Level A recommendations were noted for 40% of the oncology recommendations, 38.6% of the family planning guidelines, 33.3% of the general gynecology recommendations, 30.8% of the reproductive endocrinology guidelines, and 33.3% of urogynecology recommendations.
The guidelines were then stratified by the year in which they were issued (Fig. 4). For both obstetrics and gynecology there was no clear trend in the level of evidence based on the year of issue. For example, among the 17 gynecology recommendations issued in 2000, 6 (35.3%) were considered level A. In 2010, 45 specific gynecology recommendations were made including 16 (35.6%) that were based on level A evidence.
Our findings suggest that only a third of the recommendations put forth by the College in their practice bulletins are based on high-quality, consistent scientific evidence. Among the more than 700 specific recommendations issued over the past decade, 30% are level A guidelines based on good and consistent scientific evidence. Thirty-eight percent of recommendations are based on limited or inconsistent evidence and 32% are based primarily on consensus and expert opinion.
Development of College practice bulletins follows a rigorous process.20,21 The College has two standing committees responsible for the development of practice bulletins, one for gynecology and one for obstetrics. Each committee meets twice a year. When the decision is made to commission a new topic, the committee initially chooses one or more authors for the bulletin who is an expert on the topic. Each bulletin is assigned a committee member who serves as a primary reviewer. The primary reviewer facilitates the development of the bulletin and monitors progress of the authors. Once a bulletin is completed, it is reviewed by multiple committee members who provide comment on content and recommend revisions.20 Practice bulletins provide a more comprehensive topic review whereas committee opinions are shorter and provide more targeted recommendations.21
The College's process for the development of its practice bulletins differs somewhat from that of other specialty societies that often convene a panel of content experts for the development of each guideline.2,8,9,22 For example, the National Comprehensive Cancer Network develops guidelines widely used in oncology. For each of its guidelines, the National Comprehensive Cancer Network convenes a panel of disease-site experts from member institutions. The guideline then undergoes a further level of review by content experts who are not panel members.8,23 The American College of Cardiology and the American Heart Association convene writing groups for each guideline. All recommendations are voted on by the group.24
With the proliferation of guidelines, a number of studies have begun to examine the evidence that underlies their recommendations.6–9 Of the 1,025 recommendations issued by the National Comprehensive Cancer Network, 6% were category I, based on randomized controlled trials with uniform consensus, whereas 83% were category IIA, lower level of evidence but with uniform consensus.8 The American College of Cardiology–American Heart Association recommendations also use a level of evidence categorization although with different standards. Level A recommendations are based on multiple randomized trials or meta-analyses, level B on a single randomized trial or nonrandomized studies, and level C on expert opinion, case studies, or standards of care. Among the 2,711 recommendations issued, 314 (12%) were level A and 1,246 (46%) were level C.9 Of the 4,218 recommendations issued by the Infectious Disease Society of America, only 14% were supported by level I evidence.6 The level of evidence of the College's recommendations is in line with these findings.
A number of factors likely account for the findings we noted. Compared with some fields in medicine such as cardiology, there are few randomized controlled trials in obstetrics and gynecology. Many disease processes in obstetrics and gynecology are infrequent, present under relatively urgent circumstances, or are only rarely associated with meaningfully poor outcomes, all of which make the conduct of randomized trials either difficult or impractical. This fact is highlighted in a recent practice bulletin on vaginal birth after previous cesarean delivery. Despite the bulletin citing 136 references, with the exception of one prospective trial of a decision-aid, the highest-quality studies among the references are considered only level II-2 (evidence obtained from well-designed cohort or case-control analytic studies, preferably from more than one center or research group).19 Perhaps even more concerning is that in obstetrics and gynecology, recommendations by professional societies often differ. A recent analysis comparing the College's recommendations with those of the Royal College of Obstetricians and Gynaecologists noted that only 28% of obstetric recommendations were the same, 56% were not comparable, and 16% were opposite.25 Although our findings clearly highlight the potential for increased research in the specialty, reductions of federal funding severely limit the conduct of high-quality research. In light of these limitations, the College's guidelines provide an excellent summary of available literature and make recommendations based on the best available data.
We recognize several important limitations of our study. We developed our categorization system of recommendations in an attempt to apply a similar system for obstetrics and gynecology recommendations. Clearly this system is highly dependent on the reviewer, and many recommendations were difficult to precisely categorize. Whereas we recognize that other classification schemas could be developed to more precisely categorize recommendations, the goal of our analysis was merely to provide descriptive data. A priori we chose to focus our analysis only on practice bulletins and not examine committee opinions. Our results might have been different had committee opinions been included, and this certainly warrants further study. Finally, we used the grading recommendation (A, B, C) provided by the College in its bulletins. We did not “regrade” the evidence underlying the recommendations provided. The reproducibility of this grading scheme is unknown, and reanalysis might have altered our findings somewhat.18
A major issue faced by the College and other professional societies is the role of guidelines in areas of relative uncertainty. Many experts have pointed out the problems that arise when guidelines rely on expert opinion that is subject to bias.4,10–12,15 Some organizations, such as the U.S. Preventive Services Task Force, do not issue guidelines when evidence is insufficient.26,27 The corollary to this is that care must be rendered in clinical scenarios where high-quality data are lacking. One could argue that these are the situations when clinicians may benefit most from guidelines, even when these recommendations are based on observational studies or consensus. Many professional societies are now re-evaluating how evidence is graded and how guidelines are issued. Although a uniform system is lacking, the GRADE (Grading of Recommendations Assessment, Development, and Evaluation) system has been adopted by many societies. GRADE examines explicit recommendations and then ranks each recommendation into one of four categories (high, moderate, low, and very low). The recommendation is based entirely on whether further research is likely to challenge the reported effect.28,29 The GRADE system has been adopted by a number of professional societies including the Agency for Healthcare Research and Quality, the American College of Physicians, and the American College of Chest Physicians.30
Our findings highlight the difficulties in developing high-quality clinical guidelines. Although guidelines do not equate with standard of care and cannot replace clinical judgment, the College's recommendations clearly strongly influence the practice of obstetrics and gynecology.31 Although the College's guidelines provide a reliable source of information based on the best available data, clinicians must remain mindful of the limitations of guidelines. There is an urgent need to continue to conduct high-quality research in obstetrics and gynecology and to provide the funding to undertake such research.
1. Powers JH. Practice guidelines: belief, criticism, and probability. Arch Intern Med 2011;171:15–7.
2. Sniderman AD, Furberg CD. Why guideline-making requires reform. JAMA 2009;301:429–31.
3. Burgers JS, Bailey JV, Klazinga NS, Van Der Bij AK, Grol R, Feder G. Inside guidelines: comparative analysis of recommendations and evidence in diabetes guidelines from 13 countries. Diabetes Care 2002;25:1933–9.
4. Shaneyfelt TM, Centor RM. Reassessment of clinical practice guidelines: go gently into that good night. JAMA 2009;301:868–9.
5. Matthys J, De Meyere M, van Driel ML, De Sutter A. Differences among international pharyngitis guidelines: not just academic. Ann Fam Med 2007;5:436–43.
6. Lee DH, Vielemeyer O. Analysis of overall level of evidence behind Infectious Diseases Society of America practice guidelines. Arch Intern Med 2011;171:18–22.
7. McAlister FA, van Diepen S, Padwal RS, Johnson JA, Majumdar SR. How evidence-based are the recommendations in evidence-based guidelines? PLoS Med 2007;4:e250.
8. Poonacha TK, Go RS. Level of scientific evidence underlying recommendations arising from the National Comprehensive Cancer Network clinical practice guidelines. J Clin Oncol 2011;29:186–91.
9. Tricoci P, Allen JM, Kramer JM, Califf RM, Smith SC Jr. Scientific evidence underlying the ACC/AHA clinical practice guidelines. JAMA 2009;301:831–41.
10. Wright JM. Practice guidelines by specialist societies are surprisingly deficient. Int J Clin Pract 2007;61:1076–7.
11. Choudhry NK, Stelfox HT, Detsky AS. Relationships between authors of clinical practice guidelines and the pharmaceutical industry. JAMA 2002;287:612–7.
12. Detsky AS. Sources of bias for authors of clinical practice guidelines. CMAJ 2006;175:1033, 1035.
13. Boyd CM, Darer J, Boult C, Fried LP, Boult L, Wu AW. Clinical practice guidelines and quality of care for older patients with multiple comorbid diseases: implications for pay for performance. JAMA 2005;294:716–24.
14. Glickman SW, Ou FS, DeLong ER, Roe MT, Lytle BL, Mulgund J, et al.. Pay for performance, quality of care, and outcomes in acute myocardial infarction. JAMA 2007;297:2373–80.
15. Hirsh J, Guyatt G. Clinical experts or methodologists to write clinical guidelines? Lancet 2009;374:273–5.
16. Landercasper J, Dietrich LL, Johnson JM. A breast center review of compliance with National Comprehensive Cancer Network Breast Cancer guidelines. Am J Surg 2006;192:525–7.
17. Merritt TA, Gold M, Holland J. A critical evaluation of clinical practice guidelines in neonatal medicine: does their use improve quality and lower costs? J Eval Clin Pract 1999;5:169–77.
18. Chauhan SP, Berghella V, Sanderson M, Magann EF, Morrison JC. American College of Obstetricians and Gynecologists practice bulletins: an overview. Am J Obstet Gynecol 2006;194:1564–72.
19. Vaginal birth after previous cesarean delivery. ACOG Practice Bulletin No. 115. American College of Obstetricians and Gynecologists. Obstet Gynecol 2010;116:450–63.
20. American College of Obstetricians and Gynecologists committee descriptions. Available at: http://www.acog.org/departments/committeesAndCouncils/committeeDescriptions.pdf
. Retrieved March 16, 2011.
21. Kirkpatrick DH, Burkman RT. Does standardization of care through clinical guidelines improve outcomes and reduce medical liability? Obstet Gynecol 2010;116:1022–6.
22. Wright TC Jr, Massad LS, Dunton CJ, Spitzer M, Wilkinson EJ, Solomon D. 2006 consensus guidelines for the management of women with abnormal cervical cancer screening tests. Am J Obstet Gynecol 2007;197:346–55.
23. National Comprehensive Cancer Network. Available at: www.nccn.org
. Retrieved March 16, 2011.
24. American Heart Association statement and guideline development. Available at: http://www.americanheart.org/downloadable/heart/1274968068651SGProcessFlowchartFinal052710.pdf
. Retrieved March 16, 2011.
25. Chauhan SP, Hendrix NW, Berghella V, Siddiqui D. Comparison of two national guidelines in obstetrics: American versus Royal College of Obstetricians and Gynecologists. Am J Perinatol 2010;27:763–70.
26. Petitti DB, Teutsch SM, Barton MB, Sawaya GF, Ockene JK, DeWitt T. Update on the methods of the U.S. Preventive Services Task Force: insufficient evidence. Ann Intern Med 2009;150:199–205.
27. Sawaya GF, Guirguis-Blake J, LeFevre M, Harris R, Petitti D; U.S. Preventive Services Task Force. Update on the methods of the U.S. Preventive Services Task Force: estimating certainty and magnitude of net benefit. Ann Intern Med 2007;147:871–5.
28. Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al.. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol 2011;64:383–94.
29. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al.. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336:924–6.
30. Kavanagh BP. The GRADE system for rating clinical guidelines. PLoS Med 2009;6:e1000094.
31. Al-Niaimi A, Chauhan SP, Gupta LM, Bailet JW. Factors influencing the evolving practice of obstetricians in eastern Wisconsin: a survey. Am J Perinatol 2008;25:321–4.