Scientific Evidence Underlying the American College of Obstetricians and Gynecologists' Practice Bulletins

Wright, Jason D. MD; Pawar, Neha MD; Gonzalez, Julie S. R. MD; Lewin, Sharyn N. MD; Burke, William M. MD; Simpson, Lynn L. MD; Charles, Abigail S. MS; D'Alton, Mary E. MD; Herzog, Thomas J. MD

doi: 10.1097/AOG.0b013e3182267f43
Original Research

OBJECTIVE: Clinical guidelines are an important source of guidance for clinicians. Few studies have examined the quality of scientific data underlying evidence-based guidelines. We examined the quality of evidence that underlies the recommendations made by the American College of Obstetricians and Gynecologists (the College).

METHODS: The current practice bulletins of the College were examined. Each bulletin makes multiple recommendations. Each recommendation is categorized based on the quality and quantity of evidence that underlies the recommendation into one of three levels of evidence: A (good and consistent evidence), B (limited or inconsistent evidence), or C (consensus and opinion). We analyzed the distribution of levels of evidence for obstetrics and gynecology recommendations.

RESULTS: A total of 84 practice bulletins that offered 717 individual recommendations were identified. Forty-eight (57.1%) of the guidelines were obstetric and 36 (42.9%) were gynecologic. When all recommendations were considered, 215 (30.0%) provided level A evidence, 270 (37.7%) level B, and 232 (32.3%) level C. Among obstetric recommendations, 93 (25.5%) were level A, 145 (39.7%) level B, and 117 (34.8%) level C. For the gynecologic recommendations, 122 (34.7%) were level A, 125 (35.5%) level B, and 105 (29.8%) level C. The gynecology recommendations were more likely to be of level A evidence than the obstetrics recommendations (P=.049).

CONCLUSION: One third of the recommendations put forth by the College in its practice bulletins are based on good and consistent scientific evidence.


One third of the recommendations put forth by the American College of Obstetricians and Gynecologists in their Practice Bulletins are based on good and consistent scientific evidence.

From the Divisions of Gynecologic Oncology and Maternal Fetal Medicine, Department of Obstetrics and Gynecology, Columbia University College of Physicians and Surgeons, and the Herbert Irving Comprehensive Cancer Center, New York, New York.

See related editorial on page 501.

Corresponding author: Jason D. Wright, MD, Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Columbia University College of Physicians and Surgeons, 161 Fort Washington Avenue, 8th Floor, New York, NY 10032; e-mail:

Financial Disclosure The authors did not report any potential conflicts of interest.

The translation of novel scientific findings into clinical practice has become a major challenge in medicine. To guide physicians in the implementation of best practice, professional societies and consensus groups now routinely develop practice guidelines and recommendations. These guidelines are meant to synthesize the best available data and make practical recommendations for clinicians.1,2

The past two decades have witnessed a dramatic increase in the number of available guidelines.24 For adult pharyngitis alone there are 10 different guidelines from various groups.5 Although guidelines provide useful information for clinicians, they have limitations.4 First and foremost, guidelines are only as good as the evidence that underlies them. Evaluations of a number of guidelines have found that many recommendations are based on low-quality evidence and expert opinion.4,610 This is particularly problematic as expert opinion is subject to bias, either implicit or subconscious.2,4,11,12 Other limitations have also been noted. Some authors have suggested that guidelines are too narrowly focused and often lack the flexibility to address patient-specific issues.4

Despite the limitations of guidelines, adherence to these recommendations is often used as a benchmark for quality.1318 Given the importance that guidelines have assumed in practice, it is critical to examine the quality of evidence that forms the basis for the recommendations. The objective of our study was to examine the quality of evidence that underlies the recommendations made by the American College of Obstetricians and Gynecologists (the College) and to define areas where high-quality evidence is lacking.

The College produces a number of disease-specific guidelines and recommendations. Our analysis focused specifically on practice bulletins. In general, practice bulletins address a specific topic, provide a literature review of the subject, and provide a summary of recommendations. A list of current practice bulletins was obtained in January 2011 from the American Congress of Obstetricians and Gynecologists web site.

Practice bulletins generally provide a list of references on which the topic review is based. Based on the summation of available data, each practice bulletin provides a “summary of recommendations.” Each recommendation is associated with a level of evidence that is based on the type and consistency of the studies associated with the recommendation. The levels of evidence used by the College are (Table 1)19 level A: recommendations are based on good and consistent scientific evidence; level B: recommendations are based on limited or inconsistent scientific evidence; and level C: recommendations are based primarily on consensus and expert opinion.

Table 1

Table 1

We collected each recommendation and its associated level of evidence from the current practice bulletins. Committee opinions and other education bulletins were not analyzed. The topics were stratified by the study team into those primarily concerned with obstetrics and those focused on gynecology. The gynecologic topics were also stratified by subspecialty into the following: general gynecology, family planning, oncology, urogynecology, and reproductive endocrinology. We developed a classification system to describe the type of recommendations each guideline made. Each recommendation was classified as one of the following: diagnosis, counseling, evaluation, treatment, or mode of delivery (obstetrics only). Table 1 describes our system for guideline classification. Each recommendation was classified by two independent reviewers. Discrepancies were resolved by consensus. Discrepancies that required resolution were noted for 28 (3.9%) of the individual recommendations.

Descriptive statistics were used to report study findings. The level of evidence for obstetric and gynecologic recommendations were compared using χ2 tests. P<.05 was considered statistically significant. All tests were two-sided.

A total of 84 practice bulletins that offered 717 individual recommendations were identified (Table 2). Forty-eight (57.1%) of the guidelines were obstetric and 36 (42.9%) were gynecologic. Among the individual recommendations, 365 (50.9%) were obstetric and 352 (49.1%) were gynecologic topics. When all of the recommendations were considered, 215 (30.0%) provided level A evidence, 270 (37.7%) level B, and 232 (32.3%) level C. Figure 1 provides a breakdown of the level of evidence for the obstetrics and gynecology guidelines. Among obstetrics recommendations, 93 (25.5%) were level A, 145 (39.7%) level B, and 117 (34.8%) level C. For the gynecology recommendations, 122 (34.7%) were level A, 125 (35.5%) level B, and 105 (29.8%) level C. The gynecology recommendations were more likely to be of level A evidence than the obstetrics recommendations (P=.049).

Table 2

Table 2

Fig. 1

Fig. 1

The recommendations were then stratified by our classification schema. For gynecology, 67.9% of recommendations addressed treatment, 17.6% evaluation, 8.8% diagnosis, and 5.7% counseling. Of the obstetrics recommendations, 46.0% concerned treatment, 23.6% evaluation, 15.6% diagnosis, 8.2% counseling, and 6.6% mode of delivery. Among the obstetrics recommendations, level A evidence was noted for 24.6% of the diagnostic recommendations, 46.7% of the counseling recommendations, 20.9% of the guidelines for evaluation, 27.4% of the treatment recommendations, and 4.2% of the guidelines concerning mode of delivery. For gynecology, level A recommendations were found for 29.0% of the diagnostic guidelines, 35.0% of the counseling recommendations, 24.2% of evaluation guidelines, and 38.1% of those recommendations that addressed treatment (Fig. 2).

Fig. 2

Fig. 2

When the gynecology guidelines were stratified by subspecialty, the majority of recommendations (n=225, 63.9%) addressed general gynecology topics. Sixteen percent (16.2%) of the guidelines concerned family planning, 8.5% oncology, 7.7% urogynecology, and 3.7% reproductive endocrinology (Fig. 3). For each of the subspecialties, treatment recommendations predominated. Level A recommendations were noted for 40% of the oncology recommendations, 38.6% of the family planning guidelines, 33.3% of the general gynecology recommendations, 30.8% of the reproductive endocrinology guidelines, and 33.3% of urogynecology recommendations.

Fig. 3

Fig. 3

The guidelines were then stratified by the year in which they were issued (Fig. 4). For both obstetrics and gynecology there was no clear trend in the level of evidence based on the year of issue. For example, among the 17 gynecology recommendations issued in 2000, 6 (35.3%) were considered level A. In 2010, 45 specific gynecology recommendations were made including 16 (35.6%) that were based on level A evidence.

Fig. 4

Fig. 4

Our findings suggest that only a third of the recommendations put forth by the College in their practice bulletins are based on high-quality, consistent scientific evidence. Among the more than 700 specific recommendations issued over the past decade, 30% are level A guidelines based on good and consistent scientific evidence. Thirty-eight percent of recommendations are based on limited or inconsistent evidence and 32% are based primarily on consensus and expert opinion.

Development of College practice bulletins follows a rigorous process.20,21 The College has two standing committees responsible for the development of practice bulletins, one for gynecology and one for obstetrics. Each committee meets twice a year. When the decision is made to commission a new topic, the committee initially chooses one or more authors for the bulletin who is an expert on the topic. Each bulletin is assigned a committee member who serves as a primary reviewer. The primary reviewer facilitates the development of the bulletin and monitors progress of the authors. Once a bulletin is completed, it is reviewed by multiple committee members who provide comment on content and recommend revisions.20 Practice bulletins provide a more comprehensive topic review whereas committee opinions are shorter and provide more targeted recommendations.21

The College's process for the development of its practice bulletins differs somewhat from that of other specialty societies that often convene a panel of content experts for the development of each guideline.2,8,9,22 For example, the National Comprehensive Cancer Network develops guidelines widely used in oncology. For each of its guidelines, the National Comprehensive Cancer Network convenes a panel of disease-site experts from member institutions. The guideline then undergoes a further level of review by content experts who are not panel members.8,23 The American College of Cardiology and the American Heart Association convene writing groups for each guideline. All recommendations are voted on by the group.24

With the proliferation of guidelines, a number of studies have begun to examine the evidence that underlies their recommendations.69 Of the 1,025 recommendations issued by the National Comprehensive Cancer Network, 6% were category I, based on randomized controlled trials with uniform consensus, whereas 83% were category IIA, lower level of evidence but with uniform consensus.8 The American College of Cardiology–American Heart Association recommendations also use a level of evidence categorization although with different standards. Level A recommendations are based on multiple randomized trials or meta-analyses, level B on a single randomized trial or nonrandomized studies, and level C on expert opinion, case studies, or standards of care. Among the 2,711 recommendations issued, 314 (12%) were level A and 1,246 (46%) were level C.9 Of the 4,218 recommendations issued by the Infectious Disease Society of America, only 14% were supported by level I evidence.6 The level of evidence of the College's recommendations is in line with these findings.

A number of factors likely account for the findings we noted. Compared with some fields in medicine such as cardiology, there are few randomized controlled trials in obstetrics and gynecology. Many disease processes in obstetrics and gynecology are infrequent, present under relatively urgent circumstances, or are only rarely associated with meaningfully poor outcomes, all of which make the conduct of randomized trials either difficult or impractical. This fact is highlighted in a recent practice bulletin on vaginal birth after previous cesarean delivery. Despite the bulletin citing 136 references, with the exception of one prospective trial of a decision-aid, the highest-quality studies among the references are considered only level II-2 (evidence obtained from well-designed cohort or case-control analytic studies, preferably from more than one center or research group).19 Perhaps even more concerning is that in obstetrics and gynecology, recommendations by professional societies often differ. A recent analysis comparing the College's recommendations with those of the Royal College of Obstetricians and Gynaecologists noted that only 28% of obstetric recommendations were the same, 56% were not comparable, and 16% were opposite.25 Although our findings clearly highlight the potential for increased research in the specialty, reductions of federal funding severely limit the conduct of high-quality research. In light of these limitations, the College's guidelines provide an excellent summary of available literature and make recommendations based on the best available data.

We recognize several important limitations of our study. We developed our categorization system of recommendations in an attempt to apply a similar system for obstetrics and gynecology recommendations. Clearly this system is highly dependent on the reviewer, and many recommendations were difficult to precisely categorize. Whereas we recognize that other classification schemas could be developed to more precisely categorize recommendations, the goal of our analysis was merely to provide descriptive data. A priori we chose to focus our analysis only on practice bulletins and not examine committee opinions. Our results might have been different had committee opinions been included, and this certainly warrants further study. Finally, we used the grading recommendation (A, B, C) provided by the College in its bulletins. We did not “regrade” the evidence underlying the recommendations provided. The reproducibility of this grading scheme is unknown, and reanalysis might have altered our findings somewhat.18

A major issue faced by the College and other professional societies is the role of guidelines in areas of relative uncertainty. Many experts have pointed out the problems that arise when guidelines rely on expert opinion that is subject to bias.4,1012,15 Some organizations, such as the U.S. Preventive Services Task Force, do not issue guidelines when evidence is insufficient.26,27 The corollary to this is that care must be rendered in clinical scenarios where high-quality data are lacking. One could argue that these are the situations when clinicians may benefit most from guidelines, even when these recommendations are based on observational studies or consensus. Many professional societies are now re-evaluating how evidence is graded and how guidelines are issued. Although a uniform system is lacking, the GRADE (Grading of Recommendations Assessment, Development, and Evaluation) system has been adopted by many societies. GRADE examines explicit recommendations and then ranks each recommendation into one of four categories (high, moderate, low, and very low). The recommendation is based entirely on whether further research is likely to challenge the reported effect.28,29 The GRADE system has been adopted by a number of professional societies including the Agency for Healthcare Research and Quality, the American College of Physicians, and the American College of Chest Physicians.30

Our findings highlight the difficulties in developing high-quality clinical guidelines. Although guidelines do not equate with standard of care and cannot replace clinical judgment, the College's recommendations clearly strongly influence the practice of obstetrics and gynecology.31 Although the College's guidelines provide a reliable source of information based on the best available data, clinicians must remain mindful of the limitations of guidelines. There is an urgent need to continue to conduct high-quality research in obstetrics and gynecology and to provide the funding to undertake such research.

