OBJECTIVE: The Bishop score is the most commonly used method to assess the readiness of the cervix for induction. However, it was created without modern statistical methods. Our objective was to determine whether a simplified score can predict vaginal delivery equally well.
METHODS: Data were analyzed for 5,610 nulliparous women with singleton, uncomplicated pregnancies between 37 0/7 and 41 6/7 weeks of gestation undergoing labor induction. These women had all five components of the Bishop score recorded. Logistic regression was performed and a simplified score created with significant components. Positive and negative predictive values and positive likelihood ratios were calculated.
RESULTS: In the regression model, only dilation, station, and effacement were significantly associated with vaginal delivery (P<.01). The simplified Bishop score was then devised using these three components (range 0–9) and compared with the original Bishop score (range 0–13) for prediction of successful induction, resulting in vaginal delivery. Compared with the original Bishop score (greater than 8), the simplified Bishop score (greater than 5) had a similar or better positive predictive value (87.7% compared with 87.0%), negative predictive value (31.3% compared with 29.8%), positive likelihood ratio (2.34 compared with 2.19), and correct classification rate (51.0% compared with 47.3%). Application of the simplified Bishop score in other populations, including indicated induction and spontaneous labor at term and preterm, were associated with similar vaginal delivery rates compared with the original Bishop score.
CONCLUSION: The simplified Bishop score comprised of dilation, station, and effacement attains a similarly high predictive ability of successful induction as the original score.
LEVEL OF EVIDENCE: II
The Bishop score can be simplified to three components: dilation, effacement, and station.
From the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland.
The data included in this article were obtained from the Consortium on Safe Labor, which was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, through Contract No. HHSN267200603425C. Institutions involved in the Consortium include, in alphabetical order: Baystate Medical Center, Springfield, Massachusetts; Cedars-Sinai Medical Center Burnes Allen Research Center, Los Angeles, California; Christiana Care Health System, Newark, Delaware; Georgetown University Hospital, MedStar Health, Washington, DC; Indiana University Clarian Health, Indianapolis, Indiana; Intermountain Healthcare and the University of Utah, Salt Lake City, Utah; Maimonides Medical Center, Brooklyn, New York; MetroHealth Medical Center, Cleveland, Ohio; Summa Health System, Akron City Hospital, Akron, Ohio; The EMMES Corporation, Rockville, Maryland (Data Coordinating Center); University of Illinois at Chicago, Chicago, Illinois; University of Miami, Miami, Florida; and University of Texas Health Science Center at Houston, Houston, Texas.
Presented as a poster at the Annual Meeting of the Society for Maternal-Fetal Medicine, February 10, 2011, San Francisco, CA.
Corresponding author: S. Katherine Laughon, MD, MS, Epidemiology Branch, NICHD, National Institutes of Health,. 6100 Executive Boulevard, Room 7B03, Rockville, MD 20852; e-mail: firstname.lastname@example.org.
Financial Disclosure The authors did not report any potential conflicts of interest.
In the 1960s, Dr Edward Bishop developed a pelvic scoring system using cervical dilatation, effacement, station, consistency, and position with a possible range from 0 to 13.1 Based on clinical experience, he concluded that elective induction in multiparous women with uncomplicated pregnancies at term was successful with a score of greater than 8.
Shortly after the Bishop score was introduced, other investigators created weighting for the components of the score and found that cervical dilation was more associated with the time of the latent phase compared with the other components. However, the weighted Bishop score did not provide a clinically significant improvement in predicting duration of labor compared with the original score.2,3 New scores have been proposed, the Bishop score has been modified, and attempts have been made to improve the Bishop score by adjusting for additional maternal and obstetric characteristics, but these scores in general have not proven to be superior to the original score, and these more cumbersome scores have not been widely adapted into busy clinical practice.4–8 The Bishop score remains the most commonly used system to assess for preinduction readiness.9
Because the original Bishop score was created on an empiric basis without modern statistical methods and the five components are correlated, the question remains whether all components are necessary in predicting vaginal delivery. If only some of the components are independently associated with successful induction, then the score can be reduced to contain only those components with equivalent ability to predict a successful induction. Our objective was to determine whether a simplified Bishop score can predict vaginal delivery equally well in nulliparous women with uncomplicated pregnancies undergoing induction of labor at term in contemporary obstetric practice. We then investigated whether a simplified Bishop score could be applied for other indications for induction and at different gestational ages.
MATERIALS AND METHODS
The Consortium on Safe Labor was a study conducted by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health involving 228,668 deliveries between 2002 and 2008 from 12 clinical centers and 19 hospitals.10 Institutional review board approval was obtained by all participating institutions. Data were collected from electronic medical records, including demographics, medical history, labor and delivery information as well as obstetric, postpartum, and neonatal outcomes. Additional data from the neonatal intensive care unit were collected and linked to the newborn record. The patient data were supplemented with maternal and newborn discharge International Classification of Diseases, 9th Revision codes for each delivery. Each site transferred data in electronic format to the data coordinating center where data were mapped to common categories for each predefined variable. Data were cleaned and logic checking performed. Validation studies indicated that the electronic medical records were an accurate representation of the medical charts.10
Eleven sites provided indications for induction. We included nulliparous women with a singleton gestation, delivering between 37 0/7 and 41 6/7 weeks of gestation, with vertex presentation, and were uncomplicated pregnancies undergoing elective or postdate induction of labor or induction for precursors that could have been expectantly managed, including uncomplicated gestational hypertension11 or chronic hypertension before 39 weeks of gestation12; history of maternal, obstetric, or fetal indication in a prior pregnancy; or induction for suspected fetal macrosomia without diabetes.13 We excluded women with a previous uterine scar (n=12), stillbirth (n=16), any child with congenital anomalies (n=795), or who had an induction for any other reason, including chorioamnionitis, fetal compromise, maternal preeclampsia, maternal medical conditions, and vaginal bleeding. A total of 12,996 women was available for final data analysis and of these, 5,610 women had all five components of the Bishop score and this was designated the “training” population.
Logistic regression with backward elimination was performed to investigate which components of the Bishop score (dilation, effacement, station, consistency, and position) were significantly associated with successful vaginal delivery in a model adjusted for site. A simplified Bishop score was created by comparing the regression coefficients and using only the components that had a final P<.01 by Wald test. The significance level of P<.01 for an effect to stay in the model was chosen because although P<.05 might be statistically significant, the purpose of the study was to simplify the score. We chose to include only those components that were the main contributors to success of vaginal delivery. The regression model for the simplified Bishop score was validated using a bootstrap method with samples of the same size as the original data set.14 Bootstrapping is a technique that allows a given population to be randomly resampled to create multiple data sets of the same size. The analysis was rerun in each bootstrap sample to evaluate whether our decision-making regarding choice of which of the five components of the original Bishop score to include in a simplified score was robust. Logistic regression was performed with a P<.01 significance level for the effect to stay in the model in a backward elimination step using the data set from each of the 1,000 bootstrap samples.
Interactions were explored between the components that were statistically significant and Spearman's correlation coefficients were calculated. Sensitivity, specificity, positive and negative predictive values, and positive likelihood ratio positive were calculated for the original Bishop score and the simplified Bishop score. The correct classification rate was calculated by adding the number of true positives and true negatives and dividing by the total number of patients classified.
The simplified Bishop score was compared with the original Bishop score in two test populations in which women had all cervical components present: at term (37 0/7–41 6/7 weeks of gestation) and preterm (32 6/7–36 6/7 weeks of gestation) undergoing an indicated induction of labor, including maternal, obstetric, or fetal indications for induction (for example, preeclampsia, maternal medical diseases, small for gestational age, oligohydramnios) and did not include any women in the training population. To test the Bishop score and simplified Bishop score in a “natural experiment,” we also evaluated these scores in women with spontaneous labor at term (37 0/7–41 6/7 weeks of gestation) and preterm (32 0/7–36 6/7 weeks of gestation).
There were 5,610 women included in the training population and their characteristics are presented in Table 1. Most women were between the ages of 18 and 34 years and had an average height between 60 and 68 inches. Approximately one third of women were overweight (body mass index [calculated as weight (kg)/[height (m)]2] 25.0–29.9) at delivery and 38.9% of women were obese (body mass index 30.0 or greater). The majority (69.3%) of women were white or non-Hispanic followed by 10.9% African American or non-Hispanic, and 6.7% Hispanic. Induction of labor occurred more often in women with private insurance (77.0%), nonsmokers (97.1%), and at or after 39 weeks of gestation. There were 1,716 (30.6%) women who had a Bishop score greater than 8 before induction.
Overall, 75.3% women (n=4,224) had a vaginal delivery. In the regression model, dilation had the highest regression coefficient (.45) followed by station (.32), and these cervical components were both highly significant (P<.001 and P=.009, respectively; Table 2). Effacement had a regression coefficient that was similar to consistency (.15 compared with .13, respectively), although effacement was highly significant (P<.001), whereas consistency was not (P=.07). Cervical position had a very small contribution to the model (regression coefficient=.01) and was not significant (P=.06). There were no significant interactions between these components, although they were correlated (Spearman's r=.3–5, P<.001). We chose to include dilation, station, and effacement in a simplified score because these were the cervical components that had the largest three regression coefficients and were highly significantly associated with success of vaginal delivery.
To validate the process of developing a simplified score, a bootstrap method was used. The bootstrap method resulted in dilation and station always being chosen in the model and effacement chosen for 70.5% of the different bootstrap samples, overall supporting our choice of cervical components from the regression model (Table 3).
At a given sensitivity and specificity for vaginal delivery, the positive predictive value, negative predictive value, and correct classifications rates were similar to the original Bishop score compared with using a simplified Bishop score based on dilation, effacement, and station only (Table 4). For example, using the original Bishop score greater than 8, the simplified Bishop score with the closest sensitivity and specificity would be greater than 5. Compared with the original Bishop score greater than 8, the simplified Bishop score greater than 5 had a similar positive predictive value (87.7% for the simplified compared with 87.0% for the original score) and negative predictive value (31.3% for the simplified compared with 29.8% for the original score). The positive likelihood ratio test and the correct classification rate were also similar or slightly better (2.34% compared with 2.19% and 51.0% compared with 47.3%, respectively).
We then compared the simplified Bishop score with the original Bishop score for the following separate populations of women: term (37 0/7–41 6/7 weeks of gestation) indicated induction and spontaneous labor and preterm (32 0/7–36 6/7 weeks of gestation) indicated induction and spontaneous labor. The simplified Bishop score was associated with a similar vaginal delivery rate compared with the original Bishop score (Fig. 1). For illustration, a simplified Bishop score greater than 5 performed similarly to an original Bishop score greater than 8 in both the indicated inductions and spontaneous labor at term and preterm with the similar correct classification rates (Table 5).
In nulliparous women with uncomplicated pregnancies undergoing an induction of labor at term, a simplified Bishop score with three components (dilation, station, and effacement) predicted vaginal delivery similarly to the original Bishop score. The simplified Bishop score also was comparable to the original Bishop score in predicting successful vaginal delivery in women with an indicated induction both at term and preterm between 32 and 36 6/7 weeks of gestation. Even in women who presented in spontaneous labor at term and preterm, the simplified Bishop score was similar to the original Bishop score, suggesting that the simplified score is equivalent to the original score in the setting that it was developed.
Other attempts at modifying or evaluating the Bishop score have used different outcomes such as length of labor or achieving active labor, and many included multiparous women who are known to have more successful inductions.3–5,7 We chose vaginal delivery as the primary outcome, because this is what clinicians and patients define as success. Our study also has the advantage of having a large number of nulliparous women. Thus, we were able to use modern statistical methods to find which components of the Bishop score were independently associated with vaginal delivery to create a simplified score.
There is a possibility that women who had all five components of the Bishop score recorded are different in baseline characteristics from women who were missing some of the components. However, most of the women (72.2%) had dilation, station, and effacement present, and many clinicians informally already use a simplified Bishop score. It is more likely that the recording of some compared with five components of the Bishop score was based on clinician preference rather than something inherently different about a woman undergoing an induction. Given the large numbers, we were able to test the simplified score in other populations of women, including indicated induction and spontaneous labor both term and preterm, and the simplified Bishop score performed similarly to the original Bishop score in predicting vaginal delivery in all of these settings, which suggests that missing cervical components were likely not an issue.
Our findings are similar to a prospective study of 134 women undergoing an induction of labor at term, in which only the cervical components of dilation and effacement were associated with vaginal delivery within 24 hours.15 Using an “abbreviated” Bishop score including dilation and effacement only greater than 3, the predictive characteristics of vaginal delivery (excluding 23 women who had an emergency cesarean delivery for maternal or fetal indications) were positive predictive value 85.5%, negative predictive value 65.7%, and positive likelihood ratio +2.61, which were similar to our simplified Bishop score greater than 5. An older, smaller study of 40 nulliparous and 69 multiparous women also found that only dilation was associated with the length of latent phase of labor after labor induction.16 Our study found both effacement and station to be significant in addition to dilation likely because we had a large number of women and thus more power. Although the addition of position or consistency may be significantly associated with successful vaginal delivery in a different population of women, the purpose of our model was to simplify the score, so we chose only the components that were both highly significant in the regression and contributed the most to vaginal delivery as determined by the regression coefficients. Of note, simplifying the score even further by using only the two components with the highest regression coefficients, dilation and station, resulted in a worse correct classification rate compared with the simplified Bishop score using all three components of dilation, station, and effacement (data not shown). Our findings are also supported by a secondary analysis of four randomized controlled trials with a total of 781 women comparing different induction methods for indicated induction after 37 weeks of gestation, and the cervical components dilation, effacement, and station were independently associated with vaginal delivery within 24 hours after adjusting for maternal and obstetric characteristics, although only position and station were associated with spontaneous vaginal delivery.17
Other studies have created variations of the Bishop score. In a prospective study of 1,189 women undergoing induction mostly for indicated indications, Lange et al.7 used linear regression to create a new score with the cervical components of dilation and station from the original Bishop score and length measured as centimeters as opposed to percentage with dilation multiplied by two. The indications for induction (premature rupture of membranes, amniotomy, and medically induced) and definitions of failure (delivery within 24 hours or labor established within 8 hours for the medically induced group) were different from our study as well as a lower overall rate of failure of approximately 15% compared with 25% in our study. Nonetheless, Lange's score was found to perform similarly to the original Bishop score in that population of women. Dhall et al8 also created a new score in 200 women undergoing indicated induction with a slightly lower vaginal delivery rate (71.5%) than our study. Dilation, effacement, and consistency were rescored and weighted, and parity was also included. The Dhall score had higher prediction of success rate at both ends of the score, but the study was limited because no women had a Bishop score greater than 8. In addition, using a reasonably accurate prediction of Dhall score of 7 or greater, which corresponded to a Bishop score cutoff point of 4, the Dhall score only performed significantly better in multiparous but not nulliparous women.
In summary, reassessing the original Bishop score using modern statistical methods resulted in a simplified score with only three components (dilation, station, and effacement) yielding an equivalently high predictive ability. The simplified Bishop score performed similarly to the original Bishop score in predicting vaginal delivery in indicated inductions term and preterm as well as in spontaneous labor at term and preterm. Given that our study is a large, nationally representative cohort reflecting current clinical practice, our findings are generalizable. Because cervical position and consistency do not add to the overall ability to predict vaginal delivery, we believe that the original Bishop score can be replaced with a simplified score using dilation, station, and effacement only.
1. Bishop EH. Pelvic soring for elective induction. Obstet Gynecol 1964;24:266–8.
2. Friedman EA, Niswander KR, Bayonet-Rivera NP, Sachtleben MR. Relation of prelabor evaluation to inducibility and the course of labor. Obstet Gynecol 1966;28:495–501.
3. Friedman EA, Niswander KR, Bayonet-Rivera NP, Sachtleben MR. Prelabor status evaluation. II. Weighted score. Obstet Gynecol 1967;29:539–44.
4. Burnett JE Jr. Preinduction scoring: an objective approach to induction of labor. Obstet Gynecol 1966;28:479–83.
5. Fields H. Induction of labor. Readiness for induction. Am J Obstet Gynecol 1966;95:426–9.
6. Hughey MJ, McElin TW, Bird CC. An evaluation of preinduction scoring systems. Obstet Gynecol 1976;48:635–41.
7. Lange AP, Secher NJ, Westergaard JG, Skovgard I. Prelabor evaluation of inducibility. Obstet Gynecol 1982;60:137–47.
8. Dhall K, Mittal SC, Kumar A. Evaluation of preinduction scoring systems. Aust N Z J Obstet Gynaecol 1987;27:309–11.
9. Baacke KA, Edwards RK. Preinduction cervical assessment. Clin Obstet Gynecol 2006;49:564–72.
10. Zhang J, Troendle J, Reddy UM, Laughon SK, Branch DW, Burkman R, et al. Contemporary cesarean delivery practice in the United States. Am J Obstet Gynecol 2010;203:326.e1–326.e10.
11. Sibai BM. Diagnosis and management of gestational hypertension and preeclampsia. Obstet Gynecol 2003;102:181–92.
12. Sibai BM. Chronic hypertension in pregnancy. Obstet Gynecol 2002;100:369–77.
13. Irion O, Boulvain M. Induction of labour for suspected fetal macrosomia. The Cochrane Database of Systematic Reviews 2000, Issue 2. Art. No.: CD000938. DOI: 10.1002/14651858.CD000938.
14. Izrael D, Battaglia AA, Hoaglin DC, Battaglia MP. SAS® Macros and Tools for Working With Weighted Logistic Regression Models That Use Survey Data. Seattle (WA): Seattle SAS Users Group International Proceedings; 2003. Available at: www2.sas.com/proceedings/sugi28/275–28.pdf
. Accessed October 21, 2010.
15. Reis FM, Gervasi MT, Florio P, Bracalente G, Fadalti M, Severi FM, et al. Prediction of successful induction of labor at term: role of clinical history, digital examination, ultrasound assessment of the cervix, and fetal fibronectin assay. Am J Obstet Gynecol 2003;189:1361–7.
16. Watson WJ, Stevens D, Welter S, Day D. Factors predicting successful labor induction. Obstet Gynecol 1996;88:990–2.
© 2011 by The American College of Obstetricians and Gynecologists.
17. Crane JM, Delaney T, Butt KD, Bennett KA, Hutchens D, Young DC. Predictors of successful labor induction with oral or vaginal misoprostol. J Matern Fetal Neonatal Med 2004;15:319–23.