Intrapartum electronic fetal heart rate (FHR) monitoring was introduced into clinical practice more than a half century ago as a means of assessing fetal tolerance of the hypoxic stress associated with labor.1–6 In the United States, the use of this tool has been almost universal for several decades. Although such monitoring has virtually eliminated unexpected intrapartum fetal death, this tool has failed to reduce the incidence of neonatal hypoxic ischemic encephalopathy or improve any index of long-term neurologic outcomes.7,8
It is clear that the overwhelming majority of cases of abnormal neonatal neurologic outcomes, including cerebral palsy, have their origins in either prelabor developmental events, premature birth, or sentinel intrapartum events such as uterine rupture, placental abruption, or umbilical cord accidents.7 However, the continued birth of acidemic neonates without identifiable intrapartum hypoxic events and the success of total body cooling in improving neurologic outcomes of encephalopathic neonates born with a wide range of pH values support the belief that even uneventful labor may not be tolerated by some fetuses and that our ability to identify such fetuses before birth remains limited.9–12 Many theories have been proposed to explain the disappointing failure of electronic FHR monitoring to reliably identify such fetuses and allow preventive intervention, with major themes centering on errors in pattern interpretation, variations in pattern interpretation even among recognized experts, and inadequate standardization of pattern management.13–16 Investigators continue to attempt to more effectively mine the information available from visual observation of FHR patterns in the hope of identifying an as yet unrecognized approach to FHR interpretation that will unlock the true power of such monitoring and lead to improved neonatal outcomes.17–19 Each of these theories shares the underlying assumption that “better use” of FHR data will lead to better clinical outcomes.
The use of electronic FHR monitoring is predicated on both the existence of a useful correlation between certain FHR patterns and fetal arterial pH, and on a second assumption that arterial pH is a reliable reflection of fetal tolerance of the hypoxic stress of labor.20–24 We sought to evaluate the correlation between neonatal acidemia and short-term newborn condition, as reflected in Apgar score, in a large contemporary cohort in which universal cord gases were collected. We hypothesized that this correlation is poor, reflecting wide biological variability in the ability of the human fetus to tolerate various degrees of hypoxia-induced acidemia without sustaining injury. To the extent that this is true, one of the fundamental assumptions underlying the use of electronic FHR monitoring would be weakened, with important implications for patient management and future research.
We performed a retrospective cohort study of all singleton, term (at least 37 weeks of gestation), nonanomalous neonates delivered at Texas Children's Hospital Pavilion for Women in Houston, Texas, from the time of its opening in March 2012 through July 2020, for whom both complete umbilical artery cord gas values and 1- and 5-minute Apgar scores were available. We also separately reviewed the medical records of all neonates born at our institution who were undergoing total body cooling for presumed intrapartum hypoxic injury. Basic demographic data as well as cord gas values and Apgar scores were extracted using a medical record query from Epic. Umbilical arterial and venous cord gases are collected, by protocol, in all neonates born at our institution. Apgar scores are assigned by either the neonatal transition nurse who routinely attends each delivery or, if present, by a neonatologist. Base excess is a calculated value. The technique of direct pH measurement had an interassay coefficient of variation of 0.03% and an intra-assay coefficient of variation of 0.01%.
To characterize the correlation between umbilical artery pH and base excess and Apgar scores, our primary objective, Spearman correlation coefficients were calculated for these parameters over the entire range of pH and base excess values. In addition, such correlation was examined within several specific subranges of arterial pH and base excess values. A correlation coefficient greater than 0.75 is considered strong, a coefficient of 0.45–0.75 moderate and a coefficient of less than 0.45 is considered weak. Receiver operating characteristic analysis of these data was performed to determine an optimal statistical threshold of pH for the prediction of significant newborn depression (5-minute Apgar score less than 4).25 Similar analysis was performed for the absence of newborn depression (5-minute Apgar score greater than 7). Statistical analysis was performed using SAS 9.4. The study was approved by the Institution Review Board of Baylor College of Medicine.
During the study period, 47,626 neonates were born at our institution; 29,787 met inclusion criteria, as defined above (Fig. 1 and Table 1). During the period of this study, 8,914 primary cesarean deliveries were performed; 44% of these procedures were associated with at least one diagnostic code suggesting concern for fetal well-being as an indication for cesarean delivery.
Thirty-five neonates with arterial pH values did not have recorded values for base excess. Seventy percent of neonates with a pH less than 7.0% and 85% of neonates with base excess less than −12 had 5-minute Apgar scores greater than 7. Fewer than 1% of neonates with a pH greater than 7.2 or base excess greater than −4 had 5-minute Apgar scores less than 4. Of seven newborns with umbilical cord gases who underwent cooling for presumed intrapartum hypoxia–ischemia, three (43%) had an umbilical artery pH greater than 7.15 (range 6.72–7.17) (Table 2).
Spearman correlation coefficients for pH and base excess, and 1- and 5-minute Apgar scores were poor (Table 3). In all cases, the correlation was weaker at 5 minutes than at 1 minute, as represented by a decrease in the correlation coefficient.
Receiver operating characteristic curve analysis suggests that an arterial pH of 7.23 yields the best discrimination for prediction of a depressed or not depressed newborn (5-minute Apgar score 7 or less or greater than 7) and that a pH threshold of 7.22 yields the best discrimination for prediction of a severely or not severely depressed newborn (Apgar score less than 4 or 4 or greater) (Table 4 and Fig. 2). However, sensitivities and specificities for even these ideal thresholds remain in ranges considered by statisticians to be only fair.
Normal labor is a process involving repetitive contraction-induced reductions of fetal oxygen delivery and has for decades been recognized as a process that is “invariably asphyxiating.”20,26 This long-term hypoxic stress results in the development of significant metabolic acidemia, with a median pH value at the time of uncomplicated vaginal delivery of 7.25 (2.5th percentile 7.08.)26 This pH reflects a level of hypoxia-induced acidemia, which would generally reflect critical illness in an adult. Although unique fetal hematologic characteristics retard the development of this acidemia during labor, the effect of acidemia per se is not affected by these changes, and a significant spectrum of fetal tolerance of such degrees of acidemia would be expected.26
No data exist correlating any FHR pattern with any index of long-term neonatal outcome. Rather, the use of electronic FHR monitoring is predicated on the existence of both a meaningful correlation between certain FHR patterns and umbilical artery pH, and the assumption of a similar correlation between pH and fetal tolerance of labor and delivery, as reflected by the condition of the newborn. The former correlation, based on early studies by Hon, Kubli, Saling, Quilligan, Paul, and others, is well documented.1,2,4–6,20,22–24 However, a careful review of these foundational studies of electronic FHR monitoring reveals that the commonly recognized FHR patterns in use today were validated primarily on the basis of their ability to predict a fetal pH above or below certain pH values (most commonly 7.15–7.2), a cutoff above which fetal tolerance of the hypoxic stress of labor was assumed and below which intervention was felt to be indicated to avoid fetal injury. However, the choice of these pH values as a meaningful threshold was conceded by these original pioneers to be “arbitrary.”6 Despite this uncertain scientific foundation, protocols for fetal scalp blood sampling, a procedure originally felt to be an essential component of intrapartum monitoring, dictated repeat pH sampling every 30 minutes for a pH 7.2–7.25 and expeditious delivery for fetuses with scalp pH less than 7.2.8,27,28 This technique has been largely abandoned worldwide after the documentation that the presence of a spontaneous or induced FHR acceleration reliably predicted a pH greater than 7.2.8,28,29 Thus, when we refer to a given FHR pattern as reassuring or worrisome, we are really saying that we believe the umbilical cord pH to be above or below values approaching 7.2; to the extent that this pH value is poorly correlated with newborn outcome, the identification of fetuses with pH above or below this value becomes less relevant, regardless of the precision with which pH can be predicted. Similar considerations apply to the commonly cited base excess threshold of −12 as a predictor of fetal outcome.30
The Apgar score serves to describe the physiologic condition of a neonate immediately after birth.25 Our data demonstrate a weak-to-absent correlation between acidemia in general (as reflected by pH), and metabolic acidemia specifically (as reflected by base excess) and even short-term fetal condition, as reflected in 1- and 5-minute Apgar scores. These results document the presence of wide biological variability in fetal tolerance of the hypoxic stress of labor and suggest that such tolerance is poorly reflected by arterial blood gas analysis, and hence also by the recognition of FHR patterns designed to predict such values. Given this poor correlation between acidemia and short-term neonatal outcomes (Apgar scores) and considering the remarkable recuperative capacities of the newborn, the well documented poor correlation between any FHR pattern or Apgar score and long-term neurologic outcome should not be surprising; a significant deterioration in correlation coefficients between pH and Apgar score was seen even between 1 and 5 minutes.7,8 We believe our data provide insight into the physiologic basis of this “disappointing story.”13 Although several of the demographic differences described in Table 1 are statistically significant due to the very large sample size, none are of clinical significance.
Receiver operating characteristic analysis of these data demonstrates that the original arbitrary pH cutoff of 7.2 defining risk of significant fetal depression was remarkably close to the ideal values for identification or exclusion of a mildly or severely depressed neonate (7.22–7.23). However, an area under the curve of 0.722 of 0.742 would be considered statistically “poor” in terms of predicting a severely depressed neonate at birth; basing the performance of a major operative procedure with significant implications for future pregnancies on such a test might be considered questionable for a specialty concerned with the practice of evidence-based medicine.31 Our data do confirm a better correlation between a higher pH (and, thus, a category I FHR tracing, which reliably predicts such as pH) and a neonate with a normal 5-minute Apgar score and association with a lower risk of subsequent hypoxic ischemic encephalopathy. However, we note that only seven of the 29,787 (0.07%) women in this study had evidence of hypoxic encephalopathy sufficient to warrant cooling, using standard cooling criteria.9–11 Thus, although our observation that 98.7% of neonates with a pH greater than 7.2 had 5-minute Apgar scores greater than 7 would support the traditional view of FHR monitoring as having excellent sensitivity for predicting a nonacidemic neonate, these data also indicate that simply having a term, nonanomalous fetus is associated with a 99.9% incidence of giving birth to a neonate without signs of hypoxic ischemic encephalopathy, without respect to the FHR pattern.
Our finding of a significant increase in the rate of cesarean delivery in women whose fetuses were born with a low pH could be viewed as an affirmation of the predictive value of electronic FHR monitoring. However, our finding of a lower rate of cesarean delivery in women with greater base deficit suggests that the wrong women often undergo cesarean delivery based on FHR patterns reflecting benign respiratory acidemia, whereas many of those with damaging metabolic acidemia are overlooked. When almost half of the women in our facility who undergo primary cesarean delivery do so at least partially out of concern for the FHR tracing, this is not an inconsequential concern. This was also a recognized fundamental flaw in the use of fetal scalp blood sampling.29
This analysis does not suggest an abandonment of intrapartum fetal monitoring; the prevention of unexpected intrapartum fetal death, a problem rarely even considered in countries where the use of electronic FHR monitoring is ubiquitous but still a major problem in low-resource nations, is a major obstetric success story. Nor do our data question whether it is better to be born with a higher pH rather than a lower pH; a greater proportion of neonates with pH less than 7.0 go to cooling than neonates with a pH greater than 7.15.9–11 Although these data do not suggest an alteration of our current approach to intrapartum monitoring, they do raise two important points. First, the significant contribution of electronic FHR monitoring to the current cesarean delivery rate combined with the failure of such monitoring to significantly reduce the rate of neonatal hypoxic ischemic encephalopathy or subsequent cerebral palsy has less to do with inadequate pattern interpretation by clinicians and much more to do with a wide biological variability in the ability of the human fetus to tolerate the hypoxic stress of labor, leading to a highly variable and unpredictable individual threshold for injury. Second, further refinement of techniques of FHR interpretation, designed to more accurately predict fetal pH, or the use of alternative approaches to the evaluation of fetal acidemia (such as lactate levels) do not affect this fundamental weakness in the use of electronic FHR monitoring and are unlikely to improve the clinical usefulness of this tool.18,32 These conclusions are supported by an increasing recognition of the wide range of newborn pH values associated with neonatal hypoxic encephalopathy in neonates who are undergoing cooling.9–11,33
Strengths of this study include our exclusive use of cord arterial blood gas values, the contemporary nature of our data, a large sample size, and the fact that cord gases are routinely collected and analyzed in all deliveries at our institution, barring technical issues with arterial sampling. Thus, this study is unlikely to be biased by selective cord sampling of depressed neonates, a problem plaguing most previous studies. Previous data that used selective pH sampling in much smaller series have documented the absence of immediate evidence of neurologic impairment in many neonates with umbilical cord pH values less than 7.20 or even less than 7.0, as well as a poor correlation between severe acidemia and long-term neurologic injury.34–36 However, the systematic analysis of newborn condition across the entire spectrum of pH and of base excess values using universal sampling of more than 29,000 term neonates makes our data highlighting the inconsistencies in these associations unique and supports these previously published small case series. In a similar manner, although the relatively poor ability of electronic FHR monitoring to predict fetal compromise and the relatively good ability of this technique to exclude such compromise is widely recognized, our data present a detailed statistical exploration of these empiric observations and suggest a physiologic basis for these observations not previously available. The major weakness of our study involves the fact that long-term neurologic follow up of almost 30,000 neonates is not possible within the current health care system. We have used Apgar score, a universally accepted means of assessing immediate newborn condition, as a short-term indicator of fetal tolerance of labor. The 5-minute Apgar score in particular also has been shown to correlate well with longer-term outcomes.37 Our description of the poor correlation of fetal pH and base excess with even short-term outcomes (Apgar score) helps explain the well-recognized poor predictive value of Apgar score for long-term neurologic outcome.7 Although it is recognized that Apgar scores may be depressed as a result of factors other than fetal acidemia, such cases of birth depression would be evenly distributed across the range of pH values seen in this study because they are, by definition, unrelated to acidemia. Thus, our conclusions would not be affected by such cases.
In conclusion, the scientific foundations of electronic FHR monitoring link certain FHR patterns to fetal pH rather than to long-term or even short-term newborn outcomes. The clinical use of this tool assumes the existence of a reliable correlation between the degree of metabolic acidemia incurred as a result of the labor process and these outcomes. At the extremes of pH, this assumption has at least some modicum of validity. However, in the pH ranges commonly seen during the process of labor, this correlation is poor, reflecting wide biological variation in the ability of the fetus to tolerate the hypoxic stress of labor and its resultant metabolic acidemia. We believe this variability, rather than shortcomings of, or errors in pattern interpretation, to be the weak link in the chain responsible for the observed increased rate of cesarean delivery with no measurable reduction in long-term adverse outcomes. Under these circumstances, further refinements in pattern interpretation, although potentially improving the prediction of fetal pH, are unlikely to significantly alter these clinical outcomes. As far as our current approach to FHR interpretation, it would appear that this is as good as it gets.
1. Hon EH. The electronic evaluation of the fetal heart rate; preliminary report. Am J Obstet Gynecol 1958;75:1215–30. doi: 10.1016/0002-9378(58)90707-5
2. Hon EH. The fetal heart rate patterns preceding death in utero. Am J Obstet Gynecol 1959;78:47–56. doi: 10.1016/0002-9378(59)90639-8
3. Hon EH. Electronic evaluation of the fetal heart rate. VI. Fetal distress-a working hypothesis. Am J Obstet Gynecol 1962;83:333–53. doi: 10.1016/s0002-9378(16)35841-0
4. Paul RH, Suidan AK, Yeh S, Schifrin BS, Hon EH. Clinical fetal monitoring. VII. The evaluation and significance of intrapartum baseline FHR variability. Am J Obstet Gynecol 1975;123:206–10.
5. Quilligan EJ, Katigbak E, Nowacek C, Czarnecki N. Correlation of fetal heart rate patterns and blood gas values. 1. Normal heart rate values. Am J Obstet Gynecol 1964;90:1343–9. doi: 10.1016/0002-9378(64)90858-0
6. Kubli FW, Hon EH, Khazin AF, Takemura H. Observations on heart rate and pH in the human fetus during labor. Am J Obstet Gynecol 1969;104:1190–206. doi: 10.1016/s0002-9378(16)34294-6
7. American College of Obstetricians and Gynecologists, American Academy of Pediatrics. Neonatal encephalopathy and neurologic outcome. 2nd ed. Accessed January 14, 2020. https://www.acog.org/-/media/project/acog/acogorg/clinical/files/task-force-report/articles/2014/neonatal-encephalopathy-and-neurologic-outcome.pdf
8. Clark SL, Gimovsky ML, Miller FC. The scalp stimulation test: a clinical alternative to fetal scalp blood sampling. Am J Obstet Gynecol 1984;148:274–7. doi: 10.1016/s0002-9378(84)80067-8
9. El-Dib M, Inder TE, Chalak LF, Massaro AN, Thoresen M, Gunn AJ. Should therapeutic hypothermia be offered to babies with mild neonatal encephalopathy in the first 6 h after birth? Pediatr Res 2019;85:442–8. doi: 10.1038/s41390-019-0291-1
10. Gluckman PD, Wyatt JS, Azzopardi D. Selective head cooling with mild systemic hypothermia after neonatal encephalopathy: multicentre randomised trial. Lancet Lond Engl 2005;365:663–70. doi: 10.1016/S0140-6736(05)17946-X
11. Tagin MA, Woolcott CG, Vincer MJ, Whyte RK, Stinson DA. Hypothermia for neonatal hypoxic ischemic encephalopathy: an updated systematic review and meta-analysis. Arch Pediatr Adolesc Med 2012;166:558–66. doi: 10.1001/archpediatrics.2011.1772
12. Clark SL, Meyers JA, Frye DK, Garthwaite T, Lee AJ, Perlin JB. Recognition and response to electronic fetal heart rate patterns: impact on newborn outcomes and primary cesarean delivery rate in women undergoing induction of labor. Am J Obstet Gynecol 2015;212:494.e1–6. doi: 10.1016/j.ajog.2014.11.019
13. Freeman R. Intrapartum fetal monitoring--a disappointing story. N Engl J Med 1990;322:624–6. doi: 10.1056/NEJM199003013220910
14. Macones GA, Hankins GDV, Spong CY, Hauth J, Moore T. The 2008 National Institute of Child Health and Human Development workshop report on electronic fetal monitoring: update on definitions, interpretation, and research guidelines. J Obstet Gynecol Neonatal Nurs 2008;37:510–5. doi: 10.1111/j.1552-6909.2008.00284.x
15. Parer JT, King T. Fetal heart rate monitoring: is it salvageable? Am J Obstet Gynecol 2000;182:982–7. doi: 10.1016/s0002-9378(00)70358-9
16. Clark SL, Hankins GDV. Temporal and demographic trends in cerebral palsy--fact and fiction. Am J Obstet Gynecol 2003;188:628–33. doi: 10.1067/mob.2003.204
17. Cahill AG, Roehl KA, Odibo AO, Macones GA. Association and prediction of neonatal acidemia. Am J Obstet Gynecol 2012;207:206.e1–8. doi: 10.1016/j.ajog.2012.06.046
18. Cahill AG, Tuuli MG, Stout MJ, López JD, Macones GA. A prospective cohort study of fetal heart rate monitoring: deceleration area is predictive of fetal acidemia. Am J Obstet Gynecol 2018;218:523.e1–12. doi: 10.1016/j.ajog.2018.01.026
19. Frey HA, Liu X, Lynch CD. An evaluation of fetal heart rate characteristics associated with neonatal encephalopathy: a case-control study. Int J Obstet Gynaecol 2018;125:1480–7. doi: 10.1111/1471-0528.15222
20. Modanlou H, Yeh SY, Hon EH, Forsythe A. Fetal and neonatal biochemistry and Apgar scores. Am J Obstet Gynecol 1973;117:942–51. doi: 10.1016/0002-9378(73)90066-5
21. Hon EH, Khazin AF. Observations on fetal heart rate and fetal biochemistry. I. Base deficit. Am J Obstet Gynecol 1969;105:721–9. doi: 10.1016/0002-9378(69)90008-8
22. Khazin F, Hon EH. Observations of fetal heart rate and fetal biochemistry II: fetal maternal pH differences. Am J Obstet Gynecol 1971;109:432–9. doi: 10.1016/0002-9378(71)90341-3
23. Saling E. A new method for examination of the child during labor. Introduction, technic and principles [in German]. Arch Gynakol 1962;197:108–22. doi: 10.1007/BF02590014
24. Beard RW, Morris ED, Clayton SG. pH of foetal capillary blood as an indicator of the condition of the foetus. J Obstet Gynaecol Br Commonw 1967;74:812–22. doi: 10.1111/j.1471-0528.1967.tb15562.x
25. American Academy of Pediatrics Committee on Fetus and Newborn, American College of Obstetricians and Gynecologists Committee on Obstetric Practice. The Apgar score. Pediatrics 2015;136:819–22. doi: 10.1542/peds.2015-2651
26. Pammi M, Clark SL, Shamshirsaz AA. Intrapartum maternal oxygen supplementation-friend or foe? JAMA Pediatr 2021;175:236–7. doi: 10.1001/jamapediatrics.2020.5363
27. Zalar RW, Quilligan EJ. The influence of scalp sampling on the cesarean section rate for fetal distress. Am J Obstet Gynecol 1979;135:239–46. doi: 10.1016/0002-9378(79)90352-1
28. Clark SL, Gimovsky ML, Miller FC. Fetal heart rate response to scalp blood sampling. Am J Obstet Gynecol 1982;144:706–8. doi: 10.1016/0002-9378(82)90441-0
29. Clark SL, Paul RH. Intrapartum fetal surveillance: the role of fetal scalp blood sampling. Am J Obstet Gynecol 1985;153:717–20. doi: 10.1016/0002-9378(85)90330-8
30. Low JA, Lindsay BG, Derrick EJ. Threshold of metabolic acidosis associated with newborn complications. Am J Obstet Gynecol 1997;177:1391–4. doi: 10.1016/s0002-9378(97)70080-2
31. Akobeng AK. Understanding diagnostic tests 3: receiver operating characteristic curves. Acta Paediatr Oslo Nor 1992;96:644–7. doi: 10.1111/j.1651-2227.2006.00178.x.
32. Gjerris AC, Staer-Jensen J, Jørgensen JS, Bergholt T, Nickelsen C. Umbilical cord blood lactate: a valuable tool in the assessment of fetal metabolic acidosis. Eur J Obstet Gynecol Reprod Biol 2008;139:16–20. doi: 10.1016/j.ejogrb.2007.10.004
33. Azzopardi DV, Strohm B, Edwards AD. Moderate hypothermia to treat perinatal asphyxial encephalopathy. N Engl J Med 2009;361:1349–58. doi: 10.1056/NEJMoa0900854
34. Goldaber KG, Gilstrap LC, Leveno KJ, Dax JS, McIntire DD. Pathologic fetal acidemia. Obstet Gynecol 1991;78:1103–7.
35. Malin GL, Morris RK, Khan KS. Strength of association between umbilical cord pH and perinatal and long term outcomes: systematic review and meta-analysis. BMJ 2010;340:c1471. doi: 10.1136/bmj.c1471
36. Low JA, Muir DW, Pater EA, Karchmar EJ. The association of intrapartum asphyxia in the mature fetus with newborn behavior. Am J Obstet Gynecol 1990;163:1131–5. doi: 10.1016/0002-9378(90)90670-3
37. Cnattingius S, Johansson S, Razaz N. Apgar score and risk of neonatal death among preterm infants. N Engl J Med 2020;383:49–57. doi: 10.1056/NEJMoa1915075