Secondary Logo

Journal Logo

Contents: Original Research

Diagnostic Accuracy of Fetal Heart Rate Monitoring in the Identification of Neonatal Encephalopathy

Graham, Ernest M. MD; Adami, Rebecca R. MD; McKenney, Stephanie L. BA; Jennings, Jacky M. PhD; Burd, Irina MD, PhD; Witter, Frank R. MD

Author Information
doi: 10.1097/AOG.0000000000000424
  • Free

The Perinatal Quality Foundation has created an examination containing both knowledge-based and judgment questions to optimize and standardize the interpretation of electronic fetal monitoring (EFM).1 Recognition of the importance of EFM interpretation led to this requirement in 2005 that all caregivers in the hospitals insured by the Medical Center Insurance Company pass this EFM credentialing examination before being allowed to work on any of the labor and delivery floors in their network and undergo recertification every 3 years.1 It was hoped that having labor and delivery staff more rapidly identify abnormalities indicative of metabolic acidemia on intrapartum EFM tracings and communicate these findings among team members would decrease the incidence of hypoxic–ischemic encephalopathy and related litigation. The 2012 American College of Obstetricians and Gynecologists survey on professional liability found that neurologically impaired infant claims were the most common primary allegation of an obstetric claim occurring in 28.8%.2

Meta-analysis of randomized controlled trials comparing EFM with intermittent auscultation has failed to show that EFM decreases neurologic morbidity or mortality3; however, the combined sample size of 12 randomized controlled trials has been criticized as insufficient to evaluate whether EFM can significantly lower neonatal morbidity and mortality.4

Although some investigators have concluded that because of the low prevalence of the target conditions and mediocre validity (ability to distinguish between those diseased and well) that the positive predictive value of EFM for fetal death in labor or cerebral palsy is near zero,5 researchers using the U.S. 2004 linked birth and infant death data found that 89% of singleton pregnancies had EFM and that the use of EFM was associated with a substantial decrease in early neonatal mortality and morbidity that lowered infant mortality.4 Electronic fetal monitoring is premised on the assumption that abnormalities indicative of severe metabolic acidosis leading to hypoxic–ischemic encephalopathy should be present in the tracing before delivery. Our objective in this study is to estimate the diagnostic accuracy of human assessment of electronic fetal heart rate tracings during the hour before delivery to identify abnormalities associated with hypoxic–ischemic encephalopathy qualifying for whole-body hypothermia treatment.


This is a case–control study of all neonates born at two hospitals within our system with suspected hypoxic–ischemic encephalopathy treated with whole-body hypothermia within 6 hours of birth during the 6.5-year period from January 1, 2007, to July 1, 2013. Neonates in the control group were matched to each neonate in the case group in a two-to-one fashion using the subsequent two deliveries in the same hospital matched by gestational age within 1 week and mode of delivery. This study was conducted using the standards for reporting of diagnostic accuracy.6 It was approved by the institutional review board of the Johns Hopkins School of Medicine. Neonates were eligible for treatment with whole-body hypothermia if moderate to severe encephalopathy7 was present at birth (manifested as lethargy, stupor, coma, decreased or no activity, distal flexion, complete extension, decerebrate posture, hypotonia or flaccidity, abnormal primitive reflexes, bradycardia, periodic breathing, apnea, or seizures) and had a cord gas or early neonatal gas at less than 1 hour with pH 7.0 or less or base deficit greater than 16 mM. They were also eligible if the cord or early neonatal gas at less than 1 hour showed pH 7.01–7.15 and base deficit 10–15.9 mM if moderate to severe encephalopathy was present with evidence of an acute sentinel event,8 10-minute Apgar score less than 5, or there was need for assisted ventilation initiated at birth with continuation for at least 10 minutes. It is the policy within the two hospitals to obtain umbilical artery cord gases at all deliveries, and the number of neonates with cord pH less than 7.0 or base deficit greater than 12 mM was recorded. Exclusion criteria for whole-body hypothermia treatment included greater than 6 hours of life, gestational age less than 35 weeks, severe growth restriction (birth weight less than 1,800 g), major congenital anomaly, severe persistent pulmonary hypertension with anticipated need for extracorporeal membrane oxygenation, coagulopathy with active bleeding, and suspected sepsis with severe hemodynamic compromise requiring large doses of pressors.

Neonatal and maternal medical records were reviewed to identify relevant clinical data. Intrauterine growth restriction was defined as an estimated fetal weight less than the 10th percentile.9 Oligohydramnios was defined as an amniotic fluid index less than 5.0 cm with intact membranes at the time of the admission in which delivery occurred. The clinical diagnosis of chorioamnionitis was made in the presence of maternal fever with at least one other finding of fetal tachycardia, uterine tenderness, or purulent vaginal discharge.

The two hospitals use universal continuous EFM during labor, which is stored electronically. The primary exposure of the study was the noncomputer-assisted interpretation of the last hour of tracing before delivery, which was reviewed independently by three obstetricians blinded to outcome using the National Institute of Child Health and Human Development and the American College of Obstetricians and Gynecologists three-tiered category system and definitions.10 The reviewers were an obstetric resident (R.R.A.) and two maternal-fetal medicine attendings (F.R.W., E.M.G.), all of whom had passed our required EFM course. Each reviewer assessed the last hour of tracing and assigned category based on the most nonreassuring portion of the tracing, and the final category was assigned based on consensus among the reviewers. Each reviewer recorded the baseline fetal heart rate (FHR), time with FHR greater than 160 beats per minute (bpm, tachycardia) or less than 110 bpm (bradycardia), number of accelerations, reactivity, total number of decelerations, and number of late, variable, or early decelerations.10 Reactivity was defined as the presence of at least two FHR accelerations that peak (but do not necessarily remain) at least 15 bpm above the baseline and last 15 seconds within a 20-minute period that occurred any time during the last hour before delivery. Variability was classified as absent (undetectable), minimal (amplitude range 5 bpm or less), moderate (amplitude range from 6–25 bpm), and marked (amplitude range greater than 25 bpm).10 Absent and minimal were considered decreased variability. The number of prolonged decelerations lasting 2–10 minutes was recorded as well as the nadir and length of the most severe prolonged deceleration. Because some have hypothesized that non-National Institute of Child Health and Human Development measures of FHR decelerations immediately before delivery, which account for properties such as depth, duration, and frequency, would have a greater predictive ability for acidemia compared with the National Institute of Child Health and Human Development category system,11 we performed a review of the literature to identify other clinician determined fetal heart rate parameters. Human fetal acidosis has been shown to correlate with severe variable decelerations, although not with mild or moderate variables.12 Severe variable decelerations were those with a drop to less than 70 bpm or lasting greater than 60 seconds.12 The number of contractions in the last hour before delivery were counted, and the ratio of late decelerations per contractions and variable decelerations per contractions were expressed as a percentage. Total deceleration area was calculated as the sum of the area within all decelerations in the final 30 minutes (debt 30) and final 60 minutes (debt 60) of the tracing as a measure of both quantity and severity.11 The area within each deceleration was approximated as one half (width in seconds×depth in bpm).11 Other non-National Institute of Child Health and Human Development FHR tracing characteristics such as shoulders, slow return, and variability within the deceleration during the 30 minutes before delivery have not been shown to be associated with acidemia or neonatal depression and were not included.13

Categorical EFM tracing parameters were determined by consensus among the three reviewers, and continuous parameters were averaged. We calculated κ statistics to assess the interobserver reliability in classifying the categorical EFM parameters reactivity, variability, and category. Predefined criteria for agreement were: poor (κ 0.0–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), and excellent (0.81–1.0).14 We calculated the Pearson correlation coefficient to assess the interobserver reliability in classifying continuous EFM parameters such as FHR at baseline, accelerations, decelerations, and debt time.

Bivariate analyses were performed using conditional logistic regression to account for matching for both continuous and categorical variables. Categorical variables with multiple components such as race were compared using χ2. Multivariable logistic regression models were used to determine the diagnostic accuracy of EFM parameters in the identification of neonates with encephalopathy treated with whole-body hypothermia. Variables significant at a P value of <.10 in bivariate analyses were used in the multivariable regression. Final variable selection in the multivariable regression was determined by statistical significance as defined by a confidence interval that did not include 1.0 and a P<.05. For each significant final variable in the multivariable analysis, receiver operator characteristic curves were produced and sensitivity and specificity calculated. Stata 10 was used for statistical analysis.


During the 6.5-year study period, 39 neonates were treated with whole-body cooling (1 case/521 deliveries at 35 weeks of gestation or greater). There were no meaningful differences in maternal demographics between the neonates in the case group and those in the control group (Table 1). Sentinel events occurred in 13 (33.3%) of the neonates in the case group and none of the neonates in the control group (P<.001). There was no difference in intrapartum oxytocin exposure. The neonates in the case group had a significantly higher incidence of clinical chorioamnionitis, but there was no difference in histologic chorioamnionitis or funisitis (Table 1).

Table 1
Table 1:
Maternal Variables

Neonates in the case group were significantly more likely to have 1- and 5-minute Apgar scores less than 7 and significantly lower cord pH and higher base deficit (Table 2). Neonates in the case group were significantly more likely to have respiratory distress, seizures, and a longer length of stay as well as a significantly higher incidence of positive blood cultures in the neonatal period (Table 2).

Table 2
Table 2:
Neonatal Variables

The κ coefficient for the three reviewers for categorical variables ranged from 0.48 for reactivity to 0.37 for short-term variability, indicating moderate to fair reproducibility (Table 3). The Pearson correlation coefficient for the three reviewers for continuous variables ranged from 0.94 for fetal heart rate baseline to 0.28 for severe variables (Table 3). The Pearson correlation coefficient for total number of decelerations between the two maternal-fetal medicine attendings was 0.67 and between the resident and each attending was 0.82 and 0.81. For late decelerations, the Pearson correlation coefficient between the two maternal-fetal medicine attendings was 0.67 and between the resident and each attending was 0.76 and 0.64. There was no statistically significant difference in tracing category between the neonates in the case group and those in the control group with 77% of neonates in the case group and 90% of neonates in the control group being category II (Table 4). The neonates in the case group were significantly more likely to be nonreactive during the hour before delivery. Of the 16 neonates in the case group (41.0%) that were reactive during the hour before delivery, eight had sentinel events. There was no difference in the number of accelerations or total decelerations, but the neonates in the case group had a significantly increased number of late decelerations during the hour before delivery. There was no difference in variable decelerations per contraction or late decelerations per contraction. The measurement of total deceleration area debt 30 and debt 60 was significantly increased in the neonates in the case group (Table 4).

Table 3
Table 3:
Correlation Among the Three Reviewers in Assessing Fetal Heart Rate Parameters in the Last Hour of Electronic Fetal Monitoring Before Delivery
Table 4
Table 4:
Fetal Heart Rate Characteristics in the Last Hour Monitoring Before Delivery

Multivariable logistic regression was performed for all fetal heart rate tracing parameters with P<.10 on bivariate analysis, adjusting for clinical chorioamnionitis, which was significantly increased in the neonates in the case group. There was a significantly increased odds ratio for decreased early decelerations and increased debt 30 and debt 60 in neonates in the case group (Table 5). For the detection of cases using significant variables from the multivariable logistic regression, the area under the receiver operator characteristic curve ranged from 0.66 to 0.72 and the sensitivity from 23.1% to 35.9% (Table 6).

Table 5
Table 5:
Multivariable Analysis of Fetal Heart Rate Tracing Characteristics in the Last Hour of Monitoring Before Delivery Adjusting for the Presence of Clinical Chorioamnionitis
Table 6
Table 6:
Area Under Receiver Operator Characteristic Curve, Sensitivity, and Specificity of Electronic Fetal Heart Rate Monitoring During Last the Hour Before Delivery


Although we had hoped to be able to identify specific FHR abnormalities associated with neonatal hypoxic–ischemic encephalopathy and focus on these in our team training to decrease the incidence of this catastrophic complication, we were unable to identify such changes. Our study found a significantly increased rate of nonreactive FHR tracings and late decelerations among fetuses diagnosed with hypoxic–ischemic encephalopathy as neonates on bivariate analysis, but these differences were not significant in multivariable analysis. The measure of total deceleration area in the 30 minutes (debt 30) and 60 minutes (debt 60) before delivery were significantly increased in encephalopathic fetuses on bivariate and multivariable analysis, but as a result of the high incidence of decelerations in normal neonates, their sensitivity and specificity in detecting neurologic injury was too low to be clinically useful. Multivariable analysis also showed a significant decrease in early decelerations in encephalopathic neonates, but this change also had a sensitivity and specificity too low to be useful in detecting injury in the general obstetric population.

Retrospective observational studies on the effects of EFM on decreasing hypoxic–ischemic encephalopathy are limited because tracing abnormalities may prompt intervention before the deterioration to metabolic acidosis and result in a metabolically normal newborn.15 This “treatment paradox effect” can occur where an outcome (neonatal hypoxic–ischemic encephalopathy) with a known association with the test predictor (EFM abnormalities) can be ameliorated or avoided by an intervention,16 but if early intervention based on EFM abnormalities leads to a decreased incidence of hypoxic–ischemic encephalopathy, we should be able to describe and quantitate these abnormalities when comparing encephalopathic neonates with neonates who were neurologically normal controls.

Our finding that only 12.8% of these encephalopathic neonates had category III tracings in the hour before delivery is similar to the rarity of severely abnormal tracings in other studies. Computerized software using the five-tier color-coded levels comparing the last 3 hours before delivery of 60 fetuses that developed encephalopathy to 280 with metabolic acidosis but without encephalopathy and 2,132 fetuses with normal cord gases found that only 8.3% of neonates born with severe metabolic acidosis and encephalopathy ever reached the red level.15 Even when the red level EFM tracing occurred, its average cumulative duration was similar and very short ranging from 4.7 to 6.3 minutes in the three study groups. The area under the curve for red level EFM to identify encephalopathy was 0.53, which is no better than chance. High sensitivity for the encephalopathic neonates was achieved at the expense of a high false-positive rate in the normal neonates. Electronic fetal monitoring patterns that detected approximately 75% of the encephalopathic neonates were also present in 29% of the normal neonates.15 In a study of 154 FHR tracings evaluated by three maternal-fetal medicine specialists using the National Institute of Child Health and Human Development three-tier system, category III was rare occurring in only 1.9%, and interobserver reliability for category III was poor (κ 0.0) under idealized study conditions, mainly as a result of lack of agreement regarding absent compared with minimal variability.14 Review of the entire intrapartum EFM tracing for 48,444 patients found that category III was identified in only 0.1% of patients (1 in every 897).17 Category II tracings were very common occurring in 84% of labors, which was very similar to our finding of 77% of encephalopathic neonates and 90% of neonates in a control group.

A 1997 confidential inquiry into all cases of neonates with grade II and III encephalopathy born in Trent, U.K., concluded that there was evidence of a peripartum insult in 88%.18 The investigators found a major episode of suboptimal care in 64% of all cases of neonatal encephalopathy and in 75% of the deaths and concluded that incorrect EFM interpretation led the list of causes.18 If it is true that there is now reassuring evidence for the use of EFM in that its use is linked with long-term improvement such as a significant decrease in neonatal and infant morbidity and mortality,4 then there should be identifiable fetal heart rate patterns associated with neonatal encephalopathy, which the obstetrician can quantitate. However, studies have shown that the false-positive rate of EFM for predicting cerebral palsy is high at greater than 99%19 and that the use of EFM has not resulted in a reduction of cerebral palsy.20

When investigators have found an association between category II EFM in the last 2 hours of labor and short-term neonatal morbidity, they were not able to describe the association of specific components of category II features and morbidity.17 The overall EFM pattern during the last 30 minutes before delivery is universally overall category II,11 and more than 98% of term fetuses with terminal decelerations deliver with normal umbilical cord gas pH levels.21

This study began as a safety project by resident and attending obstetricians who after passing the required course on EFM would use this knowledge to identify and quantitate EFM abnormalities in the last hour of labor in encephalopathic neonates. If such changes could be identified, we hoped to decrease the incidence of hypoxic–ischemic encephalopathy in our unit by focusing on these changes. Unfortunately, although we found some EFM abnormalities that were significantly different when compared with neonates as neurologically normal controls, the high incidence of these abnormalities in normal neonates and the low incidence of hypoxic–ischemic encephalopathy show they would not be able to identify encephalopathic neonates. Electronic fetal monitoring is not a precision technology, and EFM generates many patterns that are not easily fitted into defined categories by obstetricians even after specific training. Although we are unable to measure predictive values in a case–control study, our findings of low sensitivity for any of the EFM abnormalities are consistent with the study by Grimes and Peipert5 who reported that the positive predictive value of EFM for detecting hypoxic–ischemic encephalopathy is near zero. The high frequency of EFM abnormalities in normal neonates, which coupled with the low prevalence of hypoxic–ischemic encephalopathy of 1.0–3.8 per 1,000 term births,22–24 does not allow us to quantify specific abnormalities during the last hour before delivery that have precision in identifying neonatal encephalopathy qualifying for treatment with hypothermia within 6 hours of birth.


1. Berkowitz RL, D'Alton ME, Goldberg JD, O'Keeffe DF, Spitz J, Depp R, et al.. The case for an electronic fetal heart rate monitoring credentialing examination. Am J Obstet Gynecol 2014;210:204–7.
2. Klagholz J, Strunk AL. Overview of the 2012 ACOG survey on professional liability. Washington, DC: The American Congress of Obstetricians and Gynecologists; 2012. Available at:∼/media/Departments/Professional%20Liability/2012PLSurveyNational.pdf. Retrieved May 14, 2014.
3. Alfirevic Z, Devane D, Gyte GM. Continuous cardiotocography (CTG) as a form of electronic fetal monitoring (EFM) for fetal assessment during labour. The Cochrane Database of Systematic Review 2013, Issue 5. Art. No.: CD006066. DOI: 10.1002/14651858.CD006066.pub2.
4. Chen HY, Chauhan SP, Ananth CV, Vintzileos AM, Abuhamad AZ. Electronic fetal heart rate monitoring and its relationship to neonatal and infant mortality in the United States. Am J Obstet Gynecol 2011;204:491.e1–10.
5. Grimes DA, Peipert JF. Electronic fetal monitoring as a public health screening program: the arithmetic of failure. Obstet Gynecol 2010;116:1397–400.
6. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al.. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. The Standards for Reporting of Diagnostic Accuracy Group. Croat Med J 2003;44:639–50.
7. Sarnat HB, Sarnat MS. Neonatal encephalopathy following fetal distress. A clinical and electroencephalographic study. Arch Neurol 1976;33:696–705.
8. American College of Obstetricians and Gynecologists, American Academy of Pediatrics. Neonatal encephalopathy and neurologic outcome. 2nd ed. Washington (DC): American College of Obstetricians and Gynecologists; 2014.
9. Hadlock FP, Harrist RB, Martinez-Poyer J. In utero analysis of fetal growth: a sonographic weight standard. Radiology 1991;181:129–33.
10. Intrapartum fetal heart rate monitoring: nomenclature, interpretation, and general management principles. ACOG Practice Bulletin No. 106. American College of Obstetricians and Gynecologists. Obstet Gynecol 2009;114:192–202.
11. Cahill AG, Roehl KA, Odibo AO, Macones GA. Association and prediction of neonatal acidemia. Am J Obstet Gynecol 2012;207:206.e1–8.
12. Kubli FW, Hon EH, Khazin AF, Takemura H. Observations on heart rate and pH in the human fetus during labor. Am J Obstet Gynecol 1969;104:1190–206.
13. Cahill AG, Roehl KA, Odibo AO, Macones GA. Association of atypical decelerations with acidemia. Obstet Gynecol 2012;120:1387–93.
14. Blackwell SC, Grobman WA, Antoniewicz L, Hutchinson M, Gyamfi Bannerman C. Interobserver and intraobserver reliability of the NICHD 3-Tier Fetal Heart Rate Interpretation System. Am J Obstet Gynecol 2011;205:378.e1–5.
15. Elliott C, Warrick PA, Graham E, Hamilton EF. Graded classification of fetal heart rate tracings: association with neonatal metabolic acidosis and neurologic morbidity. Am J Obstet Gynecol 2010;202:258.e1–8.
16. Maso G, Businelli C, Piccoli M, Montico M, De Seta F, Sartore A, et al.. The clinical interpretation and significance of electronic fetal heart rate patterns 2 h before delivery: an institutional observational study. Arch Gynecol Obstet 2012;286:1153–9.
17. Jackson M, Holmgren CM, Esplin MS, Henry E, Varner MW. Frequency of fetal heart rate categories and short-term neonatal outcome. Obstet Gynecol 2011;118:803–8.
18. Draper ES, Kurinczuk JJ, Lamming CR, Clarke M, James D, Field D. A confidential enquiry into cases of neonatal encephalopathy. Arch Dis Child Fetal Neonatal Ed 2002;87:F176–80.
19. Nelson KB, Dambrosia JM, Ting TY, Grether JK. Uncertain value of electronic fetal monitoring in predicting cerebral palsy. N Engl J Med 1996;334:613–8.
20. Clark SL, Hankins GD. Temporal and demographic trends in cerebral palsy—fact and fiction. Am J Obstet Gynecol 2003;188:628–33.
21. Cahill AG, Caughey AB, Roehl KA, Odibo AO, Macones GA. Terminal fetal heart decelerations and neonatal outcomes. Obstet Gynecol 2013;122:1070–6.
22. Low JA, Lindsay BG, Derrick EJ. Threshold of metabolic acidosis associated with newborn complications. Am J Obstet Gynecol 1997;177:1391–4.
23. Graham EM, Ruis KA, Hartman AL, Northington FJ, Fox HE. A systematic review of the role of intrapartum hypoxia-ischemia in the causation of neonatal encephalopathy. Am J Obstet Gynecol 2008;199:587–95.
24. Badawi N, Kurinczuk JJ, Keogh JM, Alessandri LM, O'Sullivan F, Burton PR, et al.. Intrapartum risk factors for newborn encephalopathy: the Western Australian case-control study. BMJ 1998;317:1554–8.
© 2014 by The American College of Obstetricians and Gynecologists. Published by Wolters Kluwer Health, Inc. All rights reserved.