Secondary Logo

Journal Logo

Original Clinical Report

Prediction of Late-Onset Sepsis in Preterm Infants Using Monitoring Signals and Machine Learning

Cabrera-Quiros, Laura PhD1; Kommers, Deedee MD, PhD1,2; Wolvers, Maria K. MSc2; Oosterwijk, Laurien MSc2; Arents, Niek MD3; van der Sluijs-Bens, Jacqueline MD2; Cottaar, Eduardus J. E. PhD1; Andriessen, Peter MD, PhD1,2; van Pul, Carola PhD1,4

Author Information
doi: 10.1097/CCE.0000000000000302


Sepsis is one of the most common causes of death among neonates. Furthermore, neonatal sepsis is associated with neurodevelopmental delay and a prolonged length of hospital stay for prematurely born infants (1–3). Preterm infants are particularly vulnerable to sepsis, both acquired during birth (early-onset sepsis) as well as hospital-acquired sepsis, referred to as late-onset sepsis (LOS) when the onset of disease is beyond 3 days after birth (4). Since the antibiotic treatment often differs between early-onset sepsis and LOS, adequate diagnosis is important. Furthermore, since timely treatment with antibiotics is associated with better outcome, early diagnosis is also important.

Blood cultures are the gold standard for diagnosis of sepsis, and when adequate volumes are obtained, blood cultures have an excellent sensitivity (5). However, blood cultures can require up to 48 hours for a result and are prone to false positives due to contamination (4,6). Therefore, instead of awaiting the result of a blood culture, in clinical practice, a broad-spectrum antibiotic treatment is started immediately after blood is taken. Since it is an invasive procedure, blood cultures are not taken regularly, but only when there is a suspicion of sepsis. This clinical sign of illness can be denoted as a Cultures, Resuscitation, and Antibiotics Started Here (CRASH) moment (7). Since the clinical signs or symptoms of infection often remain nonspecific and subtle until this acute deterioration, it would be of great value to predict sepsis before a CRASH moment occurs (8).

While electronic medical record (EMR) data have been used for sepsis prediction in adults or newborns (9,10), the temporal resolution of this data is relatively low and likely to lack important information on heart rate variability (HRV). Previous research has shown that characteristic features of heart rate (e.g., reduced HRV and transient decelerations) precede the CRASH moment in preterm infants by several hours (7,8,11,12).

In premature infants, displaying an HRV-based score to clinicians lowered sepsis-associated mortality with approximately 20% (13), and a cross-correlating index between heart rate and oxygen saturation proved to be the best illness predictor for preclinical detection of sepsis (14). In a previous study (15), we showed increased numbers of heart rate decelerations, increased respiratory instability, and a decrease in spontaneous infant activity, that is, lethargy, in the hours preceding sepsis, using regression analysis. Regression analysis formalizes the relationship between independent and dependent variables in the form of mathematical equations for the data at hand. In contrast, machine learning (ML) is an approach that can learn from data and optimize the prediction, specifically for an unseen cohort, a capacity called regularization.

The aim of the current study is to predict an upcoming sepsis in preterm infants by developing ML models based on patient monitoring data. To adjust for maturational effects of gestational age and postnatal age, we designed a matched-control study of infants with proven sepsis (LOS patients) and age-matched controls (C).

For the models, we use different vital signs (heart rate, HRV, respiration, and motion features), in accordance to recommendations in earlier work (16) and insights from a previous study (15). We hypothesize that combining information from multiple types of vital signals increases the discriminative value of the ML models to differentiate patients with LOS from controls.


Pseudonymized data from the EMR, electrocardiogram, and chest impedance (CI) were used in this retrospective study. The first was used only for creating our patient groups (see matching process), while the electrocardiogram and CI were used for the ML experiments. The medical ethical committee of Máxima Medical Center provided a waiver for this study, in accordance with the Dutch law on medical research with humans.


In the study period (July 1, 2016, to December 1, 2018), 492 preterm infants less than 32 weeks’ gestation were admitted to the neonatal ICU (NICU). Eligible for this study were 358 inborn infants with a minimum length of stay of 5 days, allowing to have clinical and laboratory follow-up to define sepsis and control cases. We excluded 25 infants with a sepsis onset less than 72 hours after birth (early-onset sepsis) and eight infants with a large intraventricular hemorrhage greater than or equal to III, congenital anomalies, or syndrome. Hence, the clinical characteristics and blood culture data of 325 infants were collected from the EMR.

A proven LOS was defined by clinical signs of a generalized infection (e.g., apneas, temperature instability, feeding intolerance, worsening respiratory distress, or hemodynamic instability) according to the Vermont Oxford Criteria, isolation of pathogens from a blood culture obtained after day 3 of life, and treatment with IV antibiotics (17,18). In case a coagulase-negative Staphylococcus bacteria was isolated from the blood culture, only infants with generalized signs of infection were identified as LOS if the C-reactive protein level was at least once greater than or equal to 10 mg/L within 5 days after clinical onset of LOS, in order to minimize the risk of including infants with contaminated blood cultures (19). As preterm infants may suffer from multiple sepsis episodes, only the first proven LOS episode was included. Infants were initially treated with broad-spectrum antibiotics (penicillin G and gentamicin) according to our local antibiotic protocol. This treatment was adjusted based on the blood culture result. In principle, the antibiotic therapy was continued for at least 5 days or shortened to 3 days in the case of a coagulase-negative Staphylococcus sepsis when clinical improvement was rapid and central lines were removed or not present (20).

Controls were recruited from 1) a group of infants with blood cultures obtained directly after birth to rule out perinatal infection, without isolation of a pathogen in the blood culture and antibiotics discontinued less than 48 hours after birth or 2) infants without the need for obtaining any blood culture.

All preterm infants received continuous cardiorespiratory monitoring (Philips IntelliVue MX 800; Philips, Hamburg, Germany), although in extremely preterm infants less than 27 weeks’ gestation, plethysmography was frequently used in the first week of life to monitor the heart rate instead of the electrocardiogram signal, thus excluding them from our study as the electrocardiogram signal is used as input. For each infant with sepsis, a CRASH moment was defined (21). The CRASH moment served as an anchor point for the analysis. In the group with controls, an “equivalent crash moment” (ECM) was calculated as anchor point, based on comparable gestational and postnatal age (see matching procedure below). The electrocardiogram (250 Hz) and CI (62.5 Hz) waveforms corresponding to the 48 hours before and after the anchor point were acquired from a data warehouse (PIIC-iX, Data Warehouse Connect; Philips Medical Systems, Andover, MA).

Matching Process

From the cohort (n = 325), we identified 93 (28.6%) preterm infants with a culture-proven LOS, of whom 62 had sufficiently long traces of continuous cardiorespiratory (electrocardiogram, CI) monitoring data. We identified 194 (59.7%) infants as potential control cases. Of these, 69 control infants had continuous (electrocardiogram, CI) cardiorespiratory data available.

As the 62 LOS infants differed significantly from the selected 69 control infants with respect to gestational age (28.1 ± 2.0 vs 29.5 ± 1.4 wk; p < 0.001) and since maturation may influence HRV, infants were matched on gestational age ± 3 days. Subsequently, the anchor point of the CRASH moment in each LOS infant was right aligned with the matched control infant to create the ECM (9). A graphical representation of this alignment of CRASH and ECM is showed in Figure 1. In this way, we could compare the vital signs without differences due to maturational effects. This process resulted in a group of 32 LOS and 32 matched controls. Table 1 demonstrates the clinical characteristics of the matched LOS and controls. Online Table I (Supplemental Digital Content, shows the clinical details of the LOS infants.

TABLE 1. - Clinical Characteristics of the Late-Onset Sepsis and C Group After Matching for Gestational Age
Characteristics Late-Onset Sepsis (n = 32) Controls (n = 32) p
Gestational age (wk) 28.8 ± 1.6 28.9 ± 1.5 NS
Birth weight (g) 1,149 ± 324 1,179 ± 298 NS
Singletons 21 (66%) 23 (72%) NS
Female gender 16 (50%) 21 (66%) NS
Intrapartum maternal antibiotic treatment 6 (19%) 7 (22%) NS
Antenatal corticosteroids (two doses of betamethasone) 29 (91%) 31 (97%) NS
Cesarean section 23 (72%) 21 (66%) NS
Umbilical artery Ph 7.26 ± 0.12 7.28 ± 0.10 NS
Apgar score 5 mina 7 (6–8) 8 (7–8) NS
Surfactant replacement therapy 17 (53%) 18 (56%) NS
Patent ductus arteriosus (ibuprofen) 10 (31%) 7 (22%) NS
Patients with a central line in the 48 hr before CRASH 12 (38%) 7 (22%) NS
CRASH or ECM in postnatal age (d) 8.0 ± 5.6 6.9 ± 4.4 NS
CRASH or ECM at postmenstrual age (wk) 29.9 ± 1.3 29.9 ± 1.5 NS
Caffeine therapy prior to CRASH 31 (97%) 31 (97%) NS
Spontaneous ventilation prior to CRASH 4 (13%) 4 (13%) NS
Nasal continuous positive airway pressure prior to CRASH 26 (81%) 28 (87%) NS
Noninvasive positive airway pressure ventilation prior to CRASH 2 (6%) 0 (0%) NS
Artificial ventilation prior to CRASH 0 (0%) 0 (0%) NS
Mechanically ventilated after CRASH/ECM 13 (41%) 0 (0%) < 0.01
In-hospital mortality 2 (6.3%) 0 (%) NS
CRASH = Cultures, Resuscitation, and Antibiotics Started Here, ECM = equivalent crash moment, NS = not significant.
aThe Apgar score is a standard newborn status score ranging from 0 to 10, based on Appearance (skin color), Pulse, Grimace, Activity, and Respiration.
Categorical variables in n (%) and continuous variables are expressed as mean ± sd or median (interquartile range).
Paired Student t test, signed rank test, or Fisher exact test.

Figure 1.
Figure 1.:
Illustration of the Cultures, Resuscitation, and Antibiotics Started Here (CRASH) moment and equivalent crash moment (ECM) after matching. LOS = late-onset sepsis.

ML Models

Following common practices in ML, features to represent the patient’s state were defined and calculated. This was done for intervals of 1 hour in order to assess changes in the signals over time.

Electrocardiogram processing using peak detection to acquire RR (the intervals between consecutive R-wave peaks in QRS-complexes) intervals was similar to our previous work (15): to measure HRV features, R-wave peaks were detected using a peak detection algorithm, and the normal-to-normal (NN) intervals were calculated. Next, time domain features were obtained by calculating the sd of the NN intervals (SDNN), the square root of the mean of the squares of successive differences between adjacent NN intervals (RMSSD), the percentage of decelerations (pDec) defined as the percentage of NN intervals larger than the mean NN interval of the past 5 minutes and the SDDec as the sd of all NN intervals that contribute to pDec. The latter two features thus focusing on variations in HRV due to decelerations.

In addition, phase-rectified signal averaging is used to quantify the coherence time of each quasi-periodicity in the signal, while the signal is in a certain phase with regard to an anchor point, like acceleration or deceleration. The average deceleration response and the average acceleration response (AAR), two representative features of the waveform are defined as the average of the difference between the 50 data points after the anchor point (including the anchor point itself) and before the anchor point.

For respiration related features, similar to our previous article, the CI signal was first filtered to remove all cardiac artifacts and peaks were detected to determine normal intervals for respiration, similar to electrocardiogram processing. The mean of respiratory rate, the interdecile range of respiratory rate (IDR), the sd of respiratory rate (RespSD), and the skewness of respiratory rate (RespSkew) were calculated. RespSkew is a measure of the degree of distortion compared with a Gaussian distribution of the respiratory rate.

The last group of features concerns the ones quantifying motion, for which we use the signal instability index (SII), which is a nonparametric measure based on the kernel density estimate that can be applied to a band-pass filtered (0.001–0.40 Hz) electrocardiogram waveform to obtain an estimate of motion (the SII) every second (1 Hz) using the electrocardiogram data of the past 10 seconds. As described in our previous article, low values of the SII are indicative of the absence of movement, while higher values are a quantitative estimate of body movement. Three features—the IDR, sd, and the skewness of the SII—were used to quantify body movement or the lack thereof. Low values of the IDR and the sd represent a lack of movement (or lethargy), while large positive values of the skewness feature represent lethargy.

For each patient, the raw signal (e.g., RR intervals) was normalized, subtracting its mean and dividing by its deviation. Per feature, a mean value per time interval was then calculated per group, allowing us to assess differences between populations and variations over time. We do this for the 24 hours before and after the CRASH (or ECM).

A summary of the features is presented in Online Table II (Supplemental Digital Content, These were the best performing features as shown in our previous study (15). Each 1-hour interval is labeled as either LOS or C only if it comes from a patient of that group. Thus, we ensure separation between LOS and C patients.

A logistic regressor, a naive Bayes, and a nearest mean classifier, three classic and powerful ML approaches, were selected as our models. The logistic regressor classifier provides insight into the importance of each feature for the classification (sepsis vs control), which allows to interpret the results in terms useful for clinicians.

The features are separated into four subsets of features given their nature: 1) HRV, 2) movement, 3) respiration, and 4) all, which is a combination of the other three. For training and evaluation of the three classifiers, we used leave-k-subjects-out cross-validation. Thus, all samples for k subjects (sample randomly without replacement) are left out for independent testing while the rest of patients are used for training and parameter tuning, ensuring that all intervals used in the testing correspond to patients that were not seen during training. The process is repeated until all subjects are left out, and the mean and deviation results are reported.

We trained our classifiers using a 3-hour segment, consisting of the three 1-hour intervals preceding the CRASH moment per each patient. This 3-hour segment was based on the analysis for group differences per time, which showed the greatest differences between LOS and controls in features in the last 3 hours before CRASH.

For testing, we used all the 1-hour intervals preceding the CRASH for 24 hours, in segments of 1 hour. As evaluation metrics of the classification, for each cross-validation fold, we calculated the sensitivity and specificity values, the area under the receiver operating characteristic curve (AUC), accuracy, and precision (positive predictive value [PPV]) for all experiments. Accuracy = (TP + TN)/n and precision=TP/(TP + FP), where TPs = true positives, TNs = true negatives, FPs = false positives, and n = total number of samples. Using the accuracy is possible as our dataset is balanced (i.e., similar number of samples from each class).


Continuous variables are expressed as mean (sd) or median (interquartile range) in case of non-normal distribution. Differences between demographic variables in the matched-control group are tested with paired t test, signed rank test, or Fisher exact test, depending on whether the set was normally distributed or not. To evaluate significance in group differences (LOS and C) of the features, we used the Wilcoxon rank-sum test. A p value of less than 0.05 was considered significant. The evaluation metrics of the ML classification are stated above.


In Figure 2, we present the mean values per interval of different features for the LOS (red) and C (green) for the 24 hours before and after the CRASH or ECM. When noted (*) or (**), the intervals for the LOS are significantly different than the C with p value of less than 0.05 and p value of less than 0.01, respectively. In general, the plots show that there is a significant difference in the signals in the LOS and C when preceding the CRASH moment. Most features show a significant difference already 4 hours before this anchor point. In Online Figure I (Supplemental Digital Content,, these results are shown for all (nonmatched) 62 and 69 LOS and C patients.

Figure 2.
Figure 2.:
Time series analysis for late-onset sepsis (LOS) (red) and controls (green) for 32 matched patients (Online Figure I, Supplemental Digital Content,; 62 and 69 patients of the total cohorts before matching). The value displayed at each timepoint is the average value of the last hour preceding that timepoint. * and ** correspond to a significant difference between LOS and with p < 0.05 and p < 0.01, respectively. As the greatest differences for various features were observed approximately 3 hr before Cultures, Resuscitation, and Antibiotics Started Here (CRASH), 3-hr segments were used for training the classifiers. ECG = electrocardiogram, pDec = percentage of decelerations, RespIDR = interdecile range of respiratory rate, RespSD = the sd of respiratory rate, RMSSD = square root of the mean of the squares of successive differences between adjacent normal-to-normal intervals, SDDec = the sd of RR interval corresponding to percentage of decelerations, SDNN = sd of the RR interval, SII = signal instability index, SII-IDR = interdecile range of the SII, SII-Skew = skewness of the SII.

Figure 3 shows the accuracy per time interval for the ML experiments using the 3-hour segment for training and 1-hour segments preceding the timestamp for testing. At different time points, the accuracy for the performance on the test set is given. In this figure, the different plots represent a different set of features used while training. In general, the accuracy for the combination of all features (purple plot) is higher than all other combinations for up to about 6–9 hours before the CRASH moment. Additionally, there is an increase in the accuracy as the CRASH moment approaches, with the highest performance 3 hours before CRASH.

Figure 3.
Figure 3.:
Accuracy per time interval of 1 hr, based on the machine learning model trained using 3-hr segment. Results for: A, logistic regressor. B, Naive Bayes classifier. C, Nearest mean classifier. In general, the accuracy of combining all features is superior than the accuracy of a single feature. CRASH = Cultures, Resuscitation, and Antibiotics Started Here, HRV = heart rate variability.

In addition, Table 2 presents a summary of all evaluation metrics for the classification of intervals at 0, 1, 3, and 5 hours before the CRASH, using all features combined. Similar to Figure 3, the best results are found for the intervals at 3 hours before the CRASH.

TABLE 2. - Evaluation Metrics (Mean ± sd per Fold) for the Classification of Proven Sepsis for Different Times Before Cultures, Resuscitation, and Antibiotics Started Here (or Equivalent Crash Moment), Using All Features and the 3-Hour Segment for Training
Machine Learning Method Time Before Cultures, Resuscitation, and Antibiotics Started Here Accuracy Area Under the Receiver Operating Characteristic Curve Sensitivity Specificity Precision (Positive Predictive Value)
Logistic regressor 0 hr 0.71 ± 0.17 0.71 ± 0.12 0.68 ± 0.21 0.74 ± 0.19 0.74 ± 0.22
1 hr 0.66 ± 0.16 0.66 ± 0.21 0.65 ± 0.22 0.68 ± 0.18 0.67 ± 0.16
3 hr 0.79 ± 0.12 0.79 ± 0.11 0.78 ± 0.07 0.80 ± 0.22 0.82 ± 0.18
5 hr 0.76 ± 0.16 0.76 ± 0.20 0.82 ± 0.19 0.70 ± 0.18 0.73 ± 0.16
Naive Bayes 0 hr 0.67 ± 0.10 0.67 ± 0.04 0.63 ± 0.14 0.71 ± 0.22 0.73 ± 0.19
1 hr 0.62 ± 0.13 0.62 ± 0.06 0.53 ± 0.13 0.72 ± 0.22 0.69 ± 0.22
3 hr 0.71 ± 0.10 0.71 ± 0.05 0.68 ± 0.09 0.74 ± 0.15 0.73 ± 0.13
5 hr 0.69 ± 0.10 0.69 ± 0.08 0.66 ± 0.16 0.71 ± 0.15 0.69 ± 0.14
Nearest mean 0 hr 0.63 ± 0.10 0.63 ± 0.15 0.63 ± 0.14 0.64 ± 0.22 0.68 ± 0.19
1 hr 0.72 ± 0.10 0.72 ± 0.09 0.68 ± 0.17 0.75 ± 0.08 0.73 ± 0.09
3 hr 0.70 ± 0.14 0.70 ± 0.19 0.67 ± 0.13 0.74 ± 0.19 0.73 ± 0.20
5 hr 0.68 ± 0.13 0.68 ± 0.15 0.66 ± 0.14 0.68 ± 0.17 0.68 ± 0.14
Sensitivity: true positive over total positives; specificity: true negatives over total negatives; and precision or positive predictive value: true positive over predicted positives.
Accuracy column has the same values as in Figure 3.
Boldface values indicate the best scores per column.

When analyzing the weights of importance for our logistic regressor classifier trained with all features irrespective of the category, we found that RMSSD and AAR (HRV subset), and RespSD and RespIDR (both from respiration subset) are the four most important features for the decision made by the classifier.


In this study, we applied ML to discriminate between time intervals of LOS and C patients in the hours before sepsis, using data obtained from routine patient monitoring. In HRV, respiration, and body motion features, differences between the LOS and C are visible up to 4 hours preceding the CRASH moment. For all classifiers, the combination of all features performs better in the ML experiments than using the features separately, showing the importance of combining different vital signs that measure different physiologic parameters. The observed accuracies for predicting sepsis 3 and 5 hours before sepsis are not optimal yet, but show potential of using routine patient monitoring to predict LOS in an early stage, similar to findings in other studies (7,8,12,13), in particular, if features from different type of signals are combined. This is in line with literature: the importance of HRV parameters as indicator for LOS has been investigated thoroughly (11,12) and research also showed changes in respiration preceding sepsis (14). In addition, lethargy is known to be an important indicator of sepsis (10).

We observed that traditional time-domain HRV measures (RMSDD, SDNN) vary in nonseptic patients, while tailored HRV parameters (pDec, SDDec), focusing more on unbalanced or asymmetric HRV by zooming in on decelerations, showed more robust differences. We also observed that after CRASH, nearly all features change direction (decrease or increase) rather drastically, probably because of clinical interventions (intubation, volume expansion, or medication) related to CRASH. However, changes can already be seen just before CRASH. We speculate that this is due to nurse handling as a response to the increasing events of apneas associated with sepsis. As a result, the electrocardiogram signal is “contaminated” with external influences making it less reliable for prediction at 1 hour before CRASH.

In recent meta-analysis surveys, the diagnostic accuracy assessment of existing prediction models for sepsis was reviewed for adults by Fleuren et al (9) and for neonates by Verstraete et al (10), the latter comparing 12 prediction models based on statistical models. Features used in these works rely strongly on blood culture analysis, with temperature as the only monitoring signal regularly used. In addition, works carried out by Moorman et al (11,13), Griffin and Moorman (7), and Griffin et al (8,21) consistently reported the benefits of using HRV features with patient demographics in a prediction score scheme to predict upcoming sepsis in neonates. Of all these models, three prediction models reviewed by Verstraete et al (10) showed a sensitivity of at least 95%, but with much lower values for specificity and PPV. In comparison, our ML models do not have as high values for sensitivity, with a maximum of 0.82 ± 0.19, but present uniform performances between sensitivity, specificity, precision, and overall accuracy or AUC (Table 2).

The main advantage of using ML models is that ML can be used to predict a class or response specifically for unseen data while learning from a different set or cohort, applying techniques to avoid bias toward the training data. Only few works have used ML-based methods to predict sepsis in neonates, as most efforts are focused on adults (9). Stanculescu et al (22) used critical alarm information, either directly annotated (23) or inferred from the monitoring signals, and a time-dependent method called Hidden Markov models to predict sepsis in neonates. Furthermore, Gómez et al (24) used power spectral features derived from HRV and several ML methods to detect patients with sepsis with rather promising results. The latter is perhaps the most complete work in the literature using ML to predict sepsis in neonates. They reported a sensitivity of 0.94, specificity of 0.94, PPV of 0.95, and an AUC equal to 0.94. However, this work focuses on early-onset sepsis (first 48 hr of life), while our focus is on late-onset sepsis.

One of the main strengths of our study is that we use a separate cohort of patients as control group, which guarantees that samples in the control group are not unknowingly contaminated and affecting the results. Furthermore, with our matching scheme, we have patients with comparable maturation, allowing assessment of differences in the features as they approach the CRASH/ECM moment, as seen in Figure 2. These results imply that such feature changes are related with the onset of sepsis.

Defining the onset of sepsis with a CRASH moment has its limitations. Clinicians may suspect an upcoming deterioration several hours prior to the start of antibiotic treatment, which suggests that the ideal timing of a predictive alert may be anchored on clinical suspicion rather than clinical intervention. Therefore, increasing the time window for an alerting analytic is useful, although the usefulness of the alerting analytic is a trade-off between sensitivity and specificity. Although our analytics do not have very high values for sensitivity (approximately 0.80, 3–5 hr before CRASH), it present uniform performances between sensitivity, specificity, precision, and overall accuracy. Additionally, one of the main limitations of this work is the rather small amount of data left for training our models when applying our matching process. Furthermore, due to the restrictive use of skin electrodes in very preterm infant in the first week of life, only a part of the LOS and control cases could be used for the matching procedure. However, the demographic and causative agents are representative for a real-life NICU environment. Therefore, we knowingly had to choose simple (yet still powerful) ML models to compensate for the lack of data. This becomes another limitation of our work, as other efforts have presented a wider variety of models (24).

Our ML methods could be improved by collecting more data from different patients, allowing us to maintain a one-to-one matching while having more samples to train our models, and perhaps leveraging more complex methods such as those used in (24). Also, other pathologies such as necrotizing enterocolitis need to be studied separately, as they can become potential false positives for a sepsis prediction (14).

Future efforts might exploit the use of more features and types of sensors to further improve the early detection. Some examples might include temperature or frequency-based analysis of the RR intervals, which have shown to vary in the presence of sepsis (8,22, 24–26). Similarly, we have not yet used gestational age and postnatal age for classification, although these parameters have improved model performance in the past (8,21).


In conclusion, in this study, we used ML on high resolution vital sign data obtained from routine patient monitoring to predict LOS. We showed that combining features representing the complex clinical vignette of neonatal sepsis in preterm infants leads to better prediction performances for up to 5 hours before the CRASH moment of sepsis, suggesting that mortality due to LOS could be reduced by enabling preemptive action.


1. Stoll BJ, Hansen NI, Bell EF, et al.; Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network. Trends in care practices, morbidity, and mortality of extremely preterm neonates, 1993-2012. JAMA. 2015; 314:1039–1051
2. Mahieu LM, Buitenweg N, Beutels P, et al. Additional hospital stay and charges due to hospital-acquired infections in a neonatal intensive care unit. J Hosp Infect. 2001; 47:223–229
3. Stoll BJ, Hansen NI, Adams-Chapman I, et al.; National Institute of Child Health and Human Development Neonatal Research Network. Neurodevelopmental and growth impairment among extremely low-birth-weight infants with neonatal infection. JAMA. 2004; 292:2357–2365
4. Shane AL, Sánchez PJ, Stoll BJ. Neonatal sepsis. Lancet. 2017; 390:1770–1780
5. Schelonka RL, Chai MK, Yoder BA, et al. Volume of blood required to detect common neonatal pathogens. J Pediatr. 1996; 129:275–278
6. Dong Y, Speer CP. Late-onset neonatal sepsis: Recent developments. Arch Dis Child Fetal Neonatal Ed. 2015; 100:F257–F263
7. Griffin MP, Moorman JR. Toward the early diagnosis of neonatal sepsis and sepsis-like illness using novel heart rate analysis. Pediatrics. 2001; 107:97–104
8. Griffin MP, Lake DE, Moorman JR. Heart rate characteristics and laboratory tests in neonatal sepsis. Pediatrics. 2005; 115:937–941
9. Fleuren LM, Klausch TLT, Zwager CL, et al. Machine learning for the prediction of sepsis: A systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 2020; 46:383–400
10. Verstraete EH, Blot K, Mahieu L, et al. Prediction models for neonatal health care-associated sepsis: A meta-analysis. Pediatrics. 2015; 135:e1002–e1014
11. Moorman JR, Rusin CE, Lee H, et al. Predictive monitoring for early detection of subacute potentially catastrophic illnesses in critical care. Annu Int Conf IEEE Eng Med Biol Soc. 2011; 2011:5515–5518
12. Fairchild KD. Predictive monitoring for early detection of sepsis in neonatal ICU patient. Curr Opin Pediatr. 2013; 25:172–179
13. Moorman JR, Carlo WA, Kattwinkel J, et al. Mortality reduction by heart rate characteristic monitoring in very low birth weight neonates: A randomized trial. J Pediatr. 2011; 159:900–906.e1
14. Fairchild KD, Lake DE, Kattwinkel J, et al. Vital signs and their cross-correlation in sepsis and NEC: A study of 1065 VLBW infants in two NICUs. Ped Res. 2017; 81:315–321
15. Joshi R, Kommers D, Oosterwijk L, et al. Predicting neonatal sepsis using features of heart rate variability, respiratory characteristics and ECG-derived estimates of infant motion. IEEE J Biomed Health Inform. 2019; 24:681–692
16. Sullivan BA, Fairchild KD. Predictive monitoring for sepsis and necrotizing enterocolitis to prevent shock. Semin Fetal Neonat Med. 2015; 20:255–261
17. Stoll BJ, Hansen N, Fanaroff AA, et al. Late-onset sepsis in very low birth weight neonates: The experience of the NICHD Neonatal Research Network. Pediatrics. 2002; 110:285–291
18. Vermont Oxford Network. Manual of Operations: Part 2. Data Definitions & Infant Data Forms. Release 22.0. 2007Available at: Accessed October 2018
19. Hofer N, Zacharias E, Müller W, et al. An update on the use of C-reactive protein in early-onset neonatal sepsis: Current insights and new tasks. Neonatology. 2012; 102:25–36
20. Hemels MA, van den Hoogen A, Verboon-Maciolek MA, et al. Shortening the antibiotic course for the treatment of neonatal coagulase-negative staphylococcal sepsis: Fine with three days? Neonatology. 2012; 101:101–105
21. Griffin MP, O’Shea TM, Bissonette EA, et al. Abnormal heart rate characteristics preceding neonatal sepsis and sepsis-like illness. Pediatr Res. 2003; 53:920–926
22. Stanculescu I, Williams CKI, Freer Y. A hierarchical switching linear dynamical system applied to the detection of sepsis in neonatal condition monitoring. July 23–27, 2014, Conference on Uncertainty in Artificial Intelligence (UIA), UAI, Quebec City, Canada, pp 752–761
23. Stanculescu I, Williams CK, Freer Y. Autoregressive hidden Markov models for the early detection of neonatal sepsis. IEEE J Biomed Health Inform. 2014; 18:1560–1570
24. Gómez R, García N, Collantes G, et al. Development of a non-invasive procedure to early detect neonatal sepsis using HRV monitoring and machine learning algorithms. June 5–7, 2019, IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS), Cordoba, Spain, pp 132–137
25. Mahieu LM, De Muynck AO, De Dooy JJ, et al. Prediction of nosocomial sepsis in neonates by means of a computer-weighted bedside scoring system (NOSEP score). Crit Care Med. 2000; 28:2026–2033
26. Joshi R, Bierling BL, Long X, et al. A ballistographic approach for continuous and non-obtrusive monitoring of movement in neonates. IEEE J Transl Eng Health Med. 2018; 6:2700809

    infant; premature; intensive care units; neonatal; machine learning; monitoring; physiologic; predictive value of tests; sepsis

    Supplemental Digital Content

    Copyright © 2021 The Authors. Published by Wolters Kluwer Health, Inc. on behalf of the Society of Critical Care Medicine.