The ideal of continuous, accurate measurement of pulse oximetry is rarely achieved in the clinical environment. The performance of conventional pulse oximeters is affected by many factors, including motion, poor tissue perfusion, ambient light, and electromagnetic interference (1). Performance limitations have been reported in multiple clinical studies. A prospective evaluation of 20,802 adults in the operating room (OR) and recovery room revealed an overall failure rate of 2.5%; the failure rate increased to 7.2% in patients with ASA physical status IV (2). Freund et al. (3) found that once oximetry failed intraoperatively, data would not be provided for 32% of the mean anesthesia time. Wiklund et al. (4) in a study of adults in the recovery room, found an average alarm frequency of 7%, 77% of which were false and caused by motion, low perfusion, or sensor displacement problems. In the pediatric intensive care unit (PICU) 71% of pulse oximeter alarms were false (5).
Over the last decade, several pulse oximeter manufacturers have worked diligently to improve oximetry performance. Nellcor (Pleasanton, CA) has introduced the N-395 and, more recently, the N-595 oximeters as improvements in low signal performance. Masimo Corporation (Mission Viejo, CA) pioneered advances in oximetry performance during challenging clinical conditions of motion and poor tissue perfusion with their Signal Extraction Technology (SET®) (6). Clinical studies performed to validate improved performance of the new technologies relative to conventional pulse oximetry offer conflicting views as to the relative superiority of the new technologies (7–11).
Improvements in pulse oximetry technology are directed at artifact rejection during noisy signal conditions, with the putative advantage of enhanced patient safety through better detection of true desaturation events and fewer false alarms. Although compelling evidence in both laboratory and clinical environments suggests improved performance of new oximetry technology in detecting hypoxemic events during motion, this improved performance may come at a price. The improvement in the false alarm rate achieved with improved technology may be associated with a reduced or delayed detection of new alarms (12) or with inappropriate rejection of good data. The degree of hypoxemia may adversely impact oximeter performance as well (13). Previous works have only partially addressed the issues of pulse oximetry accuracy at differing saturations (14–22).
Because rejection algorithms may differentially affect data reporting, the aim of this study was to compare data availability, signal heuristics, and agreement of two new oximeter technologies compared with an older standard. This study uses a novel means of assessing overall oximetry performance across a spectrum of conditions of signal integrity and arterial oxygen saturation.
A blinded side-by-side comparison of Masimo SET® (software version 3.0), Nellcor N-395 (software version 18.104.22.168), and GE Marquette Solar 8000 (Nellcor second generation N200 equivalent) oximeters was performed using an IRB approved protocol, and written informed consent was obtained for each patient enrolled in the study. The Solar 8000 technology was the unblinded clinical standard device used for patient care. The Masimo SET® oximeter was set in the default mode of 8-s averaging for all but the sleep studies (for which it was set in the 2-s FastSat® averaging mode). The Nellcor N-395 averaging mode is not user selectable. Clinical staff placed the patient sensors at sites appropriate for each clinical situation. Masimo LNOP® Neo sensors were used for the Masimo oximeter and Nellcor Oxisensor® II N-25 sensors were used with both the N-395 and Solar 8000 oximeters.
Both newer devices provide indicators of data quality as determined by proprietary analyses of signal characteristics. The prime signal heuristic available from Masimo is Signal IQ (SIQ), a linear metric of pulse oximetry signal quality, scaled from 0 to 1. When signal quality is questionable, Masimo reports a “Low Signal IQ” (LoSIQ) message (correlating to SIQ <0.3) and provides a constant readout of SIQ across the range of measured saturations (23). Signal heuristics available from the Nellcor N-395 oximeter include motion (MOT) and pulse search (PS). The MOT indicator lights with possible motion artifact; the PS indicator lights during prolonged and challenging monitoring conditions and flashes during a loss-of-pulse signal (24). No signal heuristics were available from the Solar 8000 for analysis. Our study tested MOT, PS, and SIQ as signal quality indicators. We defined “high-quality signal” as the absence of a signal “flag,” e.g, the absence of MOT, PS, or LoSIQ, and SIQ >0.9.
Data from all three sources were acquired simultaneously via a personal computer at 62.5 Hz using proprietary software, exported at a sample rate of 1 Hz, and further reduced to 10-s samples for analysis of time of data availability, measures of agreement, and signal heuristics and warnings stratified by both signal integrity and Spo2. Paired comparison among all three devices was by χ2 for discreet data and analysis of variance for continuous data with two-sample or paired Student’s t-tests assuming unequal variance for post hoc comparisons as appropriate. Probabilities of flagged data were compared by the χ2 test and binomial exact confidence intervals were calculated. Bias (device − [average of devices]) and limits of agreement (bias ± 2 sd) among devices was tested by Bland-Altman methodology across discreet intervals of Spo2 and SIQ (25). Differences between bias and limits of agreement (precision) were tested by Student’s t-test and variance-ratio F-test, respectively. Significant differences are reported at P < 0.05, P < 0.005, and P < 0.0005. All statistical calculations were performed with Stata statistical software (Version 7; StataCorp, College Station, TX).
The study period for each patient was begun when all three oximetry sensors were attached and reporting data and concluded when the first sensor was removed. Data acquired during cardiopulmonary bypass (one patient) were excluded from analysis, as well as data acquired during periods when sensors were disconnected (e.g., during site rotation). Data and alarms from the test devices were not observable by clinical staff. Co-oximetry data were not collected during this study.
The subjects ranged in age from 2 days to 15 yr, with 8 patients <2 mo old. Weights ranged from 1.4 kg to 79.8 kg with 9 patients <3 kg. Twenty-three patients were studied in acute care settings (neonatal intensive care unit, PICU, and OR) and four in the sleep laboratory. Total study duration was 85.6 h (2.74 ± 2.32 h/patient).
Data Reporting and Signal Heuristics
Table 1 shows the frequency of signal heuristics by device and by care setting. Of the 344,388 s of data acquired, 308,301 (89.54%) were available for analysis after exclusion of time periods when all sensors were not in place or during cardiopulmonary bypass (one patient). Data reporting was highest for the Solar 8000 (Nellcor second generation equivalent device), displaying data 98.7% of overall time, compared with 98.4% for the Masimo SET and Nellcor N-395 (P < 0.001). Masimo reported data 97.1% of time overall with no signal heuristic flag (98.2% in the sleep lab and 96.6% in acute care). Nellcor N-395 reported data 84.7% of time overall with no signal heuristic flag (90.7% in the sleep lab and 81.3% in acute care). Masimo’s LoSIQ message flagged 1.2% of overall data (0.3% in the sleep lab and 1.8% in acute care). Nellcor N-395 MOT message flagged 13.7% of overall data (7.7% in the sleep lab and 17.1% in acute care). Nellcor PS message flagged 0.1% of overall data (0.0% in the sleep lab and 0.1% in acute care) and for the purpose of subsequent analysis was incorporated into the “no data” category.
Effect of Signal Heuristics on Clean Data Availability
Because algorithmic data rejection is reported by both manufacturers to be an important advance in technology, we examined the relationship between a continuous measure of SIQ available on the Masimo device and the presence or absence of quality flags. The graphical representation of this analysis is shown in Figure 1. As expected, data rejection by the Masimo device sharply changed at SIQ decreased to <0.3. For the N395, there was an almost linear relationship between SIQ and the probability of displaying unflagged data. For the Solar 8000 N200 equivalent device, data were displayed until SIQ approached zero. The distinct shapes of the curves for the different devices confirm the presence of differential rejection algorithms.
Effect of Saturation on Clean Data Availability
When confronted with poor quality signals, the amplitude of noise or motion-induced changes in detected light will far exceed the amplitude of the arterial signal, such that the red to infrared ratio calculated by a pulse oximeter will approach one, resulting in a reported saturation of approximately 85%, the isobestic point. Because of the characteristic to report artifactual data near the isobestic saturation, we examined the relationship between reported saturation and the presence of quality flags. This relationship is shown for the three devices in Figure 2. Both Masimo and N-395 were more likely to flag data as suspect as reported saturation decreased but there was no evidence of increased flagging in the isobestic range.
Effect of Signal Heuristics on Accuracy
In Table 2, agreement among devices by bias and precision analysis according to displayed heuristics is tested. With no Masimo signal heuristic flag, the Masimo-N395 device pair exhibits similar bias and precision to the Masimo-N200 pair. In the presence of the LoSIQ flag, performance of each pairing worsened as compared with the unflagged condition. These results are illustrated in Figure 3, where agreement between the Masimo-N395 and Masimo-N200 pairs are observed to deteriorate at SIQ <0.3. Not surprisingly, the N395-N200 pair has a relatively small bias across a range of SIQ but does demonstrate a significant reduction in precision at Low SIQ.
Performance differences were also compared according to Nellcor heuristic flags. With no Nellcor flag, the Masimo-N395 pair exhibits a higher bias relative to the Masimo-N200 pair and the Masimo-N200 pair has a lower bias (but broader confidence interval) than the N395-N200 pair. Agreement deteriorated during MOT and PS conditions for the Masimo-N395 pair and the Masimo-N200 pair. When only “high quality” signal (no flags from either manufacturer) was analyzed, similar bias and precision was observed between device pairings, whereas agreement between any two devices was maximized.
Effect of Oxygen Saturation on Accuracy
The reported Spo2 also affected agreement between devices. In Figure 4, the difference between devices (bias) is plotted as a function of 10% increments of average Spo2, using all available signal data. Agreement between devices begins to deteriorate at an Spo2 of less than 80% and grows progressively worse at lower saturations. Confidence intervals are much broader at lower saturations when flagged data are incorporated into the analysis. Figure 5 shows limits of agreement for the Masimo-N395 pairing and reveals deterioration of reliability at Spo2 <70%.
Under optimal signal conditions, with no signal heuristic displayed, there is little difference in precision and bias between the two newer technologies, but agreement between devices was significantly affected by SIQ, MOT, and hypoxemia. Our calculated overall bias between Masimo-N395 (0.20 ± 3.01) is consistent with previous publications when compared with co-oximetry (19). However, with poor signal conditions or with hypoxemia, agreement between the new devices deteriorated. Without co-oximetry data, it is impossible to tell which, if any, oximeters are accurate under these conditions. Both new technologies had heuristics that identify data with better agreement. Both Masimo’s LoSIQ and Nellcor’s PS select out a relatively small number of signals with higher bias and broader confidence limits. Additionally, the N395 MOT heuristic selects a substantial number of signals (13.7% overall and 17.1% in the acute care setting) with broader confidence intervals and less precision. Overall, the Masimo device posted less questionable data than the Nellcor N-395 device. It appears that the Nellcor MOT indicator is more sensitive but less specific than the Masimo LoSIQ flag. These results are consistent with laboratory-derived receiver operator characteristics of both devices (7).
We found that the Solar 8000 second-generation oximetry technology reported more data (failing to post data only 1.3% of time) than either of the newer technologies. Failure to report oximetry data may be clinically correlated with missed desaturations as well as missed changes in saturation, both in frequency and duration. Bohnhorst et al. (12) studied 17 unsedated preterm infants, comparing conventional pulse oximetry and two new generation pulse oximeters relative to transcutaneous partial pressure of oxygen. The conventional pulse oximeter detected all hypoxemia episodes, whereas 5.4% of the episodes were missed by the Nellcor OXIMSART instrument and 0.5% were missed by the Masimo instrument. With older technology and no signal heuristic reporting, data validity is left to the clinician’s interpretation.
Pulse oximetry is an important tool in detection of sleep-disordered breathing. In our sleep lab, Masimo posted more unflagged data than Nellcor N-395 (98.2% versus 90.7%). Brouillette et al. (26), in a sleep study comparing Masimo SET® and Nellcor N-395 oximeters, found that the Masimo oximeters register many fewer false desaturations because of MOT artifact. Of the 75 desaturation events not associated with movement artifact, Masimo detected 98.6% of events whereas the Nellcor N-395 oximeter detected 45.3% (P < 0.01). In our sleep lab analysis, Masimo signal averaging was set to 2 seconds to facilitate identification of brief desaturations. This introduces a potential source of bias when comparing oximeters with differing signal-averaging times.
In an effort to further characterize performance of the Masimo and Nellcor N-395 oximeters, we stratified the data by epochs of Spo2 and SIQ. SIQ was chosen as a reference because SIQ is the only available signal heuristic reference that provides a continuous assessment of data integrity. Based on the different shapes of the “probability of good data” curves presented in Figure 1, differences in the Masimo and Nellcor signal analysis algorithms produced differential probabilities of good signal conditions across the range of SIQ. Masimo has high probability of reporting good data down above SIQ of 0.3, whereas N395 algorithms for PS and MOT rejected more data at higher SIQ, reflecting differing rejection strategies. The Solar 8000 oximeter actually displayed more data at SIQ <0.4 than either the Masimo device or the N395. This is probably related to the limited ability of the older technology to reject data of questionable integrity.
Similarly, Figure 2 shows a decrease in oximetry performance at lower saturations for all devices. This is consistent with previous reports (14–22). A study of Nellcor N-395 and Masimo SET® oximeters compared to measured co-oximetry oxygen saturations in 25 children with low perfusion (as evidenced by increased serum lactic acid) showed that the absolute difference between measured and calculated saturations was significantly more for both pulse oximeters when Sao2 was <90% compared to Sao2 ≥90%(13). Differential data rejection may contribute to differences in reported saturations.
Although overall agreement between new devices was good, interval analysis reveals significant differences at low SIQ and low Spo2. In Figures 3 and 4, agreement between Masimo and Nellcor N-395 decreases both at SIQ ≤0.3 and at Spo2<70 (as reflected by the widening of confidence intervals below this level). Data points in the 90%–100% range (83.8% of all time in our data set) contribute to the high overall agreement between devices. However, clinically relevant disagreement occurs much more frequently during periods of desaturation. The wide limits of agreement emphasize the clinical limitations in pulse oximetry technology despite improvements in artifact rejection. Because many fewer data points were available for analysis at Spo2 <70% (2.9% of data), the disagreement between devices at low saturation is underestimated by conventional bias and precisions analysis over the whole saturation range and may be more clinically significant when monitoring chronically desaturated patients. Although we observed disagreement between oximeters at lower saturations, it is possible that one oximeter could be more accurate in this range. Analysis of agreement between devices would be facilitated by collection of co-oximetry data. Variability in data collection could be reduced by tighter control of patient ages, clinical problems, and care area heterogeneity.
In conclusion, the purported clinical advantage of new oximetry technology is the reduction of missed desaturations and reduction in false alarms. Both Masimo SET® and Nellcor N-395 flagged questionable data but the use of different artifact rejection algorithms resulted in different probabilities of presenting good data. With SIQ >0.3, Masimo SET® reported significantly more unflagged data than Nellcor N-395. Agreement between devices deteriorated slightly and differently with available heuristics. Agreement between the newer devices deteriorated with lower signal integrity and lower Spo2 such that the devices are not clinically equivalent. Even under optimal signal conditions, when all tested technologies exhibited similar bias, the wide limits of agreement revealed continued limitations in pulse oximetry technology.
1. Trivedi NS, Ghouri AF, Shah NK, et al. Effects of motion, ambient light, and hypoperfusion on pulse oximeter function. J Clin Anesth 1997; 9: 179–83.
2. Moller JT, Pederson T, Rasmussen LS, et al. Randomized evaluation of pulse oximetry in 20,802 patients: I. Anesthesiology 1993; 78: 436–44.
3. Freund PR, Overand PT, Cooper J, et al. A prospective study of intraoperative pulse oximetry failure. J Clin Monit 1991; 7: 253–8.
4. Wiklund L, Hok B, Stahl K, Jordeby-Jonsson A. Postanesthesia monitoring revisited: incidence of true and false alarms from different monitoring devices. J Clin Anesth 1994; 67: 182–8.
5. Lawless ST. Crying wolf: false alarms in a pediatric intensive care unit. Crit care Med 1994; 22: 981–5.
6. Goldman JM, Petterson MT, Kopotic RJ, Barker SJ. Masimo signal extraction pulse oximetry. J Clin Monit 2000; 16: 475–83.
7. Barker SJ. “Motion-resistant” pulse oximetry: a comparison of new and old models. Anesth Analg 2002; 95: 967–72.
8. Jopling JW, Mannheimer PD, Bebout DE. Sensitivity and specificity performance during motion artifact in three pulse oximeters designed for use in motion [abstract]. Anesthesiology 2000; 93(3A): A585.
9. Jopling MW, Mannheimer PD, Bebout DE. Issues in the laboratory evaluation of pulse oximeter performance. Anesth Analg 2002; 94: S62–8.
10. Malviya S, Reynolds PI, Voepel-Lewis T, et al. False alarms and sensitivity of conventional pulse oximetry versus the Masimo SET technology in the pediatric postanesthesia care unit. Anesth Analg 2000; 90: 1336–40.
11. Hay WW, Rodden DJ, Collins SM, et al. Reliability of conventional and new pulse oximetry in neonatal patients. J Perinatol 2002; 22: 360–6.
12. Bohnhorst B, Peter CS, Poets CF. Pulse oximeters’ reliability in detecting hypoxemia and bradycardia: Comparison between a conventional and two new generation oximeters. Crit Care Med 2000; 28: 1565–8.
13. Torres A, Skender K, Worhrley J, et al. Assessment of 2 new generation pulse oximeters during low perfusion in children [abstract]. Crit Care Med 2001; 29: A117.
14. Robertson FA, Hoffman GM. Effects of signal integrity and saturation on accuracy of Masimo SET® and Nellcor N395 pulse oximeters [abstract]. Anesthesiology 2002; 96: A555.
15. Robertson FA, Hoffman GM. Clinical evaluation of Masimo SET® and Nellcor N395 oximeters during optimal signal conditions in difficult-to-monitor neonates [abstract]. Anesthesiology 2002; 96: A556.
16. Robertson FA, Hoffman GM. Effects of signal integrity and saturation on data availability in Masimo SET® and Nellcor N395 pulse oximeters [abstract]. Anesthesiology 2002; 96: A599.
17. Jay GD, Hughes L, Renzi FP. Pulse oximetry is accurate in acute anemia from hemorrhage. Ann Emerg Med 1994; 24: 32–5.
18. Severinghaus JW, Koh SO. Effect of anemia on pulse oximeter accuracy at low saturation. J Clin Monit 1990; 6: 85–8.
19. Wouters PF, Gehring H, Meyfroidt G, et al. Accuracy of pulse oximeters: the European multi-center trial. Anesth Analg 2002; 94: S13–6.
20. Schmitt HJ, Schueta WH, Proeschel PA, Jaklin C. Accuracy of pulse oximetry in children with cyanotic congenital heart disease. J Cardiothorac Vasc Anesth 1993; 7: 61–5.
21. Thrush D, Hodges MR. Accuracy of pulse oximetry during hypoxemia. South Med J 1994; 87: 518–21.
22. Gehring H, Holger M, Reigchert, et al. The accuracy of a new generation of pulse oximeters [abstract]. Anesthesiology 2002; 96: A561.
23. Radical signal extraction pulse oximeter operator’s manual. Irvine, CA: Masimo Corporation, 2000.
24. Nellcor operator’s manual, N-395 pulse oximeter. St. Louis, MO: Mallinckrodt Inc., 2000.
25. Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Statistician 1983; 32: 307–17.
© 2004 International Anesthesia Research Society
26. Brouillette RT, Lavergne J, Leimanis A. Differences in pulse oximetry technology can affect detection of sleep-disordered breathing in children. Anesth Analg 2002; 94: S47–53.