Analogous to the temporal medians for Spo2,ref, and RR, we also defined a single Q value for each recording, as the temporal median of Q(t). As shown in the Results section, a threshold value Qthr was determined as an objective measure to only use reliable data in the calibration procedure.
Statistical Approach and Analysis
Before our measurements, we faced many unknowns such as variations in PPG signal strength among individuals, subject motion, and the extent to which hypoxia could be induced in the normobaric hypoxic tent. Rather than conducting a pilot study to investigate each of these unknowns, we recruited relatively large numbers of subjects for studies N and H (25 each), in the anticipation that this would provide a sufficiently large and clean data set to determine a stable linear regression between RR and Spo2,ref. Thus, we enrolled many subjects in the study and determined a posteriori if we succeeded in establishing a stable linear regression. This process is described in the Results section.
All measurements for studies N and H were performed consecutively without intermediate analysis to adjust the study design. Inclusion and exclusion criteria for studies N, H, and C are in Supplemental Digital Content, Supplemental Table 3 (http://links.lww.com/AA/B428). For study N, all 25 recruited individuals were included while for study H, 21 individuals were included after checking for inclusion and exclusion criteria. For study C, we were somewhat limited in inclusion because the climate room had limited availability. The purpose of study C was exploratory.
For studies N and H, an objective measure was used to discard recordings with unreliable RR. As shown in the Results section, indeed the 17 cleanest data points resulted in a regression line that was essentially the same as the regression for the 31 cleanest data points by which we mean that any error due to variation in regression was irrelevant (much smaller) compared with inaccuracies caused by variations among subjects. The determination of C1 and C2 was eventually based on the 31 cleanest data points. Next, with these calibration constants we proceeded to answer the overall study goal (Can a single calibration curve provide Spo2 estimates for a population of individuals with acceptable accuracy?). For this, we adopted Arms, a metric used for pulse oximeters to describe accuracy (ISO 80601-2-61, 2011, section 18.104.22.168.2.2). An * superscript is used to indicate at least 1 major difference with the International Organization for Standardization (ISO) standard, which is that in our formulation (Equation 3) we used median values of the traces, thus discarding short-term errors.
Because, in our calibration approach, this expression was mathematically similar to that of a standard deviation, we used the χ2 approach to calculate an error estimate for A*rms.
A mean difference or “bias” was according to Equation 4, also following the ISO standard.
In study C, the protocol was repeated once for each of the 5 individuals. To assess statistical significance among parameters (eg, DC reflectance, PPG signal strength or Spo2,cam–Spo2,ref), caused by exposure to different temperatures, we averaged the values from the 2 sessions and then treated these averages as 5 independent samples (from 5 individuals).
Throughout this study, we reported 99% confidence intervals (CIs) because no prior data were used to choose our sample sizes. When Student t tests were used, we reported the P values.
Filtering the Data Set and Determination of Qthr and Calibration Constants C1, C2
A total of 46 data points from 39 individuals for studies N and H are shown in Figure 5A and listed in Supplemental Table 1 (Supplemental Digital Content, http://links.lww.com/AA/B428). As expected, similar to calibration curves for conventional pulse oximetry, a negative correlation between median RR and median Spo2,ref is seen. Some outlier data points are from signals similar to those in Figure 4B or C, that are obviously corrupted. As expected, similar to calibration curves for conventional pulse oximetry, a negative correlation between median RR and median Spo2,ref. is seen. Some outlier data points are from signals similar to those in Figure 4B or C that are obviously corrupted. As mentioned, we wished to discard such data in an objective manner. We defined an objective quality metric Q and indeed, the signals in Figure 4B and C, had low Q values. However, the question remains which threshold value Qthr to use for this metric to discard data for which the signal corruption is less obvious. In the following paragraph, we describe the process of determining Qthr in an objective manner.
In Figure 5B, we used Qthr as a parameter to discard data with Q is less than Qthr and plot the coefficient of determination (R2) for regression on the remaining data. Three domains can be identified: insufficient filtering, appropriate filtering, and too aggressive filtering. At Qthr <1.4, recordings with corrupted data had a large impact on the regression; only moderate R2 values were found. For 1.4 < Qthr <1.9, R2 was high but more importantly, regression lines were very similar to each other (Figure 5A). For Qthr >1.9, both the R2 values and regression slopes were unstable. Here, the filtering was too aggressive and interindividual variations combined with poor statistics (too few data points) dominated the regression slope. We chose Qthr = 1.4 as the filtering value to obtain a clean set of data for the calibration although any value between 1.4 and 1.9 gave essentially equal calibration results. In other words, the 17 cleanest recordings gave the same regression as the 31 cleanest recordings and those in between (14 regression lines shown in Figure 5A), indicating that the data set was sufficiently large to determine a population calibration.
The regression line for Qthr = 1.4 defined the calibration curve with constants C1 =118.0 and C2 =45.9 (Figure 6). With calibration defined, the next step was to analyze the A*rms as this determined an estimate of the system accuracy. After linear regression on 31 data points, the point estimate of the A*rms was 1.15%. We applied a 1-sided, 99% CI based on the χ2 distribution with 29 degrees of freedom for the root mean square error after a linear regression of our data. This gave an upper limit of 1.65% for A*rms.
Perturbation of Physiological Conditions and Validation of Calibration Under Normoxic Conditions
One source of feedback on the effect of cooling was obtained with reflectance spectra on the forehead. Typically, the reflectance spectra at low temperatures were slightly higher than that at room temperature for wavelengths between 450 and 600 nm that was likely due to a lower blood volume caused by vasoconstriction. Figure 7A shows reflectance spectra for room and cold temperatures for one individual, as well as the average difference for 5 individuals. The difference just reached statistical significance (at 99% CI) for a few yellow and green wavelengths. An increased reflectance in this wavelength region only was consistent with a reduced blood volume at skin depths of about 0.1 to 0.3 mm (Figure 7B).22 These skin layers were dominant in camera-based PPG measurements (Figure 1B). While barely significant, we considered this a first sign of successful perturbation relevant for investigating calibration of camera-based pulse oximetry.
A stronger manifestation of physiological perturbation was the dependency of PPG signal strength on temperature, for both contact and contactless PPG (Figure 7C). Pulsatile strengths varied considerably at any temperature, which was due to a natural variation among individuals (Supplemental Digital Content, Supplemental Table 3, http://links.lww.com/AA/B428) and the fact that temperature control was not perfect. A strong wind from the cooling ventilators caused us to only record when they were off. Thus, actual temperatures increased quite rapidly during recordings and indicated temperatures per data point were only averages. Nevertheless, the impact of ambient temperature on pulsatility in the finger was significant. A paired, 1-sided t test analysis of the changes of pulsatile strengths with temperature shows that pulsatility at room temperature was larger than that at medium temperatures (P = 0.0096) and similarly for medium and cold temperatures (P = 0.006) for the finger probe. For the forehead, this was similar for room and medium temperature (P = 0.002). For medium and cold temperatures, the difference did not reach a significant level (P = 0.034); here, only 3 data were available with Q is greater than Qthr. Overall, the amplitude reduction at the finger was stronger than on the forehead by approximate factors of 30 and 4, respectively, for the temperature range of 23 to 7°C. More interestingly, however, we observed that the regression lines for red and IR were more or less parallel, which illustrates that the RR, critical for Spo2 calibratability, did not appear to be negatively affected. Please note that the wavelengths used in contact probes were different than those used for the camera, which explains the different red/IR ratios. To investigate calibration robustness with temperature, we did not further analyze the red and IR regression lines because this would have implied that we assumed the Spo2 did not change with temperature. In fact, on average, Spo2,ref increased slightly by 0.21% (±0.08%, 99% CI) per degree Celsius cooling. Instead, we computed Spo2,cam values from the RR and the earlier established calibration constants. The results are shown in Figure 8 combined with the main results of studies N and H to allow an easy visual comparison. The data used for calibration are shown along with the computed upper limit for A*rms (99% CI), indicated as dashed lines. The ISO requirement of Arms <4% is also indicated for reference.
It is encouraging that with a different experimental setting, the Spo2,cam values from study C were estimated quite well with the calibration derived from studies N and H. Even though study C explored the impact of ambient temperature on calibratibility rather than validation of the calibration, the data for 2 individuals (40 and 41) that were not in the calibration studies at room temperature can be considered a first validation of the calibration at normoxic conditions. While the mean difference values for room, medium, and cold temperature were small (−0.7%, −0.5%, and −0.7%), the 99% CI intervals were relatively large (±0.7%, ±2.1%, and ±2.4%, respectively), expressing the poor statistics in study C. An indication of whether calibration is robust for cold perturbation is to analyze the pairwise changes in Spo2 discrepancies (Spo2,cam – Spo2,ref) per individual for a change in temperature because such an analysis discards any potential systematic errors resulting from the different experimental setting. With Q is greater than Qthr, we had 5 Spo2,cam – Spo2,ref values at room and medium temperature but only 3 at cold temperature, which gave 5 pairs for room-medium and 3 pairs for medium-cold. An additional comparison was made by comparing the values for room and cold temperature. The average changes in discrepancies for these 3 temperature steps were small: 0.3%, 0.5%, and −0.1% for room-medium, medium-cold, and room-cold, respectively. However, the 99% CI were relatively large: ±2.0%, ±0.7%, and ±1.3%, respectively.
The presented data provide strong evidence that a single calibration curve for a population of healthy adult individuals can be used to estimate Spo2 contactlessly with an acceptable accuracy of A*rms <1.65% (upper 99%, one-sided confidence limit). While noting that we discarded short-term errors, we consider this accuracy acceptable in view of typical Arms values of 2% or 3% for commercial pulse oximeters and the maximum of 4% as specified by the ISO standard (80601-2-61, 2011). In our view, this finding was not trivial because the contactless nature of camera-based pulse oximetry implies interrogation of much shallower skin layers than conventional pulse oximetry.
The calibration was performed on the basis of data for 26 individuals from a data pool of 39. Using an objective measure, the data for 15 individuals were discarded because subject motion and/or low pulsatile strengths caused the signals to be unreliable for accurate assessment of pulsatile strength. Overall pulsatile strength varied significantly among individuals: the lowest and highest median IR pulsatile strengths in the data with Q > 1.4 were 0.9 × 10−3 and 4.6 × 10−3, respectively. The discarded data tended to have lower average pulsatility than the nondiscarded data (IR pulsatility: 0.9 × 10−3 and 1.2 × 10−3, respectively). Even though we found that 14 regression lines were stable for 1.4 < Q < 1.9, we wanted to verify that the corresponding measured pulsatile strengths were clean, that is, the signal was due to true PPG and not due to motion or noise (camera or lamp). In our approach of assessing pulsatile strength, noise artificially increased pulsatile strength in both channels (red and IR) to the same extent, pushing the RR value closer toward one. If this were the case, the discrepancy (Spo2,cam − Spo2,ref) would then tend to become negative for traces with relatively low pulsatile strength. Analysis of our data (Supplemental Digital Content, Supplemental Table 1, http://links.lww.com/AA/B428) showed that (Spo2,cam − Spo2,ref) in fact slightly trended in the opposite direction (not significant) illustrating that the measured pulsatilities were caused by true PPG (skin color changes) and did not have a significant noise component. Apparently, the approach of discarding unreliable data based on Q provided satisfactory filtering of the data. Moreover, in study C, where PPG amplitudes were reduced significantly, we measured traces with Q >1.4 at a pulsatile strength as low as 0.7 × 10−3.
PPG pulse amplitude on the forehead decreased to a much smaller extent than at the finger. This was qualitatively consistent with observations of Bebout and Mannheimer9 who also compared pulsatile strengths on the forehead with the finger. It should be noted, however, that the contact forehead sensor interrogated deeper skin layers than the camera (Figure 1 and comments by Mannheimer17 on referring to such probes as “surface” versus “reflective” probes). Other differences in the present study were that the temperature reduction was slightly larger and acclimatization was shorter.
Unfortunately, the discarded group included 5 of 7 people with dark skin, leaving only 2 remaining subjects in the calibration set (subjects 7 and 29, Supplemental Digital Content, Supplemental Table 3, http://links.lww.com/AA/B428). Although not enough data were available for statistical proof, the 2 data points had relatively small errors (0.5% and 1.5%), suggesting that there was not necessarily relevant bias due to melanin concentration. While melanin reduces the signal to noise ratio, we did not see evidence, experimental nor theoretical, of a potential calibration problem in contactless pulse oximetry caused by melanin.
In this study, we focused on fundamental calibratability based on median values of traces, discarding short-term errors. However, to obtain an impression of general accuracy, we considered all recordings of all individuals and considered segments of 10 seconds. In Figure 9, each point represents the average value for a 10-second segment. We filtered with Q >1.4, per segment, resulting in 2500 segments (59% of all segments). Please note that the data in Figure 9 now include 6 individuals with skin fototype IV or higher (indicated by asterisks in the legend). For data that were in the calibration set, “+” symbols are used, while for the additional data we use “−.” Data for dark-skinned individuals are emphasized by using larger symbols. Some data points feature a relatively large error. It should be noted that the Qthr of 1.4 was derived for traces and is likely not strict enough for the 10-second shorter segments. Also, the data that were not used in the calibration stem from traces with lower Q values; thus, it is not surprising that these data feature larger errors than the segments stemming from the traces with higher Q values. We include this figure to give the reader an impression of data that were discarded in the calibration. Also, it serves to remind that the calibration error A*rms is an underestimate because it discards short-term errors. While noting that some individuals contribute much more than others, the overall A**rms for Figure 9 is 2.54%. Short-term errors can be mitigated with various methods (eg, filtering and error concealment) and of course algorithms that address motion, unlike the approach used in this study. Many factors therefore will change (reduce) the eventual Arms. The value of 2.54% is therefore merely illustrative which is why we did not compute a confidence interval. Nevertheless, we believe that the finding of an A**rms of 2.54%, together with the A*rms <1.65% (upper 99% 1-sided confidence limit) for long-term calibratibility, provides strong evidence for the fundamental feasibility of contactless pulse oximetry.
Calibration of contactless, camera-based pulse oximetry was performed by robust linear regression on 31 data points measured on a population of 26 healthy individuals under normoxic and hypoxic conditions (Spo2 83% – 100%). Discarding short-term errors, an accuracy of A*rms <1.65% (99% one-sided, upper confidence limit) was found which compared well with Arms values for conventional pulse oximeters (typical values are 2% – 3%) and the maximum (4%) allowed by ISO 80601-2-61, 2011.
By exposing subjects to temperature changes from room temperature down to cold (about 5°C), discrepancies between Spo2,ref and Spo2,cam were 0.3%, 0.5%, and −0.1% for room-medium, medium-cold, and room-cold, respectively. With 99% CI intervals of ±2.0%, ±0.7%, and ±1.3%, respectively, these accuracies do not necessarily violate the ISO standard. Although further research on more than just 5 individuals is needed to narrow the intervals, these results are encouraging.
Challenges such as subject motion and low pulsatile strength have to be addressed to make this new measurement practical and successful.
Name: Wim Verkruysse, PhD.
Contribution: This author was involved in study design, conduct of the study, and manuscript preparation.
Name: Marek Bartula, MSc.
Contribution: This author was involved in study design and manuscript preparation.
Name: Erik Bresch, PhD.
Contribution: This author was involved in conduct of the study, data analysis, and manuscript preparation.
Name: Mukul Rocque, MSc.
Contribution: This author was involved in conduct of the study, data analysis, and manuscript preparation.
Name: Mohammed Meftah, BSc.
Contribution: This author was involved in study design, conduct of the study, and manuscript preparation.
Name: Ihor Kirenko, PhD.
Contribution: This author was involved in conduct of the study and critical reading.
This manuscript was handled by: Maxime Cannesson, MD, PhD.
We greatly appreciate the invaluable suggestions from Siegfried Kaestle, Andreas Schlack, Alexander Dubielczyk, and Rolf Neumann from Philips Medizin-Systeme, Böblingen, Germany. Furthermore, we also thank our software expert Patriek Bruins from Philips Innovation Services, Eindhoven, The Netherlands.
1. Aoyagi T, Kishi M, Yamaguchi K, Watanabe S. Improvement of the earpiece oximeter. Abstracts of the 13th annual meeting of the Japanese Society for Medical Electronics and Biological Engineering. 1974:Osaka; 9091.
2. Severinghaus JW. Takuo Aoyagi: discovery of pulse oximetry. Anesth Analg 2007;105:S14.
3. Hertzman A, Spielman C. Observations on the finger volume pulse recorded photoelectrically. American Journal of Physiology 1937;119:334335.
4. Wieringa FP, Mastik F, van der Steen AF. Contactless multiple wavelength photoplethysmographic imaging: a first step toward “SpO2 camera” technology. Ann Biomed Eng 2005;33:103441.
5. Humphreys K, Ward T, Markham C. Noncontact simultaneous dual wavelength photoplethysmography: a further step toward noncontact pulse oximetry. Rev Sci Instrum 2007;78:0443046.
6. Huelsbusch M, Blazek V. Clough AV, Chen CT. Contactless mapping of rhythmical phenomena in tissue perfusion using PPGI. SPIE Proceedings of the Medical Imaging. 2002:San Diego, CA: Physiology and Function from Multidimensional Images, 110117.
7. Takano C, Ohta Y. Heart rate measurement based on a time-lapse image. Med Eng Phys 2007;29:8537.
8. Verkruysse W, Svaasand LO, Nelson JS. Remote plethysmographic imaging using ambient light. Opt Express 2008;16:2143445.
9. Bebout DE, Mannheimer PD. Effects of cold-induced peripheral vasoconstriction on pulse amplitude at various pulse oximeter sensor sites. Anesthesiology 2002;96:A558.
10. Cooke J, Scharf J. Improving pulse oximeter performance. Anesthesiology 2002; 96:A593.
11. Sun Y, Thakor N. Photoplethysmography Revisited: From Contact to Noncontact, From Point to Imaging. IEEE Trans Biomed Eng 2016;63:46377.
12. Kong L, Zhao Y, Dong L, Jian Y, Jin X, Li B, Feng Y, Liu M, Liu X, Wu H. Non-contact detection of oxygen saturation based on visible light imaging device using ambient light. Opt Express 2013;21:1746471.
13. Bal U. Non-contact estimation of heart rate and oxygen saturation using ambient light. Biomed Opt Express 2015;6:8697.
14. Shao D, Liu C, Tsow F, Yang Y, Du Z, Iriya R, Yu H, Tao N. Noncontact Monitoring of Blood Oxygen Saturation Using Camera and Dual-Wavelength Imaging System. IEEE Trans Biomed Eng 2015; Epub ahead of print.
15. Corral F, Paez G, Strojnik M. A photoplethysmographic imaging system with supplementary capabilities. Optica Applicata 2014;44:191204.
16. Guazzi AR, Villarroel M, Jorge J, Daly J, Frise MC, Robbins PA, Tarassenko L. Non-contact measurement of oxygen saturation with an RGB camera. Biomed Opt Express 2015;6:332038.
17. Mannheimer PD. The light-tissue interaction of pulse oximetry. Anesth Analg 2007;105:S107.
18. Farrell TJ, Patterson MS, Wilson B. A diffusion theory model of spatially resolved, steady-state diffuse reflectance for the noninvasive determination of tissue optical properties in vivo. Med Phys 1992;19:87988.
19. Fine I. Genina EA, Derbov VL, Meglinski I, Tuchin VV. The optical origin of the PPG signal. SPIE Proceedings of the Saratov Fall Meeting. 2013:Saratov: Optical Technologies in Biophysics and Medicine XV; and Laser Physics and Photonics XV, 9031031.
20. Shvartsman LD, Fine I. Optical transmission of blood: effect of erythrocyte aggregation. IEEE Trans Biomed Eng 2003;50:102633.
21. Kamshilin AA, Nippolainen E, Sidorov IS, Vasilev PV, Erofeev NP, Podolian NP, Romashko RV. A new look at the essence of the imaging photoplethysmography. Sci Rep 2015;5:10494.
22. Verkruysse W, Lucassen GW, van Gemert MJ. Simulation of color of port wine stain skin and its dependence on skin variables. Lasers Surg Med 1999;25:1319.
Supplemental Digital Content
Copyright © 2016 International Anesthesia Research Society.