For every MAP-RN value documented for a patient, a temporally matched MAP-AUTO was sought. MAP-AUTO was computed from the continuous ABP signal sourced from the same Philips CMS bedside patient monitor. A MAP-AUTO value was therefore available for nearly any time, except for relatively uncommon episodes without at least 10 secs of continuous, reliable ABP waveform data within the preceding 6 mins, in which case the unmatched MAP-RN value was excluded from further analysis.
Computation of MAP-AUTO was comprised of two processing steps. First, unreliable ABP data with low SQIs (<70%) were identified and excluded. Second, a representative value was extracted from the remaining reliable ABP waveform data. For the first step, unreliable ABP data were identified using an SQI algorithm that has been previously described in detail (13, 14), which combines functionality of two antecedent SQI algorithms (11, 12). Any segments of ABP waveform data with an SQI rating less than a threshold, i.e., SQI ≤ threshold, were excluded from a given analysis. All analyses were repeated using the full spectrum of integer SQI cut-off thresholds, from 0 to 100.
The SQI algorithm computes variables related to the shape of the waveforms and, on the basis of whether these variables are within normative ranges and are similar to the prior beats' values, outputs a rating of the reliability of the ABP waveform. SQI is expressed as an integer between 0 (poorest quality data) and 100 (highest quality data) for consecutive nonoverlapping 10-sec windows of ABP. The waveform features that are considered by the SQI algorithm include, for each pulse, the systolic pressure, diastolic pressure, mean pressure, pulse pressure (the difference between the systolic blood pressure and the diastolic blood pressure in a beat), pulse-to-pulse interval, maximum and minimum slope during the up-stroke, duration of the up-stroke, duration of the crest of the beat, and average of all negative slopes (a metric of spiky, nonphysiologic noise in the waveform).
After exclusion of unreliable ABP data (i.e., SQI < threshold), for a given time t in a patient's chronologic record, MAP-AUTO and SBP-AUTO were computed using the median value from the remaining reliable ABP waveform in the most recent 6 mins (a series of permutations on this primary methodology were computed; see the Sensitivity Analysis section below for details). As noted above, if at time t there were <10 secs of continuous reliable ABP waveform data within the past 6 mins, we registered MAP-AUTO and SBP-AUTO as unavailable.
In this investigation, the automated archive was created retrospectively by processing the ABP waveform data using the investigational algorithm, but there is no technical reason why this could not function in real time (i.e., the processing is unsupervised by humans, and it is not computationally intensive). Therefore, for this study, we refer to the set of MAP-AUTO and SBP-AUTO data as an archive.
Our study outcome was “consensus hypotension” within the subsequent 4 hrs. Consensus hypotension was defined by a mean arterial pressure (MAP) of ≤70 mm Hg, documented at the same time by both MAP-RN and MAP-AUTO. By using a definition that required the simultaneous agreement of both MAP-RN and MAP-AUTO, we limited bias in our results. Each MAP measurement was evaluated as follows (also see the examples in Figures 1 and 2).
- 1) True positive: MAP ≤70 mm Hg; within the next 4 hrs, is followed by consensus hypotension.
- 2) False positive: MAP ≤70 mm Hg; within the next 4 hrs, is not followed by consensus hypotension.
- 3) True negative: MAP >70 mm Hg; within the next 4 hrs, is not followed by consensus hypotension.
- 4) False negative: MAP >70 mm Hg; within the next 4 hrs, is followed by consensus hypotension.
The fact that the RN and the algorithm could disagree about blood pressure at a given point in time may be counterintuitive. Briefly, discrepancies occurred because the RN and algorithm used different “filters” to exclude unreliable data and then to summarize blood pressure over an observation interval, so in fact, there were many instances when there was disagreement about what blood pressure value to document. Using a fixed threshold for hypotension, i.e., MAP ≤70 mm Hg, there are four possible observation combinations from the two documentation sources. In most cases, the MAP-RN and MAP-AUTO are both documented as stable. The most likely interpretation of this combination is that the patient is in fact hemodynamically stable, but there is also the possibility that both documentation sources are errant. Table 1 details the possible clinical interpretations for each of the four paired measurement scenarios for MAP-AUTO and MAP-RN.
The key assumption in this study design is that, on average, a more valid MAP measurement will have a higher association with future hemodynamic states. The associations of MAP-AUTO and MAP-RN with future consensus hypotension were statistically compared using McNemar's test on contingency tables from matched pairs. We compared MAP-RN vs. MAP-AUTO first by using whatever SQI threshold gave equal sensitivities (but potentially unequal specificities) and then by using whatever SQI threshold gave equal specificities (but potentially unequal sensitivities).
We reanalyzed our data using several permutations of the primary methodology, investigating a different definition of hypotension, different methods of algorithmically processing the data, and different definitions of hemodynamic instability.
We explored the following alternatives to our primary methodology (summarized in Table 2).
- 1) We examined systolic blood pressure instead of MAP. For systolic blood pressure, we changed our hypotension definition to be <90 mm Hg.
- 2) We computed MAP-AUTO as the minimum value from all reliable ABP waveform data from the most recent 6 mins (instead of using the median value).
- 3) We computed MAP-AUTO as the median value from all reliable ABP waveform data from the most recent 60 mins (instead of 6 mins).
- 4) We altered the outcome to consensus hypotension or an increase of at least 100% in vasopressor infusion rate (Levophed, vasopressin, Neosynephrine, dopamine, or epinephrine).
- 5) We altered the inclusion criteria: when determining whether there was 4 hrs of antecedent “consensus stability”, MAP-AUTO was computed using a lower (SQI ≥0) and a higher (SQI ≥90) threshold.
Working from the beginning of each record in the 2,320 adult ICU visits with archived ABP waveform data, we found a total of 35,659 valid 8-hr episodes (episodes that begin with 4 hrs of consensus stability) from 757 unique ICU visits. Subject characteristics are summarized in Table 3.
The receiver operating characteristic curve for the association between any archived MAP value and subsequent hypotension is shown in Figure 3. For all blood pressure data intervals that were analyzed, the RNs documented a single blood pressure value, and their summary sensitivity and 1 − specificity are plotted. By contrast, the MAP-AUTO was adjustable depending on how the data reliability criteria were set: The SQI (from an automated algorithm) provided a rating of the reliability of the ABP waveform from 0 to 100. As the SQI cut-off approached 100, the reliability criteria grew more stringent; i.e., only the most pristine blood pressure measurements were archived. As a result, the specificity increased, the sensitivity decreased, and the positive predictive value for subsequent hypotension approached 80%. Conversely, as the SQI cut-off approached 0, the reliability criteria grew more relaxed until eventually all blood pressure measurements were used. As a result, the specificity decreased, the sensitivity increased, and the positive predictive value for subsequent hypotension eventually degraded to below 5%.
In Table 4, data from four points of the receiver operating characteristic curve in Figure 3 are tabulated as contingency tables. Specifically, we report data from the MAP-RN values, as well as four illustrative points from the MAP-AUTO curve, corresponding to SQI = 100, SQI ≥92, SQI ≥18, and SQI ≥0, which are identified in Figure 3.
The MAP-AUTO and MAP-RN were statistically compared at two points, and the p values from McNemar's test are presented in Table 5 for two contingency tables: one resulting from the different sensitivities when the specificities are matched and one resulting from the different specificities when the sensitivities are matched (the points on Figure 3 where SQI ≥18 and SQI ≥92, respectively). The improvements in sensitivity and specificity obtained by using the MAP-AUTO vs. the MAP-RN were statistically significant (p < .0001).
For our sensitivity analysis, we evaluated a different definition of hypotension, different methods of algorithmically processing the data, and different definitions of hemodynamic stability. Table 6 lists the permutations that we explored (the top row recapitulates the findings from the primary analysis). We found our results insensitive to these permutations, and in all cases, MAP-AUTO/SBP-AUTO was significantly more sensitive (at a matched level of specificity) and more specific (at a matched level of sensitivity) than MAP-RN/SBP-RN. When the window length was increased from 6 to 30 mins, the difference between the two signals decreased, resulting in the highest p value of .00042. All other p values were consistently <.0001.
For illustrative purposes, we show one receiver operating characteristic curve from the sensitivity analysis. In this permutation, we compared the association between archived mean arterial pressure data (MAP-RN or MAP-AUTO) and hemodynamic instability, defined by either consensus hypotension or the doubling of the infusion rate of any vasopressor drugs. Figure 4 shows the corresponding receiver operating characteristic curve and positive predictive value (PPV) for varying SQI thresholds. Comparing these curves to Figure 3, the MAP-AUTO performance degrades slightly, while the MAP-RN performance point is similar. All the same, as in Table 5, the MAP-AUTO values are statistically superior to their matched MAP-RN counterparts.
In this analysis, RN documentation of blood pressure data in stable ICU patients does not improve the clinical validity of the ICU medical record, as compared with an automated archiving methodology. We found a small but highly significant advantage to the automated methodology, a finding that persisted throughout a set of sensitivity analyses, suggesting that this is not idiosyncratic to one method of analysis but is probably generalizable to a spectrum of different definitions of clinical validity and different methods of automated archiving. This has notable implications for present-day hospital operations, as well as for technological capabilities that might develop in the future.
In today's hospitals, substantial time and effort are spent in clinical documentation (17). If complete human attentiveness to clinical variables was possible, it would presumably be impossible to beat the clinical team in terms of selecting representative data to aid a clinical evaluation. The findings here suggest that, for MAP and systolic blood pressure at least, such clinician vital sign documentation offers no archival benefit over the described automated methodology, perhaps because it is impossible to maintain perfect focus given diverse work duties and the repetitiveness of some tasks. It is possible then that some of this time and effort of documentation are not strictly necessary, compared to an automated alternative (note that documentation may have other benefits, such as creating awareness in clinicians, which is addressed in the Limitations section). Furthermore, it is standard practice for clinicians on rounds to review documented vital signs—to assess the course of a disease process, the efficacy of a therapy, the development of a new pathology, etc.—and our findings suggest that there may be a more valid alternative to reviewing an RN-documented archive of blood pressure data, and perhaps automatic archiving agents may prove valid for other vital signs, such as respiratory rate (18), urine output, etc., although this is a matter of speculation. Using MAP-AUTO (or SBP-AUTO) offers one additional advantage vs. MAP-RN (SBP-RN). Specifically, the SQI can be adjusted to alter operating characteristics (sensitivity and specificity) to best suit clinical needs. MAP-AUTO has a PPV that is similar to that of the MAP-RN when the SQI threshold is set extremely low (e.g., SQI >5). At the same time, we found that the PPV of a highly reliable (e.g., SQI >90) MAP-AUTO measurement approaches 80%. Operationally it would be valuable to communicate to the caregivers this extra information, that in certain circumstances there is a possibility of future hypotension (e.g., moderate SQI hypotension), whereas in other circumstances there is a significant probability of experiencing future hypotension (e.g., high SQI hypotension). The motivation, of course, would be to communicate information so that caregivers can respond appropriately, either mitigating or preventing the subsequent hypotension. The MAP-RN, by contrast, is a fixed value, with a PPV of <50%. There is no easy way to modify the sensitivity or specificity of MAP-RN or to extract information beyond what was documented.
Our findings also inform technological capabilities that may be developed in the future. First, there is substantial interest in early warning scores, in which continual vital signs and other data are monitored, and when abnormal conditions are detected, a clinician response team is mobilized to respond to the incipient deterioration of a patient (19, 20). It is possible that such early warning score functionality could be automated, and our findings offer preliminary evidence that human oversight, e.g., ensuring that spurious blood pressure data are excluded, may not be necessary or even desirable. Automatically archived blood pressure data, using an adjustable SQI, may be the best source of input data for such decision-support algorithms in the ICU. It is possible that these results also pertain to non-ICU hospital wards, where the benefits of rapid response teams (which are typically activated on the basis of abnormal vital sign data) have been quite inconsistent in published reports (20). We found that human-documented blood pressure data are inferior, and it is likely that some RNs are better than others in terms of charting clinically valid blood pressure data. We speculate that one reason rapid response team programs have had varied success is because of inconsistent vital sign collection practices by different nursing staffs. An automated archive may offer a more valid, continual, and consistent method of data collection for early warning score applications.
Finally, these findings suggest another interesting hypothesis to be developed and tested in future work: the “secretarial” aspects of documentation, i.e., recording a tedious list of variables and findings for future review, may be distracting from the real-time benefits of documentation, namely, obliging caregivers to reexamine their patients on a regular basis. It would seem ideal if, in the future, clinical processes emphasized the continual reexamination of patients by the clinical staff (rather than the secretarial tasks) employing novel clinical information systems to reduce the effort of data archiving and to automatically highlight interesting patterns/changes in the clinical variables, ensuring that such patterns were not accidentally overlooked. In terms of data display, our findings suggest it might be reasonable for clinicians on rounds to review automatically archived blood pressure records, with the associated blood pressure reliability measures indicated, so the clinicians can assess themselves which data are most meaningful. The current study is important because it suggests that there is real room for improvement of today's ICU documentation, with future work justified in optimizing computer–clinician interactions to yield the best patient care.
Overall, our results provide very preliminary evidence that clinician oversight is not strictly necessary for valid collection of physiologic data. Therefore, extremely large records of physiologic data may not be limited by the requirement for clinician oversight, i.e., archiving continuous physiologic data for days at a time during an inpatient admission (although whether such a practice would offer any clinical benefit, and justify the substantial data storage requirement, remains a completely open question).
There are several important limitations to consider. Our findings relate only to blood pressure data for initially stable ICU patients within a single institution. The findings may not apply to other vital signs, to consistently unstable ICU patients, to non-ICU patients, or to ICU patients in other institutions. However, there is gathering evidence that automated methods for excluding unreliable vital sign measurements and maximizing the clinical validity of the data may prove as good as or better than clinicians (18, 21, 22).
Our findings do not apply to patients with ongoing hemodynamic instability; we only examined records with periods of antecedent blood pressure stability. For actively unstable patients, there may be reasons why clinician-documented data would be more valid, e.g., because the RN attention is more focused and reliable or because the pathophysiologic condition is too complex for a simple computer algorithm, etc. However, the consequence of this reasoning is that our results may be even more applicable outside the ICU where there is a lower staff to patient ratio and hence a lower probability of identifying infrequent events such as hypotension.
It could be possible that our methodology is unfair to the RN: after documenting hypotension, the RN might therapeutically intervene and so avoid future hypotension. Then, even though the documentation of hypotension was valid, our methodology would treat this scenario as a false-positive for MAP-RN because there was no future hypotension. As a result, we may be underestimating test characteristics, such as PPV, for MAP-RN. However, these occurrences are unlikely to alter our major findings.
- 1) Overall, there were substantial differences between MAP-RN and MAP-AUTO (e.g., PPV as plotted in Figure 3).
- 2) In the sensitivity analysis, when we modified the outcome definition to include therapeutic interventions to hypotension—consensus hypotension or increase in vasopressor infusion rate—there was only marginal improvement in the MAP-RN PPV (Fig. 4), while MAP-AUTO remained significantly superior in terms of sensitivity and specificity, without any change in p < .001 (Table 6).
Furthermore, note that such occurrences would never reduce the PPV of MAP-AUTO, nor would they alter the finding that adjusting the SQI cut-off yields a wide range of PPVs for MAP-AUTO.
As a final study limitation, we note that vital sign documentation is not merely for archival purposes. The process of reviewing and documenting vital signs may alert the clinician to a troublesome condition or to malfunctioning equipment. While it may be that clinician's do not need to formally document blood pressure data, it is likely that an alternative, perhaps more time-efficient, mechanism, such as an interactive graphical user interface, would be necessary to ensure that the clinician is well aware of the current blood pressure of the patient. It is possible that some day fully automated care of ICU patients may be possible (perhaps using some of the automated techniques employed in this study), but our findings are limited to only the archival value of MAP-AUTO vs. MAP-RN data. Real clinicians do not have the luxury of looking back many minutes in the past for the last reliable blood pressure measurement when making real-time management decisions in critically ill patients.
In an initially stable ICU patient population, clinician-documented blood pressure values were inferior to an automated archiving agent with signal quality filtering as early indicators of hemodynamic instability. These findings suggest that human oversight may not be necessary for creating a valid archive of vital sign data within an electronic medical record. Furthermore, if clinician documentation is an unreliable early indicator of hemodynamic instability, then an automated archive may be a preferable source of data for early warning systems that identify patients at risk of decompensation.
1. Kaiser W, Findeis M: Artifact processing during exercise testing. J Electrocardiol
2. Tsien CL, Fackler JC: Poor prognosis for existing monitors in the intensive care
unit. Crit Care Med
3. Edmonds ZV, Mower WR, Lovato LM, et al: The reliability of vital sign measurements. Ann Emerg Med
4. Lovett PB, Buchwald JM, Stürmann K, et al: The vexatious vital: Neither clinical measurements by nurses nor an electronic monitor provides accurate measurements of respiratory rate in triage. Ann Emerg Med
5. Jones DW, Appel LJ, Sheps SG, et al: Measuring blood pressure accurately: new and persistent challenges. JAMA
6. Friesdorf W, Konichezky S, Gross-Alltag F, et al: Data quality of bedside monitoring in an intensive care
unit. Int J Clin Monit Comput
7. Kacmarek RM: Alarms. In:
Principles and Practice of Intensive Care
Monitoring. Tobin MJ (Ed). New York, McGraw-Hill, 1998, pp 133–140
8. Goldman JM, Schrenker RA, Jackson JL, et al: Plug-and-play in the operating room of the future. Biomed Instrum Technol
9. Amoore JN: A simulation study of the consistency of oscillometric blood pressure measurements with and without artefacts. Blood Press Monit
10. Portet F, Hernández AI, Carrault G: Evaluation of real-time QRS detection algorithms in variable contexts. Med Biol Eng Comput
11. Li Q, Mark RG, Clifford GD: Robust heart rate estimation from multiple asynchronous noisy sources using signal quality indices and a Kalman filter. Physiol Meas
12. Li Q, Mark RG, Clifford GD: Artificial arterial blood pressure artifact models and an evaluation of a robust blood pressure and heart rate estimator. Biomed Eng Online
13. Sun J, Reisner AT, Mark RG: A signal abnormality index for arterial blood pressure waveforms. Comput Cardiol
14. Zong W, Moody GB, Mark RG: Reduction of false arterial blood pressure alarms using signal quality assessment and relationships between the electrocardiogram and arterial blood pressure. Med Biol Eng Comput
15. Hug C, Clifford GD: An analysis of the errors in recorded heart rate and blood pressure in the ICU using a complex set of signal quality metrics. Comput Cardiol
16. Clifford GD, Scott DJ, Villarroel M: User Guide and Documentation for the MIMIC-II Database. MIMIC-II Database Version 2, Release 1. Cambridge, MA, Massachusetts Institute of Technology, 2009
17. Poissant L, Pereira J, Tamblyn R, et al: The impact of electronic health records on time efficiency of physicians and nurses: A systematic review. J Am Med Inform Assoc
18. Chen L, Reisner AT, Gribok A, et al: Can we improve the clinical utility of respiratory rate as a monitored vital sign? Shock
19. McGaughey J, Alderdice F, Fowler R, et al: Outreach and early warning systems (EWS) for the prevention of intensive care
admission and death of critically ill adult patients on general hospital wards. Cochrane Database Syst Rev
20. Winters BD, Pham JC, Hunt EA, et al: Rapid response systems: A systematic review. Crit Care Med
21. Aboukhalil A, Nielsen L, Saeed M, et al: Reducing false alarm rates for critical arrhythmias using the arterial blood pressure waveform. J Biomed Inform
22. Reisner AT, Chen L, McKenna TM, et al: Automatically-computed prehospital severity scores are equivalent to scores based on medic documentation. J Trauma
Keywords:© 2011 by the Society of Critical Care Medicine and Lippincott Williams & Wilkins
hypotension; intensive care; physiologic monitoring; electronic medical record; digital signal processing; automatic data processing