Secondary Logo

Journal Logo

Assessing Point-of-Care Hemoglobin Measurement: Be Careful We Don't Bias with Bias

Morey, Timothy E. MD; Gravenstein, Nikolaus MD; Rice, Mark J. MD

doi: 10.1213/ANE.0b013e31822906b2
Editorials: Editorials

From the Department of Anesthesiology, University of Florida College of Medicine, Gainesville, Florida.

Supported by institutional funds only.

The authors declare no conflicts of interest.

Reprints will not be available from the authors.

Address correspondence to Timothy E. Morey, MD, Department of Anesthesiology, University of Florida College of Medicine, PO Box 100254, Gainesville, FL 32610-0254. Address e-mail to

Accepted June 6, 2011

A growing number of investigators are publishing research reports detailing the performance of point-of-care total hemoglobin determination in comparison with a time-proven “gold-standard” method such as calibrated, laboratory CO-Oximetry instruments that can be found in most hospital “stat” laboratories.15 In this issue, Berkow et al.1 publish findings from their well-conducted study in which a Pulse CO-Oximetry system was used to measure hemoglobin concentrations and compared with laboratory CO-Oximetry in patients undergoing major spinal surgery with high risk for blood loss. From these data and following Bland-Altman analysis, Berkow et al.1 conclude, “… Pulse CO-Oximetry demonstrated clinically acceptable accuracy of hemoglobin measurement.…” In contrast, our own view is that the analytical methods and selection of patient population (and their interactions) forestall such conclusions. More specifically, we propose that the Bland-Altman analysis and the hemoglobin concentration range studied make data interpretation problematic, especially if one extends the conclusions to a patient population not (yet) studied.

Back to Top | Article Outline


A significant finding made by Berkow et al.1 regarding the accuracy of the Pulse CO-Oximetry instrument was that a minimal bias (−0.1 g/dL) was calculated from the nonexcluded data, contributing to the conclusion that the device had “… clinically acceptable accuracy .… ” Although high bias leads to correctly inferring inaccuracy, low bias means little in analyzing this type of instrument. To illustrate this point, we offer the following examples: Figure 1A is a Bland-Altman plot of data with high accuracy and low bias. The values for the difference between the reference and the tested instrument in this figure approximate 0. In Figure 1B, the data are obviously not accurate because there are large differences between the reference and tested instruments. However, because the spread above and below 0 is approximately the same, the bias is close to 0, just as in Figure 1A. Figure 1C shows a Bland-Altman plot that has both inaccurate data and high bias. The data points are clustered to 1 side of the 0 line. Instances represented by Figure 1C occur frequently in the medical literature. For example, this type of error is manifest in some point-of-care glucose meters in the presence of anemia.6 Therefore, minimal bias as reported by Berkow et al.1 does not necessarily infer an accurate instrument. It might actually only mean that the data are approximately equally centered above and below the reference measurements. Although the devices for Figure 1, A and B, have a bias close to 0, they are not equally accurate. Indeed, the dispersion of the points around the 0 line (i.e., limits of agreement) conveys an enormous bearing on the interpretation of “accuracy.”

Figure 1

Figure 1

In addition, the limits of agreement also drive consideration of whether the new device may substitute for the reference instrument and may be even more important than bias. From the inventors of the Bland-Altman technique: “How far apart measurements can be without leading to problems … is a question of clinical judgement [sic].”7 The problem with many device studies is that the acceptable limits of agreement are not often decided beforehand. This makes the analysis vulnerable to an erroneous critical analysis and decisions about the meaning of the limits of agreement in a clinical context. In contrast, most treatment studies state a priori not only the primary end point under consideration, but also the magnitude of the difference that is clinically significant to calculate the subject sample size required to meet type I and II error parameters. Bland and Altman note that “the decision about what is acceptable agreement is a clinical one; statistics alone cannot answer the question .… ”7 Even Berkow et al.'s properly calculated and presented bias and limits of agreement alone do not and cannot provide a decision point without clinical context.

Back to Top | Article Outline


Beyond these considerations, nonstatistical factors such as selection of the subject population may interfere with interpretation. For most patients, the critical range wherein the measured hemoglobin becomes essential is 6 to 10 g/dL.8 Patients with hemoglobin concentrations exceeding 10 g/dL will rarely receive red cell transfusion whereas severely anemic patients (<6 g/dL) likely will. When studying hemoglobin measurement devices, investigators often record a large proportion of their observations at concentrations >10 g/dL, values of minimal interest to anesthesiologists and that interfere with effective data interpretation. For example, Berkow et al.1 recorded approximately 16 of 130 nonexcluded data pairs (12%) with hemoglobin concentrations <10 g/dL. Similarly, Macknet et al.2 acquired only 79 of 335 data pairs (24%) of hemoglobin concentrations <10 g/dL, whereas Miller et al.3 obtained a larger fraction of points (approximately 38 of 78 pairs; about 49%) at hemoglobin concentrations <10 g/dL. The discerning reader will naturally wonder how such an overwhelming preponderance of points for hemoglobin >10 g/dL might affect data analysis and interpretation. Inclusion of a super-majority of hemoglobin values >10 g/dL biases the Bland-Altman bias. In other words, the Bland-Altman analysis is an average of the differences between the new and reference method. More importantly, calculation of the standard deviation of these differences (multiplied by 1.96) leads to the limits of agreement. If 90% of the data originate at hemoglobin concentrations >10 g/dL, then the average and standard deviation is 90% weighted toward these values. The device could perform well in the upper range, but less accurately for values in the clinically more important 6 to 10 g/dL range. Even a flawlessly performed statistical analysis will have obscured this fact because the calculation of bias and limits of agreement will be driven by the observations in the hemoglobin range of minimal clinical interest, i.e., >10 g/dL. Indeed, this phenomenon has been noted for the i-STAT® device by Steinfelder-Visscher et al.,9 who noted a significant negative bias of 2.2% for hematocrits when the hematocrit was <25%, but hardly any bias for observations >25%. They opined, “… the discrepancy in hematocrit bias shows that accuracy established in one patient population cannot be automatically extrapolated to other patient populations, thus stressing the need for separate evaluation.”9 We wholeheartedly agree.

Additionally, this problem is compounded in the presence of proportional bias, an interaction between the analysis method and hemoglobin range selection. Bias can be fixed or proportional. Proportional bias occurs when the variability is a function of the magnitude of the data (Fig. 1D), whereas fixed bias is independent of the magnitude (Fig. 1, A–C). Bland-Altman analysis is designed to measure fixed, not proportional, bias.10 Reviewing Figure 2 of Berkow et al.,1 if one were to regress the points, it appears that the slope of the linear regression line may be negative, although the authors did not test whether the slope of the line was significantly different than 0, the ideal slope and an essential assumption for Bland-Altman calculations of bias and limits of agreement.7 Therefore, a magnitude or proportional effect may be evident in these data.

What does it all mean to an anesthesiologist? Briefly, it means that the bias for patients with hemoglobin 6 to 10 g/dL may be greater (and perhaps in the opposite direction) than that calculated for all patients and that the studied Pulse CO-Oximetry system may (although not certainly) actually overestimate the true hemoglobin. In contrast, the calculated bias for all patients suggests that the Pulse CO-Oximetry system underestimates the true hemoglobin slightly. Because so few patients with hemoglobin 6 to 10 g/dL were studied, one cannot know the degree of overestimation. Can the reader envision integrating this variable fixed and proportional bias into clinical care? Moreover, as Ludbrook10 cogently writes, in the setting of coexisting fixed and proportional bias, one cannot know what fraction of the bias is fixed or proportional using the Bland-Altman technique although least products regression analysis does discriminate between these 2 types of bias. For this reason, the limits of agreement for Bland-Altman analysis are “… reliable only when there is not proportional bias.”10 What does this mean for our specialty? In addition to not knowing the bias, one cannot know the true limits of agreement. Is an individual patient's Pulse CO-Oximetry hemoglobin concentration within approximately ±2 g/dL of the true value as reported by Berkow et al.,1 or ±4 g/dL of the true value? For an anemic patient with a Pulse CO-Oximetry hemoglobin concentration of approximately 7.9 g/dL, would one be comfortable transfusing this patient given these large limits of agreement? In fact, the concurrently measured true hemoglobin was approximately 9.4 g/dL from Figure 3 of Berkow et al. (time 14:00 to 14:30).1 Similarly, Gayat et al.4 concluded for emergency department patients that Pulse CO-Oximetry would have led to incorrect transfusion decisions in 13% of all cases. These issues could be better addressed by studying patients who are in the physiological state when anesthesiologists are contemplating red cell transfusion, not in conditions that usually exist in the preoperative holding area or the operating room at the commencement of anesthesia before significant hemorrhage. Such data would provide more reliable guidance regarding the performance characteristics of point-of-care measurements.

Back to Top | Article Outline


As suggested by Berkow et al.,1 the fundamental advantage of point-of-care monitoring for hemoglobin is the seconds-to-minutes results for Pulse CO-Oximetry, electroconductivity, or photometry methods as compared with longer times for traditional laboratory instruments. This rapidity, however, should be balanced against the accuracy of the data supplied by the devices. Who cares how fast one receives information, if it is inaccurate? We need better methods to analyze the accuracy of new devices and how their results may affect our clinical decision making. Taking lessons from other specialties, we have previously suggested a modified error grid analysis and Cohen κ statistic to supplement Bland-Altman analysis.11 Similarly, Miller et al.3 and Gayat et al.4 have independently advocated for zoned error analysis. In addition, we propose that hemoglobin values entered into these statistical tests be focused on the 6 to 10 g/dL range of interest to more appropriately answer the primary reason for the existence of these devices for an anesthesiologist: “Should I give my patient blood, or not?”

Back to Top | Article Outline


Name: Timothy E. Morey, MD.

Contribution: This author helped analyze the data and write the manuscript.

Attestation: Timothy E. Morey approved the final manuscript.

Name: Nikolaus Gravenstein, MD.

Contribution: This author helped analyze the data and write the manuscript.

Attestation: Nikolaus Gravenstein approved the final manuscript.

Name: Mark J. Rice, MD.

Contribution: This author helped analyze the data and write the manuscript.

Attestation: Mark J. Rice approved the final manuscript.

This manuscript was handled by: Dwayne R. Westenskow, PhD.

Back to Top | Article Outline


1. Berkow L, Rotolo S, Mirski E. Continuous noninvasive hemoglobin monitoring during complex spine surgery. Anesth Analg 2011;113:1396–402
2. Macknet MR, Allard M, Applegate RL, Rook J. The accuracy of noninvasive and continuous total hemoglobin measurement by Pulse CO-Oximetry in human subjects undergoing hemodilution. Anesth Analg 2010;111:1424–6
3. Miller RD, Ward TA, Shiboski SC, Cohen NH. A comparison of three methods of hemoglobin monitoring in patients undergoing spine surgery. Anesth Analg 2011;112:858–63
4. Gayat E, Bodin A, Sportiello C, Boisson M, Dreyfus JF, Mathieu E, Fischler M. Performance evaluation of a noninvasive hemoglobin monitoring device. Ann Emerg Med 2011;57:330–3
5. Causey MW, Miller S, Foster A, Beekley A, Zenger D, Martin M. Validation of noninvasive hemoglobin measurements using the Masimo Radical-7 SpHb Station. Am J Surg 2011;201:590–6
6. Rice MJ, Pitkin AD, Coursin DB. Review article: glucose measurement in the operating room: more complicated than it seems. Anesth Analg 2010;110:1056–65
7. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135–60
8. American Society of Anesthesiologists Task Force on Perioperative Blood Transfusion and Adjuvant Therapies. Practice guidelines for perioperative blood transfusion and adjuvant therapies: an updated report by the American Society of Anesthesiologists Task Force on Perioperative Blood Transfusion and Adjuvant Therapies. Anesthesiology 2006;105:198–208
9. Steinfelder-Visscher J, Teerenstra S, Gunnewiek JM, Weerwind PW. Evaluation of the i-STAT point-of-care analyzer in critically ill adult patients. J Extra Corpor Technol 2008;40:57–60
10. Ludbrook J. Statistical techniques for comparing measurers and methods of measurement: a critical review. Clin Exp Pharmacol Physiol 2002;29:527–36
11. Morey TE, Gravenstein N, Rice MJ. Let's think clinically instead of mathematically about device accuracy. Anesth Analg 2011;113:89–91
© 2011 International Anesthesia Research Society