The evolution of method comparison studies has occurred through a series of fits and starts, the most notable of which occurred in the early 1980s as the shortcomings of the traditional, linear regression approach to method comparisons were described and a complementary, “agreement”-based methodology proposed.1–3 However, the evolution did stop there. As clinicians began to deemphasize absolute values and focus on trend monitoring (cynics argue that the latter is easier, advocates suggest it is more useful), additional techniques were developed, the 2 most notable of which are the 4-quadrant and polar plotting techniques developed by Perrino et al.4 and Critchley et al.5
In this issue of Anesthesia & Analgesia, Saugel et al.6 describe, step by step, how these plots are produced and, most importantly, how they differ. By “transforming” trending data from Cartesian to polar coordinate systems, Saugel et al.6 demonstrate, both visually and mathematically, how the selection of a particular analytical technique can impact the results. The fact that highly discordant measurements in which the average change is 0 are excluded from the quantitative estimate of agreement when using the polar technique, but not the 4-quadrant technique, is essential to proper interpretation of these tests.
Why does the polar plotting technique do this? The reason is because the polar plotting technique does not make any assumptions about which technique is better. Ironically, most published studies that use the polar plotting technique compare a new method of measurement with an accepted reference standard, and in these instances, an agnostic approach may not be appropriate.7–9 This question of “what is truth” may sound philosophical on the surface but in reality has significant mathematical implications. For instance, if changes in cardiac output (ΔCO) are measured by a pulmonary artery catheter and a magic 8 ball, and the pulmonary artery catheter estimates that ΔCO is −2 L/min but the magic 8 ball estimates that ΔCO is 2 L/min, the polar analysis will exclude that data point because “true” ΔCO is 0 L/min. This is, in essence, information loss. The 4-quadrant technique, by contrast, will place this point in quadrant 2, which will negatively affect the estimate of concordance.
Interestingly, the Bland-Altman technique, as it was originally described, takes the same approach to the relative value of data.1–3 In some sense, this is appropriate. As Altman pointed out, clinicians are not particularly interested in the probability that the slope of a best fit line between 2 outputs is not 0 “when the two variables are obviously associated by their very nature… What we really want to know in these studies is how well the two measures agree.”1 However, this assumes that 2 variables are actually related. A key element to the agreement strategy, which is often overlooked, is that “good” agreement does not necessarily imply any correlation if the range of data tested is small. From the viewpoint of the clinician, performance of an agreement analysis makes the a priori assumption that a correlation actually exists.
Take, for instance, 2 monitors, X and Y, which estimate cardiac output (Θ). If the slope of the best fit line relating X and Y is m, estimates of Θ can be described as
where Θ represents the true value of stroke volume and ε represents normally distributed measurement errors. Now let us assume that m = 0, that is, that there is absolutely no statistical correlation between the 2 devices (i.e., they are completely independent of one another). If X and Y are “tested” over a narrow range of values (i.e., the distribution of Θ is small [Fig. 1, upper left quadrant]), it is possible for a random number generator to produce limits of agreement approaching what would be deemed clinically “ acceptable” by Critchley and Critchley,10 who in 1999 stated that “Bias and precision statistics has now replaced correlation and regression as the accepted statistical technique of comparing two techniques measuring the same physiological variable, such as cardiac output.”
Clearly, many in the anesthesia community agree, because some investigators have stopped publishing correlation coefficients and even basic scatterplots altogether, relying exclusively on limits of agreement and/or trending analysis to compare methods of measurement,7,8,11–15 yet this was not the intent of Bland or Altman. Indeed, in their 1983 manuscript, they stated “The first step, one which should be mandatory, is to plot the data… For the purposes of comparing the methods the line of identity is much more informative, and is essential to get a correct visual assessment of the relationship.”2 Three years later, in their oft-cited Lancet paper, they reemphasized that “The first step is to examine the data… A simple plot of the results of one method against those of the other though without a regression line is a useful start.”3
Thus, when making method comparison studies, we suggest that the following steps are taken:
- The authors explicitly state whether or not 1 device is being used as a reference standard.
- The authors state, a priori, the acceptable range of clinically acceptable values for each statistical test used. For instance, the acceptable limits of agreement for blood pressure monitoring are likely to be narrower during cerebral aneurysm surgery than placement of ear tubes.
- Data are initially presented in a scatterplot to allow visual assessment of the relationship, as suggested by Bland and Altman.2,3 This will reduce the probability that the limits of agreement between measures, which have no meaningful relationship, are deemed clinically acceptable. Preiss and Fisher16 described a random permutation technique, which estimates the probability that a paired data set arose from unassociated measurements, which some investigators may want to consider using.
- The limits of agreement are calculated and plotted, as described by Bland and Altman, with allowances made for repeated measures when necessary.2,3,17,18 For repeated measures, Bland and Altman base the estimate of the limits of agreement on both within-subject variances and the variance of the differences between-subject means.17 Myles and Cui18 recommend calculating the mean of repeated measures and using a random effects model to account for the reduced variation that occurs with averaging. When the residuals are not normally distributed, a transformation may be appropriate. For instance, some data may be distributed log-normally. In these instances, Dexter et al.19 have suggested both a nonparametric (based on ranking the observations and selecting cutoff values that remain within the desired percentiles [taking into account sample size and degrees of freedom]) and a parametric approach (based on the Student t distribution) for the calculation of prediction limits, either of which is acceptable. Last, confidence intervals around these limits should be calculated and displayed graphically, ensuring that the limits are interpreted properly (i.e., that the sample size is sufficient to draw meaningful conclusions from the data).3,17
- When trending data are analyzed, they should be performed using either the 4-quadrant (when an accepted standard is being used) or a polar plotting techniques (when there is no accepted reference standard).4,5 If the polar plotting technique is used, particular attention must be paid to the exclusion zone, because the polar plotting technique may exclude important data points that are highly discordant.6
Saugel et al.6 have pointed out a subtle but extremely important and underappreciated difference in 2 frequently used statistical techniques. They have also helped clarify a concept that has clearly confused many (the original paper describing the polar plotting technique states that “agreement is shown by the angle the vector makes with the line of identity [y x] and magnitude of change by the length of the vector (Fig. 4),” yet in the fifth published figure, and the Appendix, the average value is used, not the length of the vector).5 It is our hope that standardizing the presentation of data will improve the inferences derived from these important studies.
Name: Robert H. Thiele, MD.
Contribution: This author helped design the study and write the manuscript.
Attestation: Robert H. Thiele approved the final manuscript.
Name: Timothy L. McMurry, PhD.
Contribution: This author helped design the study and write the manuscript.
Attestation: Timothy L. McMurry approved the final manuscript.
This manuscript was handled by: Franklin Dexter, MD, PhD.
1. Altman DG. Statistics and ethics in medical research: V–analysing data. Br Med J. 1980;281:1473–5
2. Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Statistician. 1983;32:307–17
3. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–10
4. Perrino AC Jr, O’Connor T, Luther M. Transtracheal Doppler cardiac output monitoring: comparison to thermodilution during noncardiac surgery. Anesth Analg. 1994;78:1060–6
5. Critchley LA, Lee A, Ho AM. A critical review of the ability of continuous cardiac output monitors to measure trends in cardiac output. Anesth Analg. 2010;111:1180–92
6. Saugel B, Grothe O, Wagner JY. Tracking changes in cardiac output: statistical considerations on the 4-quadrant plot and the polar plot methodology. Anesth Analg. 2015;121:514–24
7. Maus TM, Reber B, Banks DA, Berry A, Guerrero E, Manecke GR. Cardiac output determination from endotracheally measured impedance cardiography: clinical evaluation of endotracheal cardiac output monitor. J Cardiothorac Vasc Anesth. 2011;25:770–5
8. van der Kleij SC, Koolen BB, Newhall DA, Gerritse BM, Rosseel PM, Rijpstra TA, Geisler FE, van der Meer NJ. Clinical evaluation of a new tracheal impedance cardiography method. Anaesthesia. 2012;67:729–33
9. Suehiro K, Tanaka K, Mikawa M, Uchihara Y, Matsuyama T, Matsuura T, Funao T, Yamada T, Mori T, Nishikawa K. Improved performance of the fourth-generation FloTrac/Vigileo system for tracking cardiac output changes. J Cardiothorac Vasc Anesth. 2015;29:656–62
10. Critchley LA, Critchley JA. A meta-analysis of studies using bias and precision statistics to compare cardiac output measurement techniques. J Clin Monit Comput. 1999;15:85–91
11. Møller-Sørensen H, Hansen KL, Østergaard M, Andersen LW, Møller K. Lack of agreement and trending ability of the endotracheal cardiac output monitor compared with thermodilution. Acta Anaesthesiol Scand. 2012;56:433–40
12. Cooper ES, Muir WW. Continuous cardiac output monitoring via arterial pressure waveform analysis following severe hemorrhagic shock in dogs. Crit Care Med. 2007;35:1724–9
13. Hadian M, Kim HK, Severyn DA, Pinsky MR. Cross-comparison of cardiac output trending accuracy of LiDCO, PiCCO, FloTrac and pulmonary artery catheters. Crit Care. 2010;14:R212
14. Cecconi M, Dawson D, Casaretti R, Grounds RM, Rhodes A. A prospective study of the accuracy and precision of continuous cardiac output monitoring devices as compared to intermittent thermodilution. Minerva Anestesiol. 2010;76:1010–7
15. Sokolski M, Rydlewska A, Krakowiak B, Biegus J, Zymlinski R, Banasiak W, Jankowska EA, Ponikowski P. Comparison of invasive and non-invasive measurements of haemodynamic parameters in patients with advanced heart failure. J Cardiovasc Med (Hagerstown). 2011;12:773–8
16. Preiss D, Fisher J. A measure of confidence in Bland-Altman analysis for the interchangeability of two methods of measurement. J Clin Monit Comput. 2008;22:257–9
17. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8:135–60
18. Myles PS, Cui J. Using the Bland-Altman method to measure agreement with repeated measures. Br J Anaesth. 2007;99:309–11
19. Dexter F, Epstein RH, Bayman EO, Ledolter J. Estimating surgical case durations and making comparisons among facilities: identifying facilities with lower anesthesia professional fees. Anesth Analg. 2013;116:1103–15